Investigating CrystalDiskMark

mikeblas · Jul 9, 2009

I've noticed a couple of threads using Crystal Disk Mark to benchmark drives. I hadn't heard of this tool before and was curious about its implementation.

Interestingly, the threads I saw were pasting digital pictures of the monitor where the application was running. I found that it's easy enough to take a screen shot with Alt+PrintScreen to capture the benchmark results. This isn't exactly a clear represenatation of the results. The program features a "paste" command in its "Edit" menu that copies the results formatted as plain text to the clipboard.

I ran the program against a single 300 gig Hewlett Packard 10KRPM SAS drive (HP Part 492620-B21)attached to one of the servers I have at work. All of the tests I did were with Win64; in this case, the server is a HP ProLiant rig with a StorageArray 400 backplane running Windows 2003 Server R2 x64. It has 32 gigs of memory.

Code:

--------------------------------------------------
CrystalDiskMark 2.2 (C) 2007-2008 hiyohiyo
      Crystal Dew World : http://crystalmark.info/
--------------------------------------------------

   Sequential Read :  536.489 MB/s
  Sequential Write :  315.936 MB/s
 Random Read 512KB :  522.720 MB/s
Random Write 512KB :  310.560 MB/s
   Random Read 4KB :   91.374 MB/s
  Random Write 4KB :   73.389 MB/s

         Test Size : 100 MB
              Date : 2009/07/09 7:46:08

I spent some time reviewing the source code, and found some interesting anomolies. One of the most important is that the program uses FILE_FLAG_NO_BUFFERING when creating the file handle that it reads to or writes from, but does not provide an aligned buffer to the ReadFile() or WriteFile() calls subsequently used against the handle. This means that the driver must still do some buffering, which enables it to do caching, which will alter the results.

I also ran the program against an Intel X-25E 64 gig drive on my desktop machine. That rig is a Core i7 945 box running Windows Vista 64 with a EVGA SLI motherboard and 12 gigs of memory, and the drive attached to a SATA port on the motherboard. These are the results:

Code:

Intel X25-E SSD (64 gigs)

--------------------------------------------------
CrystalDiskMark 2.2 (C) 2007-2008 hiyohiyo
      Crystal Dew World : http://crystalmark.info/
--------------------------------------------------

   Sequential Read :  230.891 MB/s
  Sequential Write :   77.822 MB/s
 Random Read 512KB :  163.622 MB/s
Random Write 512KB :   78.434 MB/s
   Random Read 4KB :   17.134 MB/s
  Random Write 4KB :   42.133 MB/s

         Test Size : 100 MB
              Date : 2009/07/09 8:11:07

The SAS drive handily out-performs the SSD drive. Right now, the SSD drive costs about the same as the spinning HP drive, though the HP drive is about five times cheaper when measuring cost per gigabyte. It's also about twice as cheap when measuring cost per IO operation per second.

The HP server has a fiber channel card connected to an external RAID chassis which hosts sixteen Seagate ST373455SS SAS drives in RAID10. There are only 14 actual drives in the array, and 2 drives are hot spares. I was surprised at the poor performance of the array:

Code:

--------------------------------------------------
CrystalDiskMark 2.2 (C) 2007-2008 hiyohiyo
      Crystal Dew World : http://crystalmark.info/
--------------------------------------------------

   Sequential Read :  189.025 MB/s
  Sequential Write :  100.712 MB/s
 Random Read 512KB :  183.226 MB/s
Random Write 512KB :  101.221 MB/s
   Random Read 4KB :   26.228 MB/s
  Random Write 4KB :   10.310 MB/s

         Test Size : 100 MB
              Date : 2009/07/09 7:50:49

I was able to reproduce the poor performance of this setup using a different machine with a similarly configured array. However, when doing so, I was surprised to find that the access pattern of the test seems to be off. When running the test, I only saw five disk drive lights in the array flickering. I would have expected the accesses to be spread across all drives in the array, making all the lights flicker. (It's easy to verify this happens by copying a file to the array--all drives are pretty equally active.)

The stripe size on the array is 256 kilobytes, so my theory is that the tool doesn't appropriately generate random numbers which access all the drives in the broadly-striped array.

I'm also a bit concerned with the way that the tool initializes its test file, as it seems to create it very quickly. Since most people (me, too) run the tests as administrator, it's possible that the file system is creating the file and being extended with no fill, then the first writes to the file actually cause that fill. This leads to another concern with the tests--that the order of the tests matters because nothing is done to flush cache or reset the file between the tests.

Has anyone else investigated this benchmark?

nitrobass24 · Jul 9, 2009

So you are saying that a single 2.5in 10k SAS disk is doing 500+ MB/s sequential read, and 300MB/s + sequential writes?
Something seems a bit off... and I think its the Cache on the controller.

Your test size is 100MB....change that to 1000mb, and then make sure write caching, and read ahead is turned off on the controller.

How about you take that same drive, and those arrays, and bench them with HDtach, and IOmeter as well so we can compare their results to CDM as well

mikeblas · Jul 9, 2009

Yes, I'm going to collect and post HD Tack results, too. I've been reading the Crystal Source for now, though ...

AreEss · Jul 9, 2009

I'm just going to come right out and say the facts:
Crystal DiskMark is utter garbage.
Those numbers are complete bull. Especially the SAS.
Your RAID is configured wrong; align file system to segment size.

extide · Jul 9, 2009

LOL @ Mechanical disk getting nearly 100MB a sec on 4k random reads. Thats OBVIOUSLY wrong. Put that X25-E on that same SAS controller and see what it gets. Seems like you are benchmarking the controller cache, and not the drive...

EDIT: FWIW here is my 160GB VelociRaptor at work
--------------------------------------------------
CrystalDiskMark 2.2 (C) 2007-2008 hiyohiyo
Crystal Dew World : http://crystalmark.info/
--------------------------------------------------

Sequential Read : 97.891 MB/s
Sequential Write : 102.550 MB/s
Random Read 512KB : 48.171 MB/s
Random Write 512KB : 75.113 MB/s
Random Read 4KB : 0.785 MB/s
Random Write 4KB : 2.227 MB/s

Test Size : 100 MB
Date : 2009/07/09 11:12:18

And here is a screenie of my SSD at home:

Annihilation.

And unlike your benchmarks there these numbers agree with what everyone else gets. Both of the above are ran on ICH10R in AHCI mode.

mikeblas · Jul 9, 2009

extide said:
Seems like you are benchmarking the controller cache, and not the drive...

Actually, I'm investigating the tool, first. How can I measure anything meaningful without first validating the tool that I'm using to measure? Indeed, there's more caching on the server's controller. But they're also running completely different operating systems.

The sequential test is calling WriteFile() in a loop

Code:

WaitFlag = TRUE;
SetTimer(((CDiskMarkDlg*)dlg)->GetSafeHwnd(), TIMER_ID, DISK_TEST_TIME, TimerProc);
start = timeGetTime();
do{
        for(i = 0; i < Loop; i++)
        {
                result = WriteFile(hFile, buf, BufSize, &writeSize, NULL);
                if(result)
                {
                        count++;
                }
                else
                {
                        FlushFileBuffers(hFile);
                        CloseHandle(hFile);
                        AfxMessageBox(((CDiskMarkDlg*)dlg)->m_MesDiskWriteError);
                        ((CDiskMarkDlg*)dlg)->m_DiskBenchStatus = FALSE;
                        return ;
                }
        }
        SetFilePointer(hFile, 0, NULL, FILE_BEGIN);
}while(WaitFlag);

to do blocking I/O against the file. It writes the same buffer again and again. The buffer is a megabyte in size. The docs say that the API doesn't do partial writes, but not checking this in the code seems a bit iffy to me.

nitrobass24 · Jul 9, 2009

Investigating how the tool works is great...but if the test you are using are invalid then so will your conclusions about the tool.

No one is saying you should not investigate the tool....but you are not exactly comparing apples to apples in your test above.

You essentially benched a RamDrive, and an SSD(not to mention was not even on the same system or chipset).

If you want to really find out how something works you need to use valid test.

unhappy_mage · Jul 9, 2009

nitrobass24 said:
Investigating how the tool works is great...but if the test you are using are invalid then so will your conclusions about the tool.

But if the tool is invalid, it'll never produce valid results, regardless of hardware. The code isn't written in a way that produces interesting or correct results.

Suppose someone used a thermometer to measure top speed of a vehicle, and found that a motorcycle could go 30 mph and a Civic could go 500. You'd complain about the testing methodology before you even looked at the results. It's not a great analogy (CDM seems to at least be interacting with the hard drive in some way) but the premise is the same.

extide · Jul 9, 2009

The thing is the results from that test (especially the 4K random read/write) do have a pretty strong correlation to real-world performance, at least in a desktop machine. Things like database access patterns are usually a little more intelligent and tend to try to do larger sequential reads and writes.

nitrobass24 · Jul 9, 2009

unhappy_mage said:
But if the tool is invalid, it'll never produce valid results, regardless of hardware. The code isn't written in a way that produces interesting or correct results.

Suppose someone used a thermometer to measure top speed of a vehicle, and found that a motorcycle could go 30 mph and a Civic could go 500. You'd complain about the testing methodology before you even looked at the results. It's not a great analogy (CDM seems to at least be interacting with the hard drive in some way) but the premise is the same.

Im not disagreeing with that.
My point is on the *SAS* tests he didnt really test the drive. He tested the cache on the controller

mikeblas · Jul 9, 2009

nitrobass24 said:
but you are not exactly comparing apples to apples in your test above.

I haven't made any comparisons yet; that'll have to wait until I have time to run HDTach and IOmeter.

But I might not bother: Crystal Mark Disk seems fundamentally flawed in that it always issues a synchronous I/O to the device under test. As a result, it never queues up more than one operation, which means that larger-scale devices (such as enterprise drives or advanced RAID controllers) are never going to get a chance to reach their potential, particularly in the random-access tests. Even commodity-class devices will perform at least a bit better with a deeper queue.

nitrobass24 · Jul 9, 2009

mikeblas said:
I haven't made any comparisons yet; snip.

mikeblas said:
The SAS drive handily out-performs the SSD drive. Right now, the SSD drive costs about the same as the spinning HP drive, though the HP drive is about five times cheaper when measuring cost per gigabyte. It's also about twice as cheap when measuring cost per IO operation per second.

Really you havent??

mikeblas · Jul 9, 2009

nitrobass24 said:
Really you havent??

I'm here to investigate the tool, not the relative performance of the drives. Having a gut feel for the quantitative performance of these different systems makes them a known. By using a new measurement to measure something that's known, I learn something about the new measurement.

Would it help you understand my goal better if I deleted that observation from the original post?

AreEss · Jul 9, 2009

mikeblas said:
Actually, I'm investigating the tool, first. How can I measure anything meaningful without first validating the tool that I'm using to measure? Indeed, there's more caching on the server's controller. But they're also running completely different operating systems.

The sequential test is calling WriteFile() in a loop

BAHAHHAHAHAHAHAHHAHAHAHAHAHAHAHAHAHAHHAHAHAHAHAHHAHAHAHAHAHAHHAHA
*deep breath*
AHAHAHHAHAHAHAHAHAHAHAHAHHAAHHAAHAHAHAHAHAHHAHAHAHAHAHAHAHAHAHAHAHAHAHAHHAHAHAHAHA

Yeah. That's just the height of incompetence right there. It's sitting in a reused OS buffer so you get an insane CPU cache hit rate, which then gets passed onto the disk cache, which if the algorithm is smart - like Hitachi - will discard the additional writes and instead mark down to write a duplicate of what's already in cache, since it's the same write request!

BLOODY BRILLIANT!

mikeblas · Jul 9, 2009

I spent some time adding versions of the random tests that do overlapped I/O. They keep a maximum of sixteen I/Os pending during the duration of the test run. The green "512K" and "4K" buttons are the existing synchronous code, and the "512KB" and "4KB" links show the rates from the overlapped code.

The top image is for my RAID1 SAS array at home, and the bottom image is the RAID5 SATA array on the same rig. Giving the OS, the controller, and the drives the ability to work through the random queue intelligently results in a substantial improvement over the numbers previously reported by the tool.

extide · Jul 9, 2009

I dont really think it's necessary, I mean the numbers (in stock form) are still an indicator of real world performance, AND also repeatable and comparable to other results obtained with the same tool. It might even be better to get rid of the units (MB/s) and just give a number.

mikeblas · Jul 9, 2009

They might be indicators of real world performance, but I think they're neither accurate nor reliable indicators of in situ performance.

unhappy_mage · Jul 10, 2009

nitrobass24 said:
Im not disagreeing with that.
My point is on the *SAS* tests he didnt really test the drive. He tested the cache on the controller

But why was the cache involved in the test? A good benchmark wouldn't fit in cache (unless that's what your final workload will look like, I guess) so the way the program was written has an effect on the outcome of the tests. The point is that the benchmark is poorly written, so arguing about the results is irrelevant.

extide · Jul 10, 2009

You can change the benchmark size by using the drop down there.... (On most desktop configurations 100MB test size is large enough to not fit in cache, but guess what, it happens to be configurable in-case it does fit in the cache on your setup)

No need to bash the app for no reason there...

AreEss · Jul 11, 2009

extide said:
I dont really think it's necessary, I mean the numbers (in stock form) are still an indicator of real world performance, AND also repeatable and comparable to other results obtained with the same tool. It might even be better to get rid of the units (MB/s) and just give a number.

Uh. No. They aren't. Period. They're completely bogus and have absolutely no legitimacy whatsoever. If you're foolish enough to buy into those numbers, don't come here whining about how you're not getting them.

But why was the cache involved in the test? A good benchmark wouldn't fit in cache (unless that's what your final workload will look like, I guess) so the way the program was written has an effect on the outcome of the tests. The point is that the benchmark is poorly written, so arguing about the results is irrelevant.

Exactly; Crystal DiskMark is the worst written disk bench I've seen in a long, long time. The grand real sum of your test data is 1MB. And it gets repeated. So it sits in memory, sits in CPU cache, sits in OS buffers, sits in disk cache, etcetera.

No need to bash the app for no reason there...

I explained my reasons. Which also explains why going past 100MB is irrelevant and as pointless as this shitty benchmark itself.

Rubycon · Jul 11, 2009

Hey guys you may want to check out another benchmark aimed at SSDs. It will work with mechanical drives too and show their obvious weaknesses.

http://alex-is.de/PHP/fusion/downloads.php?cat_id=4

I've been using this to fine tune my SSD arrays primarily a function of stripe size and HBA parameters.

extide · Jul 11, 2009

AreEss said:
Uh. No. They aren't. Period. They're completely bogus and have absolutely no legitimacy whatsoever. If you're foolish enough to buy into those numbers, don't come here whining about how you're not getting them.

I am not buying into those numbers, but I am saying I have been running a SSD for the past few months and those numbers represent my experience.

mikeblas · Jul 14, 2009

Rubycon said:
Hey guys you may want to check out another benchmark aimed at SSDs.

I don't see any source code for this one.

Investigating CrystalDiskMark

[H]ard|DCer of the Month - May 2006

[H]ard|DCer of the Month - December 2009

[H]ard|DCer of the Month - May 2006

2[H]4U

2[H]4U

[H]ard|DCer of the Month - May 2006

[H]ard|DCer of the Month - December 2009

[H]ard|DCer of the Month - October 2005

2[H]4U

[H]ard|DCer of the Month - December 2009

[H]ard|DCer of the Month - May 2006

[H]ard|DCer of the Month - December 2009

[H]ard|DCer of the Month - May 2006

2[H]4U

[H]ard|DCer of the Month - May 2006

2[H]4U

[H]ard|DCer of the Month - May 2006

[H]ard|DCer of the Month - October 2005

2[H]4U

2[H]4U

Weaksauce

2[H]4U

[H]ard|DCer of the Month - May 2006