SSD disks performance issues with ZFS

bibin

n00b
Joined
Jan 6, 2016
Messages
6
Hello,

We are currently testing our using ZFS on Linux as a storage platform for our VPS nodes but we don't seem to be getting the performance figures we expected. Can you please provide some suggestions as to what we should be tweaking to reach higher iops?

Hardware is SuperMicro with the MegaRAID 2108 chipset as a daughter card on each server. We had three servers that we tested: pure SSD with 4 x 480GB Chronos drives, 4 x 600GB SAS 10k drives and 480GB SSD cache, and lastly 4 x 1TB SAS 7.2k with 480GB SSD cache.

We set the onboard raid controller to essentially JBOD (raid0 per drive with cache turned off). We got the best performance when using Z2 with LZ4 compression. Here are the results we saw:

Server ----- RAID ----- Filesystem ---- Read Speed ----- Write Speed ------ Read IOPS ------ Write IOPS




pure SSD with 4 x 480GB ------ Soft - Z2 ------- ZFS without compression ------- 4.1GB/s ------ 778 MB/s ----- 23025 ----- 7664
Chronos drives

pure SSD with 4 x 480GB ------ Soft - Z2 ------- ZFS with lz4 compression ------- 4.6GB/s ----- 1.8GB/s ------ 47189 ----- 15715
Chronos drives

4 x 600GB SAS 10k drives ------- Soft - Z2 --- --- ZFS without compression ------ 4.0Gb/S ------ 486Mb/s ----- 10234 ------ 3413
with 480GB SSD cache

4 x 600GB SAS 10k drives ------ Soft - Z2 ------ ZFS with lz4 compression ------- 4.8Gb/s -------- 2.2Gb/s ----- 51056 ------- 17077
wtth 480GB SSD cache


4 x 1TB SAS 7.2k drives ----- Soft - Z2 ------- ZFS without compression --------- 4.1Gb/s ------- 1.4Gb/s -------- 53486 -------- 17840
with 480GB SSD cache


4 x 1TB SAS 7.2k drives
with 480GB SSD cache ------- Soft - Z2 ------ ZFS with lz4 compression -------- 4.4Gb/s -------- 1.7Gb/s ------- 37803 --------- 12594


It doesn't seem like there is a big difference between the pure SSD setup and the others, even without the SSD cache on the other setups. Is there something we are missing here or something we should be looking into? We were expecting iops to be a lot higher than the results.

Thank you for your help!
 
How are you running the tests? Linux generally caches things as well which tends to mess with a lot of these results. Many times, you will need to use a data set twice as large as your system memory in order to start seeing your actual disk performance. Not sure what tool you are using, but I've used Bonnie++ and Iozone both of which have a direct to disk flag to prevent OS caching, although they still recommend to use random datasets at least twice the size of your memory (sucks when you have a system with a few hundred gigs of ram :).

Also, what size blocks are you testing with? SSDs generally use 4k blocks I believe. 8x that of traditional spinning disks. And have you checked your disk alignment? If your disks aren't aligned, you'll be doing twice the work to get the results you want.

That said, I don't see the data for your non-cached spinning disk setup? They all have an SSD cache. The IOPs you are getting on those mean either things are being cached at your OS, or your dataset you are testing with is a bunch of repeats which is already cached on the SSD. In general expect about 150 IOPS from a SATA drive, 350 from a SAS drive, and 1000 or so from an SSD.. then just multiply it out by how many active spindles you have. For those nearline-'SAS' drives, I would expect about 600 IOPS... so definitely some caching going on somewhere.

Lastly, I'm guessing the chronos drives are sata? Sata max queue depth is 32 vs SAS at 128 I think? Although probably not what is affecting your results since SSDs generally can handle data much faster than spinning disks (unless you are testing sequential writes?) and therefore doesn't need to queue as much, but the controller will also have an overall max queue depth, which might be causing issues. Anyways, something worth thinking about.
 
Hi,

We have tested on the baremetal server with ram caching. I hope the mentioned IOPS will be served by the RAM cache(ARM) and not by the disks. We are using fio tool to test the IOPS benchmarking. I am thinking that to create a vm with kvm virtualization on the zpool(like a dataset) to test the disk iops?. Please advice us.
 
Hello,

..........

Hardware is SuperMicro with the MegaRAID 2108 chipset as a daughter card on each server. We had three servers that we tested: pure SSD with 4 x 480GB Chronos drives, 4 x 600GB SAS 10k drives and 480GB SSD cache, and lastly 4 x 1TB SAS 7.2k with 480GB SSD cache.

We set the onboard raid controller to essentially JBOD (raid0 per drive with cache turned off).......

raid 0 is not JBOD :D.....

you need to use HBA card not HW Raid card., ex lsi 2008.....

when you setup raid 0, you are running "middle person" between drive and ZFS

JBOD in lsi is called "Unconfigured Drive"
 
How much data are you writing to the "arrays" with FIO?

Is your MegaRAID 2108 in IT mode? (not sure if it supports IT mode)
 
raid 0 is not JBOD .....

you need to use HBA card not HW Raid card., ex lsi 2008.....

when you setup raid 0, you are running "middle person" between drive and ZFS

JBOD in lsi is called "Unconfigured Drive"

Yes. Our raid card is not support JBOD configuration and can't flash to IT mode as well. I have setup those things as per the LSI doc. They recommended that to use " To use the drives connected to the 2108/2208 controller, a RAID
must be created. If using drives like in JBOD, each single drive must be
created as a RAID 0 individually". https://www.supermicro.com/manuals/other/LSI_2108_2208_SAS_MegaRAID_Configuration_Utility.pdf


Do you have any idea to test the disk IOPS without RAM cache(ARM). I have tested by using the fio tool. But the IOPS is showing as with RAM cache results I believe.
 
How much data are you writing to the "arrays" with FIO?

Is your MegaRAID 2108 in IT mode? (not sure if it supports IT mode)

We are using to following fio configuration to test ,
./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

This will create a 4 GB file, and perform 4KB reads and writes using a 75%/25% (ie 3 reads are performed for every 1 write) split within the file, with 64 operations running at a time. I think the output is showing with the RAM cache. Do you know, how can we test the disk iops along with the ZFS pool by using any other tools like bonnie++,iozone,etc. If you know, please share the details.

Thanks
 
Yes. Our raid card is not support JBOD configuration and can't flash to IT mode as well. I have setup those things as per the LSI doc. They recommended that to use " To use the drives connected to the 2108/2208 controller, a RAID
must be created. If using drives like in JBOD, each single drive must be
created as a RAID 0 individually". https://www.supermicro.com/manuals/other/LSI_2108_2208_SAS_MegaRAID_Configuration_Utility.pdf


Do you have any idea to test the disk IOPS without RAM cache(ARM). I have tested by using the fio tool. But the IOPS is showing as with RAM cache results I believe.

you have to use HBA card, SAS2008 IT firmware
by makin raid 0, the controller is processing in the middle

they recomendation is not good,
cut middle person aka skip controller processing, HBA card is the answer,

the other problem, ZFS loves to control the drives by itself.
since you are trusted on controller raid card, any uncertain situation can be happened:p

on linux? just knowing on linux ha!...
 
Back
Top