ZFS performance tuning

Joined
May 22, 2006
Messages
3,270
I've been reading information about getting the best performance out of ZFS and was looking for additional input.

Currently I have 8x WD5000AAKX drives in a RAIDZ1 pool. These are 4k drives and OpenIndiana creates the pool with ashift=9. From what I'm reading, it would be advantageous to grab a modified binary so I can re-create the pool with ashift=12. These drives will be hosting iSCSI and NFS datastores for my ESXi servers so I'd like to squeeze as much performance out of them as I can.

How big of a performance hit do I take from using 8 drives vs the preferred 3, 5, or 9 in a RAIDZ1 set? Am I better off using RAIDZ2 or 3? Mirrors? What's the best method for creating this pool so my virtual machines can get the best performance (aside from SSD read cache and ZIL which is on the way!).

Thanks
 
From what I can tell, the performance hit of either the wrong ashift or suboptimal number of data drives in a vdev is not significant. As far as type of pool, what is your workload? Reads heavy? Write heavy? Mixed?
 
I would caution against RAIDZ1 for ESX datastores if you want to maximize performance. No matter how many drives you have in a vdev, you will only get the IOPS of one (assuming the ARC is not a factor). The L2ARC and ZIL cache will help you alot, but you will get cheaper and higher performance with mirrors or striped mirrors, or even striped RAID1Zs if you want to max out storage.

And then you have the reliability argument with only one parity drive, especially with that many in a vdev.
 
Last edited:
I would caution against RAIDZ1 for ESX datastores if you want to maximize performance. No matter how many drives you have in a vdev, you will only get the IOPS of one (assuming the ARC is not a factor). The L2ARC and ZIL cache will help you alot, but you will get cheaper and higher performance with mirrors or striped mirrors, or even striped RAID1Zs if you want to max out storage.

And then you have the reliability argument with only one parity drive, especially with that many in a vdev.

So perhaps creating a pool of 4 mirror vdevs would be best? This would essentially resemble a RAID 10 set, correct?

IOPs will be my primary concern as there could be 8-15 VMs running on these drives at any given time, including one running Playon to stream internet videos to our TV. So.....

9665509.jpg
 
A pool of mirror vdevs is hands down the optimal for IOPS performance, though, I doubt you'll have issues if you get a L2ARC SSD big enough to hold the entire workset of your VMs (which you should).

edit: and yes, a pool of mirror vdevs is the equivalent of a raid10
 
4-mirror pool would be ideal, although I haven't used SSD L2ARC and ZIL yet so I can't say if you are ok with one vdev. The ARC should soak up all your read IOPS, the writes I don't know. I'm assuming you have a SLC SSDs for that?
 
Last edited:
A pool of mirror vdevs is hands down the optimal for IOPS performance, though, I doubt you'll have issues if you get a L2ARC SSD big enough to hold the entire workset of your VMs (which you should).

edit: and yes, a pool of mirror vdevs is the equivalent of a raid10

The L2ARC SSD I'll be using is just a 64GB Kingston. I've also got 16GB of RAM in my file server.

Since I have 8 drives, I should still see pretty significant throughput, correct? Since I only have 2x 1Gb network ports devoted to iSCSI and another 2 aggregated for NFS the most I can push at the pool via the network is 500MBps.
 
I thought ESX couldn't multipath NFS. AFAIK you would have to dedicate the second nic to second NFS datastore to get the bandwidth of both.
 
I thought ESX couldn't multipath NFS. AFAIK you would have to dedicate the second nic to second NFS datastore to get the bandwidth of both.

I'm not multipathing, just creating a 2 port LACP link on the ZFS server. My two ESXi servers have a single Gb port dedicated to NFS so I'd like to have 2Gb available in case they're both hitting the NFS shares hard simultaneously.
 
Just got done creating a 4 mirror pool with ashift=12. Ran a dd bench and got 369MB/s writes. With my RAIDz1 pool with ashift=9 I'd get 190MB/s. Very impressive.
 
Bonnie++ looks even better.

Previously I was getting around 200MB/s write and 360MB/s read. Now...

Code:
NAME	        SIZE	Bonnie	Date(y.m.d)	File	Seq-Wr-Chr	%CPU	Seq-Write	%CPU	Seq-Rewr	%CPU	Seq-Rd-Chr	%CPU	Seq-Read	%CPU	Rnd Seeks	%CPU	Files	Seq-Create	Rnd-Create
datastores	1.81T	start	2011.08.30	32248M	75 MB/s	         97	348 MB/s	62	228 MB/s	38	54 MB/s	        99	633 MB/s	40	2102.0/s	17	16	+++++/s	        31793/s

EDIT: While I'm sure the new pool format and ashift helps, it's also worth noting I went from connecting my drives across two cheap SATA controllers each on PCI-E x1 to a single IBM BR10i on a PCI-E x4 bus.
 
I would be making a couple of test pools

As I think you might find IOPS to be higher with more spindles per vdev

Thruput might be higher for the mirrors compared to the Raidz.... but IOPS you might find are the otherway around.

Make both types of pools and compare the difference

Then just for fun make yourself 2 x Raidz 4 drive vdevs in the one pool and compare that.;)
As that might get you the best of both worlds... for the $....that is double the thruput of One drives bandwidth... with a decent (for HDD's) IOPs score as well.

.
 
Code:
NAME 	SIZE 	Bonnie 	Date(y.m.d) 	File 	Seq-Wr-Chr 	%CPU 	Seq-Write 	%CPU 	Seq-Rewr 	%CPU 	Seq-Rd-Chr 	%CPU 	Seq-Read 	%CPU 	Rnd Seeks 	%CPU 	Files 	Seq-Create 	Rnd-Create
 tank 	10.9T 	start 	2011.08.30 	32G 	80 MB/s 	99 	408 MB/s 	35 	203 MB/s 	28 	64 MB/s 	99 	450 MB/s 	25 	2067.1/s 	6 	16 	+++++/s 	+++++/s

Just as a comparison here is my 6 drive Z1 pool with cache and log on an SSD. Before the SSD, the Rnd Seeks were around 8-900/s IIRC.
 
Code:
NAME 	SIZE 	Bonnie 	Date(y.m.d) 	File 	Seq-Wr-Chr 	%CPU 	Seq-Write 	%CPU 	Seq-Rewr 	%CPU 	Seq-Rd-Chr 	%CPU 	Seq-Read 	%CPU 	Rnd Seeks 	%CPU 	Files 	Seq-Create 	Rnd-Create
 tank 	10.9T 	start 	2011.08.30 	32G 	80 MB/s 	99 	408 MB/s 	35 	203 MB/s 	28 	64 MB/s 	99 	450 MB/s 	25 	2067.1/s 	6 	16 	+++++/s 	+++++/s

Just as a comparison here is my 6 drive Z1 pool with cache and log on an SSD. Before the SSD, the Rnd Seeks were around 8-900/s IIRC.

Right now I'm running with a 64GB L2ARC but no ZIL yet. A pair of 8GB SLC drives should be arriving soon which will run as a mirrored ZIL.
 
Here's what I got from a Windows 7 x64 VM running by itself on an iSCSI volume from the new pool. Write performance seems low but reads are filling up the two 1Gb links on the host.

vmfsperf.png
 
Which ones are you getting?

-TLB

Just a pair of cheap Transcend 8GB SLCs. TS8GSSD25S-S. Since I'm running vSphere 5 in my lab, I'd like to be able to experiment with Storage DRS and Profiles by having two different types of storage available. Without a dedicated ZIL, NFS latency average 35ms on an idle VM and spikes over 250ms during write activity.
 
I would be making a couple of test pools

As I think you might find IOPS to be higher with more spindles per vdev

Thruput might be higher for the mirrors compared to the Raidz.... but IOPS you might find are the otherway around.

Make both types of pools and compare the difference

Then just for fun make yourself 2 x Raidz 4 drive vdevs in the one pool and compare that.;)
As that might get you the best of both worlds... for the $....that is double the thruput of One drives bandwidth... with a decent (for HDD's) IOPs score as well.

No, 2x 4 drive RAIDZ1's would only have the write IOPS of 2 drives, limited by the parity calculations that can not be spread out across multiple drives within a vdev, where the write IOPS will be equal to the IOPS of 4 drives in the mirror setup. Reads, however, will be good on the RAIDZ1.

I'm not a big fan of mirrors in home environments though. I can see the need in large, high performance database systems or many-user environments, but for home use, even intensive home use, I think you'd be more happy to spend a few bucks on good SSD's to take advantage of the awesome hybrid storage possibilities of ZFS. If all your VMs get most of their reads from L2ARC, the IOPS of your vdevs/pool will be of less importance.
Remember, you can add multiple L2ARC SSD's for increased performance, as ZFS obviously spreads out the data on the drives, so 2 drives of 32GB, should be quite a bit faster than 1 drive of 64GB, and if 1 drive dies, your pool will not be totally gimped from not having any L2ARC at all, and since you're not making any vdevs of the L2ARC SSD's, they dont have to be the same size or type, get whatever is cheaper.
 
Back
Top