Finalizing my ESXi/ZFS AIO storage design

idea

Gawd
Joined
Jan 24, 2005
Messages
615
Ladies and gentlemen,

I am done hunting the parts to put this thing together. Everything is on it's way. Let me know what you think of this design and if there is anything I can do better.

Quick details on the rest of the box: Supermicro X9SCM-F-O w/ Xeon E3-1230 CPU. 16GB ECC UDIMM (4GB goes to Solaris VM). ESXi 5.

Code:
ITEM	QTY	DESCRIPTION
HBA	2	Dell SAS6/iR (LSI 1068e) flashed w/ LSI IT firmware
Boot	1	Boot device and primary datastore to hold Solaris: Cheap SATA drive
SAS	8	(4 mirror vdevs) (8x 73GB 2.5" SAS 10K).  292GB of fast SAS usable storage to store VMs, databases, etc.
NL	6	(1 RAIDZ vdev) (6x 2TB WD20EARX). 10TB of nearline usable storage.
ZIL	2	(1 mirror vdev) Transcend TS8GSSD25S-S 2.5" 8GB SATA II SLC Internal Solid State Drive
L2ARC	1	ADATA 500 Series AS592S-32GM-C 2.5" 32GB SATA II MLC Internal Solid State Drive (SSD)

Some notes and questions thus far:
  • Obviously the SAS/NL storage will be controlled by Solaris and shared back to ESXi
  • Solaris will have 4GB RAM, so should I slice 2GB of the SSDs for ZIL? Or leave them at their full 8GB? I only ask because the general rule for ZIL is RAM-divided-by-2.
  • I am going to try to divide the HBA's evenly for mirrored vdevs. One disk per pair will go to HBA0, the other to HBA1. This will provide somewhat fault tolerance. Seems like the right thing to do.

What do you guys think?
 
I think people frown upon WD 4k sector drives for ZFS builds. Even with ashift=12 you still have performance issues you don't with other drives.

Hitachi or Samsung.
 
I think people frown upon WD 4k sector drives for ZFS builds. Even with ashift=12 you still have performance issues you don't with other drives.

Hitachi or Samsung.

Thanks, noted. Good thing I didn't unbox them yet. I could trade them for a different model.
 
Also if possible get the +F not the -F, as the latter has a NIC that esxi does not yet support.
 
Also if possible get the +F not the -F, as the latter has a NIC that esxi does not yet support.

Too late! I also have an Intel dual port GbE NIC. 3 is enough for me. 4 is even better, whenever they get around to supporting the other onboard NIC
 
The other thing I'd point out is that the LSI 1068e controllers don't support disks larger than 2TB.

Perhaps not a huge issue now, but if you're looking to add/upgrade any more storage later on it might be a problem for you.
 
The other thing I'd point out is that the LSI 1068e controllers don't support disks larger than 2TB.

Perhaps not a huge issue now, but if you're looking to add/upgrade any more storage later on it might be a problem for you.

Thought about that. I don't plan on getting 3TB drives in the near future and until then, I'd rather not purchase two new HBAs and cables. I had to use this stuff while I had it lying around.
 
Can anyone offer advice on the size of the ZIL question I have in the first post? Not sure if I should partition (slice) the SSDs or just leave them at 8GB
 
I wouldn't partition the ZIL. I have not confirmed this, but I suspect you may have issues with drive cache utilization ability to flush, etc. if you try and use the SSD for different purposes (i.e. ZIL/L2ARC)

If you are just asking if you should partition and leave the excess un-utilized then, well, I wouldn't. You don't even actually need a single slice/partition - just setup the unpartitioned raw devices as the ZIL. The firmware of the SSD itself will handle wear leveling so you will get extra longevity out of the unused area.
 
The storage design looks fairly high end (mirrored SAS drives, ZIL, etc.) but I think it's out of balance with the amount of memory you have. I'd expect you to be memory starved not only for your storage VM but also not have much memory left to spread around to other VMs to use that super fast storage. Also, I'd agree with the other folks comments about the controller (I'd go with something SAS2008 like the IBM M1015) and also the slow disks (I'd go with the 3TB 5400RPM Hitachis, although with the recent floods now is a bad time to be buying drives).

As to ZIL, I'm not 100% sure but I recall reading that you need at least as much memory as ZIL, so I don't think an 8gig ZIL would get fully used on a 4gig system.
 
Also, by my count you're one port short for drives. The two HBAs are 8 ports each, so a total of 16. With 8 SAS, 6 SATA, 2 ZIL, and 1 LARC you have 17. I assume you'll be using the on board controller for ESXi so not passing it through (although I'm not familiar with the X9SCM-F-O, it's possible the on board controller is split in two).

Lastly, although not really needed if you're going to all this trouble you may want to mirror your ESXi drive just so you don't have to bother with a reinstall or continual backup of your storage VM. If you don't want to drop in another RAID controller you could use something like an IcyDock with built in mirroring.
 
I wouldn't partition the ZIL. I have not confirmed this, but I suspect you may have issues with drive cache utilization ability to flush, etc. if you try and use the SSD for different purposes (i.e. ZIL/L2ARC)

If you are just asking if you should partition and leave the excess un-utilized then, well, I wouldn't. You don't even actually need a single slice/partition - just setup the unpartitioned raw devices as the ZIL. The firmware of the SSD itself will handle wear leveling so you will get extra longevity out of the unused area.

Yes, I am asking if I should leave the excess unutilized. I have read that best practice for ZIL size is half of the RAM. I do not know what the reason for this is. I wanted to know if anyone else knew.

The storage design looks fairly high end (mirrored SAS drives, ZIL, etc.) but I think it's out of balance with the amount of memory you have. I'd expect you to be memory starved not only for your storage VM but also not have much memory left to spread around to other VMs to use that super fast storage. Also, I'd agree with the other folks comments about the controller (I'd go with something SAS2008 like the IBM M1015) and also the slow disks (I'd go with the 3TB 5400RPM Hitachis, although with the recent floods now is a bad time to be buying drives).

As to ZIL, I'm not 100% sure but I recall reading that you need at least as much memory as ZIL, so I don't think an 8gig ZIL would get fully used on a 4gig system.

Thank you! Unfortunately, I am limited to 4x4GB UDIMMs due to memory manufacturers not releasing 8GB UDIMMs yet. I decided 4GB should go to Solaris, and 12GB to the guests, but I suppose I could give 8GB to Solaris if I had to. We will see. They sure are taking their sweet time announcing the 8GB sticks.

As far as the controllers go, I stated earlier I'm not really interested in spending the extra money on them since I don't need anything they have to offer. I don't need 6GB/s and I don't need 3TB support yet. One day, sure… not today though.

Also, by my count you're one port short for drives. The two HBAs are 8 ports each, so a total of 16. With 8 SAS, 6 SATA, 2 ZIL, and 1 LARC you have 17. I assume you'll be using the on board controller for ESXi so not passing it through (although I'm not familiar with the X9SCM-F-O, it's possible the on board controller is split in two).

Lastly, although not really needed if you're going to all this trouble you may want to mirror your ESXi drive just so you don't have to bother with a reinstall or continual backup of your storage VM. If you don't want to drop in another RAID controller you could use something like an IcyDock with built in mirroring.

DOH! I screwed that one up big time. I'll have to do 6x SAS, with one port leftover. I'll make it a hot spare or another 2TB in the RAID-Z.

Mirroring the boot device is definitely in the plans. Check out my thread here: http://hardforum.com/showthread.php?t=1626406
 
Yes, I am asking if I should leave the excess unutilized. I have read that best practice for ZIL size is half of the RAM. I do not know what the reason for this is. I wanted to know if anyone else knew.

Just add it as a raw device, no partitions, and let the firmware handle wear levelling. You won't actually use the extra space (how much you use will be a function of how fast you can write data), but the sizing guide is an indication that you need a certain size to support a certain transfer rate - there's not a penalty for having a larger than required zil (it doesn't take system ram to maintain it like the l2arc does)

Chris
 
Damn it, I just realized that these ZIL/L2ARC SSD's must go to one pool only. I guess they'll go to the SAS pool.
 
hotzen, apparently there was an X9SCM+-F motherboard Supermicro made. It had appeared on their website for a short while but is now no longer there! I even saw it on sale on a U.S. site for a while.

I believe even my X9SCM manual references the +F. Naturally, I can't find the +F for sale anywhere now!
 
AFAIK there is no "+" version of the X9SCM which provides the Intel NIC 82574L.
There is only one of the full ATX X9SCL [1]

That's pretty much the exact setup I have planned for next month ;) good luck

[1] http://www.supermicro.nl/products/motherboard/Xeon3000/#1155

http://www.supermicro.com/products/motherboard/Xeon/C202_C204/X9SCL_-F.cfm

The X9SCL is mATX not full atx. It's doesn't have the 2 sata 3 ports the SCM does and one less pcie 4x slot (basically c204 vs. c202)
 
I plugged everything in last night and booted up. The 1068e cards don't recognize the 8gb SSDs. I'm going to have to find out why.

I might drop the SSDs altogether. I'm not even sure if I could benefit from their use and they are using up 3 valuable ports on my HBAs. I have to think about it.
 
Back
Top