Another Help me decide on ZFS, ESXi hardware thread

cbutters

Gawd
Joined
Dec 30, 2005
Messages
514
Just bought myself a second hand rack for cheap locally and looking into building myself an ESXi + ZFS file server... Need a bit of confirmation before I pull the trigger on this hardware. Will be using it to store my large collection of movies/photos/websites/documents and some work backups off-site. ZFS is attractive to me because of the non-corruption of data, security, snapshots, and compression options. (despite this, I have a secondary backup system already for important files), I will also be virtualizing a web server and a domain server / any other projects I come up with.

Core Hardware:
Chassis: Norco 4020 (20 hot swap drive bays)
Mobo: Supermicro MBD-X9SCL+-F
Memory: 2 x Kingston 8GB DDR3 ECC Registered KVR1333D3D4R9S/8G
CPU: Core i5-2400

Will be running ESXi off of usb thumb drive, and virtualizing openindiana on thumbdrive to run large filesystem.


Current Storage Plans:
I plan to run a 10 x 2TB RAIDZ2 initially, which should give me 2 parity drives and 8 data drives. Using the striping rule, 128 zfs stripe size/8 data drives = 16 So I should be good, I don't mind only one vdev starting out since I wont need more than sata speed normally as I will mostly be interfacing over gigabit.

Drives:10x2TB
Controller: Intel M1015 8 port controller, 2 onboard ports.

So I plan to run the 10 drives using both the m1015 and 2 onboard motherboard ports, Since ZFS interfaces with each drive singly, this shouldn't be a problem right?



Future Expansion:
Going forward in the next year or so, I would like to max out the hot-swap bay capacity and expand another 10 drives by adding another vdev device with 3TB drives, I plan on adding another M1015 controller at that time and doing another vdev except with 3TB drives: 10 x 3TB RAIDZ2 vdev
I expect the additional vdev will improve performance of the pool when I get around to adding it.

should there be any problems expanding my pool by adding this second vdev device? even though one vdev is using 2TB and the other 3TB?

Other expansion possibilities include throwing an intel SSD or 2 into place as a zfs caching device to improve performance for the virtual machines.


So to wrap it up, these are my plans, giving me 16TB storage initially and a future expansion to 40TB later on when I add a new vdev. Are there any glaring problems with this strategy? Thanks for reviewing..
 
First off, you're off to a good start, I just built a very similar box for ESXi (supermicro board, m1015), but you're missing a few things.

First, like danswartz said, you're need a 2.5" ssd or laptop drive to run the OS on, ideally a small (16-32GB) one as that's all you'll need.

Second, you're going to need a Xeon to support VT-d in order to pass through the HBA to the OpenIndiana install. I'd recommend an E3-1230, as it's the best in terms of price / performance. an E3-1220 would work too, but then you miss out on hyperthreading which is nice for an ESXi box.

Third, you've picked out the wrong RAM, unfortunately the e3 boards do not take registered RAM, you're going to need unbuffered ECC, of which 8GB DIMMs are way too expensive for the time being, so if you want to go with 16GB, you'll need 4x4GB, this is what I used in my board (x9-SCM+-F, very similiar to yours):http://www.amazon.com/gp/product/B002T3JN0Y/ref=oh_o00_s00_i01_details

Hopefully this helps
 
As an Amazon Associate, HardForum may earn from qualifying purchases.
Get the RPC-4220 over the RPC-4020. SFF-8087 backplanes make things much easier and better drive trays. Worth a few more dollars.

Also, be mindful on the PSU. I have seen multi-rail 1050w PSUs not be able to handle the startup load of 20 drives not using staggered spin-up.
 
One note: if you can get a cheap/small drive for the local datastore, do that instead of an SSD. OI will not be doing much I/O to the boot drive, so unless you have a small/cheap SSD lying around, it's overkill...
 
If you are using ESXi and virtualizing OI, obviously you will be using direct hardware passthrough...I am pretty sure you cannot passthrough only 2 onboard SATA ports, as you would need to passthrough the entire controller (which wont work unless your main VM datastore/drive is either on its own storage controller etc)...long story short, you better add another M1015 to your list...
 
Second, you're going to need a Xeon to support VT-d in order to pass through the HBA to the OpenIndiana install. I'd recommend an E3-1230, as it's the best in terms of price / performance. an E3-1220 would work too, but then you miss out on hyperthreading which is nice for an ESXi box.

Are you sure? the i5-2400 data says that vt-d is supported: http://ark.intel.com/products/52207

Third, you've picked out the wrong RAM, unfortunately the e3 boards do not take registered RAM, you're going to need unbuffered ECC, of which 8GB DIMMs are way too expensive for the time being, so if you want to go with 16GB, you'll need 4x4GB, this is what I used in my board (x9-SCM+-F, very similiar to yours):http://www.amazon.com/gp/product/B002T3JN0Y/ref=oh_o00_s00_i01_details

Hopefully this helps
THANKS! I've made that mistake in the past and had it bite me... I think I will go with something like this: Crucial CT2KIT51272BA1339 8GB for now until the 8gb dimms are more readily available.


If you are using ESXi and virtualizing OI, obviously you will be using direct hardware passthrough...I am pretty sure you cannot passthrough only 2 onboard SATA ports, as you would need to passthrough the entire controller (which wont work unless your main VM datastore/drive is either on its own storage controller etc)...long story short, you better add another M1015 to your list...
Thanks for the catch! if I went with a motherboard that had 2x6gb sata controllers and 4x3gb sata controllers, like the supermicro MBD-X9SCM-F-O for example; would it be possible to pass only the 3gbps through VT-D? If not, I guess I need three m1015s in the future.
 
As an Amazon Associate, HardForum may earn from qualifying purchases.
Get the RPC-4220 over the RPC-4020. SFF-8087 backplanes make things much easier and better drive trays. Worth a few more dollars.

Also, be mindful on the PSU. I have seen multi-rail 1050w PSUs not be able to handle the startup load of 20 drives not using staggered spin-up.

Thanks for the tip, I thought I was reading somewhere that backplanes sometimes caused issues with zfs setups, but I just realized that it was really talking about SAS expanders! This should clean up the cabling a bit going with the 4220, thanks...
 
Thanks for the catch! if I went with a motherboard that had 2x6gb sata controllers and 4x3gb sata controllers, like the supermicro MBD-X9SCM-F-O for example; would it be possible to pass only the 3gbps through VT-D? If not, I guess I need three m1015s in the future.

Depends...you would really want to make sure that 2 storage controllers are indeed being used: 1 for the 6gbs and 1 for the 3gbs, then you could. I ended up with 3 M1015s on my build...they are cheap enough (in the grand scheme of things) and there is no guesswork anymore.

Something else to think about too is that you may want to have a fault tolerant datastore for your VMs. I use a simple RAID1 mirror for my VM datastore. I also backup my raw VM files regularly too in case something happens...Just some food for thought.
 
Last edited:
THANKS! I've made that mistake in the past and had it bite me... I think I will go with something like this: Crucial CT2KIT51272BA1339 8GB for now until the 8gb dimms are more readily available.

If you look at the i5-2400 specs on Intel site you see that the processor DOESN'T accept ECC memory.
 
If you look at the i5-2400 specs on Intel site you see that the processor DOESN'T accept ECC memory.

Aww Dag! I had the i5-2400 laying around... looks like I'll be picking up a Xeon, most likely the E3-1230, hopefully price drops soon with ivy bridge chips coming... With this much data I'll be storing, I don't want to run without ECC, this type of protection is the reason im moving to ZFS anyways, can't skimp on it here..
Thanks everyone so far, since your suggestions have basically made me rethink the majority of my components I would say that this is a very productive thread so far.
 
Just a thought.... why does the supermicro motherboard state that core i3 processors work in the motherboard, yet says the motherboard memory specification is for ECC unbuffered memory? as far as I can tell only Xeons have ECC memory..... is this really true?
 
Aww Dag! I had the i5-2400 laying around... looks like I'll be picking up a Xeon, most likely the E3-1230, hopefully price drops soon with ivy bridge chips coming... With this much data I'll be storing, I don't want to run without ECC, this type of protection is the reason im moving to ZFS anyways, can't skimp on it here..
Thanks everyone so far, since your suggestions have basically made me rethink the majority of my components I would say that this is a very productive thread so far.

Suggestion is install openindiana without Esxi and use Virtualbox and Zones to virtualize your stuff, that way you don't need a Xeon processor.
 
Just a thought.... why does the supermicro motherboard state that core i3 processors work in the motherboard, yet says the motherboard memory specification is for ECC unbuffered memory? as far as I can tell only Xeons have ECC memory..... is this really true?

Wrong thread, sorry
 
Last edited:
ok, due to feedback new parts list is as follows:

-Habey RL-26 26" 3-sections Ball-Bearing Slide Rail
-SUPERMICRO MBD-X9SCL+-F Micro ATX Server Motherboard LGA 1155 Intel C202 DDR3 1333
-NORCO RPC-4220 4U Rackmount Server Chassis w/ 20 Hot-Swappable SATA/SAS 6G Drive Bays
-Crucial 8GB (2 x 4GB) 240-Pin DDR3 SDRAM DDR3 1333 (PC3 10600) ECC Unbuffered Server Memory Model CT2KIT51272BA1339
-Intel Xeon E3-1230 Sandy Bridge 3.2GHz LGA 1155 80W Quad-Core Server Processor BX80623E31230
-Intel 80GB X25-M OS Drive
-2 x IBM M1015
-10 x Samsung 2TB HD204UI

also 4 x 3ware CBL-SFF8087-05M 1 unit of 0.5m Multi-lane Internal (SFF-8087) Serial ATA cable

is .5m long enough for the backplane in this case?

unless i hear any issues with these parts, i plan on ordering any parts I dont have today.
 
Last edited:
The IBM M1015 only supports 8 drives doesn't it? I'm just curious why you'd opt for a 20 bay enclosure then use 2x IBM M1015 . I am struggling to select an HBA for an identical project. Can you do one RAID5 array spanning both M1015s?
 
The IBM M1015 only supports 8 drives doesn't it? I'm just curious why you'd opt for a 20 bay enclosure then use 2x IBM M1015 . I am struggling to select an HBA for an identical project. Can you do one RAID5 array spanning both M1015s?

20 bay enclosure is for future expansion.

2 x m1015 gets me up to 16 bays, i can use 6 more of the onboard motherboard ports which puts me past 20 bays, for virtualization purposes, I'll probably get a 3rd m1015 if I ever get enough drives to fill it.

I am not doing a RAID5, I am doing something roughly similar to a RAID6, (a ZFS RAIDZ2) which with ZFS, the controller does not matter, as the hdd is accessed natively.
 
Last edited:
Hmm, sad to say, but 1gbit network connection can easily surpass the zfs write speed of that raidz2.

If you write 1byte of data, you have to read and write from every single disk on that vdev to update that 1byte. (you can skip the read if that data is cached in arc)

If you only use this to store iso's, movies, backups, ..., it's fine. But not normally fine for vmware usage.
 
Hmm, sad to say, but 1gbit network connection can easily surpass the zfs write speed of that raidz2.

If you write 1byte of data, you have to read and write from every single disk on that vdev to update that 1byte. (you can skip the read if that data is cached in arc)

If you only use this to store iso's, movies, backups, ..., it's fine. But not normally fine for vmware usage.

Thanks for the advice... I am going to do some tests and see how it goes, if it is too slow, I have several options, let me know what you think:

1) may look at reconfiguring the drives possibly into two separate vdevs instead of one.

2)incorporate a SSD as a "ZIL" caching device to speed up writes?

3)use a separate MIRRORED disk on the motherboard sata controller to store the VMs on,
 
Open one is good, you could make 5 mirrors, optimal way, or maybe 3 raidz's

2) I'm not sure this would help at all, an slog is made to speed up sync writes to the host, not actually speed them up to the physical disks. Turning sync off would show if this would help though.

3) this would put you in the same position (though without contention on the other raidz), unless you used several mirror vdevs

But then, everything might be happy, it all depends on workload.

Right now, I have 4 vdev's for my vm's, but I don't use them oftem, and it is *ok* performance.
But normally I only use those vm's for distorter recovery of broken systems, just needs lots of space to recover, and to put the to be recovered data.
 
So as I am waiting on parts to arrive, I am playing with openindiana, solaris, freenas in vmware. I was keen on openindiana, but I realized that it does not support the compression features that solaris does... is there any benefit to running openindiana over solaris (they seem pretty comparable except solaris has more features) I know that openindiana is illuminos based, and solaris is oracle, and is not being open sourced as it should be.... is that why people choose OI? to stay away from oracle?
 
So as I am waiting on parts to arrive, I am playing with openindiana, solaris, freenas in vmware. I was keen on openindiana, but I realized that it does not support the compression features that solaris does... is there any benefit to running openindiana over solaris (they seem pretty comparable except solaris has more features) I know that openindiana is illuminos based, and solaris is oracle, and is not being open sourced as it should be.... is that why people choose OI? to stay away from oracle?

Currently, ZFS encryption is Oracle Solaris 11 only, so if you need, you have no options.
But you loose compatibility forever with all other ZFS implementations like FreeBSD, Illumos based ones (Illumian, OpenIndiana, SmartOS), Linux or OSX which are on ZFS V.28

I do not expect newer Versions than V28 that are Oracle compatible.
Its more likely that we see features flags in future, not Versions > 28 from others than Oracle.
http://blog.delphix.com/csiden/files/2012/01/ZFS_Feature_Flags.pdf
 
Umm? I have been using gzip and lzjb compression in openindiana, maybe you meant encryption like _Gea is assuming?
 
Umm? I have been using gzip and lzjb compression in openindiana, maybe you meant encryption like _Gea is assuming?

i meant encryption, but only because it seemed to give me the compression features. Where do you configure compression in openindiana installations w/ napp-it?

[EDIT] Ok just kidding I found where to enable/disable compression on an already established ZFS dataset... im not seeing where to choose the type of compression and I am still showing a compression ratio of 1.0 though.... ?
 
Last edited:
Hmm, sad to say, but 1gbit network connection can easily surpass the zfs write speed of that raidz2.

If you write 1byte of data, you have to read and write from every single disk on that vdev to update that 1byte. (you can skip the read if that data is cached in arc)

If you only use this to store iso's, movies, backups, ..., it's fine. But not normally fine for vmware usage.

My 3 drive raidz1 of Samsung F4's seems to saturate a single gigabit connection pretty acceptably. I'm running Realtek nics so I might be loosing a bit but it's not uncommon to see consistent 90-100MB/s on large sustained transfers.

That's not to say there aren't other relevant reasons for going with smaller vdevs. ;)
 
Things are going pretty well with my build, I finally have things working as planned.
I have a 10 drive RaidZ2 going on... question though.... I thought Raid Z2 on 10 2TB drives would give me 2 drives of parity (4TB) leaving me 16TB total usable space. For some reason I'm showing 18.1T as the size? I am of course losing additional space due to ZFS overhead, but where is it getting that 18.1T number? see screenshot:
zfs.png
 
Things are going pretty well with my build, I finally have things working as planned.
I have a 10 drive RaidZ2 going on... question though.... I thought Raid Z2 on 10 2TB drives would give me 2 drives of parity (4TB) leaving me 16TB total usable space. For some reason I'm showing 18.1T as the size? I am of course losing additional space due to ZFS overhead, but where is it getting that 18.1T number? see screenshot:
zfs.png

Ok never mind I just realized myself that the 18.1 is the raw size of all the drives together. One drive has 1.81T space, although it is funny that napp-it shows each drive as having 2.00TB, and then later is adding them up to be 18.1T
or is this a function of TB vs TiB?
 
Last edited:
How did those rails work out for you? I bought a set of rails that don't fit; they're too wide to fit between the rack and the 4220, and I'd like to replace them one of these days. Where did you get your rails?
 
How did those rails work out for you? I bought a set of rails that don't fit; they're too wide to fit between the rack and the 4220, and I'd like to replace them one of these days. Where did you get your rails?

I got the Habey RL-26 26" 3-sections Ball-Bearing Slide Rail. and they were very tight.... I managed to get it into place.... but it does not slide smoothly and I wouldn't recommend them, as it is a bad fit.
 
Back
Top