ZFS RAIDz2 Homeserver - sanity check

seku

n00b
Joined
Mar 3, 2014
Messages
42
Hi all,
my old HP Microserver is reaching its limits storage-wise, so i thought i'd tackle a new funny project and build me a bigger homeserver.

Been reading up a lot the last years about ZFS and its hunger for RAM, so i thought of going for a "beefier" build for the server... just wanted a sanity check of the build from experienced storage addicts :)

Hardware should be as follows:

The build will be used primarily for home storage, and running a few VM's (mailserver, lightweight webserver, Sickbeard, Firewall ...). All VM's will be running off the SSD.

Both LSI controllers (the one on the mobo and the 9201i will be VT-ded to the storage VM... i thought of experimenting with Nas4Free (i use it on the Microserver, and very happy with it) and napp-it. Nas4Free would allow me to play around with GELI encryption too, I don't know how far i'll take that though.

As for hard disks i thought of going with 4TB hitachi deskstars (or the NAS models). After quite a lot of questions and discussions on #zfs (thanks guys, you were impressively helpful!) i think to start with a zpool of 2 times 6 disks in raidz2 ... mostly to get the perfect 2^n array size, and minimize rebuild times.

The 2x10GBe on the mainboard would be used to connect to my workstation and backup server, as i don't feel like paying for a 10GBe switch yet. The 4 ports gigabit ethernet card would take care of DMZ, LAN and outbound traffic, all routed through probably a PF-sense VM.

There should be no IO load on the ZFS array, as i will be the only user, plus a few music/xbmc appliances in the house... or maybe some rare shares to friends via VPN.

In this scenario, i hope i can manage 300-400mbyte sec transfers from my windows workstation to the array via CIFS. Does this seem feasible (10gbe/vmxnet3/esxi/zfs array bottlenecks?)

I'd love to hear your comments about the build and any suggestions/corrections on what i should do.

Thanks all :)
 
Sounds fine. the disks should give you around 600mb/sec speed, maybe up around 800mb/sec.

The limiting factor here will be cifs itself, I have not seen cifs go faster than 300mbit :)
There might be some tuning that could be done to make it go faster, but this was just windows defaults, and wasn't in a position to *optimize* it.
 
300mbit? you should easily max a 1GbE link, I'd assume more with 10GbE hardware...

That said with incremental backups my desktop usually takes less than a couple minutes even with standard gigabit, not that I haven't toyed around with the idea of trying some 10GbE hardware (why won't consumer prices go down!?!?).

I'm considering doing a similar build though, I've always been a linux guy so mdadm with raid6 has served me well for years, but going ZFS lately probably ZFSoL has been through my head a few times.

What are you going to be using for the bare metal os? ESXi?
 
Sounds fine. the disks should give you around 600mb/sec speed, maybe up around 800mb/sec.

The limiting factor here will be cifs itself, I have not seen cifs go faster than 300mbit :)
There might be some tuning that could be done to make it go faster, but this was just windows defaults, and wasn't in a position to *optimize* it.

You mean MB/sec right (megabyte/s) as apposed to megabit/s.
 
Agreed. My win7 pro worksation running crystaldiskmark to a CIFS share on an omnios server gets 276mb/sec read and 209mb/sec write.
 
300mbit? you should easily max a 1GbE link, I'd assume more with 10GbE hardware...

That said with incremental backups my desktop usually takes less than a couple minutes even with standard gigabit, not that I haven't toyed around with the idea of trying some 10GbE hardware (why won't consumer prices go down!?!?).

I'm considering doing a similar build though, I've always been a linux guy so mdadm with raid6 has served me well for years, but going ZFS lately probably ZFSoL has been through my head a few times.
What are you going to be using for the bare metal os? ESXi?

yes, thinking of going for esxi for its very stable vt-d of storage controllers. i'm awaiting bottlenecks both in cifs and in the esxi vmxnet adapters and vswwitches... so i chose a cpu with highest single thread performance.

i've read about several new interesting things in smb 3 and 3.02... so ill give it a try. Vmxnet3 seems more of a trouble child, maybe i will have to play with tcp window sizes or other stuff i have no clue about.

if it gets too complex i can still install the storage OS to bare metal, but the i would loose a lot of flexibility to use the power of this always-on machine.
 
As a 10GbE owner I would advice you not do it unless you really need the speed or if the money needed to invest in the hardware is affordable to you. You have to really think about when 10Gbe is going to come into play. 800 MB/s from mechanical drives is going to be hard to achieve unless your using a stripe of mirrors. You can achieve that speed using multiple RAIDZ/2/3.. vdev's in a striped zpool as well. If you go RAIDZ (which most people do) your setup is only as fast as one disk minus caching (ARC, L2ARC, ZIL).

I see a lot of people get hype about the benchmarks but for myself they never translated in real my real world usage.

Lets say 800 MB/s is your goal and you'll do whatever you need to do to get your array moving that fast. Remember to go 800 MB/s across a network the source and destination need to be able to hit those speeds. This is something I didn't really keep in mind when I did my build out.

Anyways just my 2 cents.
 
you're absolutely right about the necessity/price part. i got to admit i could do well enough with gigabit mostly. but i love tinkering with IT :) 1gbit does feel a bit slowpokish tho, especially when transferring multiple terabytes of raw video files. it'd be a dream to work with those raw files directly via cifs (100megs/sec per video stream)

in a way, if i get 300mbyte/sec out of the box ill be content. naturally ill tinker with it to see if i can make it faster... that's its own reward. about the receing ends: workstation runs 500 mbyte/sec read/write ssds and the backup server prolly a zpool of 12x2tb raidz2 and 5x4tb raidz1... not as fast as the main one, but it should do (those disks are my actual main storage and backup... theyll all get demoted to backup)
 
LSI 9201 16i

Thank you for ending my day long search! I think I'm going to stick with this card.

Do you all think that having all drives in a given array on the same HBA is more desirable than having two controllers with half of the array on one and half on the other or does it not really matter? I had hoped to get two 8i controllers (8 internal drive capacity) and divide the processing between the two. Any thoughts?
 
Thank you for ending my day long search! I think I'm going to stick with this card.

Do you all think that having all drives in a given array on the same HBA is more desirable than having two controllers with half of the array on one and half on the other or does it not really matter? I had hoped to get two 8i controllers (8 internal drive capacity) and divide the processing between the two. Any thoughts?

I don't know if i should answer, because i don't have any hands-on experience yet, but here's my take : i will keep the arrays on one controller for raidz2:
  • Bandwidth-wise there should be no reason for starving the arrays : even on PCI-Express 1.0, an LSI 9201 16port on pcie8x should give you 4 GByte/sec throughput; meaning 250megs/sec per disk
  • I want to minimize IRQ crosstalk problems (if both controllers shared the same)
  • More importantly, if one controller dies, and i got a few raidz2 of 6 disks spread evenly between 2 controllers, then the array will crap out.I prefer "clean" failure. also, arrays on other controllers ill be unaffected
  • Finally, i think arrays on multiple controllers only makes sense if you can do something like 3 controllers and a raidz2 6 disk : if one controller fails, then it takes 2 disks out of the array, making it only degraded.
tl;dr : either put the array on one controller; or, for an array put only the number of redundancy disks on one controller. This would make it 1 disk for mirror & raidz1, 2 disks for raidz2 and 3 disks for raidz3 per controller.
 
Last edited:
This might be difficult to achieve depending on your OS, the size of your files and how you access them over the network.
  • sole heavy duty user on NAS
  • NAS OS should be either OmniOS or Nas4Free, i will test both .. same OS will be used for main server and backup server
  • Workstation OS is either Win7 or my Hackintosh partition
  • Size of files range from 4mbyte (CinemaDNG) to 40-50 gigabyte files. no small stuff except a few roms, but i don't care about speed there. Access to data is nearly all the time sequential
  • Access to the files will be done via CIFS... i *could* do ISCSI but i feel that would leave me a lot less flexible.
Can you ballpark anything with this?

Also GreenLED made me ponder a bit about my RAIDZ2/Controller setup... thanks a lot! I remembered that the 9201-16i controller is in fact 2 seperate 8-port chips on the same card... this now gives me 3 controllers to work with, so i can implement the 6 disks / 3 controllers RAIDZ2 scenario. I like how things would look rather streamlined :
pNOFHd7.png
 
Glad I could be of some help. I'm also glad I took a look at this post because after I looked up that controller, it made me wonder whether running one HBA verses two was a good idea for my NAS. I really liked to spread the workload on my builds because it's less intensive on the hardware. Well, good thing you brought that up. I've been "thumbing" through this best practices guide for ZFS and they recommend using separate hardware for the same reason. Win, win!
 
well folks, it is done, parts are ordered :)

i'll do a follow-up thread to keep you guys updated once the build starts!
 
well folks, it is done, parts are ordered :)

i'll do a follow-up thread to keep you guys updated once the build starts!

Please post the name of the thread here so we can subscribe. Oh, and thanks for your suggestion on the HBA. Saved me tons of time. I will be using the same model.
 
I will post the name of the thread here.
Just ordered 12x4TB Hitachi Deskstar NAS drives too ... love those Hitachis. Will allow me to build the first zpool :)
 
I will post the name of the thread here.
Just ordered 12x4TB Hitachi Deskstar NAS drives too ... love those Hitachis. Will allow me to build the first zpool :)

How much did the drives cost you?
 
  • sole heavy duty user on NAS
  • NAS OS should be either OmniOS or Nas4Free, i will test both .. same OS will be used for main server and backup server
  • Workstation OS is either Win7 or my Hackintosh partition
  • Size of files range from 4mbyte (CinemaDNG) to 40-50 gigabyte files. no small stuff except a few roms, but i don't care about speed there. Access to data is nearly all the time sequential
  • Access to the files will be done via CIFS... i *could* do ISCSI but i feel that would leave me a lot less flexible.
Can you ballpark anything with this?

Also GreenLED made me ponder a bit about my RAIDZ2/Controller setup... thanks a lot! I remembered that the 9201-16i controller is in fact 2 seperate 8-port chips on the same card... this now gives me 3 controllers to work with, so i can implement the 6 disks / 3 controllers RAIDZ2 scenario. I like how things would look rather streamlined :
pNOFHd7.png

It's possible, I have a pool with two 5 disk RAIDZ vdevs. I'll do some quick read write tests for you tonight.
 
140€ per drive, VAT included.
oh, and i already have a lesson's learned :
Be careful with LGA2011 coolers : for the same chip there are 2 kinds of sockets : ILM square and ILM narrow. had to reorder a different a noctua cooler :)

It's possible, I have a pool with two 5 disk RAIDZ vdevs. I'll do some quick read write tests for you tonight.
thanks a lot!
 
These are the results from a my iMac connected via 10Gb. On the Mac I used a ram disk. The ZFS pool that I used consists of two 5 disk 4TB Seagate RAIDZ vdevs. I moved the same file enough to the point where it should have been in RAM on the ZFS side (but who knows). Notice I tried to snap the screenshot at peaks, so please keep that in mind.

SMB
HVh2S4U.png


NFS
K1uqFDs.png


Upload
AqF2O0G.png


I did another one that peaked at 333MB/s but dropped down to 250ish.
ovBHeet.png


Reading from my 2X256 M4 mirror
d0H9GcN.png



The latest OmniOS Napp-it .9e1

pool: ZFS
state: ONLINE
scan: resilvered 536K in 0h0m with 0 errors on Mon Feb 24 20:51:30 2014
config:

NAME STATE READ WRITE CKSUM
ZFS ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c5t5000C5004FA34CA9d0 ONLINE 0 0 0
c5t5000C5004FB8D4C1d0 ONLINE 0 0 0
c5t5000C5004FB917D0d0 ONLINE 0 0 0
c5t5000C5004FBBA37Fd0 ONLINE 0 0 0
c5t5000C5004FBBCACDd0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
c5t5000C500606FD815d0 ONLINE 0 0 0
c5t5000C5006072404Cd0 ONLINE 0 0 0
c5t5000C50060724912d0 ONLINE 0 0 0
c5t5000C500607321F0d0 ONLINE 0 0 0
c5t5000C50065331CCBd0 ONLINE 0 0 0
cache
c5t500A07510909439Fd0 ONLINE 0 0 0
spares
c5t5000C5006088312Cd0 AVAIL

System Memory:
Physical RAM: 49134 MB
Free Memory : 10309 MB
LotsFree: 767 MB



Sux, I know. I would love to tune it when I have more time.
 
Last edited:
thanks for doing these benchies, twistacatz :)

reads look kind of what i thought a vanilla cifs setup would look like, but i wonder if i misread the write speed : does it max out at 80 mbyte/sec? that'd be quite a bit lower than what i'd expect. osx smb.implementation or something else?

looking forward to run my first tests, but it'll take a week till all parts arrive :/
 
I'm sorry man, I updated the SMB graphic to reflect the proper image.

thanks for doing these benchies, twistacatz :)

reads look kind of what i thought a vanilla cifs setup would look like, but i wonder if i misread the write speed : does it max out at 80 mbyte/sec? that'd be quite a bit lower than what i'd expect. osx smb.implementation or something else?

looking forward to run my first tests, but it'll take a week till all parts arrive :/

Just to make sure I ran a second write test and I got better results. The only difference I can really tell you is this time I moved the file from my SSD vs the RAM drive which should have been faster. This is part of my issue, I never really get consistent numbers and they always underwhelm me considering the hardware that I'm using.

dPzHrN9.png


Who knows you might have better luck.
 
Have you run a bonnie benchmark or similiar on the server to get a baseline without the network in the way?
 
Have you run a bonnie benchmark or similiar on the server to get a baseline without the network in the way?

Without a doubt and each time it returns amazing numbers. Like I said above I don't put too much value in the benchmarks.
 
Back
Top