RAID-Z

Boomslang

Limp Gawd
Joined
Apr 28, 2007
Messages
451
Hi, I'm planning a 18-drive (2x 9-drive arrays) storage server for this summer, and I've decided to use Solaris and ZFS to take advantage of RAID-Z.

http://en.wikipedia.org/wiki/ZFS

RAID-Z is basically a tweaked RAID-5, and can tolerate a single drive failure.

Anyway, I've been reading a bunch of docs, but a lot of them contradict each other because of the dates they were published. ZFS still seems to be under quite active development, so it's hard to find a concrete answer sometimes.

Questions:

-If a drive fails, can I drop a cold-spare in and easily rebuild the array? Does RAID-Z support hot-spares? Are you notified by some means when a device fails, or do you have to custom script / keep an eye on the diagnostics?

-I don't care that much about expansion of arrays, but I read that expansion/growing an array by adding a disk is not supported. Is this correct, or was I reading old news?

-Are there any on-the-fly disk encryption solutions that support plausible deniability that will function in the ZFS/RAID-Z setup? Looking for something similar to TrueCrypt for Linux.

-Does anyone have any experience administering ZFS storage that can add anything? I set up a theoretical system in VMWare, and it seemed easy enough, but I had no way to get a feel for performance because all of my virtual drives were on the same physical device.

Thanks, any input appreciated.
 
disclaimer: I've never heard of ZFS or RAID-Z

From my RAID5 experience wether or not you can do hot swaps and expansions is entirely dependant on the controller.
 
disclaimer: I've never heard of ZFS or RAID-Z

From my RAID5 experience wether or not you can do hot swaps and expansions is entirely dependant on the controller.

This would be software RAID using cheap SATA controllers.

http://solaris.reys.net/english/2005/11/zfs_raid-z

This link lists some quick and dirty points for why RAID-Z is a good (software) RAID-5 alternate. ZFS itself seems like an amazing filesystem.

http://www.sun.com/software/solaris/faqs/zfs.xml for a quick overview of the whole shebang.
 
-If a drive fails, can I drop a cold-spare in and easily rebuild the array? Does RAID-Z support hot-spares? Are you notified by some means when a device fails, or do you have to custom script / keep an eye on the diagnostics?
I don't know. This isn't mentioned on the ZFS Best Practices Guide, either. It's still worth reading, but I haven't found one way or another whether Solaris does something noticeable about disk failure.
-I don't care that much about expansion of arrays, but I read that expansion/growing an array by adding a disk is not supported. Is this correct, or was I reading old news?
Well, sort of. You can't add a disk to an existing raid-Z group, but if you decide 18 disks isn't enough and you need more, you can add another 9 (or 4, or 13 - the number is irrelevant) disks in their own raid-Z group in the same pool and it'll stripe across them. You can have many vdev's in one pool, basically. Each vdev can be raid 1, Z, or just concatenated. Then all the data gets spread to every pool.
-Are there any on-the-fly disk encryption solutions that support plausible deniability that will function in the ZFS/RAID-Z setup? Looking for something similar to TrueCrypt for Linux.
Not for the whole volume (to quote Wikipedia, "ZFS lacks transparent encryption, although there is an OpenSolaris project underway") but you can of course put a TrueCrypt volume in a file and do things that way. Solaris lacks support for TC itself, but you can still mount the device in Windows or Linux over network. You can even easily export an entire chunk over iScsi and use a software initiator to write to that chunk. Well, okay, perhaps "easy" is the wrong word ;)
-Does anyone have any experience administering ZFS storage that can add anything? I set up a theoretical system in VMWare, and it seemed easy enough, but I had no way to get a feel for performance because all of my virtual drives were on the same physical device.
It's pretty fast for the hardware I'm using with it... but I'm using crappy hardware :p Still, I get ~60 MB/s writes to a three disk array with two of the disks on the same (!) ATA/33 (!) bus. Turning on compression would probably help things, but I haven't tested it yet.

A side note: Every time someone brings up ZFS, I find something else cool and simple it's doing. I need to buy more disks now :p
 
for disk failures in solaris its spits out error msg's on the console and also in the var/log file
 
All my problems with ZFS have not actually been with ZFS but with Solaris in general, the filesystem is amazing but Solaris is not really like Linux/BSD whatsoever. You should also note that 32bit Solaris has a hard limit of 2tb drives (note: this only counts for incoming drives, so iSCSI'd arrays but not your 6x500gb internal array have this issue). I don't know if there is a workaround, but as far as I discovered there isn't.

Sun has a an excellent introductory guide on their website, I highly suggest checking it out to get a feel for the command interface and its capabilities.

Also as a disclaimer, I only tried using Solaris for a week and wasn't using approved hardware, so the entire process was incredibly painful. More experienced users/approved hardware would probably breeze through most the issues I encountered.
 
With regards to disk failures, you need a setup that can support hotplugging hardware wise. You can't just disconnect and reconnect hardware unless it's able to handle such. So once ZFS reports a failure that doesn't mean you can just pull the old drive and slap in a new one. You'll need hotpluggable support to do that.
 
Lots of responses, I'll try to touch all the bases...

It's pretty fast for the hardware I'm using with it... but I'm using crappy hardware :p Still, I get ~60 MB/s writes to a three disk array with two of the disks on the same (!) ATA/33 (!) bus. Turning on compression would probably help things, but I haven't tested it yet.

What kind of processor usage does sustained transfers into and out of your array result in? I'm trying to keep the budget on the low-ish side, and have an AMD x64 X2 low end Brisbane picked out. Do you know if the SMP capabilities will dramatically help read/write/transfer times?

All my problems with ZFS have not actually been with ZFS but with Solaris in general, the filesystem is amazing but Solaris is not really like Linux/BSD whatsoever. You should also note that 32bit Solaris has a hard limit of 2tb drives (note: this only counts for incoming drives, so iSCSI'd arrays but not your 6x500gb internal array have this issue). I don't know if there is a workaround, but as far as I discovered there isn't.

Sun has a an excellent introductory guide on their website, I highly suggest checking it out to get a feel for the command interface and its capabilities.

Also as a disclaimer, I only tried using Solaris for a week and wasn't using approved hardware, so the entire process was incredibly painful. More experienced users/approved hardware would probably breeze through most the issues I encountered.

I got a feel for administration of a RAID-Z array by setting one up in VMWare - as far as running the array goes, it was disconcertingly easy. I felt like I had missed a few steps, but things seemed to be going okay. I'll be accessing files on the server through SMB/CIFS to support Windows clients. En masse content addition will be handled by a simple SCP connection. I haven't done any research on iSCSI because a brief glance told me that it wasn't what I wanted, but I'm assuming that I won't hit the 2TB limit using the means I've listed. I'd like very much to install Solaris in 64-bit - I don't recall being presented with an option while I was installing in VMWare. Was I missing something, or does it automatically detect the processor's capabilities and adjust accordingly?

With regards to disk failures, you need a setup that can support hotplugging hardware wise. You can't just disconnect and reconnect hardware unless it's able to handle such. So once ZFS reports a failure that doesn't mean you can just pull the old drive and slap in a new one. You'll need hotpluggable support to do that.

I was told somewhere along the line that SATA as an interface included hotplugging in its featureset. I was thinking that I could tell ZFS to drop the drive from the pool, unplug the failed drive and replace it with a new one, and resilver to the fresh drive. Maybe this is not actually what hotplugging is - could be a misunderstanding on my part.

I found a blog entry talking about dealing with hot spares. HTH!

This is quite a useful link. Since it was published nearly a year ago, I'll go ahead and make the assumption that hot spares are well supported by now. Thank you, sir, for that blog entry.
 
What kind of processor usage does sustained transfers into and out of your array result in? I'm trying to keep the budget on the low-ish side, and have an AMD x64 X2 low end Brisbane picked out. Do you know if the SMP capabilities will dramatically help read/write/transfer times?
I'm only using a dual Pentium 3 to get those transfer rates. So an X2 should be more than enough. A 1 GHz CPU is supposed to handle 500 MB/s of checksumming (according to Sun), and I don't know how much parity generation (but in Linux I get ~800 MB/s of raid 5 parity from one of these CPUs). In addition, ZFS supposedly uses multiple CPUs to do checksumming and parity these days. I can't give you actual numbers right now; my array is down because I'm an idiot and left my extra sata cables at home. In general, though, I'd say Solaris eats more CPU standing still doing nothing (~10%) than Linux, but it doesn't peg the CPUs by any means to do fast reads or writes.
I got a feel for administration of a RAID-Z array by setting one up in VMWare - as far as running the array goes, it was disconcertingly easy. I felt like I had missed a few steps, but things seemed to be going okay. I'll be accessing files on the server through SMB/CIFS to support Windows clients. En masse content addition will be handled by a simple SCP connection.
"Disconcertingly easy" is the right phrase for it :D Samba will be installed when you install Solaris; copy /etc/sfw/smb.conf-example to /etc/sfw/smb.conf and edit to your needs (or create a new config file from scratch). Then run "/usr/sfw/bin/testparm" to show any errors in your config, and finally "svcadm samba start" to fire it up. You can also experiment with NFS to see if it works better for your needs; I think OS X supports it natively, and the SFU (free from MS) package lets you mount NFS shares on Windows.
I'd like very much to install Solaris in 64-bit - I don't recall being presented with an option while I was installing in VMWare. Was I missing something, or does it automatically detect the processor's capabilities and adjust accordingly?
Exactly. The x86 dvd boots in 32-bit mode all the time, but installs binaries for the proper architecture. It might be that your machine doesn't have the necessary processor extensions to run 64-bit guest OSes, or that Solaris guesses wrong in VMware.
I was told somewhere along the line that SATA as an interface included hotplugging in its featureset. I was thinking that I could tell ZFS to drop the drive from the pool, unplug the failed drive and replace it with a new one, and resilver to the fresh drive. Maybe this is not actually what hotplugging is - could be a misunderstanding on my part.
Sata does support hot plugging optionally; it's controller-dependent. Have you decided on a controller yet? I have the Supermicro AOC-SAT2-MV8, which is quite nice (and supported just fine - they use the same controller in the Thumper) but uses a PCI-X bus which most (all?) X2 platforms lack. Thus you'd be stuck with PCI transfer rates of 120-ish megabytes per second. Not that that's a terrible thing; for most uses that's plenty. But a PCI Express card with enough ports would be a marked improvement.
 
Some good info here...

I'm only using a dual Pentium 3 to get those transfer rates. So an X2 should be more than enough. A 1 GHz CPU is supposed to handle 500 MB/s of checksumming (according to Sun), and I don't know how much parity generation (but in Linux I get ~800 MB/s of raid 5 parity from one of these CPUs). In addition, ZFS supposedly uses multiple CPUs to do checksumming and parity these days. I can't give you actual numbers right now; my array is down because I'm an idiot and left my extra sata cables at home. In general, though, I'd say Solaris eats more CPU standing still doing nothing (~10%) than Linux, but it doesn't peg the CPUs by any means to do fast reads or writes.

If Solaris uses that much CPU while doing absolutely nothing, that's upsetting. I was hoping that I could keep this machine a little more eco-friendly power-bill-friendly - I have an efficient PSU and was considering undervolting the CPU. I know a full port has been made to FreeBSD - Would you have any idea how the FreeBSD port compares to the Solaris port? I have a little BSD experience and would be much more comfortable running a BSD system than a Solaris system (no Dvorak keyboard layout on Solaris without breaking my back!) but if the port is still immature, I'll stick with Solaris. Bleh, that's a lot of CPU to use for doing nothing. :(

"Disconcertingly easy" is the right phrase for it :D Samba will be installed when you install Solaris; copy /etc/sfw/smb.conf-example to /etc/sfw/smb.conf and edit to your needs (or create a new config file from scratch). Then run "/usr/sfw/bin/testparm" to show any errors in your config, and finally "svcadm samba start" to fire it up. You can also experiment with NFS to see if it works better for your needs; I think OS X supports it natively, and the SFU (free from MS) package lets you mount NFS shares on Windows.

I was planning on running the most stripped down package set to counteract bloat and to enhance security. I would install the packages as I would need them, including ssh, smb or cifs, and nfs. Really would like to keep everything to a bare minimum. I wasn't aware of that SFU package for MS - Any idea if throughput performance would take a hit by running NFS through SFU into Windars?

Exactly. The x86 dvd boots in 32-bit mode all the time, but installs binaries for the proper architecture. It might be that your machine doesn't have the necessary processor extensions to run 64-bit guest OSes, or that Solaris guesses wrong in VMware.

Ah, I see. Perhaps the fact that I was running the VM on a 32-bit host machine could have affected it, but I'm not sure, because I've done 64-bit VMs on the same host with no problem. No matta.

Sata does support hot plugging optionally; it's controller-dependent. Have you decided on a controller yet? I have the Supermicro AOC-SAT2-MV8, which is quite nice (and supported just fine - they use the same controller in the Thumper) but uses a PCI-X bus which most (all?) X2 platforms lack. Thus you'd be stuck with PCI transfer rates of 120-ish megabytes per second. Not that that's a terrible thing; for most uses that's plenty. But a PCI Express card with enough ports would be a marked improvement.

Price is a HUGE concern of this project, so I settled on cheapo Rosewill PCI SATA150 controllers. ( http://www.newegg.com/product/product.asp?item=N82E16816132006 ) I'm not too concerned about the limitations of the PCI bus because my bottleneck will be my gigabit ethernet equipment anyway. I have a friend with a 1.5TB RAID5 array with 250gig drives who uses these controllers, and they've been working just fine and have support in Linux. Part of the reason I decided to go with RAID-Z was because it was supposed to be such a successful form of software RAID, and would allow the use of cheap controllers such as these. I noticed in the Newegg reviews (rarely can be trusted, but still...) that these cannot handle >500GB drives, so I may do a little more shopping and pick something a little more robust. Not too much more robust though. Right now, with 18 500gb drives, the entire system would run me less than 31 cents per gig (before shipping and formatting.) If the 750gig drives drop significantly in price, that number will plunge even further.

Thank you for your points so far!
 
If Solaris uses that much CPU while doing absolutely nothing, that's upsetting. I was hoping that I could keep this machine a little more eco-friendly power-bill-friendly - I have an efficient PSU and was considering undervolting the CPU. I know a full port has been made to FreeBSD - Would you have any idea how the FreeBSD port compares to the Solaris port? I have a little BSD experience and would be much more comfortable running a BSD system than a Solaris system (no Dvorak keyboard layout on Solaris without breaking my back!) but if the port is still immature, I'll stick with Solaris. Bleh, that's a lot of CPU to use for doing nothing. :(
Ah, I was reading the stats in a silly way. Logging in through the GUI takes a good deal of CPU just by itself, because my video is apparently not accelerated at all (well, onboard ATI from 2002 will do that). Right now the only thing using CPU is smbd, at under 1%. Stick with Solaris; dtrace is really cool. One of these days I've gotta play with that some more...
I was planning on running the most stripped down package set to counteract bloat and to enhance security. I would install the packages as I would need them, including ssh, smb or cifs, and nfs. Really would like to keep everything to a bare minimum. I wasn't aware of that SFU package for MS - Any idea if throughput performance would take a hit by running NFS through SFU into Windars?
By default Solaris 10 installs everything you'll need, including ssh and samba. There's probably a way to counteract this... but why? Firewall what you don't want running, or disable the service. Not installing things only leads (in my experience) to having to install them later :p Your decision, though.

Performance over NFS... I dunno. I'll run some quick tests and let you know.
Price is a HUGE concern of this project, so I settled on cheapo Rosewill PCI SATA150 controllers. ( http://www.newegg.com/product/product.asp?item=N82E16816132006 ) I'm not too concerned about the limitations of the PCI bus because my bottleneck will be my gigabit ethernet equipment anyway. I have a friend with a 1.5TB RAID5 array with 250gig drives who uses these controllers, and they've been working just fine and have support in Linux.
I don't see support for these cards specifically, but the Adaptec 1205SA and Vantec UGT-ST200 are reported to work, and they use almost the same chipset (Sil3112 versus 3114). What board are you using to get that many PCI slots? Will anything else be going in this machine?
Part of the reason I decided to go with RAID-Z was because it was supposed to be such a successful form of software RAID, and would allow the use of cheap controllers such as these. I noticed in the Newegg reviews (rarely can be trusted, but still...) that these cannot handle >500GB drives, so I may do a little more shopping and pick something a little more robust. Not too much more robust though. Right now, with 18 500gb drives, the entire system would run me less than 31 cents per gig (before shipping and formatting.) If the 750gig drives drop significantly in price, that number will plunge even further.
I'd suggest buying the disks from ZipZoomFly. They have very similar prices, and their shipping is in manufacturer-approved material. Newegg usually ships in bubble wrap, although I understand this may change for large batches of drives.

What's your whole hardware configuration look like? Case? Power supply? Planning to use hotswap cages? Post a list and I'll see if I see any potential problems with it.
 
Whole build, minus case, PSU, and drive cages: https://secure.newegg.com/NewVersio...stNumber=4613711&WishListTitle=AMD+fileserver - I have another 1GB stick of that RAM that I'll be adding alongside the 1GB in the wishlist.

Case: Cooler Master Stacker 810: http://www.sundialmicro.com/Cooler-Master-Tower-Case-rc810ssn1_1701_541.html - Also looking to buy used to save money.

PSU: Silverstone Olympia 650w: http://deadeyedata.com/index.php?main_page=product_info&cPath=7_33_36_41&products_id=137 - Can use a discount code to save even more from this site. I did plenty of research for this, so I'm firm in my decision.

Drive cages are made by CalPC Systems - They have been ordered and delivered already. They are 3x 5.25" -> 5x 3.5" adapters, non-hotswap, and with a 120mm fan in each unit. I bought 4. (Pretty much identical to Ockie's Galaxy 2.0: http://www.hardforum.com/showpost.php?p=1028869849&postcount=1 )

I'll be buying 9 of the 500GB drives to start with, and making the first array with them. I'll probably buy the second 9 once 750GB drive prices get reduced so that I can take advantage of the greater storage.

Both the SATA controllers you linked me to do not have four internal SATA connectors, so I'm afraid I can't use them.

EDIT: http://www.newegg.com/Product/Product.aspx?Item=N82E16816110002 These caught my eye.
 
Whole build, minus case, PSU, and drive cages:
Might want to look into a different board; with that one you'll need to tie up the pci-e x16 slot with a video card. I don't have any specific recommendations, but the power draw on the 6100 series are pretty low and it's generally cheaper to get onboard video than buy a separate video card. Here's an example.
Case: Cooler Master Stacker 810: http://www.sundialmicro.com/Cooler-Master-Tower-Case-rc810ssn1_1701_541.html - Also looking to buy used to save money.

PSU: Silverstone Olympia 650w: http://deadeyedata.com/index.php?main_page=product_info&cPath=7_33_36_41&products_id=137 - Can use a discount code to save even more from this site. I did plenty of research for this, so I'm firm in my decision.
Both good choices. Here's JonnyGuru's review of the OP650, in case you need further reinforcement ;)
Both the SATA controllers you linked me to do not have four internal SATA connectors, so I'm afraid I can't use them.
I was simply pointing out that the controllers linked use the same sata controller as the ones you're planning on, so it's got a good chance of working fine. Just in case you're interested in higher port density, here's the controller I'm using. Sun actually uses them, and it works just fine so far for me. They are about twice as expensive, but if you might expand past 18 drives in the future, it'd be worth buying them now. They work fine in PCI slots.

PS: Might consider these, since they're $10 cheaper and based on the same chipset.
Get eyes back ;) Those cards are notorious for not actually doing redundancy, breaking people's configurations and generally being pieces of crap, not to mince words. Not to mention that the company that actually made them (Netcell) is apparently out of business. PNY just resells them.

I'm going to have to beg off on NFS benchmarking; I'm coming down with something and my everything hurts. And of course SFU isn't cooperating (in either sense of the word). I can say, though, that it wasn't noticeably terrible when I was using it.
 
Just switched over to the board you linked me to - the extra cost is no problem, because I save the money that I'd have to spend on a video card. I can't believe I overlooked this one during my shopping. Looks perfect, thanks man.

As for the Supermicro controllers, they certainly seem sexy - however, due to space limitations in the Stacker case, I can't imagine I'd be able to handle more than 20 drives on one system. Unless I suddenly found a very(!) cheap rackable enclosure that could accomodate 32+2 drives, I don't think I can justify the extra capacity. (You wouldn't happen to know of any way to get all these drives into a rackable box, would you? I started out wanting to rack them, but every option was too expensive.)

Also, it seems that the el-cheapo chipset on those Rosewill controllers will not handle +500GB drives, so that's about it in terms of expandability. I may give this some thought. I don't like being limited when prices keep dropping on more spacious drives.

Don't worry about the benches, and get well soon!
 
Just switched over to the board you linked me to - the extra cost is no problem, because I save the money that I'd have to spend on a video card. I can't believe I overlooked this one during my shopping. Looks perfect, thanks man.
Good deal, I hope it works out :D
As for the Supermicro controllers, they certainly seem sexy - however, due to space limitations in the Stacker case, I can't imagine I'd be able to handle more than 20 drives on one system. Unless I suddenly found a very(!) cheap rackable enclosure that could accomodate 32+2 drives, I don't think I can justify the extra capacity. (You wouldn't happen to know of any way to get all these drives into a rackable box, would you? I started out wanting to rack them, but every option was too expensive.)
Expensive is the word. My brother bought a Supermicro 4u case for his file server for about $500, and it only has room for 8 drives. Most of the cases I find are limited to only 24 drives for 4u, which is kind of a waste of space IMO. If it were possible to find the case from a Thumper, that'd be a good way to rackmount disks. It wouldn't waste so much of the depth, anyways. Other than that, I guess you could build your own.
Also, it seems that the el-cheapo chipset on those Rosewill controllers will not handle +500GB drives, so that's about it in terms of expandability. I may give this some thought. I don't like being limited when prices keep dropping on more spacious drives.
The sil 3112 claims to use 48-bit addressing for sectors. Since a sector is 512 = 2**9 bytes long, that means you can access 2**(48+9) = 2**(57) = 128 PiB with them. It's strange that they would only partially implement that support. Stopping after 500GiB probably means it's only using lg(500GiB) - lg(512) = 39 - 9 = 30 bits for the sector addressing. Weird.
 
Upon further contemplation, I've decided to go with the 8-port Supermicro controller that you've linked me to. I'm going to change the motherboard again to one with only 3 PCI slots, I won't be able to fit 4*8+4 drives in a case. Thank you for all of these recommendations, they are really helping out.

Also, I don't have any definitive proof that the sil chipset doesn't handle +500GB devices - IIRC there is more than one report on Newegg mentioning it though. While I take most Newegg reviews with a grain of salt, I'm not sure this is something that can be attributed to user error. Either way, I've decided on the Supermicro controller, so no problem.
 
I've run out of other things to do, so I gave NFS-on-Windows benchmarking a shot. I didn't do anything fancy (i.e. non-sequential) but I still think the results are interesting ;) I used a 2GB file (the same one in every case). I transferred it from Solaris to Windows via three methods, and kept the "Network Utilization" graph set to "normal" update speed. In order, you have NFS, SCP, and SMB. Testing was not very scientific - I was running BitTorrent, playing music from the network drive, and so forth - but nothing you wouldn't be likely to do while copying files. Probably a reasonable guess at their real-life performance numbers. 100 megabit lan was used, Intel controllers of various sorts at both ends. I can be more specific if you want, but I don't think it's real relevant.

Image, linked for large number of pixels but 56k safe :p

First was NFS. I got up to about 78% utilization, which isn't too bad. Next, with SCP (using WinSCP 3), I got shockingly bad results - only about 21% utilization! Only 7% cpu was being burned on the Solaris end, and only about 12% at this end, so I don't know what the bottleneck was. If you have suggestions I'll be glad to test them. Finally, SMB. I don't know how to characterize this other than "all over the place". Perhaps I should use NFS all the time... It's much more consistent in terms of maxing out the interface. Or at least tune SMB ;)
 
Excellent info. I can think of a handful of people who would be interested in these results. While not scientific, they clearly show NFS to be the undisputed winner. Thanks for doing this, looks like I'll be avoiding SMB.
 
Back
Top