Quiet high-end ZFS NAS home file server, Dec '09 edition

dandv · Dec 3, 2009

[Cross-posted at SPCR]

I want to build some really reliable storage for my data, and it seems that ZFS is the only filesystem at the moment that does live checksumming. That rules out DroboPro, so I'm looking to building a quiet ZFS NAS that would start with 6 1TB or larger hard drives. I'd like this system to be very reliable and relatively future-proof for a few years, so I'm willing to invest some serious $$$.

Case: I think I've settled on the Antec Twelve Hundred case because it's good at cooling, quiet, and simply has 12 bays that allow elastic mounting. The SilverStone Raven is its counter-candidate, but I find its construction quite odd.

For the PSU, I'm torn between Antec CP-850 and Nexus RX-8500. The Nexus has a very uniform power profile, and I'd rather not have the Antec spin up and down based on load. On the other hand, I'm not sure how often my file server will draw more than 400W under use.

For the hard drives, I've read that WD Black drives are actually WD RE3 with a software setting changed. I'd also like to buy *different* drive types, not just 6 WDs. Maybe 2 WDs, 2 Seagates and 2 other drives. Recommendations?

For the motherboard, CPU and RAM I have no idea, other than the CPU has to be 64-bit and the RAM must be ECC.

Advice please?

Lastly, I'm wondering if I can make this PC into a media server by adding a Blu-ray drive and a good sound card. But I have no idea of the OpenSolaris support for such multimedia usage, or if Windows 7 on VirtualBox would get sufficient hardware access to output HDMI or SPDIFF signals. (Running OpenSolaris virtualized is not an option because of the reliability risk)

Suggestions on that would be highly appreciated as well.

Dangman · Dec 3, 2009

Subscribing this as reminder for me to answer/reply/post 12 hours. A bit too tired at the moment to answer correctly.

Oh and no system in the world is ever future proof. Oh and you're only using 12 drives correct?

EDIT: To start off, you're gonna have to go Xeon for your wants:
$240 - Intel Xeon X3440 CPU

Supports ECC RAM, is a Xeon, and should be more than enough CPU power for your needs

lightp2 · Dec 3, 2009

Hi, I am actually in the process of building a file server so perhaps can share with you some suggestions. I will focus on WD disk here.

You need to study carefully about the hard disk models especially if you are buying a lot. According to info elsewhere and also on the storage forum, WD has deactivated TLER support on the consumer models for the latest manufacturing batch. You cannot use the WDTLER utility to change setting. This may be an issue if you wanted to use them as RAID drive.

Obviously For RAID you can buy the WD RAID Edition models. However, locally they are about 40% more expensive compare to consumer model with the same capacity.

I understand you are focusing on 1TB or higher. According to initial info, Caviar Black TLER support is deactivated. I read and heard but have no 1st hand info.

For me, I am currently stuck because my budget is limited and the TLER is critical for me. I am trying to find WD6400AAKS from earlier manufacturing batch where TLER is still available. If not, then I will consider the RE drives.

lightp2 · Dec 3, 2009

In US you can actually get pretty reasonable pricing storage enclosure compared to building your own. I found the following listing in newegg. Looks like it even provides you with hot-swap bays.

http://www.newegg.com/Product/Produ...6&cm_re=rosewill_8-bay-_-16-132-016-_-Product

Please note that I have no user knowledge about the product. I actually consider buying but Newegg will not sell to non-US location.

kharan5876 · Dec 3, 2009

In US you can actually get pretty reasonable pricing storage enclosure compared to building your own. I found the following listing in newegg. Looks like it even provides you with hot-swap bays.

http://www.newegg.com/Product/Produc...-016-_-Product

That enclosure uses sata port multipliers which are not supported in opensolaris or freebsd. So he wouldn't be able to use ZFS with that. If you are going with an external enclosure I would recommend a SAS enclosure.

If you're staying with one case then either do internal SAS or direct connect SATA ports.

As far as watching movies or listening to music opensolaris should work ok but I would test it first to be sure all of your hd content doesnt lag or anything. I had some issues playing certain hd movies with vlc in windows. If you are looking into a tv tuner and doing the tivo thing then it may or may not work so well. I don't know what driver support or tv software (mythtv?) opensolaris has now. If you want to play games (pc or console emulator) on your box then opensolaris (and arguably even linux) is out of the question.

If you are looking at trying to do a lot of things with your server you may want to weigh the pros and cons of opensolaris vs freebsd. Both support zfs, the version of zfs in opensolaris is newer but freebsd has a better software collection in ports that opensolaris does with its pkg system. Note that if you decide to try both, once you create your zpool in opensolaris you will be not be able to import it into freebsd because it will be a newer version than freebsd supports.

EDIT: To start off, you're gonna have to go Xeon for your wants:
$240 - Intel Xeon X3440 CPU

Why do you have to use a Xeon? Those are more expensive and generally draw more power. That's bad if this thing is on 24/7. As I understand it Xeons really only come into play when you want to have a motherboard that supports multiple processors.

A Core 2 quad or a i5/i7 will work just fine. I'm using a q9450 with ecc ram in my server.

Have you considered the Norco 4220?
http://www.newegg.com/Product/Product.aspx?Item=N82E16811219033&Tpk=norco 4220

Also take a look at the case offerings supermicro has, like this one
http://www.supermicro.com/products/chassis/4U/747/SC747TG-R1400-SQ.cfm
Their cases and drive bays tend to be high quality.

Supermicro also makes pretty good server motherboards. You can take a look at their offerings on their site.

On the other hand, I'm not sure how often my file server will draw more than 400W under use.

Unless you are putting in some heavy duty graphics cards probably never, even with all 12 drives spinning. My server idles at around 150W. Someone made a thread earlier in the data storage forum where people posted the power output of their servers.

or if Windows 7 on VirtualBox would get sufficient hardware access to output HDMI or SPDIFF signals. (Running OpenSolaris virtualized is not an option because of the reliability risk)

Opensolaris does xvm (xen) which can passthrough pci devices, so you could potentially install a graphics card and a sound card and give them to the vm. I don't know how well it works and what the performance would be like. If you try this and get it to work I'd be very eager to know how it turns out.

lightp2 · Dec 3, 2009

Thank you very much for pointing out that port multiplier driver support is an issue for opensolaris and FreeBSD. I myself also missed that driver support check.

keenan · Dec 3, 2009

I hope you are aware that ZFS currently (or last time I checked, anyway) doesn't support raidz or raidz2 expansion, so you can't just add a single disk and expand the array.

As far as hardware, I'll also recommend SuperMicro boards and chassis. They are excellent quality, though the chassis will not be silent, which seems important to you. It'll be tough to get a very silent server with lots of disks though, disks are fairly loud and you'll need decent airflow to keep a stack of 12 of them cool. It could be made fairly quiet, but not silent.

I would look at SUPERMICRO X8SIL-F; 3xPCIe x8, 6 onboard SATA, a pair of high quality Intel NICs, and fills all your other requirements. Lots of room for expansion in those PCIe slots, consider the SuperMicro AOC-USAS-L8i. 8 SATA ports for about $125, and well supported in pretty much every OS. Officially only Xeon 3400 series is supported; it might work with Core i5, but really the low-end Xeons are about the same price anyway. Far more power than you need. Toss in a couple gigs of <Insert favourite brand> DDR3 ECC memory and you've got most of what you need.

Dangman · Dec 3, 2009

kharan5876 said:
Why do you have to use a Xeon? Those are more expensive and generally draw more power. That's bad if this thing is on 24/7. As I understand it Xeons really only come into play when you want to have a motherboard that supports multiple processors.

A Core 2 quad or a i5/i7 will work just fine. I'm using a q9450 with ecc ram in my server.

Well for one: that Xeon I recommended is simply a server version of the LGA 1156 Core i7 CPUs. In fact, it's the cheapest Core i7 based CPU (LGA 1156 and LGA 1366) with Hyperthreading support if you don't live near a Microcenter. Won't hurt to have HT support IMO. In addition, its power usage is pretty much the same as the C2Qs and i5/i6 CPUs. And finally, at least there's a guarantee that the above Xeon CPU will work in a server class LGA 1156 motherboard.

If you look at the specs and pricing, the Intel Xeon X3440 CPU really is a good deal. It's $50 cheaper than the closest Core i7/LGA1156 CPU (the Core i7 860) yet is only 266Mhz slower. It's only $40 more than the Core i5 but it does have HT which may come into play later on. So in other words, in this case, the Xeon CPU I recommended is not that much more expensive nor does it draw more power.

unhappy_mage · Dec 3, 2009

I'm using the Intel S3210SHLC for my OpenSolaris build. It works well; all the onboard devices are supported except the video (in graphics mode; it works fine for the console). Two x8 slots (one is x4 electrical) and an x16, plus two PCI. Core 2, does ECC on DDR2.

kharan5876 said:
Opensolaris does xvm (xen) which can passthrough pci devices, so you could potentially install a graphics card and a sound card and give them to the vm. I don't know how well it works and what the performance would be like. If you try this and get it to work I'd be very eager to know how it turns out.

My understanding is that while Xen does PCI passthrough, XVM (the OpenSolaris variant) doesn't include that capability yet.

okashira · Dec 3, 2009

lightp2 said:
Hi, I am actually in the process of building a file server so perhaps can share with you some suggestions. I will focus on WD disk here.

You need to study carefully about the hard disk models especially if you are buying a lot. According to info elsewhere and also on the storage forum, WD has deactivated TLER support on the consumer models for the latest manufacturing batch. You cannot use the WDTLER utility to change setting. This may be an issue if you wanted to use them as RAID drive.

Obviously For RAID you can buy the WD RAID Edition models. However, locally they are about 40% more expensive compare to consumer model with the same capacity.

I understand you are focusing on 1TB or higher. According to initial info, Caviar Black TLER support is deactivated. I read and heard but have no 1st hand info.

For me, I am currently stuck because my budget is limited and the TLER is critical for me. I am trying to find WD6400AAKS from earlier manufacturing batch where TLER is still available. If not, then I will consider the RE drives.

I have 6 WD6400AAKS drives that all have TLER enabled that im currently using in RAID 6. I am looking to sell. I purchaed all of them from diff vendors intentionaly to ensure they are from diff batches. 5 are ~2-3 month old, another 1.3 yr old.

dandv · Feb 25, 2010

Hi all,

I'm back to the task of building a reliable file server, and it's almost March 2010.

Any updates?

Should I still be looking at ZFS and Sun OS? Anything new on the scene that I should know about in terms of cases, hard drives, mobos, CPUs, maybe even a prebuilt reliable storage?

Again, my requirements are:
1. very high reliability
2. 6TB+
3. Price is less important. I'm willing to invest in this system, but still, 6TB of SSD storage would be overkill.
4. noise on the lower side

odditory · Feb 25, 2010

Does the system need to be reliable?

dandv · Feb 25, 2010

odditory said:
Does the system need to be reliable?

Not completely sure what you mean here, since I mentioned reliability at least twice in my previous post. But in case you are talking about a component drill-down,

* I want the storage to be reliable. If a drive has a silent read error, I want it to be detected and reported. I want it to be redundant to support the simultaneous failure of two hard drives.

* I obviously need the other path components between the server and the client (say, my laptop) to be reliable.

If there are other items I should look at, please let me know.

sub.mesa · Feb 26, 2010

You would want to deactivate TLER if you are going to use ZFS (or any advanced RAID). TLER prevents proper fixing of your harddrives, and may cause permanent errors that were in fact recoverable.

The only reason to use TLER, is to prevent the system from lockup/freezing for more than a few seconds, when a drive in a RAID encounters an I/O error. Expensive servers may lose thousands of dollars each second a server does not work. There TLER is useful. TLER is harmful for most home users; be sure to deactivate it.

OP, you opened this thread a year ago. Are you serious with this plan? Why did you not yet buy anything. What questions are unanswered still? Do you already have hardware to create a box? Did you already test ZFS in a VM?

Darakian · Feb 26, 2010

Why are you so concerned about data integrity? Is this thing going to house mission critical data? If it's just a media server, I'd say drop the ECC ram requirement and move to a desktop based chipset (i3 or athlon 2 would work fine). If you really want data integrity get a hardware raid controller and run raid 6 on this thing.

That all said, if you want to stick with what you have get some seagate LP drives (or WD green, or samsung eco, or etc...), mount them elastically, get a low TDP chip and get a decent passive heatsink. Water cool if you're really worried, but odds are the chip will never be stressed. I have a 20TB file server at work that rarely goes over 5% utilization (quad core opteron 1.8Ghz, 4GB ram). It's mounted as a network share on a few machines and our other servers backup to it.

sub.mesa · Feb 26, 2010

If you really want data integrity get a hardware raid controller and run raid 6 on this thing.

Of course not, RAID cannot distinguish corrupt and uncorrupt data; thats why ZFS comes into play, and uses redundancy from the RAID to restore any corruption on the fly, as it is discovered.

Hardware RAID is clearly inferior to ZFS; with hardware RAID many of the benefits of ZFS will get wasted. ZFS has a smarter RAID engine and the combination of RAID engine and filesystem yields emerging properties.

You also don't need water cooling to cool 5W of heat (the power consumption of AMD chips in idle). ECC memory would not be strictly required though, as any memory error that causes corruption would be detected by ZFS and ZFS would use data from other disks to restore the corruption on the fly.

bexamous · Feb 26, 2010

If you're doing software raid, ZFS, you don't have to worry about all the TLER crap.... meaning you can buy WD's 2TB GP drives, 5400rpm=cheap=quiet=cool, win win win.

First thing you should is decide on how much space you will need. 6TB is 'easy' as its only 4 drives in raidz... that fits in any case. If you really want future proof you should just get a Norco 4220 now. BTW 4220 w/120mm fan bracket isn't THAT loud a system... especially if you're only filling it with GP drives.

danman · Feb 26, 2010

sub.mesa said:
Of course not, RAID cannot distinguish corrupt and uncorrupt data; thats why ZFS comes into play, and uses redundancy from the RAID to restore any corruption on the fly, as it is discovered.

Hardware RAID is clearly inferior to ZFS; with hardware RAID many of the benefits of ZFS will get wasted. ZFS has a smarter RAID engine and the combination of RAID engine and filesystem yields emerging properties.

You also don't need water cooling to cool 5W of heat (the power consumption of AMD chips in idle). ECC memory would not be strictly required though, as any memory error that causes corruption would be detected by ZFS and ZFS would use data from other disks to restore the corruption on the fly.

Hardware RAID will verify the array integrity every day if specified. There's always a battery backup of writes and easy expandability of the array from one type to another.

mikesm · Feb 26, 2010

danman said:
Hardware RAID will verify the array integrity every day if specified. There's always a battery backup of writes and easy expandability of the array from one type to another.

This is true for software raid in general, at least the part about verifying array integrity. Battery backup is a different story, but that's what UPS's are for.

But ZFS cannot deal with rehaping arrays - and can't add disks to an array - you have to create a new array (with it's own redundancy) to the pool. This is a big problem for most folks with home servers who would like to add capacity incrementally.

I know folks at Sun who had mapped out what it would take to do this with raidz, but of course the sun sale to oracle stopped all new development for ZFS,

sub.mesa · Feb 26, 2010

danman said:
Hardware RAID will verify the array integrity every day if specified. There's always a battery backup of writes and easy expandability of the array from one type to another.

Hardware RAID doesn't know the difference between corrupt data and data that is intact. That's because hardware RAID has no knowledge about the filesystem.

The only thing hardware RAID can do, is make sure the parity data matches the stripe data. But when a bit flip occurs on a single HDD in the RAID, and you rebuild the array, it would 'correct' the parity data with corrupt data and thus permanently destroy the affected data.

ZFS is smarter and can see 'which' copy (data or reconstructed parity data) is uncorrupted, because it has stored the checksum and thus can detect corruption any time, and also correct it on the fly without doing any filesystem check or bringing the array offline.

odditory · Feb 26, 2010

forget it, MikeSM, you'd have better luck trying to talk an apple nerd out of his iphone or his action figures.

agrophel · Feb 26, 2010

odditory said:
forget it, Mike, you'll have better luck trying to talk an apple nerd out of his iphone or his action figures.

haha

sub.mesa · Feb 26, 2010

mikesm said:
But ZFS cannot deal with rehaping arrays - and can't add disks to an array - you have to create a new array (with it's own redundancy) to the pool. This is a big problem for most folks with home servers who would like to add capacity incrementally.

I don't think this is so bad.

You can expand RAID0's and mirrors (essentially creating RAID10). But RAID-Z and RAID-Z2 cannot be expanded. Bummer you think. But then, look at this upgrade path:

- users starts with ZFS and 4 disks in RAID-Z
- user buys another 4 disks in half a year and wants to expand the array
- user adds a second 4-disk RAID-Z array to the existing pool, not destroying any existing data but adding to the capacity/free space of the volume
- user now has 1 pool containing two RAID-Z arrays of each 4 disks, same free space as a single RAID-Z2 array of 8 disks.

This would mean you use 2 parity disks for 8 drives though, like RAID6/RAIDZ2. The cool thing here, is, that you can use RAID0 for newly written files; so it becomes faster with each array you add.

Really sexy however, is when you set copies=2 on some directory, causing each newly written file in that directory to be stored on 2 devices (a RAID-Z array is 1 device). So even if one array is totally destroyed it would still have the data on the second array. Also, this means more copies are present to heal corruption and thus is more resilient against damage/memory problems/etc.

Best yet, as this works per directory, you can set your personal documents to have 2 or more copies, while your mass storage data which is less important has to do with only "basic" RAID-Z parity protection.

Also remember that expanding a RAID5 or RAID6 is a dangerous operation. A single read or write failure on one of the member disks may destroy all data. You lose the parity protection of RAID5/6 during the expansion process.

All in all, its not a big miss. Still allows plenty of options to expand, in a safe way.

sub.mesa · Feb 26, 2010

odditory said:
forget it, MikeSM, you'd have better luck trying to talk an apple nerd out of his iphone or his action figures.

Funny, but what exactly do you think i'm doing here? Selling ZFS?

If you guys just think FAT32 is the bomb, then i will leave you guys with it. If anyone is truely interested in ZFS i'll gladly help on a voluntary basis.

bexamous · Feb 26, 2010

To anyone who doesn't get problem, here it is:

4 drive RAID5 array, last drive is parity bit, just even or odd, think ecc:
0 1 0 | 1 -- this is our working array
Now a single drive fails...
_ 1 0 | 1 -- this is okay! We can calculate the missing data, _ needs to be a 0!
Instead now we have data corruption...
1 1 0 | 1 -- this is a huge problem, we can tell something is wrong, but we don't know what, most likely controller will decide to update the parity to 'fix' the problem:
1 1 0 | 0 -- and now we have data corruption, most likely unknown to us.

john4200 · Feb 26, 2010

With RAID 6, it should be possible to detect and fix an error in one of the drives. I'm not sure if any of the controllers actually do so, but I think it is possible.

Darakian · Feb 27, 2010

Ok after reading a bit on the zfs healing thing, it is cool and most likely "better" than hardware raid. That said I can't think of a scenario in which raid 6 would not be sufficient for data integrity.

sub.mesa · Feb 27, 2010

Darakian: think about memory errors that cause corrupted data to be written. The hardware RAID controller has no distinction between corrupt and valid data; it only knows about the parity data that needs to be in sync with the real data. So ZFS can protect against RAM errors - meaning you dont need ECC RAM; ZFS does that part already using checksums. This only works for redundant ZFS pools, not for single disks or RAID0's. In that case, ZFS would detect corruption but won't be able to fix it. Still you would know exactly which files are affected; very useful for end-users. Just deleting the files if they are unimportant fixes the error. Normal operation is possible, but read access to corrupt files will be denied as a precaution.

Other than RAM errors, you can also have bit flips on your harddrives. Especially now that HDDs get bigger and size, and the number of uncorrectable errors stays roughly the same, bigger HDDs are more susceptible to bit flips. If you're using hardware RAID and a bit flip occurs on a location used for parity data; there is no problem. The hardware controller would not use the parity data until a disk is missing, and a rebuild will fix it. You will be unaware of this bit-flip, however. And should you lose a disk then you get the 'corrupted' version because now the controller has to use the parity data including the part that was corrupted; and it has no way to know it is corrupt.

In the case of using Hardware RAID and a bit flip occurs on a data HDD (i.e. not parity information) - then you're screwed since Hardware RAIDs will assume the real data is more trustworthy than the parity data. Again: it cannot distinguish corrupt and non-corrupt parity/data. All it can do is rebuild the parity and you would have lost all opportunity to get the un-corrupted version of that location's data.

If such a thing happens in a location that is used for filesystem metadata, you're so screwed! Entire directories can be lost/hidden, even if not written to/changed for years. ZFS has redundant meta-data - even for single disks. And as this metadata is checksummed just like data is, ZFS will also know if the metadata is corrupt and use a redundant copy. For RAID-Z's you would have 3 metadata copies by default, adjustable via ZFS interface.

So hardware RAID is vulnerable to both RAM memory errors affecting disk I/O, as well as bit flips that occur on the harddrive. Worse yet, you have no way to detect this corruption. If the corruption occurs on live data you're instantly corrupted with no way of detecting this on the RAID level. If the corruption occurs on parity data, then if you lose a disk without rebuilding prior to losing the disk, you would have corruption too. This counts for both RAID1, RAID3, RAID4, RAID5 and RAID6.

So if you want some protection against (silent) corruption, ZFS is your friend.

sub.mesa · Feb 27, 2010

Let me give you an example:

Code:

# zpool status cortex
  pool: cortex
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
config:

        NAME        STATE     READ WRITE CKSUM
        cortex      ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad14a   ONLINE       0     0     0
            ad6a    ONLINE       0     0     3  19.5K repaired
            ad12a   ONLINE       0     0     0
            ad26a   ONLINE       0     0     0
            ad10a   ONLINE       0     0     0
            ad24a   ONLINE       0     0     0
            ad8a    ONLINE       0     0     1  18.5K repaired
            ad4a    ONLINE       0     0     0

errors: No known data errors

As you can see here, a RAID-Z array ("RAID5") has 8 disks, and two of the disks had some corruption. ZFS instantly repaired it, and reports normal operation - no application ever gets to see corrupt data with ZFS - ever!

If this happened with hardware RAID, you would have been unaware of this corruption, and would have just corrupted some files, you may not even be aware of for a long time. But then suddenly you access your marriage-pictures, and find that some of the pictures show JPEG corruption. And that while you thought you were so safe with Hardware RAID5.

Of course, RAID cannot replace a backup. ZFS is a different beast however, as ZFS has instant snapshots, it can roll back a directory or entire filesystem to an earlier date. So even if a virus wipes all your data or your cat jumped on the delete button, you can still roll back to an earlier snapshot.

Creating snapshots is instant (1 sec) and doesn't take extra space - only if you change or add files relative to the snapshot. The snapshot is read-only. Creating a simple script ("cronjob") that snapshots your ZFS filesystems every 24 hours, offers you additional protection against user/application malfunction that RAID cannot offer. In essence, ZFS sort of is a backup if you use snapshots properly.

john4200 · Feb 27, 2010

sub.mesa said:
In the case of using Hardware RAID and a bit flip occurs on a data HDD (i.e. not parity information) - then you're screwed since Hardware RAIDs will assume the real data is more trustworthy than the parity data. Again: it cannot distinguish corrupt and non-corrupt parity/data. All it can do is rebuild the parity and you would have lost all opportunity to get the un-corrupted version of that location's data.

....

This counts for both RAID1, RAID3, RAID4, RAID5 and RAID6.

I do not think that is true for 2-chunk parity arrays such as RAID 6. If a bit flips on any hard drive in the RAID 6, a routine scan can detect it. And theoretically it can be corrected, since an N-drive RAID 6 can be rebuilt using only N-2 drives.

Even if one of the drives died, then the remaining bits for the failed drive, and one other drive, could be successively computed from the N-1 possible ways of selecting N-2 drives from the set of N-1 working drives. One of the N-1 possible ways will compute bits different than the others, and that will be the one that excluded the bit-flip drive, and so will be correct. (there is probably a more efficient way to do the computation I just described)

I'm not sure if any of the hardware RAID controllers support this. But it is possible, I think.

sub.mesa · Feb 27, 2010

John4200: what you say is "theoretically" true, but how would the Hardware RAID controller which disk has corruption? For all it knows the parity might be off. Generally, i think all hardware RAID considers parity to be less trustworthy than data, and will not correct data from parity but rather correct parity to match the data.

The problem is hardware RAID can't detect these bit flips. It would only know the parity and data is not in sync. Even if your story has some merit (things i didn't think of) - i don't think any actual product implements it.

Hardware RAID works fine to detect "failed" drives which fail in an assertive way; like not responding at all or returning I/O errors. That's the signal for the RAID controller to disconnect it and count the array as degraded. But for silent bit flips, the RAID controller gets no such warning. And only a routine rebuild would detect at least that some parity is not in sync, and thus updates the parity to reflect the (corrupted) data. So while hardware RAID can cope with failing or "crashing" harddrives; it's actually very bad at dealing with bit corruption on several drives in a RAID. If one drive even but has a hick-up, the hardware RAID controller might choose to disconnect it and count it as failed; ZFS is much smarter than that. Even with excessive bit flips on all drives in a Mirror, RAID-Z or RAID-Z2, or when using zfs set copies=2 <directory>, ZFS would repair all data on the fly and operate normally; assuming at least one copy of each file/metadata is uncorrupted.

john4200 · Feb 27, 2010

sub.mesa said:
John4200: what you say is "theoretically" true, but how would the Hardware RAID controller which disk has corruption? For all it knows the parity might be off.

No, with RAID 6 there is enough information to correct a bit flip on one drive, as I already explained. Perhaps the problem is that you do not understand RAID 6. RAID 6 does not use a simple parity checksum, but rather a kind of Reed-Solomon error correcting code with Galois fields math. The computation is complicated, but I think I explained clearly in my previous post how an error could be detected and corrected with RAID 6.

mikesm · Feb 27, 2010

sub.mesa said:
Funny, but what exactly do you think i'm doing here? Selling ZFS?

If you guys just think FAT32 is the bomb, then i will leave you guys with it. If anyone is truely interested in ZFS i'll gladly help on a voluntary basis.

I really like ZFS. I wish Sun had not taken such a proprietary path with it, but instead fully open sourced it so it could be made part of the linux kernel. If that had happened, th community with have added reshaping support to it - there is nothing in the ZFS architecture that prevents it. The other issue is that OpenSolaris has limited driver support for a lot of hardware that folks like us like to use. This makes it non-trivial to build a system that can run ZFS well. Plus there are a lack of GUI tools for management that folks here like.

I know senior people at Sun, and they know what needs to change to make it work really well for our type of use.

However, Sun's fortunes were less than bright, and the Oracle purchase has basically gutted ZFS development. Oracle has a next gen system called BTRFS that isn't bad, but the conflict means that ZFS will be EOL'd at some point. Maybe they can get the whole ZFS code base open sourced and then it can live on past Sun, but Oracle isn't exactly that type of organization...

sub.mesa · Feb 27, 2010

Both FreeNAS and FreeBSD have ZFS. And i actually prefer FreeBSD (8.0) over OpenSolaris. I do agree ZFS is not widespread. But for a NAS you don't really need to know the targeted OS that well; just things relevant to your NAS. I guess for a home-built NAS made by a windows-user, trying FreeNAS first to get the basics, then moving forward to something like FreeBSD would be a learning curve, but it sure would be a rewarding one. ZFS is also very maintenance free; you never need to rebuild it or do anything; other than replacing failed disks when that happens.

You're right about BTRFS and yes Sun did use the CDDL-license to protect ZFS from running on Linux natively, so they has a point in selling OpenSolaris. But truth is, ZFS *IS* open-source. It's just not compatible with the (rather restrictive) GPL license. Linux itself isn't really 'free'; it has lots of restrictions that make it incompatible with almost anything but licenses that basically disclaim anything; like the BSD license. As FreeBSD uses the BSD license, it has no problems of porting good technologies like Sun's D-Trace and Sun's ZFS.

In fact, projects like Ubuntu are looking at alternatives to the Linux kernel, there already is a project that replaces the Linux kernel with FreeBSD kernel in Ubuntu, kFreeBSD. That may allow usage of "native ZFS" - though that would require some work still. I agree though that a less restrictive license than CDDL would make ZFS more popular. But apparently that wasn't the direct goal of Sun as they wanted some hot selling arguments for their OpenSolaris platform.

But to say ZFS is dead.. well.. ZFS is here now, it has features superior to virtually any filesystem, works stable for me, and surely is a huge leap from something like NTFS or Ext3. If you can and want to use it, i don't see any reason not to.

mikesm · Feb 27, 2010

sub.mesa said:
Both FreeNAS and FreeBSD have ZFS. And i actually prefer FreeBSD (8.0) over OpenSolaris. I do agree ZFS is not widespread. But for a NAS you don't really need to know the targeted OS that well; just things relevant to your NAS. I guess for a home-built NAS made by a windows-user, trying FreeNAS first to get the basics, then moving forward to something like FreeBSD would be a learning curve, but it sure would be a rewarding one. ZFS is also very maintenance free; you never need to rebuild it or do anything; other than replacing failed disks when that happens.

You're right about BTRFS and yes Sun did use the CDDL-license to protect ZFS from running on Linux natively, so they has a point in selling OpenSolaris. But truth is, ZFS *IS* open-source. It's just not compatible with the (rather restrictive) GPL license. Linux itself isn't really 'free'; it has lots of restrictions that make it incompatible with almost anything but licenses that basically disclaim anything; like the BSD license. As FreeBSD uses the BSD license, it has no problems of porting good technologies like Sun's D-Trace and Sun's ZFS.

In fact, projects like Ubuntu are looking at alternatives to the Linux kernel, there already is a project that replaces the Linux kernel with FreeBSD kernel in Ubuntu, kFreeBSD. That may allow usage of "native ZFS" - though that would require some work still. I agree though that a less restrictive license than CDDL would make ZFS more popular. But apparently that wasn't the direct goal of Sun as they wanted some hot selling arguments for their OpenSolaris platform.

But to say ZFS is dead.. well.. ZFS is here now, it has features superior to virtually any filesystem, works stable for me, and surely is a huge leap from something like NTFS or Ext3. If you can and want to use it, i don't see any reason not to.

Personally, I think ZFS is a better technology than BTRFS (we still need to use lvm in an advanced filesystem?? please...), but BTRFS is part of the core linux kernel development plan, and ZFS is not. And Oracle is not the kind of place where a lot of redundant technologies are funded.

Good luck!

sub.mesa · Feb 27, 2010

There is a native port of ZFS to linux in the works too:

http://kqinfotech.wordpress.com/2009/10/23/hello-world/

One of the biggest questions around this effort would be licensing. As far as our understanding goes; CDDL doesnt restrict us from modifying ZFS code and releasing it. However GPL and CDDL code cannot be mixed, which implies that ZFS cannot be compiled into Linux Kernel which is GPL. But we believe the way to get around this issue is to build ZFS as a module with a CDDL license, it can still be loaded in the Linux kernel. Though it would be restricted to use the non- GPL symbols, but as long as that rule is adhered to there is no problem of legal issues.

john4200 · Feb 27, 2010

That project looks stalled or dead. I wonder what happened.

sub.mesa · Feb 27, 2010

Yes, but at least its theoretically possible to develop ZFS for all open source operating systems with this module trick. Better yet, a group of volunteers could develop ZFS without Sun's or Oracle's help. So ZFS still has a future i think, and right now its just the sleekest filesystem you can find for both business and pleasure.

bexamous · Feb 27, 2010

"No, with RAID 6 there is enough information to correct a bit flip on one drive, as I already explained. Perhaps the problem is that you do not understand RAID 6. RAID 6 does not use a simple parity checksum, but rather a kind of Reed-Solomon error correcting code with Galois fields math. The computation is complicated, but I think I explained clearly in my previous post how an error could be detected and corrected with RAID 6."

RAID5 you cannot fix data corruption, no matter what. RAID6 would be able to detect corruption and fix it, long its limited to a drive. However the big asterisk is that in order for RAID6 to detect/fix corruption it needs to actively scan for it. With filesystem checksums, either ZFS or btrfs, any data read from the array is checked for corruption. With RAID6 if you do a random read it is not going to read entire stripe and verify there is no corruption. Yeah you could do RAID6 where all reads were verified but it would just kill performance.

If you really care about data corruption you need to use either btrfs or zfs. Its really the only sure way to protect against silent data corruption. Personally I don't use either, I run RAID5 w/XFS. I'm not trying to say everyone needs to use btrfs/zfs, but I'm not tricking myself into thinking silent data corruption cannot affect RAID5/6 arrays. Its a real risk, even if small.

bexamous · Feb 27, 2010

BTW ZFS on Linux is just not worth it IMO. If you care so much about your data to use ZFS, why would you then trust it to a barely maintained fuse plugin? ZFS *IS* awesome, but its just not going to be an option under Linux unless they GPL it. Btrfs is the only option under Linux IMO.

Quiet high-end ZFS NAS home file server, Dec '09 edition

n00b

Ninja Editor SuperMod

Gawd

Gawd

Limp Gawd

Gawd

2[H]4U

Ninja Editor SuperMod

[H]ard|DCer of the Month - October 2005

[H]ard|Gawd

n00b

Supreme [H]ardness

n00b

2[H]4U

Supreme [H]ardness

2[H]4U

[H]ard|Gawd

[H]ard|Gawd

Limp Gawd

2[H]4U

Supreme [H]ardness

Weaksauce

2[H]4U

2[H]4U

[H]ard|Gawd

[H]ard|Gawd

Supreme [H]ardness

2[H]4U

2[H]4U

[H]ard|Gawd

2[H]4U

[H]ard|Gawd

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

[H]ard|Gawd

2[H]4U

[H]ard|Gawd

[H]ard|Gawd