ZFSonLinux 0.6.1 Released

I biggest concern (for using this at work) is that at some point that they stop supporting newer kernels and stick with red hat or ubuntu releases. After that I do not like the restrictions on expansion of raid arrays that zfs imposes and finally I am not yet sure of how bullet proof it is under failure situations. For example if 3 drives in a raid 6 get kicked out can I recover? With mdadm I am fully confident (after extensive testing) that if at least 1 of the 3 kicked out drives is mostly readable I will be able to recover.
 
Last edited:
I biggest concern (for using this at work) is that at some point that they stop supporting newer kernels and stick with red hat or ubuntu releases.
The biggest concern of ZFS on Linux is whether or not the implementation is mature enough. On Solaris and BSD platforms it took a long time before all the bugs and issues were ironed out and ZFS became a mature next-generation filesystem. On Linux the ZFS support appears to be improving, but I'm not sure it is anywhere near the maturity of Solaris and BSD implementations. If that will ever happen.

After that I do not like the restrictions on expansion of raid arrays that zfs imposes
Well, I don't like the unsafe expansion that is the only possibility on legacy RAIDs. The expansion on legacy RAID works by redistributing all data. If bad sectors were to happen during this rebuild, all your data can be lost or unrecoverable.

ZFS also has benefits like allowing expanding with bigger disks and load balancing between the vdevs by writing more data to vdevs that have more free space and/or are faster. Legacy RAID can't do that.

and finally I am not yet sure of how bullet proof it is under failure situations. For example if 3 drives in a raid 6 get kicked out can I recover? With mdadm I am fully confident (after extensive testing) that if at least 1 of the 3 kicked out drives is mostly readable I will be able to recover.
Then you don't have any experience with ZFS? Because there is nothing better than ZFS in this regard. You can basically do everything. If you reboot and make sure ZFS sees your disks, you should have access to your data. Even if you changed disks order, changed controllers, swapped some disks on other controllers, have disconnected certain disks and later re-attached them, etc.

In legacy RAID, if one disk 'fails' the metadata of the remaining members is updated to reflect this. When you reboot - even with all disks present - it will have remembered the fact that the array was broken/incomplete at one point in the past. It will continue to show the affected disk(s) as 'detached' or 'faulted' or 'non-member' or 'free'. This means the actual RAID is extremely fragile and it might break very easily. Only if everything works perfectly, will the RAID work perfectly too. ZFS is much more resilient in this regard.

If we talk about data security, nothing beats ZFS. The legacy solutions are so outdated. The alternatives ReFS and Btrfs are so immature the only reliable method of data storage is ZFS at this point in time.
 
In more than 20 years i have used lots of high-end hardware raid solutions -
and i have lost data more than once suddenly without being warned.

With ZFS i have lost data too, but that happened only when disk failures
were over my selected raid-level. But for such a disaster you have backups.

Beside such a disaster, ZFS is miles better than anything else -
and with ZOL its the universal filesystem not regarding Windows.
 
Then you don't have any experience with ZFS?

I have tested it from time to time over the last 4+ years but not much raid usage. Some time in the next week or so I will try to setup a 6 to 10 drive raidz2 array put some data on it and hot pull 3 drives then see how well I can readd these.
 
The biggest concern of ZFS on Linux is whether or not the implementation is mature enough. On Solaris and BSD platforms it took a long time before all the bugs and issues were ironed out and ZFS became a mature next-generation filesystem. On Linux the ZFS support appears to be improving, but I'm not sure it is anywhere near the maturity of Solaris and BSD implementations. If that will ever happen.

I wouldn't put open source alternatives next to Solaris. The same would go for OI and ZoL too. The ZFS version on Solaris is higher than any of the others. That said, there's nothing wrong with any of them and any of them can work in production. You just need to recognize what they can and can't do then plan accordingly.

We do have ZoL in production already largely serving backup duty. Since the backup software encrypts the data anyway there's no need to worry about LUKS or encryptfs. So it works.
 
I have tested it from time to time over the last 4+ years but not much raid usage. Some time in the next week or so I will try to setup a 6 to 10 drive raidz2 array put some data on it and hot pull 3 drives then see how well I can readd these.
i can save you some time. if you pull 3 drives from a double parity array it is going to fail.
 
Some time in the next week or so I will try to setup a 6 to 10 drive raidz2 array put some data on it and hot pull 3 drives then see how well I can readd these.
You shouldn't need to re-add them. You simply reboot or in case of hot-swap this isn't even necessary. If the operating system resides on the pool that goes offline, you would be in trouble of course.

The important thing is that ZFS just doesn't crap out like other RAID solutions do, kicking disks out and refusing to use them again. This has a valid reason by the way; they can no longer guarantee data corruption if they did otherwise, because they could accept a stale disk as valid disk and they refuse to do that.

ZFS basically holds up to the original promise of RAID, whereas legacy RAID solutions promised data security. But frankly, all they did was add a fragile vulnerable layer - a single point of failure - to the storage chain. While they do have the effect of improving resilience in some cases (traditional disk failure) they also added a risk of their own, that was sometimes a much greater threat. Legacy RAID is old, we got next-generation filesystems now that do not blindly assume a single disk or a disk that is 'perfect'. We have intelligent filesystems now. All that is missing is to let these mature and flow to the general public. ZFS is the furthest ahead towards that goal.

i can save you some time. if you pull 3 drives from a double parity array it is going to fail.
Not really, the pool will become UNAVAIL. But when you reboot with any one disk again, it will be DEGRADED and already accessible and writable.
 
i can save you some time. if you pull 3 drives from a double parity array it is going to fail.

This is a test that I have done easily 100 times with mdadm raid6 and from my testing I expect to be able to recover from this type of senario with the array intact. If I was to move part to all of my 50TB at work to zfs it would have to be recoverable after 3 drives pulled from a live zraid2 array.
 
This is a test that I have done easily 100 times with mdadm raid6 and from my testing I expect to be able to recover from this type of senario with the array intact. If I was to move part to all of my 50TB at work to zfs it would have to be recoverable after 3 drives pulled from a live zraid2 array.

What do you mean with "array intact".
If 3 disks are lost in a striped two parity raid like Raid-6 or ZFS-Z2, your array or pool is lost and not revoverable.

In case of ZFS your pool is usable again in a degraded state without any further action or pool resilver if you put back at least one of the parity disks. If you really need to survice whole lost of 3 disks, you must use Raid-Z3.

Otherwise this is a case for a backup on all striped Raids.
 
What do you mean with "array intact".
If 3 disks are lost in a striped two parity raid like Raid-6 or ZFS-Z2, your array or pool is lost and not revoverable.

In this senario with mdadm (after a reboot of the server) I can put the 3 disks back and force the raid to reassemble the array even though they were kicked out and the array was marked degraded even in the case of writing to the array while the hot pull occurred. In the case of writing yes there will be a little bit of corruption but nearly all of array will be readable. I can also put back in 1 or 2 of the disk and then force the raid to assemble.

In case of ZFS your pool is usable again in a degraded state without any further action or pool resilver if you put back at least one of the parity disks

That is what I am talking about.
 
It seems you mean that you pull out a few disks, and then reinsert them again, and ask how well ZFS will handle that situation? Where no disk has crashed? ZFS will handle this scenario fine. You can reassemble the disks in different order and everything will work fine when you reinsert the disks. There are no difficulties if the disks are intact.
 
It seems you mean that you pull out a few disks, and then reinsert them again, and ask how well ZFS will handle that situation?

Exactly I want to pull out more disks than the redundancy supports and then reinsert them (most likely after a reboot) and determine that after enough of the pulled disks are available will the raid assemble with the data that was stored on the filesystem intact.
 
Exactly I want to pull out more disks than the redundancy supports and then reinsert them (most likely after a reboot) and determine that after enough of the pulled disks are available will the raid assemble with the data that was stored on the filesystem intact.
The thing is, we ZFS users had troubles to understand this question, because it is so weird for ZFS users. This situation you describe is handled without any problems in ZFS, you can reassemble the disks in different slots if you wish and then boot up fine. Do you mean hw-raid might have problems with this?? It is shocking to ZFS users. That is why we did not understand your question. It is so basic so every raid solution should handle it. If they can not handle it without problems, change system?
 
Exactly I want to pull out more disks than the redundancy supports and then reinsert them (most likely after a reboot) and determine that after enough of the pulled disks are available will the raid assemble with the data that was stored on the filesystem intact.

sorry i was confused. basically you're doing an irrelevant test or you're testing to see how reliable crappy sata disks behind expanders may or may not be.

i was under the impression you wanted to test actual disk failures scenarios when instead you really just want to poke and prod.
 
I believe the test is very important since it has happened to me more than once. What happens if you have a power failure on an array that knocks out 1/2 of the disks. In my opinion this senario should not render your array unusable after the reboot. On fake raid or even hardware raid in many cases you array will not assemble after a reboot with the power situation fixed.

you wanted to test actual disk failures scenarios

That is also a part of the my test. After making sure that an array will survive the hot removal failure it must also survive some minor corruption on the disks. I was planing on taking a few disks and overwriting a few sectors seeing how well the raid handled that.
 
you're testing scenarios that almost never happen in the real world.

ZFS will handle re-adding disks it knows about quite well.

not sure what will occur if you pull the drives, write some data to them, then add them back. probably won't like that much. but when would that actually occur in production? do you have people running around randomly pulling drives and sticking them in other servers?
 
you're testing scenarios that almost never happen in the real world.

I must be unlucky then...

but when would that actually occur in production?

The test was trying to simulate the power failure test + drives with unreadable sectors. I do not have any bad faulty disks to test that so the best I can do at the moment is force bad data. I know its different because the drive will not report bad. Although the warranty on my 2008 class Seagate 7200.X drives is almost up so I should have a few unreliable drives in a few months (being that I RMA about one of these per month). I expect ZFS to handle this well.
 
Last edited:
I dunno about unlucky, but I don't think I agree this is a reasonable requirement :)
 
You have never had a power supply failure that only affected some of your drives or a 5in3 raid cage that lost power (loose SATA power connector or some other reason) with the rest of your drives powered?
 
I have had awesome luck with my ZFSonLinux setup. Been running well for over a year now, and I update it pretty frequently. I am on Ubuntu using the stable PPA.

I am currently on the PPA version 0.6.0.98, and will go ahead and upgrade it to 0.6.1 when I get home after work today. I am not a big fan of upgrading ZFS remotely ;)

In the near future I will be tearing my array apart and rebuilding it from scratch as my disk alignment is off (I am using 4k AF drives) and my performance is TERRRIBLE. I will be testing some failure scenarios before I tear this current implementation apart. I expect it all to work very well :)
 
You have never had a power supply failure that only affected some of your drives or a 5in3 raid cage that lost power (loose SATA power connector or some other reason) with the rest of your drives powered?

That seems like a common failure scenario to me. I'm not sure why some people are claiming it is not.
 
What are the pros/cons of Solaris-based ZFS vs Linux-based ZFS?

I love Solaris. All production systems at work are Solaris 10 SPARC systems. At home, I run OpenIndiana for ZFS. However, so many sysadmin tasks would be easier with Linux. I kind of want to migrate.
 
What are the pros/cons of Solaris-based ZFS vs Linux-based ZFS?

I love Solaris. All production systems at work are Solaris 10 SPARC systems. At home, I run OpenIndiana for ZFS. However, so many sysadmin tasks would be easier with Linux. I kind of want to migrate.

Maturity and (arguably) stability. Obviously ZFS is developed for Solaris-based systems, and it needs to be ported over to Linux. To my knowledge, these are the major reasons given.
 
What are the pros/cons of Solaris-based ZFS vs Linux-based ZFS?

I love Solaris. All production systems at work are Solaris 10 SPARC systems. At home, I run OpenIndiana for ZFS. However, so many sysadmin tasks would be easier with Linux. I kind of want to migrate.

None if we aren't including the commercial version of Solaris. All of the open source versions Linux based or not have the same ZFS version give or take. Therefore most of them lack the same features found within the commercial version of Solaris.

In terms of stability I wouldn't put any of the other open source versions above or below that of ZoL. ZoL for me seems the best way to go for non-enterprise use because you get with it all of the packages that come with whatever distro you are running it on.
 
You have never had a power supply failure that only affected some of your drives or a 5in3 raid cage that lost power (loose SATA power connector or some other reason) with the rest of your drives powered?

Or a backplane fails, or one of your controllers dies, or a cable fails, or a controller port dies. I'd say temporarily losing access to 4+ drives is one of the more common failure scenarios.
 
You have never had a power supply failure that only affected some of your drives or a 5in3 raid cage that lost power (loose SATA power connector or some other reason) with the rest of your drives powered?
I never thought of this scenario. But it seems realistic and reasonable. Thanks for asking this question!

I would be very interested in knowing your findings with ZFS. Please test and enlighten us? (I expect ZFS to handle this without problems)
 
It may take longer than expected since other tasks at work have to come before this. I will see what I can do.
 
Does anyone know if ZoL will import a GPT-partitioned pool created on a recent version of FreeBSD (PC-BSD 9.0 in this case)? I know the various Solaris-based implementations don't like GPT partitions.
 
Metaluna, Solaris should handle GPT partitions fine, if you change the partition type to Solaris-usr. This can be done with ZFSguru via the web-interface or manually in Linux for ZFS-on-Linux. I would expect Linux to handle GPT partitioned disks better than Solaris, but I never tried this or heard anyone trying. Please provide us with feedback if you intend to perform these kind of tests on ZoL.
 
sub.mesa, I went ahead and tried ZoL on a new Debian 7 installation. zpool detected and imported my BSD raidz3 pool flawlessly. I'm doing a scrub right now just to give it a little exercise. So far, so good.
 
I'm using ZoL for small office file service, Ubuntu 12.04, here is my experience so far:

1. ZFS ACL has not ported yet. Only standard unix perm (666, 777 ...).
2. If file server is running as Esxi VM, under low memory condition, vmware balloon driver will try to reclaim free block => VM is freezed (100% cpu, can't log in shell) => must reset :(

Other features such as raid-z, recovery .. are the same as other OpenSolaris distros.
 
You have never had a power supply failure that only affected some of your drives or a 5in3 raid cage that lost power (loose SATA power connector or some other reason) with the rest of your drives powered?

Exactly this happened to me a few years ago. And even worse it was an intermittent failure that randomly power cycled my discs AND my RAID cards and onboard SATA. It took me a week to figure out what was going on. ZFS handled it gracefully and when I finally replaced the PSU ZFS automatically performed a scrub and fixed the massive corruption on each of my 12 discs with no difficulty. NO DATA WAS LOST!
 
Exactly this happened to me a few years ago. And even worse it was an intermittent failure that randomly power cycled my discs AND my RAID cards and onboard SATA. It took me a week to figure out what was going on. ZFS handled it gracefully and when I finally replaced the PSU ZFS automatically performed a scrub and fixed the massive corruption on each of my 12 discs with no difficulty. NO DATA WAS LOST!
So you have experienced this scenario, both with hardware raid, and with ZFS? Did you loose data with hw-raid?
 
1. ZFS ACL has not ported yet. Only standard unix perm (666, 777 ...).
I wouldn't expect this anytime soon. Linux itself doesn't have Unix/Windows ACL's. So it's going to be a while (if ever) for this feature.

2. If file server is running as Esxi VM, under low memory condition, vmware balloon driver will try to reclaim free block => VM is freezed (100% cpu, can't log in shell) => must reset :(
This problem is actually shared with all of the other open source alternatives. ZFS under low memory will cause this scenario. However, the fix is setting an ARC limit and limiting resources on the VM host.
 
"However, the fix is setting an ARC limit and limiting resources on the VM host. "

+1
 
So you have experienced this scenario, both with hardware raid, and with ZFS? Did you loose data with hw-raid?

The problem with HW raid....how do you know? If i checksummed each datablock into a PostgreSQL server, then I could maybe find some errors. But doing that is way to complex and extremly difficult to keep in sync(if possible at all). Not to mention the lack of self healing That's is why I use ZFS for critical data.
 
I'm using ZoL for small office file service, Ubuntu 12.04, here is my experience so far:

1. ZFS ACL has not ported yet. Only standard unix perm (666, 777 ...).

This is basically not a ZFS feature but a feature of Solaris and the Solaris CIFS server.
You should not expect this functionality in BSD or Linux with Samba.

The Solaris CIFS server is the only ..X server that fully supports Windows SID and Windows compatible ACL. (Solaris CIFS stores real Windows SIDs as reference for ACL)

Question:
Is ZoL capable of booting from ZFS and a ZFS mirror (allowing bootable snaps of former versions)?
 
One more reason to use ZoL under esxi, instead of OpenSolaris clones, is vmxnet3.

I did buy some cheap mellanox 10gbe sfp+, 01 ibm bnt rackswitch on ebay, some Sfp+Cu cables.
My file server could pump ~ 1100 mbytes/s (ssd pool of course), to other linux / windows machines.

Reaching 1100mbyte/s on Linux (CentOS, Ubuntu) is so easy: install vm-tools, set mtu to 9000, done!.

Vmxnet3 on BSD alike: recompile from source, tinker with tcp param, mtu ...

Vmxnet3 on OpenSolaris: can only run with mtu 1500, max speed around 160mbytes/s. E1000g could reach ~ 300mbytes/s with mtu 9000, tcp window size ...

However, it seems that esxi kernel port loving solaris network stack. My esxi server has better storage network perf with OpenIndiana nfs than that of Ubuntu. :-|
 
Back
Top