ZFS: bringing a disk online in an unavailable pool

indotoonster · Dec 22, 2013

I have a home server using FreeBSD and ZFS that has worked well for the last 5 years, and on several occasions I have successfully replace faulty disks.

However, today a minor disaster has happened, and I'm hoping to find a solution.

I have a top-level pool that consists of 3 vdevs, each of which is a raidz1 pool, so up to 3 disks can fail -- assuming that they all belong to different vdevs -- and data integrity is intact.

Yesterday, I noticed quite a few errors being reported by 1 disk in 1 vdev. From past experience, this usually indicates that the disk is about to fail, so I do what I usually do:

1. Offline the disk: zpool offline tank gpt/ta4

2. Physically replace the disk.

3. Set up the new disk with gpart, and then call zpool replace tank gpt/ta4

However, this time between steps 2 and 3 disaster struck: when I powered on the server after installing the new drive, I smelled something burning, and my HBA indicated that 4 of the drives weren't available! By a stroke of unbelievably bad luck, there must have been some voltage surge, because another drive in the same vdev (gpt/ta2) is now completely dead, and visual inspection reveals one of the MOSFETs on the PCB is blown.

So now gpt/ta2 is UNAVAIL and gpt/ta4 is OFFLINE, so obviously the vdev, which is raidz1, is also UNAVAIL.

My questions are: 1) Is there a way to bring gpt/ta4 back online? When I try to issue "zpool online tank gpt/ta4", it tells me that the pool is unavailable, so I can't do so. I can understand why this is so, but I was thinking that gpt/ta4, although experiencing some read errors, was basically still a 'good' member of the raidz1 pool prior to taking it offline (zpool status reported there were no known data errors), and immediately upon doing so I shut the server off to replace the drive. Is there anyway to achieve this?

2) Failing that, is there a way to at least bring the remainder of my top-level pool (which consists of 3 raidz1 vdevs) online? The other 2 vdevs are perfectly fine.

Please help, I have a lot of precious data on it

Thanks in advance.

Cheers,
Ruli

Shockey · Dec 22, 2013

have you tried putting the old disk gpt/ta4 back in? I'm a bit confused by your post.

Also i take it no hotspare in the pool?

Nex7 · Dec 22, 2013

New OS install with latest ZFS bits - try to follow instructions in Serverfault reply.

Also to answer your second question, no, absolutely no way. If you cannot 'fool' the system into re-importing the pool with the old drive, you're not getting it back through any non-Herculean methods. Which is likely only achievable with a minor rollback (if it will let you get away with it) since you offlined the disk and then shut down, which means /some/ time passed between those two actions, thus likely the txg_id on the old gpt/ta4 is older than the rest of the drives, and it can't just resilver it since you've permanently lost gpt/ta2, so you need it to import at the txg_id of the older disk and just forget (data loss) everything that happened after you removed that drive. This is only going to be achievable if:

a) the gpt/ta4 disk behaves well enough during all this
b) the other disks still have a consistent on-disk txg_id that lines up with a consistent txg_id on the gpt/ta4 disk (/this/ is actually likely /if/ the gpt/ta4 disk has no read errors on the consistent txg_id's, because not much time passed after it was offlined and you shut down)
c) the fates are with you

indotoonster · Dec 22, 2013

@Shockey: sorry, I should have been clearer, yes, when I try to bring /gpt/ta4 back online, I physically reconnected it to the server first. It shows up fine, and gpart recognizes it as still having label ta4, but it is only when I try to bring it online that the error message above occurs.

@Nex7: once again, thank you for your detailed help, both here and on Serverfault (sorry for posting the same question on various forums, I know it's not very good netiquette, but I was in a real panic). Given the 3 preconditions you mention above, the outlook doesn't look very promising, but I'll see what can be done.

DPI · Dec 23, 2013

This is why I stay far away from zfs. Hope you have backup because ZFS is not a replacement for a good strong backup.

indotoonster · Dec 23, 2013

Well, the fates must definitely be with me -- I bit the bullet and opted for a different route: swapping the faulty PCBs on my HDD with the blown MOSFET (/gpt/ta2) with a working PCB from my HDD with the read errors (/gpt/ta4), i.e. combining the best bits from both faulty devices. They are identical devices with the same model and PCB code. Reading up on PCB swapping from sites such as donordrives.com and hddguru.com helped a lot, but it was still a huge gamble. Lo and behold -- it worked! The array is resilvering with a new replacement drive as I type.

Since many of you seem to think ZFS (in particular raidz1) is not a good set-up, do you have any recommendations for what I should try out as an alternative?

Many thanks again for all your comments!

danswartz · Dec 23, 2013

Um, yeah. He removed a disk from the pool, and while it was out, suffered a catastrophic failure of another drive, and now it won't let him enable the offlined drive. And this is a reason to 'stay far away from zfs'? I've used linux logical volumes and had more than one cluster-f*ck where the entire set was rendered unusable because of some bug or other. Or take a hardware raid where a drive fails, you replace it with a new one, and while it is rebuilding the array, a drive's PCB catches fire. As far as I know, you're SOL there too. Also, nice strawman with the comment about 'zfs not being a backup'. Troll much?

danswartz · Dec 23, 2013

OP, who are the 'many of you' apart from DPI? ZFS is just fine, as long as you have backups too (but this is true of any raid/pool/volume setup...) My only criticism is raid-z. Depending on how many drives you have, raidz2 would be better...

indotoonster · Dec 23, 2013

Hi danswartz, sorry, you're right, I was actually aggregating some opinions from across several forums. Indeed on this thread it is just DPI's comment above. I'm sure when I think about this properly in the cold light of day and having had a good night's sleep, I will conclude that indeed ZFS is fine -- I'm actually counting my lucky stars I've managed to recover my data from this series of very unfortunate events. You're right on the issue of raidz1, though, this episode has really underlined how vulnerable it can be. However, first order of business will be setting up a proper backup policy for the most critical data.

danswartz · Dec 23, 2013

The problem is that not all opinions are created equal. And no, I'm not blowing my own horn here. The main problem ZFS has is there is a tremendous amount of misinformation floating around out there, usually because one of those 'everyone knows X' post on a blog somewhere gets repeated and it becomes 'reality'. If you read something by someone like Richard Elling or nex7 or certain other folks, you can take that to the bank. Anything else, especially if it is anecdotal or came off a random blog somewhere... take with a grain (or possibly a shaker) of salt.

djflow195 · Dec 23, 2013

Any reason for 3 striped RAIDz1 vdevs versus 1 RAIDz3 vdev?

RAIDz3 would be slower and your IO would be capped with only one vdev but you could suffer three dead drives and still not lose data. Right now if you lose two drives in one RAIDz1 vdev you lose all your data.

You need backups if you have "precious data". No singe server is fail-proof.

PinkyThePig · Dec 23, 2013

indotoonster said:
Hi danswartz, sorry, you're right, I was actually aggregating some opinions from across several forums. Indeed on this thread it is just DPI's comment above. I'm sure when I think about this properly in the cold light of day and having had a good night's sleep, I will conclude that indeed ZFS is fine -- I'm actually counting my lucky stars I've managed to recover my data from this series of very unfortunate events. You're right on the issue of raidz1, though, this episode has really underlined how vulnerable it can be. However, first order of business will be setting up a proper backup policy for the most critical data.

Wow congratulations on the rebuild! I wasn't holding out much hope for you from so many drives failing out. If you wanted a more secure setup there are three routes you could take. One would be using a higher raidZ level, using 2 or 3. Two would be having each vdev in its own pool with some sort of overlying tech providing simple drive pooling (where it decides how to distribute data to the pools). Three would be setting up a secondary pool and doing backups to the second pool (on another computer).

Breakdown of each:
1. Better redundancy but if same thing happens you could still potentially be in a bad spot (i.e. 3-4 drives fail)
2. This would cost the same as your current setup but data loss would be limited to the unrecoverable pool. So you would lose a third of your data instead of all of it. (Also, I'm not aware of any tech offhand, never peronsally looked into it, but the concept should work).
3. This is the most expensive by far but is going to be the safest as you could just move data back over from the good pool. You could stick the pool into your normal desktop and NAS could WoL it or similar to backup to it at night.

I would recommend getting setup with backblaze or crashplan. Both are cheap (5ish a month) and would provide quite a bit of peace of mind. A nice advantage of crashplan is if you have a storage hoarder friend you can backup to their computer and they can backup to you, free of charge.

My preferred configurations for pool building is 6 disk z2 and 11 disk z3 vdevs as it stacks the odds much higher in your favor. Then use some sort of offsite backup whether that is tapes, online service or simply a second computer you have setup at a friends place.

Also, if most of your data is replaceable (e.g. media gotten from online) you could just backup unique data and then backup a text file containing your files to a cloud provider. Back when I was windows only I had a 2TB drive of random junk and I would run "tree H: >> backup.txt" to keep a listing of files on the drive (I may have gotten the syntax wrong on previous command, its been awhile). This is what I do. ~a few hundred gigs is unique data and is backed up to my other computer, then I just have a list of my media stored so if I lose it all I just spend some (significant) time redownloading.

dandragonrage · Dec 23, 2013

DPI said:
This is why I stay far away from zfs. Hope you have backup because ZFS is not a replacement for a good strong backup.

You stay away from ZFS because RAID-Z can't handle 2 bad drives when no other RAID-5-based solution can, either?

This shows that backups have use yes, but it doesn't illustrate any issues with ZFS.

Shockey · Dec 23, 2013

I'm guessing you have no hotspares to resilver to when you seen a failed drive? This could of saved you some headache and stress.

indotoonster · Dec 23, 2013

@djflow195: it's purely an issue of chronology: I originally started out with 1 raidz1 pool, and then subsequently added another, and another. If I had to redo it all now I'd probably go with raidz3 -- is there a way to migrate from one to the other that doesn't require having to recreate the pool with the same number of drives?

@PinkyThePig: thanks so much for the tip about backblaze and crashplan -- I had no idea they were $5/mo. for *unlimited* storage! I wil definitely have a look into those...

@Shockey: correct, I had no hotspares.

danswartz · Dec 24, 2013

No, you can't change existing vdevs. Need to migrate off somewhere, then destroy and recreate the pool.

omniscence · Dec 24, 2013

This is a perfect example why RAID/ZFS alone cannot protect your data. An electrical connection problem like this could easily have fried multiple drives, you were very lucky it was only one. In the worst case you have to restore from backups (which you probably don't have judging from your statements).

ZFS: bringing a disk online in an unavailable pool

indotoonster

n00b

Shockey

2[H]4U

Nex7

Weaksauce

indotoonster

n00b

DPI

[H]F Junkie

indotoonster

n00b

danswartz

2[H]4U

danswartz

2[H]4U

indotoonster

n00b

danswartz

2[H]4U

djflow195

Weaksauce

PinkyThePig

n00b

dandragonrage

[H]F Junkie

Shockey

2[H]4U

indotoonster

n00b

danswartz

2[H]4U

omniscence

[H]ard|Gawd