Recover raid5 with two dropped disks?

unhappy_mage

[H]ard|DCer of the Month - October 2005
Joined
Jun 29, 2004
Messages
11,455
Hi guys,
My raid 5 array just dropped two disks due to a controller problem that is since resolved. There's nothing on it that isn't backed up or re-downloadable, but that'd be a pain in the butt, so I'd rather get it up in one piece or some semblance thereof. The data is still there, but the superblock is telling it to drop the drives, so they stay dropped. If I delete the array and rebuild it, would that work? evmsn is segfaulting when I try to do that thru it, is mdadm the tool for this job?

Any help appreciated, and TIA.

 
I realize that. What I'm saying is I haven't lost the data, just the metadata - the header on the disk that says "this is part of a raid set, this is what the config is like" - is missing.

 
I was about to say: "just wait for U_M"... but then i saw who the OP was.
If the drives are only dropped and not "broken" then there should be a way to re-add them, shouldn't there? I remember that guy from the storage forum whose PC hosed during a ORLM process and he was able to recover the stuff.

I doubt that there is an "out of the box" solution for this though. You'd probably have to write the stuff yourself, or talk to the authors of EVMS.
 
drizzt81 said:
If the drives are only dropped and not "broken" then there should be a way to re-add them, shouldn't there? I remember that guy from the storage forum whose PC hosed during a ORLM process and he was able to recover the stuff.
Yeah, I could indubitably get the data off, but I'm not so much concerned with getting it off as re-finding it in place. I only have ~300GB of space to work with for a 600GB array, which would make it a bit messy, and copying all the data off the array in order to copy it back on seems silly.

I've figured out mdadm's syntax enough that I'm going for the gusto and building arrays (with rebuild speed set to 0!) and xfs_repair'ing them. We'll see how that goes, I guess.

 
Bleh, I seem to have screwed it up. Time to admit defeat and nuke it, I think.

For future reference, xfs_repair is *not* a read-only operation unless one specifies -n on the command line. :eek:

 
Had this happen to me with an Adaptec Controller running RAID 5 with some Seagate Cheetas.

Now, this was only to replace the meta data.

I went into the RAID controller setup BIOS.

Deleted the exisiting array...yes it was a scary moment.

Went to create a new array.

Added all available disks (they should all be there for this, and the drives should all be on their original connections, showing up as the proper drive #)

When it asked me to setup the parameters of the array, i left it all the same, but if you changed block size, or any other setting, they will have to be identical

Now here is where the magic happened (and this might not be an available option depending on your controller). I was given the choice of creating the array, but not erasing/formatting disks. If you don't have this option you are screwed.

Reboot, and pray to the Almighty.

For me it worked, and my client stopped banging his head against the door jam (true story, only time in my life I have ever seen someone so distraught that they actually banged their head against the wall.

If this was done in software (raid), then I have no freakin clue...
 
It was done in software, but there were Problems: I had taken the disks out to check SMART status (my controller doesn't support pass-through :() and not made note of which order they were in, so I had 6 possible combinations of disks to make an array. So I tried creating every possibility, one at a time, and running xfs_repair on them. Unfortunately, xfs_repair searched long enough on the fifth combination that it found something that looked like a superblock (but probably wasn't) and borked the filesystem.

 
unhappy_mage said:
It was done in software, but there were Problems: I had taken the disks out to check SMART status (my controller doesn't support pass-through :() and not made note of which order they were in, so I had 6 possible combinations of disks to make an array. So I tried creating every possibility, one at a time, and running xfs_repair on them. Unfortunately, xfs_repair searched long enough on the fifth combination that it found something that looked like a superblock (but probably wasn't) and borked the filesystem.


so from now on you will on be taking 1 drive out at a time so you dont forget right? :p sometimes lessons need to be learned...the hard way :(
 
It wouldn't have helped - LSR is supposed to keep track of which disk does where, even if you plug them in in a different order, so when I initially built the array I tried it. Good news is, it works, bad news is, it hurts your debugging in the future ><

In any case, I'm not real worried. I'll probably format it this evening and restore what I have from backups.

 
unhappy_mage said:
It wouldn't have helped - LSR is supposed to keep track of which disk does where, even if you plug them in in a different order, so when I initially built the array I tried it. Good news is, it works, bad news is, it hurts your debugging in the future ><

In any case, I'm not real worried. I'll probably format it this evening and restore what I have from backups.


ouch, that sucks man. looks like i learned something new today :) hopefully everything goes well when you rebuild your server.
 
Well, evms is being uncooperative, so I'm blanking the disks first. Little late for spilt milk, methinks :p

On a slightly different note, dd seems to be taking an inordinate amount of CPU time. Like ~25% of one cpu per process. Wasn't expecting that, but I guess with 512 byte blocks it's par for the course.

 
Back
Top