Recover raid5 with two dropped disks?

Discussion in 'Linux/BSD/Free Systems' started by unhappy_mage, Jul 1, 2006.

  1. unhappy_mage

    unhappy_mage [H]ard|DCer of the Month - October 2005

    Messages:
    11,455
    Joined:
    Jun 29, 2004
    Hi guys,
    My raid 5 array just dropped two disks due to a controller problem that is since resolved. There's nothing on it that isn't backed up or re-downloadable, but that'd be a pain in the butt, so I'd rather get it up in one piece or some semblance thereof. The data is still there, but the superblock is telling it to drop the drives, so they stay dropped. If I delete the array and rebuild it, would that work? evmsn is segfaulting when I try to do that thru it, is mdadm the tool for this job?

    Any help appreciated, and TIA.

    [​IMG][​IMG]
     
  2. Child of Wonder

    Child of Wonder 2[H]4U

    Messages:
    3,269
    Joined:
    May 22, 2006
    RAID 5's fault tolerance only extends to one drive.

    If you lose the data on two drives you're done.
     
  3. unhappy_mage

    unhappy_mage [H]ard|DCer of the Month - October 2005

    Messages:
    11,455
    Joined:
    Jun 29, 2004
    I realize that. What I'm saying is I haven't lost the data, just the metadata - the header on the disk that says "this is part of a raid set, this is what the config is like" - is missing.

    [​IMG][​IMG]
     
  4. drizzt81

    drizzt81 [H]ardForum Junkie

    Messages:
    12,361
    Joined:
    Jan 21, 2004
    I was about to say: "just wait for U_M"... but then i saw who the OP was.
    If the drives are only dropped and not "broken" then there should be a way to re-add them, shouldn't there? I remember that guy from the storage forum whose PC hosed during a ORLM process and he was able to recover the stuff.

    I doubt that there is an "out of the box" solution for this though. You'd probably have to write the stuff yourself, or talk to the authors of EVMS.
     
  5. unhappy_mage

    unhappy_mage [H]ard|DCer of the Month - October 2005

    Messages:
    11,455
    Joined:
    Jun 29, 2004
    Yeah, I could indubitably get the data off, but I'm not so much concerned with getting it off as re-finding it in place. I only have ~300GB of space to work with for a 600GB array, which would make it a bit messy, and copying all the data off the array in order to copy it back on seems silly.

    I've figured out mdadm's syntax enough that I'm going for the gusto and building arrays (with rebuild speed set to 0!) and xfs_repair'ing them. We'll see how that goes, I guess.

    [​IMG][​IMG]
     
  6. drizzt81

    drizzt81 [H]ardForum Junkie

    Messages:
    12,361
    Joined:
    Jan 21, 2004
    *crosses fingers for you*
     
  7. unhappy_mage

    unhappy_mage [H]ard|DCer of the Month - October 2005

    Messages:
    11,455
    Joined:
    Jun 29, 2004
    Bleh, I seem to have screwed it up. Time to admit defeat and nuke it, I think.

    For future reference, xfs_repair is *not* a read-only operation unless one specifies -n on the command line. :eek:

    [​IMG][​IMG]
     
  8. hardwarephreak

    hardwarephreak [H]ard|Gawd

    Messages:
    1,283
    Joined:
    Jul 13, 2002
    Had this happen to me with an Adaptec Controller running RAID 5 with some Seagate Cheetas.

    Now, this was only to replace the meta data.

    I went into the RAID controller setup BIOS.

    Deleted the exisiting array...yes it was a scary moment.

    Went to create a new array.

    Added all available disks (they should all be there for this, and the drives should all be on their original connections, showing up as the proper drive #)

    When it asked me to setup the parameters of the array, i left it all the same, but if you changed block size, or any other setting, they will have to be identical

    Now here is where the magic happened (and this might not be an available option depending on your controller). I was given the choice of creating the array, but not erasing/formatting disks. If you don't have this option you are screwed.

    Reboot, and pray to the Almighty.

    For me it worked, and my client stopped banging his head against the door jam (true story, only time in my life I have ever seen someone so distraught that they actually banged their head against the wall.

    If this was done in software (raid), then I have no freakin clue...
     
  9. unhappy_mage

    unhappy_mage [H]ard|DCer of the Month - October 2005

    Messages:
    11,455
    Joined:
    Jun 29, 2004
    It was done in software, but there were Problems: I had taken the disks out to check SMART status (my controller doesn't support pass-through :() and not made note of which order they were in, so I had 6 possible combinations of disks to make an array. So I tried creating every possibility, one at a time, and running xfs_repair on them. Unfortunately, xfs_repair searched long enough on the fifth combination that it found something that looked like a superblock (but probably wasn't) and borked the filesystem.

    [​IMG][​IMG]
     
  10. Farva

    Farva [H]ard as it Gets

    Messages:
    35,141
    Joined:
    Feb 3, 2004
    so from now on you will on be taking 1 drive out at a time so you dont forget right? :p sometimes lessons need to be learned...the hard way :(
     
  11. unhappy_mage

    unhappy_mage [H]ard|DCer of the Month - October 2005

    Messages:
    11,455
    Joined:
    Jun 29, 2004
    It wouldn't have helped - LSR is supposed to keep track of which disk does where, even if you plug them in in a different order, so when I initially built the array I tried it. Good news is, it works, bad news is, it hurts your debugging in the future ><

    In any case, I'm not real worried. I'll probably format it this evening and restore what I have from backups.

    [​IMG][​IMG]
     
  12. Farva

    Farva [H]ard as it Gets

    Messages:
    35,141
    Joined:
    Feb 3, 2004
    ouch, that sucks man. looks like i learned something new today :) hopefully everything goes well when you rebuild your server.
     
  13. unhappy_mage

    unhappy_mage [H]ard|DCer of the Month - October 2005

    Messages:
    11,455
    Joined:
    Jun 29, 2004
    Well, evms is being uncooperative, so I'm blanking the disks first. Little late for spilt milk, methinks :p

    On a slightly different note, dd seems to be taking an inordinate amount of CPU time. Like ~25% of one cpu per process. Wasn't expecting that, but I guess with 512 byte blocks it's par for the course.

    [​IMG][​IMG]