PDA

View Full Version : Restoring a bombed RAID5 array


Boomslang
11-04-2007, 11:49 PM
Recently, one of the drives in an 8-drive software RAID5 array began making a buzzing noise, but the system wasn't throwing any errors so I couldn't figure out exactly which one it was by checking logs. Thus, I thought it might be a reasonable idea to unmount the array, and unplug the drives one by one, because hey, the SATA spec allows for hotswapping, right? I found which drive it was because I could hear it spinning down. So natually, I'm going to replace it, but I decided I'd fire up the array again and back up my most recently added data, which is most important out of all the data.

Interestingly, the drive no longer buzzes. I was happy to see this, but unhappy to see that my array no longer appears anywhere on my system. I had it set up with LVM on top of mdadm RAID5. I had unplugged five devices before identifying the one that was making the noise, and that's the precise number of devices that can't be added to the array, according to dmesg: raid5: automatically using best checksumming function: generic_sse
generic_sse: 5878.000 MB/sec
raid5: using function: generic_sse (5878.000 MB/sec)
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised:

dm-devel@redhat.com
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
md: Autodetecting RAID arrays.
md: autorun ...

md: considering sdi1 ...

md: adding sdi1 ...
md: adding sdh1 ...
md: adding sdg1 ...
md: adding sdf1 ...
md: adding sde1 ...
md: adding sdd1 ...
md: adding sdc1 ...
md: adding sdb1 ...

md: created md1
md: bind<sdb1>

md: bind<sdc1>
md: bind<sdd1>
md: bind<sde1>
md: bind<sdf1>
md: bind<sdg1>
md: bind<sdh1>
md: bind<sdi1>
md: running: <sdi1><sdh1><sdg1><sdf1><sde1><sdd1><sdc1><sdb1>


md: kicking non-fresh sdg1 from array!
md: unbind<sdg1>
md: export_rdev(sdg1)
md: kicking non-fresh sde1 from array!
md: unbind<sde1>
md: export_rdev(sde1)
md: kicking non-fresh sdd1 from array!


md: unbind<sdd1>
md: export_rdev(sdd1)
md: kicking non-fresh sdc1 from array!
md: unbind<sdc1>
md: export_rdev(sdc1)
md: kicking non-fresh sdb1 from array!
md: unbind<sdb1>
md: export_rdev(sdb1)


raid5: device sdi1 operational as raid disk 7
raid5: device sdh1 operational as raid disk 6
raid5: device sdf1 operational as raid disk 4
raid5: not enough operational devices for md1 (5/8 failed)
RAID5 conf printout:


--- rd:8 wd:3
disk 4, o:1, dev:sdf1
disk 6, o:1, dev:sdh1
disk 7, o:1, dev:sdi1
raid5: failed to run raid set md1
md: pers->run() failed ...
md: do_md_run() returned -5
md: md1 stopped.


md: unbind<sdi1>
md: export_rdev(sdi1)
md: unbind<sdh1>
md: export_rdev(sdh1)
md: unbind<sdf1>
md: export_rdev(sdf1)
md: ... autorun DONE.

/proc/mdstat shows no arrays, and I can't manually remove and re-add the devices to the array to get it back online, as it says that it can't get array info, and thus can't complete these operations.

So, am I hosed?

Rakinos
11-05-2007, 12:24 AM
did you have the array completely unmounted before you started unplugging drives? Have you tried mdadm --assemble --scan

Boomslang
11-05-2007, 12:49 AM
did you have the array completely unmounted before you started unplugging drives? Have you tried mdadm --assemble --scan

Got it back with the help of a friend from home. Thanks for your response.

Where is the delete post button...?

hokatichenci
11-05-2007, 02:00 AM
Why not post the solution? People do sometimes use the search tool.

protias
11-05-2007, 06:00 PM
Got it back with the help of a friend from home. Thanks for your response.

Where is the delete post button...?

Why not post your solution as someone else may have that problem and then search the forum for the solution?

Boomslang
11-05-2007, 08:05 PM
Why not post the solution? People do sometimes use the search tool.

Why not post your solution as someone else may have that problem and then search the forum for the solution?

Sure thing -

The array was explicitly defined in /etc/mdadm.conf, then "mdadm /dev/md1 --assemble --force" was run, which corrected the mismatched write offset value.

All in all, a pretty simple fix.

protias
11-05-2007, 08:57 PM
Thanks :)

Gambit
11-06-2007, 10:08 AM
Just an aside... couldn't he have failed each drive manually, then unplug it which would have kept the array up without umounting it?