Any chance of recovering this md raid 5 array?

Red Squirrel

[H]F Junkie
Joined
Nov 29, 2009
Messages
9,211
We had a long power outage, the UPS shut down everything and all properly, but the outage was just so long that the drives all had a chance to completely cool down before they were powered up again. Upon powering up everything looked ok, and while the VMs were loading 2 of the 5 drives dropped out of my main mdadm array. So everything started going insane like spamming dmesg errors and stuff but I could still access some of the data on the array, but it was hit and miss. Maybe it was cached, I don't know. So I rebooted and hopped for the best. All the drives are up but the raid does not start. It starts with two drives missing. I can add them, and it actually says it's clean but not started. Ironicly, there is no start command. Normally an array just starts automaticly.

Is there any way to actually recover this or am I screwed? I do have backups but it will be such a royal pain having to sift through all that and do a restore, reset all the permissions so stuff works etc... just the thought of it is brutal.

Before I give up, just wondering if there's anything I can maybe try. Though even if I get it going I don't know how long it's really going to last.

Never buy Hitachi drives. They are the biggest pieces of crap ever. I ironicly have 5 incoming in the mail that I RMAed not too long ago. Now 2 more dropped at once. Pure crap.

Guess it's time to go shopping and order 5 new drives. Maybe WD blacks.
 
Playing around with adding/removing drives (through mdadm) I managed to get it started again. And it turns out there is a --run command to start it. Though its still odd that it wont start at system startup and the drives get pushed out. But now it's mounted.

mdadm's solidness never ceases to amaze me. My priority now is updating all my backups (thankfully the most important one was only 2 days old anyway) and then deciding on what new drives to order.

I'm also starting to wonder if it's the sata cards, power, or the bays causing the drives to fail, I've had so many drives fail in this server it's getting scary.
 
It is scary. I rebooted and now it's ok so it seems to stick, but one of the drives dropped out again while running a backup. This time it's one of the new drives. No errors or nothing, it just drops out. I'm running a backup job right now with all VMs turned off just so I have a very new backup and I'll run another backup job that will get the entire array including some of the less important stuff like movies. I'm just glad I DID have a backup to begin with... really shows how important it is to stay on top of that. It was 2 days old which is not too bad. Not much has changed.

I am ordering 6 new drives and 2 new enclosures. Not that I have the money for this... guess that's what the credit line is for. :( I'll also be doing raid 6 instead of 5. Being that even a new drive dropped out, I'm starting to suspect it may be my enclosures that are bad.
 
I think your problem is most likely one of these:

1) power

2) cables

3) HBA

I doubt it is the HDDs themselves, linux, or mdadm. Possibly it could be a bug in the driver for your HBA, depending on what HBA you have.
 
^ This.

Really sounds like a bad HBA above all else. But yes, MDADM is very robust.
 
I'm afraid it might be that too.

I have to ask myself at what point do I just build a whole new server, then I can just troubleshoot this on the side and use it for something less critical.
 
Not sure what hardware you are using, but something like this might help: http://www.newegg.com/Product/Product.aspx?Item=N82E16816103232

I was looking at buying one myself and using it as a pass-through for software RAID and/or hardware RAID. 8 SAS/SATA ports isn't too bad for such a controller and the price point.

Might be cheaper than building a whole new server if it is just the HBA going bad.
 
Not sure what hardware you are using, but something like this might help: http://www.newegg.com/Product/Product.aspx?Item=N82E16816103232

I was looking at buying one myself and using it as a pass-through for software RAID and/or hardware RAID. 8 SAS/SATA ports isn't too bad for such a controller and the price point.

Might be cheaper than building a whole new server if it is just the HBA going bad.

Is that PCI 1x? That could maybe fit in my server. Right now I have 2 PCIe 1x cards that are 2 port only, then I'm using the motherboard ports for the rest. So it's all over the place.

So I just need to get a special fan out cable for these type of cards I'm guessing? So one of those 2 ports is 4 sata right? So total 8? I might consider that card if I still get issues after the stuff comes in.
 
That could maybe fit in my server. Right now I have 2 PCIe 1x cards that are 2 port only, then I'm using the motherboard ports for the rest. So it's all over the place.

This is your problem. MDADM is great with most controllers, but when you mix and match, there starts to become too many points of failure, especially if they are all attached to HDDs within the same array.

Is that PCI 1x?
It is PCI-E 4x, but there are 1x and 8x variants, just look at Adaptec HBA's at newegg in the $200-300 price range.


If I were you, the first thing I would do is move all of those drives to a single HBA, or two HBA's with the same controllers, port speed, etc. This keeps consistency with transfers.

What you are doing is normally fine for lite operations and low-usage storage. But for a VM server and heavy usage, mixing controllers and bus speeds is a huge no-no, even for MDADM.

In my server, my four WD Blue drives are on a single controller acting as a pass-through node, and my four WD green drives are on a single controller, also acting as a pass-through node.

Each array has its own controller. This helps to limit the points of failure and keeps consistency throughout the array for speed, latency, and uptime insurance.

So I just need to get a special fan out cable for these type of cards I'm guessing? So one of those 2 ports is 4 sata right? So total 8? I might consider that card if I still get issues after the stuff comes in.
It uses a SFF-8087 cable. It basically turns one mini-SAS port into 4, so yes, 8 ports for SAS/SATA total. If you get that specific card, you will need to order two cables with it.
 
@Red Squirrel what was the reason they were dropping out? I'm my experience, I've build a RAID 5 with Linux RAID that dropped multiple times before the array was even initialized because of bad sectors (New drives by the way). The sectors were able to re-allocate and read just find after the second pass.

What I suggest is to try running a check every couple weeks to a month:

echo check > /sys/block/mdX/md/sync_action

My drives have been working perfectly for almost a year now since then.
 
I'm actually debating on trying out ZFS, though that's too much of a drastic change for now, but think next server I build, I may give it a try.

I hope NC IX does not screw around with my order and actually ship it today, and hopefully nothing is backordered. I'll start with all new drives and two new backplanes and see where that brings me. If not I'll try to find a pcie 1x card that has 8 ports. Though I wonder if pcie 1x has enough bandwidth for that...

I'd have to double check though, I MAY have a 4x slot in there, now that I recall. There is a 8x (or 16x?) for a video card but according to the manual it's strictly for a video card, as it disables the onboard.
 
This is one nice thing about ZFS. You can set up a pool scrub in a cron job :)

It makes no difference in his setup. He is still using multiple controllers with varying attributes, and it sounds like one of them is failing.

ZFS may be robust, but it can't stop hardware from failing.
I still stand by what I said above.
 
I wasn't contradicting you - I was addressing his comment about trying out ZFS, which I took to be a more 'down the road' thing. Sorry if I was unclear...
 
Ok so I went home for lunch, I found out which bay the bad drive was in (the one that dropped out again) and moved it to another bay. It's rebuilding fine now.

So I think I can almost rule out the drives, though the hitachi ones I RMAed DID have tons of smart errors so think those were indeed bad, but this WD has no signs of errors at all. It's rebuilding now. No VMs or anything, so I figure the rebuild will be done in about 3 hours. I'll see how things look like when I get back home. So it's either the backplane, cable, or controller, I'm thinking. I will rule out the backplane and cable once the replacements come in.
 
I wasn't contradicting you - I was addressing his comment about trying out ZFS, which I took to be a more 'down the road' thing. Sorry if I was unclear...

I didn't think you were contradicting me, I was just pointing out that even if he were to use ZFS (which is a good idea btw), using that many different controllers with the issues recently had may not be such a wise decision. :)
 
One thing too if I go with a single controller I'd probably get better performance right? I'll have to double check what slots I have in there, I may just do that. Rule everything out that way. If I do have a pcie 4x slot then I can free up the two 1x and use them for nics or something.
 
One thing too if I go with a single controller I'd probably get better performance right?

Not necessarily. This depends on the HBAs involved.
 
Last edited:
One thing too if I go with a single controller I'd probably get better performance right?

This has nothing to do with performance, though you may get a slight increase with it.

The whole point is to use a single HBA to act as a pass-through node for MDADM. This will reduce the amount of controllers which will in turn reduce the amount of points of failure and variables which may affect reliability, especially with a single array.

Even though MDADM and ZFS can both operate with HDDs on multiple controllers, it increases the chances of failure and makes troubleshooting a hardware failure that much more difficult.

If you go with hardware RAID on the controller I listed, or one of the others, there may in fact be a performance increase, but it depends on the type of RAID, and that controller only offers RAID 0, 1, 10, 1E, and JBOD. So no RAID 5 or 6.

Also, if you do decide to use hardware RAID, make sure your HDDs have TLER or TLER-like functionality enabled to avoid dropping disks.

If you are using desktop-class drives, definitely stick with MDADM.
 
Ah I see, makes sense. So I'd just be reducing the point of failure.

I'm most likely going to stick to software raid, though in the future I'd like to do hardware raid 1 for the OS drive but I will do that later down the line.

I'll see what happens once I replace the backplanes, and go from there.

The array is going to finish any minute now, so it's definitely not the drive that was at fault now that I moved it to another array. I am suspecting the controller, though I'm hoping it's the backplane given I'm going to be replacing it. I'll see what happens. I'm just glad I managed to get the data back, and I'm glad I had a good backup too, in case I could not get the data back, so at least I'm reassured.

This just shows the importance of backups. Raid != backup!

Think after all this, I will look at converting to raid 6, if I use a Linux live CD with the newer kernel I think I'll be able to do it. My current kernel does not support it, and I'm not comfortable enough to start messing with that.
 
future I'd like to do hardware raid 1 for the OS drive

I would not bother. You will spend $150+ for a HW raid 1 controller and get no added reliability over mdadm and no noticeable performance benefit unless you get a HW raid card with a large cache and BBU but that will probably cost $400 minimum.
 
Well you can't really use mdadm for the actual OS... that is running mdadm, well apparently there's a way but it's really complicated. So I'd just do a hardware raid 1 for the OS that way if one of those drives fail I don't have to reconfigure everything, which can take days, if weeks. Though I guess I could just take an image of it, not much changes on it.
 
I would not bother. You will spend $150+ for a HW raid 1 controller and get no added reliability over mdadm and no noticeable performance benefit unless you get a HW raid card with a large cache and BBU but that will probably cost $400 minimum.

FakeRAID or hardware RAID with RAID1 will give a boost in read speeds, but it is not worth it imo for the added costs.

You do not need a $400 hardware RAID card w/ cache and BBU to have the increased speed, a simple motherboard FakeRAID controller can do the same.

Also, certain versions of software RAID can also have the read speed boost, but only if certain algorithms are used. To my knowledge, MDADM and Windows RAID cannot do this.

GMirror on FreeBSD or OpenBSD has the read speed boost with RAID1 if the "load" algorithm is used.

If anyone knows of a way to do this successfully with MDADM, I would really like to know how, personally.
 
Well you can't really use mdadm for the actual OS... that is running mdadm, well apparently there's a way but it's really complicated. So I'd just do a hardware raid 1 for the OS that way if one of those drives fail I don't have to reconfigure everything, which can take days, if weeks. Though I guess I could just take an image of it, not much changes on it.

Yes you can use MDADM on the OS.

I've had Ubuntu 10.10 64-bit installed and running on 4x 80GB HDDs in RAID0, so I personally know it can be done. If you want to know how, PM me and I will be glad to help.

It's not that complicated, just a bit time consuming.

Yes, setting up the OS on hardware RAID will be much faster, I agree.
 
To my knowledge, MDADM and Windows RAID cannot do this.

Recent mdadm is supposed to do that as well. At least that is what I was told the last time I checked. I have never tested that however since I do not use raid 1 for anything larger than /boot.

Edit: I did some googling on this for clarification. It appears that single reads will not be speed up by raid 1 but multiple simultaneous reads to the same raid array will use more than 1 disk at a time to read.

Yes you can use MDADM on the OS.

At work I have done this since about 2000 so I know that works. I however have always had boot on its own 256 MB ext2 partition (that is raid1) so the OS does not need grub2 to boot.
 
Last edited:
Recent mdadm is supposed to do that as well. At least that is what I was told the last time I checked. I have never tested that however since I do not use raid 1 for anything larger than /boot.

Edit: I did some googling on this for clarification. It appears that single reads will not be speed up by raid 1 but multiple simultaneous reads to the same raid array will use more than 1 disk at a time to read.

Also, there is striped RAID-1 (technically it is RAID-10,f2 aka far, but it works with only 2 drives), which gives nearly double the read speed of a single drive for QD=1 sequential reads.
 
Running a FSCK on the resurrected volume. Things arn't looking too good as far as data integrity but I'll let it go and see what happens. Been going for a good 5 hours with tons of bad inodes and stuff.

Also speaking of mdraid, my current kernel does not support going from raid 5 to raid 6. Is it safe to use a live cd with newer kernel and do it that way? I'd be adding an extra drive. Even though I'm no longer suspecting the drives, to be safe I'm replacing them all anyway. The others will just go in my backup rotation pool.
 
Also, there is striped RAID-1 (technically it is RAID-10,f2 aka far, but it works with only 2 drives), which gives nearly double the read speed of a single drive for QD=1 sequential reads.

The far data layout works with all four drives in an MD RAID 10 setup, giving read speeds which are close to a 4-drive RAID 0. Standard RAID 10 algorithms only use 2 drives.

With RAID 10 the reads go over the stripe (RAID 0). So looking at our RAID 10 diagram we are reading from two RAID 1 mirrors. In this case we expect to see the sequential read performance of about two drives in RAID 0 according to the mdadm maintainer Niel Brown. If this was in the far layout configuration of RAID 10 we would get read performance on par with a 4 drive RAID 0 array but would incur a larger write penalty.

Look here for more detail.



Also speaking of mdraid, my current kernel does not support going from raid 5 to raid 6. Is it safe to use a live cd with newer kernel and do it that way? I'd be adding an extra drive. Even though I'm no longer suspecting the drives, to be safe I'm replacing them all anyway. The others will just go in my backup rotation pool.

As long as the live CD has a) a newer kernel (2.6.33 or newer) AND has mdadm version 3.1 or newer, you should be set. Go to the mdadm blog for more detail.
 
Last edited:
The far data layout works with all four drives in an MD RAID 10 setup, giving read speeds which are close to a 4-drive RAID 0. Standard RAID 10 algorithms only use 2 drives.

I was talking about striped RAID-1 (two drives), which mdadm calls RAID 10,f2. That works with 2 drives, 3 drives, 4 drives, etc.
 
I was talking about striped RAID-1 (two drives), which mdadm calls RAID 10,f2. That works with 2 drives, 3 drives, 4 drives, etc.

Very interesting. Have you tried this out at all? I wonder how stable it would be, even for a test bed system.
 
I'm starting to think I have a much much bigger problem on my hands. I have a completely separate machine with a raid 5, starting to get similar issues. Had a drive drop out for no reason. I added it back, now I"m getting a bunch of these errors:

Code:
[root@gbcserver ~]# dmesg -c
ata3.00: limiting speed to UDMA/33:PIO4
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata3.00: failed command: WRITE DMA EXT
ata3.00: cmd 35/00:00:00:6f:00/00:04:00:00:00/e0 tag 0 dma 524288 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3: hard resetting link
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: configured for UDMA/33
ata3.00: device reported invalid CHS sector 0
ata3: EH complete
ata3: unhandled interrupt: cmd=0xff irq_stat=0xa4 idma_stat=0xa0
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata3.00: failed command: WRITE DMA
ata3.00: cmd ca/00:98:58:87:00/00:00:00:00:00/e0 tag 0 dma 77824 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3: hard resetting link
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: configured for UDMA/33
ata3.00: device reported invalid CHS sector 0
ata3: EH complete
ata3: unhandled interrupt: cmd=0xff irq_stat=0xa4 idma_stat=0xa0
ata3: unhandled interrupt: cmd=0xff irq_stat=0xa4 idma_stat=0xa0
ata3: unhandled interrupt: cmd=0xff irq_stat=0xa4 idma_stat=0xa0
[root@gbcserver ~]#

OMG I have such bad luck with storage. :mad:

Could I be getting dirty power or something? All my plugs do say 120, and I would think the UPS would be filtering all this out anyway if there is noise. I also just realized that smartctl does not work with WD green drives, so I can't even know which drive is which. I normally keep track of the serial and which bay they are in, but now I can't even get the serial off the failing drive.
 
I was talking about striped RAID-1 (two drives), which mdadm calls RAID 10,f2. That works with 2 drives, 3 drives, 4 drives, etc.

Ahhh, I read "works with only two drives" as "only works with two drives". My mistake. :)


@OP:

Looks like that drive is toast, or you have a bad SATA cable. It's likely the former, but you never know. If you're using hotswap bays, do hdparm -tT /dev/sd[X] where X is the dropped drive. The bay light will flicker (obviously). If you want the drive to drop again, initiate a scrub.
 
Last edited:
So far so good for the server in the OP, switching it to another bay seems to have done the trick. Never thought of using hdparm to locate the drive though, that's a good idea.

Now for the one I just posted now, I popped the drive in the backup bay of my main server and wrote data to it and it was fine. I put it back in the other server and now it's fine there too.

So are sata cables THAT sensitive? All I essentially did is unplugged and replugged. Now it's rebuilding without all those errors. Is there a proven way to test a hard drive connection before putting it into production? It looked plugged in ok when I checked.
 
Not really, but you can help yourself by (for example) using latched cables. Some of the ports on the cheaper SIL-based SATA cards won't allow the latches to grip, but the ICH10R-style ports (and of course the hard drive ports) work well. Giving the SATA ports a blast of electrical spray, or compressed air, can help blow out some of the dust and debris that can ruin an electrical contact.

Having said that though, SATA cables can go bad anyway, probably due to a fracture in one of the wires. Usually this shows up as UDMA CRC errors, but your situation proves that such an assumption isn't necessarily a good one. :)
 
Replaced one of the backplanes (one was backordered) and have all drives but one running in it. New cables... still having errors. daminit. I'm really leaning towards the controller now as I plugged the same port into the new backplane just to see. I hate this micro troubleshooting, I just want it to work! If it does end up being the controller that means I'll need a new motherboard as I don't have the appropriate slots for a better card, and given the sockets change like every month I'll need a new CPU too. ARG! I just don't have money anymore for this but I just need it to work period. What a pain.

The backplane I ordered is better though, they use trays, which feels better. With the other I had to sometimes play around with the drive until it would latch inside, especially the slimmer drives.

Guess I'll be shopping for a new motherboard and CPU. I have been wanting to upgrade this server so guess it's an excuse.:p Then I can get a decent controller card.
 
Back
Top