Raid 10 and HDD failure

legrand

Weaksauce
Joined
Mar 30, 2004
Messages
79
So... we have a Raid 10 setup using 4 HDD's. These HDD's are "named" 0,1,2 and 3. 0 and 1 are in one "subunit" and 2 and 3 are in a second "subunit" on the controller card.

HDD #3 is experiencing read/write failures and a check via a utility program (powermax) confirms that that HDD needs to be replaced.

This problem was found because #2 had degraded (even though #3 still displayed as "ok"). Shortly after #2 degraded, #1 also degraded which drastically affected overall performance during the rebuild attempts.

After adjusting some settings on the controller card (AMCC 9550sx), we now have 0 and 1 as OK, but #2 cannot pass beyond a 49% rebuild status. The status keeps pausing and resetting back to 44% where it rebuilds to 49% and then the cycle starts again. #3 shows as being "ok" as well, however we know that this drive is failing and cannot be read/written to correctly and must be replaced.

My questions are:
1) Why did #1 degrade? (I understand that #2 degraded because it can't read off of #3 correctly, but what happened to #1?)
2) Can #3 be yanked and replaced with only #0 and #1 showing as "OK" as #2 still degraded? (does not 3 rebuild from 2 and vice versa?)

Thanks for any assistance and I hope this isn't as confusing as it sounds!
 
What do you see if you disconnect 2 and 3 and run only the first unit?
 
Axman said:
What do you see if you disconnect 2 and 3 and run only the first unit?

He'd get nothing. 0 and 1 are mirrors of each other and 2 and 3 are. Thus, if he removed 2 and 3, half the data would be gone. Unless it's allowing him to mirror the subunits but that's just a little odd.

I'd say, backup all the data, replace drive 3 and have it try and rebuild again..
 
Shits, I was thinking 0+1.

In any case I was getting around to suggesting that you isolate the 2 disk and run some disk tools over it. Who knows what's wrong with it inside a RAID, but you might be able to figure something out with it as a single drive.

Also, now is when you need to budget for replacements. And this might sound rediculous in a server environment, but make sure those cables are snug.
 
Back
Top