Online RAID capacity expansion risks?

BENN0

Weaksauce
Joined
Nov 23, 2009
Messages
95
I can't get my head around this, maybe it depends on the RAID controller and its implementation of capacity expansion algorithems as well.

When expanding an existing RAID 5 set with one extra disk, what will happen if the extra disk or a disk in the current set fails during the expansion procedure?

Same for RAID 6?

Same when converting from RAID 5 to RAID 6.
 
I would plan for the possibility of total loss, which is another reason why RAID isnt a replacement for a backup ;)

I have had a 19 drive RAID 5 array have multiple drive failures during the rebuild of the first failure. Lost about 20% of the data on the entire array from that iirc.
 
Well there is no reason it cannot be safe. Implementation is the problem. I've tried pulling drives and power cycling while running expand on Linux software raid, could not trip it up. It claims to be, and apparently is, safe. Any expand is going to be adding more space, so at any time data can always be written at least once. All it needs to keep track of is where the layout changes from 3 drive raid5 to a 4 drive raid5, for example. There will actually be a gap between the two, eventually that gap will be at the end of the drive and that will be your new free space.

The only unsafe part is when the expand first starts. The Linux MD drive gets around this by asking you where it can write some data to again make this safe, usually needs only a couple MB I think. Often there is empty space at end of arrays where it can sneak this data, but worst case you need to put it on another filesystem somewhere.

Anyways yeah without testing first I wouldn't exactly trust any random controller to do this cleanly.
 
I'm not talking about another drive failing during a rebuild but a planned online capacity expansion.

Reading bexamous' post I think it should be possible to have a drive failure without corrupting the whole existing array if indeed the controller copied the blocks to the new disk first and then erases them from the old disk after the expansion has completed while keeping a "transaction log" somewhere safe.

It never occurred to me to test this before deploying my array (on a HP P410 controller) but I just might as I currently have 4 empty drive laying around to do some experimenting.
 
I've have experienced this exact problem on a areca controller.
I always run a full surface scan to check for bad sectors before adding a drive to an array but there's still risk involved.

During an expansion the new drive failed. Areca Support helped me to force-resume the migration proces but the Final result: Corruption

When a normal drive fails during expansion the data is still safe (tested on raid 6 array)
After the expansion completes a rebuild will be done.

I'm not sure what happends when you do a capacity expansion + raidlevel migration and a drive fails.... :)
 
[OC]Pik4chu;1036161752 said:
I would plan for the possibility of total loss, which is another reason why RAID isnt a replacement for a backup ;)

I have had a 19 drive RAID 5 array have multiple drive failures during the rebuild of the first failure. Lost about 20% of the data on the entire array from that iirc.

Nineteen drives in RAID 5 :eek: lol, someone likes to fly by the seat of their pants...what controller were you using?
 
Back
Top