Raid reconstruction (expanding) is taking a very long time!

SedoSan

Limp Gawd
Joined
Aug 29, 2012
Messages
145
Hello all,

I have an LSI MegaRAID 9261-8i and I just added the 8th HDD to it.
Before continuing I will put the numbers of previous expands.
I had 4x 3TB HDDs in RAID5, expanding to 5 HDDs the operation took me around 4 days.
Then expanding to 6 HDDs it took me also around 4 days, then expanding to 7 disks and changing to RAID6 it took me around 5-6 days.
All of this was on my main rig which had an i7-3930k and 16GB of RAM.

Having 7 HDDs on my main rig was making it difficult for me to do some stuff so I decided to move everything related to the HDDs to a new rig and make it a NAS box. (I was only temporarily moving it to a new box because I'm planning to get a dedicated server (twin xeon) so I went cheap with the temp NAS box).
I got an intel pentium G2020 (2 cores) and 4GB of RAM, 450W and installed the RAID card to it and put all 7 HDDs to it. (it worked fine, I can stream without any problems, even with 5 people streaming at the same time). Anyways my 7 HDDs got filled up and I wanted to expand. So I got the 8th HDD and did the expand which I did like 3 times already before.

I got an estimated time of around 120 hours. The first couple of days it reached 15%, (which is around 7% per day), this isn't really bad, however on the 3rd day it only increased to 16%, and the next day 17%, it seems that it got to only 1% per day.
I started the reconstruction process on the 5th of June and it only reached 29% now (thats 15 days!)...

I'm sure this isn't normal, but what might be the cause? from my understanding, when reconstructing, the raid card uses its onboard cpu to do the operation? or does it use the PC CPU?

Another thing, is the 450W power source causing this problem?, lastly, since this was a new setup I didn't configure window's updates, so the system got booted a couple of times in the middle of the operation. (I booted a lot in my old rig and it didn't cause a delay).

At this rate it will take over a month to expand successfully (if no other drive fails because of the expanding)... Any suggestion guys?
 
A bit more information would be greatly helpful. What motherboard (brand/model/bios) did the card originally come off of and which did it get installed into? What style of PCIe slot was on both sides of this equation (both physically AND electrically (x16, x8, x4 etc)). Logs from the card would also be helpful. Are the drives connected directly to the card, and are you using the same breakout cables now that you initially used? How much activity is this array seeing while you are doing the expansion? Generally A/B comparisons with speed decreases from old/new host array moves (with no other issues complicating the comparison) are PCIe throughput issues (old computer has a x16Phy x8Ele slot and new computer is x16Phy x1Ele slot etc)
 
Nice RAID card, we are the same ;). Your problem sounds like its trying to shuffle the data around alot more. I know the expansion on a RAID6 setup will take a little bit longer since it has 2x drives in Parity. I also wouldn't listen to the estimated time all that much, mine always goes a little over (4-12 hours delay).

Yes it uses the CPU on that RAID card to do all of its thinking, estimation, rebuild, etc. Your not doing file transfers whilst doing a 'background initialization' on that drive are you?
 
A bit more information would be greatly helpful. What motherboard (brand/model/bios) did the card originally come off of and which did it get installed into? What style of PCIe slot was on both sides of this equation (both physically AND electrically (x16, x8, x4 etc)). Logs from the card would also be helpful. Are the drives connected directly to the card, and are you using the same breakout cables now that you initially used? How much activity is this array seeing while you are doing the expansion? Generally A/B comparisons with speed decreases from old/new host array moves (with no other issues complicating the comparison) are PCIe throughput issues (old computer has a x16Phy x8Ele slot and new computer is x16Phy x1Ele slot etc)

I got MSI H61M-P31/W8 mobo with 1x PCI-E 3.0 x16, the HDDs are all connected directly to the RAID card using the SAS-8087 to 4x SATA, the HDDs are the WD RED 3TB. They aren't connected as the same order as the older rig, however in the initializing it already found out which HDD goes to which so that doesn't matter (I'm using the virtual drive without any problems) What do you mean how much activity is the array showing?
As for the log, the only thing i'm getting is this warning:
"Controller ID: 0 Reminder: Potential non-optimal configuration due to PD commissioned as emergency spare -:-:3" I get this everyday at exactly 10:31:15 AM. Other than that everything is fine, the status says "Optimal"

Nice RAID card, we are the same ;). Your problem sounds like its trying to shuffle the data around alot more. I know the expansion on a RAID6 setup will take a little bit longer since it has 2x drives in Parity. I also wouldn't listen to the estimated time all that much, mine always goes a little over (4-12 hours delay).

Yes it uses the CPU on that RAID card to do all of its thinking, estimation, rebuild, etc. Your not doing file transfers whilst doing a 'background initialization' on that drive are you?

I'm guilty of that. There was around 500GB only left in the array and I'm still downloading some files from the net to it. However I'm using the NAS box as a media streaming box as well as application streaming.
 
I see a few issues here that are causing your time problem

1) Adding files to the array while you are expanding it (this severely eats the card's CPU and RAM to do the expansion and to add the new parity for the data)(Not to mention it's messing with the file structure). The absolute worst thing you could do to it is bittorrent download directly to the array, bittorrent will utilize the crap of the card and drives.

2) You have a ton of data. The length of time is not about how many drives you have, it is about how full the drives are and how much total data you have. When the card does the expansion calculation it has to rewrite every piece of every file on all the drives, the more data you have the longer this will take.

Yeah 1 month is not totally unusual for a drive. I one started a low level format of a 3TB drive and 2 weeks later it was like 30% done
 
I see a few issues here that are causing your time problem

1) Adding files to the array while you are expanding it (this severely eats the card's CPU and RAM to do the expansion and to add the new parity for the data)(Not to mention it's messing with the file structure). The absolute worst thing you could do to it is bittorrent download directly to the drive, bittorrent will utilize the crap of the card and drives.

2) You have a ton of data. The length of time is not about how many drives you have, it is about how full the drives are and how much total data you have. When the card does the expansion calculation it has to rewrite every piece of every file on all the drives, the more data you have the longer this will take.

Yeah 1 month is not totally unusual for a drive. I one started a low level format of a 3TB drive and 2 weeks later it was like 30% done

I see, yes I was torrenting directly to the drive >.<
Though couple of hours ago I restarted my NAS box and went into the Raid card bios and opened the process window there where it was at (29%). Now that nothing is running at all, no transfers/no torrent/no nothing, would you say it will speed up again or is the damage already done?

When I was expanding from 6HDDs to 7HDDs it took me only a week or less to complete (even though the 6HDDs were FULL)...
 
Ah yeah torrenting is your problem.

Nah, no damage occurred just stop torrenting if you want the expansion to speed up.
 
Yeah lets say you were downloading 2 files and seeding 20... That's way way way more usage to the array than 4 people watching movies from it.

When you torrent to a network share drive it's way less efficient than if the drive or the array were installed on your system (Unless you are using iSCSI). It's better to torrent to a drive on your system and then transfer over.
 
Think of it like this:

Torrenting to a network share is like if you were trying to help someone put together a model airplane by giving them directions over the telephone. It's difficult and clunky because there are lots of small pieces and since you aren't physically there with the guy you can't show him easily where they should go and you really have a difficult time figuring out where he's at in the process and even if he's doing it correctly.

Whereas... If you were right next to the guy and helping him step-by-step put it together then it would be easy because you could show him where all the pieces go, and have a good understanding of what needs to be done to construct the model airplane.


So as a network share, the computer that is torrenting to the drive can't access the advanced features of the drive to make it easier to write and read the data.

But if the drive is attached more directly to the system then the drive controller can make better decisions on how and in what order to write and read the data.
 
I see, thanks for the info
on a side note I have a question I was trying to solve for a long time and still not have a clear answer.

I got an HBA and a SAS-expander card, planning to install them to my new build once the rest of the pieces arrive, planning on a dual Xeon-E5, 64GB+ of ECC RAM. What I'm planning to do is get 8 new HDDs, add them to this build, install ESXi 5.1 then install FreeNAS and manage these HDDs in ZFS.
The reason for this is to take advantage of the CPU and ECC RAM as this will be a more powerful solution than a hardware raid card (please confirm this). Then I will copy everything from the original 8 HDDs to the new ones, then add the original 8 to the new array and expand to 16 drives.
Now what I want to do is to mount this virtual array to a windows copy.
Is this possible? to mount the ZFS array as NTFS virtual disk in windows?
 
Yes, ZFS on that machine will be far superior to a hardware RAID card in terms of reliability and data security. Performance wise it will probably be quite superior with the large amount of RAM you will have.

The best way to present the zfs array to a windows copy would be to use iSCSI, that way to windows it would just look as one huge hard drive. With ZFS its very simple.

https://www.youtube.com/watch?v=p97Yg7u8fSk

http://doc.freenas.org/index.php/ISCSI

It looks complicated but it's easy once you have done it once.
 
Yes, ZFS on that machine will be far superior to a hardware RAID card in terms of reliability and data security. Performance wise it will probably be quite superior with the large amount of RAM you will have.

The best way to present the zfs array to a windows copy would be to use iSCSI, that way to windows it would just look as one huge hard drive. With ZFS its very simple.

https://www.youtube.com/watch?v=p97Yg7u8fSk

http://doc.freenas.org/index.php/ISCSI

It looks complicated but it's easy once you have done it once.

I heard some problems when using iSCSI, such as (you have to de-mount the iSCSI before shutting down each time to avoid losing the data) etc... how true is this?
I'm really looking forward for this, the case is being shipped by sea because it would be too expensive to ship by air (SuperMicro Case). it will take around a month to reach~
 
Do you have a BBU? There is almost no reason left to use a RAID card without BBU. The RAID card has no concept of the contents of the array, it does not matter wether it is full or empty. An online capacity expansion will basically read all disks and write all disks completely. If you have no BBU, this process is either very unsafe (against power failures) or very slow, because the controller will have to backup each stripe before it gets restructured.
 
Do you have a BBU? There is almost no reason left to use a RAID card without BBU. The RAID card has no concept of the contents of the array, it does not matter wether it is full or empty. An online capacity expansion will basically read all disks and write all disks completely. If you have no BBU, this process is either very unsafe (against power failures) or very slow, because the controller will have to backup each stripe before it gets restructured.

Yes I'm using a BBU, the reason I got a hardware raid card was because of the cost.
HBA one won't be as effective if i'm using my gaming mobo and ram + raid 5/6.
However I'm switching now to HBA + ZFS after receiving my server parts :).
I got the LSI 9261 + BBU used for around 250$ around a year ago
 
2) You have a ton of data. The length of time is not about how many drives you have, it is about how full the drives are and how much total data you have. When the card does the expansion calculation it has to rewrite every piece of every file on all the drives, the more data you have the longer this will take.

Wha???

No it does not matter how much data you have on the drives. Its all about how many and the *size* of the drives. Amount of data on them has nothing to do with it. Raid controllers work at the block level so they have to do the entire drive regardless of what is written to it from a file-system that the controller knows nothing about.

The only time I have seen amount of data change rebuild/expansion times is with ZFS where the raid is part of the file-system.



To the OP: This sound really not normal to me. I don't have a ton of experience doing raid expansions on LSI but that seems really slow. I remember hearing adaptec was rediculously slow and people lost data during raid expansions as well...

I guess areca is the only ones that can actually reliably do expansions in an effecient/speedy manner?

Half the time of your array to replace my 20x1TB drives with 20x2TB drives way back in the day and rebuild the array 20 times to expand it. I have also done single disk and raid level migrations and they have always been decently fast.

Torrenting is bad for disk I/O and will definitely slow things down but unless you have a decently fast connection (100 megabit+) it shouldn't make it go more than 2x slower than it would have gone. Maybe the background priority on the card should be changed via megacli. It looks like the defaults are 30%:

Code:
root@capone:~# megacli  -AdpAllInfo -a0 | grep -i rate
Rebuild Rate                     : 30%
PR Rate                          : 30%
BGI Rate                         : 30%
Check Consistency Rate           : 30%
Reconstruction Rate              : 30%
Ecc Bucket Leak Rate             : 1440 Minutes
Rebuild Rate                    : Yes
CC Rate                         : Yes
BGI Rate                        : Yes
Reconstruct Rate                : Yes
Patrol Read Rate                : Yes
Background Rate                  : 30
BIOS Enumerate VDs               : Yes

Code:
MegaCli -AdpSetProp {CacheFlushInterval -val} | { RebuildRate -val}
    | {PatrolReadRate -val} | {BgiRate -val} | {CCRate -val} | {ForceSGPIO -val}
    | {ReconRate -val} | {SpinupDriveCount -val} | {SpinupDelay -val}
    | {CoercionMode -val} | {ClusterEnable -val} | {PredFailPollInterval -val}
    | {BatWarnDsbl -val} | {EccBucketSize -val} | {EccBucketLeakRate -val}
    | {AbortCCOnError -val} | AlarmEnbl | AlarmDsbl | AlarmSilence
    | {SMARTCpyBkEnbl -val} | {SSDSMARTCpyBkEnbl -val} | NCQEnbl | NCQDsbl
    | {MaintainPdFailHistoryEnbl -val} | {RstrHotSpareOnInsert -val}
    | {DisableOCR -val} | {BootWithPinnedCache -val} | {enblPI -val} |{PreventPIImport -val}
    | AutoEnhancedImportEnbl | AutoEnhancedImportDsbl
    | {EnblSpinDownUnConfigDrvs -val}|{UseDiskActivityforLocate -val} -aN|-a0,1,2|-aALL
    | {ExposeEnclDevicesEnbl -val} | {SpinDownTime -val}
    | {SpinUpEncDrvCnt -val} | {SpinUpEncDelay -val} | {Perfmode -val} -aN|-a0,1,2|-aALL

I would try upping some of those 'rate' values.
 
So, this post was 10 days ago, and I just reached 45%... that's 16% in 10 days, so 1.6% per day xD
takes so long o.o I just hope this ends well without any problems... *crosses fingers*
 
So guys, i've noticed something weird.
I keep track on my virtual drive volume (D:/)
and it keeps decreasing. 2 week ago it had around 250GB left, around 4 days ago it had 180GB left, today it has 171GB left...
Nothing is opened, only the expansion process is going on. Does the expanding process use the space and decreases it?
 
Back
Top