My RAID card keeps crashing lately, can't even do a backup.

Joined
Jan 3, 2009
Messages
644
I have a LSI MegaRAID 9260-8i raid card. It was originally an IBM ServeRAID M5014 card, but since those are just re-branded 9260-9i cards I reflashed it, took FAR less time to finish POST on LSI firmware and was updated more frequently to boot.

I then setup a RAID5 using four 3TB WD Red Harddrives connected by a SAS to 4-SATA port breakout cable. I admit though that I am not very knowledgeable about the lower level workings of RAID setups.

Anyway, the card was running fine for years, but lately it's been crashing on me.

I noticed these crashes started when I upgraded to firmware 12.15.0-0189, although I don't know if that is just a coincidence. It wasn't after the upgrade that I noticed it had a warning message about the firmware having issues with extender backplanes and if you use one to not upgrade and wait for a firmware upgrade to address these issues.

Other than the BBU and SAS->SATA cable there is no additional hardware attached to this card, and the card is just simply in a desktop system and not any kind of server or NAS, but when I use the MegaRAID Storage Manager to check the status of the card (which always is listed as operating optimal and healthy) it lists my drives under a backplane. I admit, I have no idea what a backplane even is (attempting to Google information about it leads me to believe they are hot-swap bays for servers), or if the SAS to SATA cable is what it is considering a backplane.

Anyway, lately when the RAID array is under heavy use the controller keeps crashing, seeing errors like:

Controller ID: 0 Fatal firmware error:
Driver detected possible FW hang, halting FW.

Controller ID: 0 Fatal firmware error:
Line 1209 in ../../raid/1078dma.c

Controller ID: 0 Fatal firmware error:
Line 1209 in ../../raid/1078dma.c

Then the array is down until I reboot the system. If can go for weeks without a problem, but it seems that whenever I use it heavily it crashes. Whether it's one very large read operation (such as a backup of the array), or many small random read/write operations but in a very large amount, it seems to "overload" and crash frequently whenever it is under heavy load, but when not works fine for weeks.

This has made it so I cannot even backup my array because after a few hours it WILL crash, so now even backups are impossible! (I will have to see if I can make the backup app read from the array at a slower speed, for the moment just so I have SOME backup I am literally just dragging and dropping files through windows from the array onto the backup drive instead of making a backup image of the drive).

There was a firmware update released about two weeks ago, version 12.15.0-0205, but it didn't seem to fix the issue, same hangs when attempting to backup. On top of that, now the card takes much much longer to finish POST as if it was an IBM card again.

As for trying to downgrade my firmware to see if that fixes it, there are a few issues with that.

First of all, as I said, I am not very knowledgeable about the lower level workings of RAID setups so I don't know if a downgrade could wipe my array, I know upgrades don't, but I am not sure about downgrades (mainly due to possible updates to the configuration data in newer firmwares) so I don't want to try that until I am at least done just copying my files over to the backup drive. And second, I can't find any information on the largest size a single drive (not the array, but a physical drive) can be for any firmware version (or the card itself). Currently I have 3TB drives and they work fine, but I want to eventually upgrade to 6TB or larger drives in a year or two, and if the older firmwares before 12.15.0-0189 don't support drives that big I am stuck. And finally, LSI seems to only host the latest versions of the drivers, software, and firmware on the main product page now, referring you to another page for previous versions.... which is down.

Does anyone have any ideas or suggestions? Anyone else by any chance also have this card and is running into these errors? Does this look like a fault of the harddrives? Or the card? Or is nothing likely wrong with either and it's a firmware issue?
 
Any bad caps on the card? I just had a customer whos Dell SAS5iR card went bad with weird errors due to a bulging cap
 
Well, just ran into something interesting when attempting to use Windows to just copy the files onto a backup drive.

It's a single video file in the entire RAID that is causing my card to crash, I can access the entire rest of the RAID just fine, but trying to access that one file instantly makes it crash, whether I just rebooted and tried to only access that file, or it's been weeks and I have been performing tons of read/write operations beforehand then run into that file.

This doesn't make sense to me. If the file was corrupt or the disk was dying I understand, but then my card would just warn me of this, why would the firmware of the RAID card itself crash trying to access a specific file?
 
Back
Top