Cyber Akuma
Gawd
- Joined
- Jan 3, 2009
- Messages
- 645
I have a a LSI MegaRAID 9260-8i raid card. It was originally an IBM ServeRAID
M5014 card, but since those are just re-branded 9260-8i cards I
reflashed it, and it was working fine for a few years. I run Windows 7 Professional 64bit on the system.
It has four 3TB Western Digital RED drives connected to it using a SAS->SATA cable in a RAID5 configuration.
Lately I have been having many issues with the RAID controller itself crashing, the error logs keep mentioning that the firmware itself "detected a possible hang", or that it crashed and rebooted. Originally I thought this was a firmware issue since there was a warning about backpanes (which unless it sees the SAS to SATA cable as one, I am not using) causing problems with a recent update.
I had posted about it previously here: http://hardforum.com/showthread.php?t=1840958
However, after much trial and error attempting to backup my data, I found the source of the crash..... but I have no idea why this could make the controller crash or what to do to fix it.
I noticed that throughout the hundreds of folders, hundreds of thousands of files, all throughout the 8TBs of the array.... it is a single file that is causing this. I can access the entire rest of the RAID5 array indefinitely with no problems, but attempting to read around the 80% or so point of that single file causes the card itself to crash!
This makes no sense to me, isn't the whole point of a redundant disk setup and a dedicated controller card that it can manage if even an entire drive fails and warn you of this so you can replace it? (Assuming you aren't running a RAID0). So why then, would not even a bad disk, but a single FILE cause the card itself to actually crash? If the filesystem itself has corruption that should cause Windows to have a read error, or possibly crash, not the card right? And if it's a hardware issue with the physical drive then the RAID card should notice the read error and report that, not crash, shouldn't it? I know the issue isn't limited to Windows either since attempting to create a backup image using an Acronis boot disk caused it to crash when it got to that point as well.
I have no idea what to do. I really don't care if I have to delete the file, it's nothing important, but right now I am worried that even deleting the file would cause it to crash again, or if somehow it's not the file but that particular area of that one disk, then if I delete the file I will just have this problem again when a new file is written to that area. Or if it would even be wise to run a chkdsk on the array or if that would just cause the card to crash still when chkdsk gets to that area of the RAID5 (and then run the risk of chkdsk assuming it found a million errors and attempting to fix them, corrupting tons of stuff in the process, if the controller goes down while it's scanning). That is, if it even is because of the physical location of that file and not somehow the file itself.
Any suggestions? Would my card itself have any type of diagnostic or self-checking tools for this? Any idea what I can try to do to figure out why this is happening or try to fix it?
M5014 card, but since those are just re-branded 9260-8i cards I
reflashed it, and it was working fine for a few years. I run Windows 7 Professional 64bit on the system.
It has four 3TB Western Digital RED drives connected to it using a SAS->SATA cable in a RAID5 configuration.
Lately I have been having many issues with the RAID controller itself crashing, the error logs keep mentioning that the firmware itself "detected a possible hang", or that it crashed and rebooted. Originally I thought this was a firmware issue since there was a warning about backpanes (which unless it sees the SAS to SATA cable as one, I am not using) causing problems with a recent update.
I had posted about it previously here: http://hardforum.com/showthread.php?t=1840958
However, after much trial and error attempting to backup my data, I found the source of the crash..... but I have no idea why this could make the controller crash or what to do to fix it.
I noticed that throughout the hundreds of folders, hundreds of thousands of files, all throughout the 8TBs of the array.... it is a single file that is causing this. I can access the entire rest of the RAID5 array indefinitely with no problems, but attempting to read around the 80% or so point of that single file causes the card itself to crash!
This makes no sense to me, isn't the whole point of a redundant disk setup and a dedicated controller card that it can manage if even an entire drive fails and warn you of this so you can replace it? (Assuming you aren't running a RAID0). So why then, would not even a bad disk, but a single FILE cause the card itself to actually crash? If the filesystem itself has corruption that should cause Windows to have a read error, or possibly crash, not the card right? And if it's a hardware issue with the physical drive then the RAID card should notice the read error and report that, not crash, shouldn't it? I know the issue isn't limited to Windows either since attempting to create a backup image using an Acronis boot disk caused it to crash when it got to that point as well.
I have no idea what to do. I really don't care if I have to delete the file, it's nothing important, but right now I am worried that even deleting the file would cause it to crash again, or if somehow it's not the file but that particular area of that one disk, then if I delete the file I will just have this problem again when a new file is written to that area. Or if it would even be wise to run a chkdsk on the array or if that would just cause the card to crash still when chkdsk gets to that area of the RAID5 (and then run the risk of chkdsk assuming it found a million errors and attempting to fix them, corrupting tons of stuff in the process, if the controller goes down while it's scanning). That is, if it even is because of the physical location of that file and not somehow the file itself.
Any suggestions? Would my card itself have any type of diagnostic or self-checking tools for this? Any idea what I can try to do to figure out why this is happening or try to fix it?