Possible drive failure on LSI card, also need to replace battery.

Joined
Jan 3, 2009
Messages
646
I have an IBM ServRAID M5014 which was reflashed back to LSI 9260-8i firmware. It currently has four WD RED drives in RAID5. In the past few months I have been having some issues with it, massive slowdowns (We're talking speeds that floppy disks could outperform) and loud disk access during the weekends, and random crashes.

And yes, I updated my backup just last night, which was lucky since it really started to give me issues with even booting the system this morning, despite not being a bootable drive. Had to disable the Virtual raid drive from the card's bios just to get Windows to finish booting.

I narrowed down the crashes to a single video file, for some reason trying to access the last 1/4th or so of the file crashes the entire raid card. The entirety of the rest of the 8TB+ RAID can be read or written to just fine, but that one file cannot be touched.

And that's what led me to finding out the issues that happen over the weekend, it's scheduled to perform a weekly consistency check every Saturday, and this seems to take forever, sometimes well past Sunday! Said massive slowdowns also occur during this time.

This is my first raid setup, but I am pretty sure it's not supposed to take that long. I was monitoring the status of the consistency check, and two sets of errors kept appearing:

http://i.imgur.com/JHew3i7.jpg

From what I understand, the "Unexpected Sense" message is normal and just a result of me using a SAS->SATA adapter cable, but I can't verify this. Googling just resulted in many other people on different forums asking the same thing, all with different results.

The one that most concerns me though is the other one about Correcting Medium Errors. I used to see that Unexpected Sense one all the time, but the media errors one is new. I also noticed that the "Media Error Count" for one of my drives (shown in the screenshot) is astronomical, while all the others are 0. Again, Googling this resulted in nothing more than some forum posts where people were asking "What is this for?"

I am assuming that one of my drives is failing (I plan to connect the drives separately and run CrystalDiskInfo to see if there are any errors) but I just wanted to make sure, since again, I have never setup a RAID before, and it was fruitless to attempt to Google any of this.

Is this Unexpected Sense thing something to be worried about? Or just nothing, or a side-effect of my SAS->SATA cable? What about the medium errors? I would assume those SHOULD be an issue, and likely the cause of that corrupt file, but my RAID card doesn't seem to consider it critical or even an error worthy of logging as anything more important than just the "information" level. How could accessing a corrupt file even crash the entire raid card itself? Isn't the point of the card to keep the array going if even a physical drive crashes? I understand the virtual drive or the OS crashing if trying to read a corrupt file, but the raid card itself? And would it even be safe to run a scandisk on a RAID? Especially a possibly corrupted one? I am worried that if I delete with the file, it will just leave that corrupted area as empty space for another file to be written to and then that new file will be corrupted.

Also, I have been getting errors that my battery has failed. Well, nothing really to mis-understand about that. Although it's confusing that the battery is still listed as having a 98% charge even though it has apparently failed, but it's not allowing me to charge it any more or re-learn it. The battery is optional right? (Especially since I also have a UPS for my computer) But regardless if it's optional or not, where would one even get a replacement? Not like they sell them at Best Buy or Fry's, would eBay be my only option? Can I even trust the batteries I find there not to be knockoffs that could cause damage?

Speaking of which, it would have been nice if I knew the battery had failed a month ago, just found out today even though it has apparently been logging it daily since April 16. Is there any way I can have it display critical errors like that in Windows? The only way I have found to get any information out of it is to boot and then log into the management software, it's not something I can just have sitting memory resident in my system tray.
 
It's the URE you should be looking at, not the sense error. That drive is toast, check the smart data for it if you can and order a replacement ASAP.

New drive and rebuild should be enough but you could hose the array and create a new one with the replacement drive and restore from backup if you wanted.
 
Are those Media Error Counts the same thing as an Unrecoverable Read Error? Because I can't find something specifically labeled as that.
 
Back
Top