RAID card alarm goes off overnight - new Seagate drive dying already?

crystalline

Weaksauce
Joined
Nov 25, 2006
Messages
84
In the middle of the night last night, an alarm starts blaring from my case and wakes me up. yeah, 9am IS the middle of the night, when you go to sleep at 5, lol. I'm startled/terrified/confused, the last thing I wanted to do is troubleshoot this piece of crap, and I desperately wanted to return to sleep, so I just shut it down (properly, through windows) and decide to forget about it in the morning.

Turns out that the incident happened in the middle of a scheduled virus scan, and it generated a load of errors in Administrative tools -> Event Viewer. The cause was either one of my drives, or the RAID card itself, somehow losing its connection to the drive. Drive failure seems more likely, but this drive is new, and has only been in use for about 2 weeks. Why would Seagate give 5 year warranties on drives that die in less than a month...

The drive is a 500gb Seagate ST3500630AS (http://www.newegg.com/Product/Product.asp?Item=N82E16822148136)

RAID card is HighPoint RocketRAID 2300 (http://www.newegg.com/Product/Product.asp?Item=N82E16816115029)

This is the message shown in the "HighPoint RAID Management Console":

3/3/2007 8:55:27 AM
Disk 'ST3500630AS' at Controller1-Channel4 failed.

From Administrative Tools -> Event Viewer, a bunch of error incidents are reported. Event IDs: 9, 12, 15, 51, 57. Error 15 is given 11 times, error 51 twice, the rest once:

Event Type: Warning
Event Source: Disk
Event Category: None
Event ID: 51
Date: 3/3/2007
Time: 8:55:24 AM
User: N/A
Computer: ANONYMOUS
Description:
An error was detected on device \Device\Harddisk7\D during a paging operation.


Event Type: Error
Event Source: 2310_00
Event Category: None
Event ID: 9
Date: 3/3/2007
Time: 8:55:35 AM
User: N/A
Computer: ANONYMOUS
Description:
The device, \Device\Scsi\2310_001, did not respond within the timeout period.


Event Type: Error
Event Source: Disk
Event Category: None
Event ID: 15
Date: 3/3/2007
Time: 8:55:39 AM
User: N/A
Computer: ANONYMOUS
Description:
The device, \Device\Harddisk7\D, is not ready for access yet.


Event Type: Warning
Event Source: Ftdisk
Event Category: Disk
Event ID: 57
Date: 3/3/2007
Time: 8:55:39 AM
User: N/A
Computer: ANONYMOUS
Description:
The system failed to flush data to the transaction log. Corruption may occur.


Event Type: Error
Event Source: PlugPlayManager
Event Category: None
Event ID: 12
Date: 3/3/2007
Time: 8:55:39 AM
User: N/A
Computer: ANONYMOUS
Description:
The device 'HPT DISK 0_2 SCSI Disk Device' (SCSI\Disk&Ven_HPT&Prod_DISK_0_2&Rev_4.00\5&7345b61&0&020) disappeared from the system without first being prepared for removal.


Event Type: Information
Event Source: Application Popup
Event Category: None
Event ID: 26
Date: 3/3/2007
Time: 8:55:39 AM
User: N/A
Computer: ANONYMOUS
Description:
Application popup: Windows - Delayed Write Failed : Windows was unable to save all the data for the file M:\$Mft. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.

There were 4 of the application popup windows, listing 4 different folders ("Delayed Write Failed: Windows was unable to save all the data for the file M:\.... The data has been lost"), but weirdly enough, I'm now able to access all of the folders that were listed in the errors, even though "the data has been lost".

This is the report from Speedfan:




The errors here seem troubling... I've run a few Western Digitals through the Speedfan utility, drives that had been used for 3+ years, and they were rated 100% across the board except for power-on hours.

Are these results typical of Seagate drives?

Should I ignore this incident, or RMA this drive?

Any other diagnostics I should try first?

thanks for any help...
 
It could be the way Newegg ships there hard drives. I had 2 of my 5 raid 5 Maxtor Maxline III hard drives fail in the first week. After sending them back to Maxtor I never had another problem. If you see the way Maxtor ships its drives and see the way Newegg ships there drives you will understand why. Im sure the Seagates are fine.

One other thing that can be a problem is the Power Supply. Make sure you have a good one. Good Luck.
 
Seagate are, in my experience at least, some of the best drives there are.

I have had much better luck with Seagate vs. WD and others.

But there are always going to be bad ones in batches, just RMA the drive. Seagate is awesome for RMAs and you will get it shipped to packed very well as DB stated (Maxtor is owned by Seagate BTW).
 
Its probally nothing, the ES series is designed for raids, yours is not an ES, and one of the things about the ES series that is making a little bell go off somewhere in my head is something about the amount of time the drive takes to report something like "I am busy. go away for a while" i dont remember the details.

An error was detected on device \Device\Harddisk7\D during a paging operation
The device, \Device\Scsi\2310_001, did not respond within the timeout period.
The device, \Device\Harddisk7\D, is not ready for access yet.
rest of it is a chain reaction.


What I am thinking is your virus scan had the drive so busy that when the OS wanted to write a page file or something like that, the drive reported itself as busy too quickly or not quick enough or the planets were not in proper alignment, and the controller card freaked out thinking the drive was gone when it was just very busy.

Your SMART info looks good so I dont think its a drive failure. I think its a timing thing. I am pretty sure about that.

Hopefully a super raid guru will come along and comment. I might be full of crap but I swear I have seen this before somewhere and its not a huge deal.

it does show why the extra $10 for the ES's are worth it :p


If possible you might try lowering the priority of your virus scanner, might be hidden somewhere in its options.
 
Seagate are, in my experience at least, some of the best drives there are.

I have had much better luck with Seagate vs. WD and others.

But there are always going to be bad ones in batches, just RMA the drive. Seagate is awesome for RMAs and you will get it shipped to packed very well as DB stated (Maxtor is owned by Seagate BTW).
According to the SR drive reliability survey the 7200.9 is in the 19 percentile (better than 19% of the models in the DB) while the 7200.7 is in the 89 percentile. I am not sure if we can say with certainty that Seagate drives are better than others in general.
 
Its probally nothing, the ES series is designed for raids, yours is not an ES, and one of the things about the ES series that is making a little bell go off somewhere in my head is something about the amount of time the drive takes to report something like "I am busy. go away for a while" i dont remember the details.


rest of it is a chain reaction.


What I am thinking is your virus scan had the drive so busy that when the OS wanted to write a page file or something like that, the drive reported itself as busy too quickly or not quick enough or the planets were not in proper alignment, and the controller card freaked out thinking the drive was gone when it was just very busy.

Your SMART info looks good so I dont think its a drive failure. I think its a timing thing. I am pretty sure about that.

Hopefully a super raid guru will come along and comment. I might be full of crap but I swear I have seen this before somewhere and its not a huge deal.

it does show why the extra $10 for the ES's are worth it :p


If possible you might try lowering the priority of your virus scanner, might be hidden somewhere in its options.

the drive in question is actually a 7200.10... i had considered going with the ES series, but at the time the price spread was much higher on newegg in november when i ordered. as i remember, the 7200.9's were around $160, 7200.10's were $200, and ES series were like $250.

btw, the drive is not currently running in an actual RAID config, right now the raid card is just being used for the additional sata ports...

what makes you say that the SMART info looks good? compared to the WD raptors and a few older IDE western digitals i've tested on the same system, which all display 100% perfect ratings for all attributes, this drive shows significant occurances of "raw read error rate", "seek error rate", and "hardware ecc recovered"... are these kinds of results typical for seagate drives? why the difference between them and western digital in SMART readings?
 
what makes you say that the SMART info looks good? compared to the WD raptors and a few older IDE western digitals i've tested on the same system, which all display 100% perfect ratings for all attributes, this drive shows significant occurances of "raw read error rate", "seek error rate", and "hardware ecc recovered"... are these kinds of results typical for seagate drives? why the difference between them and western digital in SMART readings?

I say the SMART data looks good as in there is not any impendending failure indicated, which was the basis for your thread titile.

I have no way of knowing how long and under what conditions those other drives were in service, SMART data is cummulative.

Replace it if you want, I was not and am not interested in a discussion of "whos drive is best" . I still think its a timing issue between the raid card and the drive.

here:

When a drive is under a continuous I/O load and performs its own error recovery, it can easily exceed 8 seconds. During that time, the normal desktop hard drive does not respond. RAID cards will typically wait 8 seconds for a drive to respond, and if the drive does not respond, RAID cards are programmed to take action. The “mis-coordination” of error handling between hard drives and RAID cards occurs when desktop drives are programmed to take responsibility for all error recovery, while RAID cards are also programmed to take responsibility for error recovery.

from
http://www.techworld.com/storage/features/index.cfm?featureid=1019

seach on "raid timeout"

Or replace it, I dont give a crap.
 
Back
Top