SMART: really?

XTF

Gawd
Joined
Oct 11, 2011
Messages
591
I've got two Samsung 1 TB (HD103UJ) drives. One is going bad, has bad sectors, SeaTools says drive self test is failing. SMART, however, is saying nothing. Why doesn't it say anything at bootup?
 
I've worked on MANY computers with failing hard drives, and I've only seen a 2 or 3 get reported on boot up even with SMART monitoring turned on in the bios. Usually you have to run a tool on them like HDTune or CrystalDiskInfo. Some times you will get messages in the windows event log saying that device such and such has a bad block.
 
Windows should really report bad blocks better than hiding them in some log.
 
Because SMART pass / fail generally has too high of a threshold to the point if you get a fail the drive is probably almost unusable at that point.. At work where I have RMA'd 75+ hard drives over the last 10 or so years only 1 of them had a SMART fail status and that happened late last year. What I do to is instead look at the raw values of the individual SMART attributes and understand what attributes are the most important. For example if I see a drive with the raw "Reallocated Sectors Count" at a low number (less than 20) I am usually not very concerned.This could be just an isolated media defect. However if this number grows by the day or week then I watch the drive very closely. One of the heads could be going bad. In every single case when this has begun the drive has totally been unreadable shortly after. This was usually less than 3 weeks. Other things to look at are

"Current Pending Sector Count" and "Uncorrectable Sector Count"

these should be at 0 however I have had new drives that had low non zero ( 1 to 4) values for these and after a full write of the disk the value went back to 0.

http://en.wikipedia.org/wiki/S.M.A.R.T.

If you do not want to learn this. CrystalDiskInfo a free program for the windows platform can tell you if things do not look good in the smart.
 
Last edited:
Hmmm, interesting. I have 6 600GB WD blue drives in a 3x2 raid10 zfs array. 5 drives have no pending sectors or bad sectors or anything at all. One has 6 pending sectors and 11 uncorrectable sectors, but zero for reallocated sectors. Not a new drive (several years old in fact.) Should I be concerned?
 
Pending and Uncorrectable sectors are sectors that the drive had data on them but was unable to read the sector the last time you tried to read it. If you did not write to every sector yet on the drive this could have been from the factory initialization which is the same case I was talking about that I was able to fix with a full drive write (same manufacturer as well - WDC black in my case). However if you did write a sector and the drive can not read it any more it could be a problem with the heads or possibly a just media defect. Do you scrub your zfs?


Edit: I know I was a little short on this. I am just about to leave for work. I will try to add some later.
 
Last edited:
How are you reading SMART values in a RAID? I have tried on my server, but the only thing I can get is the Dell OpenManage Server Administrator. It's not SMART it just tells me if it's good or bad.

I have another computer with a promise sata controller with no raid. Just 4 sata disks and I can't even read SMART values on that.

Learn me something!
Thanks
 
I use mostly linux software raid. However some raid cards do expose the SMART via their own program ( I have seen that withe 3ware cards) also I believe some work with smartmon tools ( I have not tried that).
 
SpinRite 6.0 is fantastic at scanning and finding problems before they become serious.
 
Pending and Uncorrectable sectors are sectors that the drive had data on them but was unable to read the sector the last time you tried to read it. If you did not write to every sector yet on the drive this could have been from the factory initialization which is the same case I was talking about that I was able to fix with a full drive write (same manufacturer as well - WDC black in my case). However if you did write a sector and the drive can not read it any more it could be a problem with the heads or possibly a just media defect. Do you scrub your zfs?


Edit: I know I was a little short on this. I am just about to leave for work. I will try to add some later.

I scrub my data pool weekly and zero errors have shown up. On the other hand, the pool is not even half full, so possibly the sectors in question have never been used. I am thinking of pulling that drive (all 6 are in hot plug trays), moving it to my win7 workstation and checking with crystal disk info, and/or doing a complete block by block write.
 
Because SMART pass / fail generally has too high of a threshold to the point if you get a fail the drive is probably almost unusable at that point..
Is it the BIOS or the drive that decides the threshold?
 
I think in if CDI shows me any questions about the drive, I'll do a complete block-level overwrite, re-run CDI and see what it shows. If all looks good, re-insert in the disk enclosure and let the pool resilver.
 
Well, time for a new disk, methinks. Zeroed it out with 'dd' using sysrescuecd. Rebooted win7 and reran CDI. The uncorrectable and pending went away, The bad news: the reallocated sector count is 140!
 
No way. Drive is 3+ years old :( And with disk prices thru the roof, I have to bite the bullet and get something that size or greater...
 
Shop around locally and see if there are any external drives that for some reason didn't get price hiked...Walmart can be good for this sometimes. Just pulled a 1TB seagate out of a free agent desk 3.5", was USB 2.0/3.0, model is 7200.12 - pretty fast when not used externally, they had it for 79.99 on clearance. The normal price for it was $139.99...lol.
 
Well, time for a new disk, methinks. Zeroed it out with 'dd' using sysrescuecd. Rebooted win7 and reran CDI. The uncorrectable and pending went away, The bad news: the reallocated sector count is 140!

Try another dd and see if the number grows again. There is a possibility that there is one or more isolated media defects. I have had a few drives that have 100 or so reallocated sectors but then it does not grow and the drive has run for years after that. However if you run it again (and again) and it keeps growing this is a sign of a head problem. Most of the drives I RMA exhibit the this final behavior I have mentioned.
 
Last edited:
Weird, rebooting, it's now (in opensolaris) showing no remaps, but UEC is now 11 again. I wonder if OS is reading the smart stuff wrong or something? My concern is this is a production server (albeit home), and this destructive test is a couple of hours. Might need to anyway tho...
 
Well, I feel better. Pulled the drive and plugged it back into the win7 box. Re-ran CDI. I was reading the wrong column. The 140 was the reallocation threshold - the actual value was zero. So the drive looks to still be okay...
 
Back
Top