SMART errors not caused by bad drive.

chrcoluk

[H]ard|Gawd
Joined
Jul 7, 2008
Messages
1,260
So bit of background story, first it started with this.

https://hardforum.com/threads/samsung-mlc-850-pro-about-to-die.1985642/

The above thread I made when my SATA samsung 850 PRO started behaving oddly, including been unable to boot, I sent it to samsung for RMA after all the problems and logged errors in SMART. They sent it back saying the drive was not defective.

I then tested it in other machines and sure enough it ran at full speed without any errors, but as soon as back in this machine (even with new sata cable) there was crc errors and much lower read speeds. The samsung 830 from my laptop was the same, errors in this pc, but error free in other devices. Only TLC devices with powerful ECC chips can run in this PC error free (on sata), MLC ssd's are only error free when the sata port is capped to SATA2 mode.

So what I ended up doing was getting a m.2 drive, my theory was I had defective traces for the sata ports on the board.

After about 3 months use on the drive I noticed it had 5 "media and data integrity errors" logged in smart, however no ofdd behaviour on the drive. I carried on using it.

Two weeks ago after been lucky enough to get a RTX 3080 at MSRP, I started work on upgrading windows as these gpus require windows 10 (yes I been on windows 8 that long), part of this process I was backing up my games partition on the m.2 as I was planning to make the windows partition bigger than what it is now.

Then problems started, CRC read error in macrium reflect, retried it numerous times, in the same spot CRC read error. Every time this error was logged, two things happened in SMART.
1 - Media and data integrity errors increased.
2 - Available spare decreased.

I ran chkdsk /r multiple times as macrium suggested, and eventually I was able to do a backup without CRC read errors, but by the end of it available spare was down to a very low numbers and over 50 media and data integrity errors.

I then removed my cpu o/c, set ram to stock, and tried another backup, success again but media and data integrity errors still increasing on the backup.

At this point I removed the drive, did the win10 upgrade on a spare MX500 (which is what I am currently running on). As its a TLC drive there is no smart errors.

So I thought I dont want to send this again for another failed RMA especially as now with brexit samsung have gone from free shipping to customer pays shipping and cheapest quote to just send it is almost £20 and thats uninsured.

I put it in my spare ryzen rig, and ran a macrium backup, I even selected the option to backup empty sectors to force every sector to be read, I was praying for a media and data integrity error, but zero, a flat nothing, did it 3 times.

To me the nightmare scenario is swapping out my board, and even that isnt a sure fix as I simply dont know whats going on here, I have always thought if bad ram is corrupting data, that would result either in silent written corruption or system instability like bsod's or crashing apps. I never thought something like that could actually move SMART counters. However given what happened with my 850 pro it seems a failure external to a drive can actually cause it, and this is now making me think the drive is fine but is something in my main rig causing this problem, but the question is what? I can get smart errors with no overclocking, I have stress tested the ram like crazy, it seems I might have some kind of rare fail condition like a damaged SOC, DMI bus or something.

I welcome any input on this, insight into how smart can increase and so forth and if its still worth trying the RMA, I feel I cant trust a drive with smart errors, and at the same time buying a new drive isnt a solution if the board is the problem.
 
Last edited:
I've seen defective / incompatible RAM cause things like this as well as a bad PCH (chipset) or various motherboard related issues.
 
Thanks, what I am going to try is simply swap in the ram from the ryzen rig since thats easy enough to do for diagnostics, and see if the behaviour persists, although its pain swapping the drive itself (hate the little finicky m.2 scews).

I feel very uncertian about this, as the errors are always in the same spot, and is odd that chkdsk /r when done enough times has since allowed the backup to finish (and since then I only now get 1-2 errors per run instead of about 6-8). But at the same time I would expect the behaviour to persist in other machines.

I perhaps should have thought of doing the test in the other machine before running the chkdsk /r commands.

Do you think drivers or OS version could possibly have this impact? What I havent done yet is test the m.2 drive on this machine in windows 10, although a clean win10 install when testing the 850 pro didnt fix anything.
 
I have made some progress on this, I remembered the ssd problems were first noticed within months of upgrading my cpu, so I compared the voltages (took a photo of bios screen on phone), and I made some adjustments, retested ram again as well.

So here is a summary.

Ram swap didnt change any results.
Turned down system agent voltage, and reduced ram clock speed.
Retested ram on new settings for several hours still 0 errors, also since this cpu has gone in no bsod's, no whea errors, no app crashes.
Tested samsung 830, and the errors are gone even on sata3 gen selected in bios.

I have yet to test/move the m.2 back in here as I ran out of time, been a bit busy with other stuff as well last 2 days.
 
Well I just got two 250 GB Samsung 870 EVO installed in two AMD computers. The 870 just released a month ago and my first experience with SSD. After installing the OS and running benchmarks and checking SMART I noticed the CRC Error Counts increased with each benchmark run. I did some lengthy reading and many say it's an incompatibility with AMD motherboard SATA controllers using ASMedia chips and there may be others. The benchmark scores were within reason and I haven't experienced any problems like freezes as some have reported. I currently don't know how serious I should take the CRC Error Count if everything else seem to work fine. I may get a SATA controller card and see if the problem goes away.
 
Well you wont necessarily notice anything odd, but might hit you at a later date when you try to read a file and the OS or app tells you that it cant be read. In nearly all my cases I only really notice when something has gone wrong. e.g. with my 850 pro my OS locked up with the i/o light permanently on and task manager reporting 100% usage on the ssd, on the reboot the ssd was not showing as a boot device and then after a power cycle the BCD was corrupt.

On the 970 evo I noticed whilst I was making a backup, macrium reported a CRC read error and aborted.

Now the 970 evo has been back in this machine for couple of weeks or so now and no new errors, however my backup spindle now has a bunch of pending sectors, which started going up after macrium failed to restore a backup. I am in the process of emptying the drive so I can do more diagnostics to see if the pending go down or turn into reallocated sectors. I was feeling confident, this hdd issue may just be coincidence but has me paranoid again.
 
a little off type of drives but iv been having horrid luck with seagate drives of late the 8tb compute drives and archive, im getting alot of the cycle recount errors 98/100 in crystal disk and hwinfo. recently tried building a nice lil nas/server with 2 of these drives jboded em together starting copying over my back up data from current nas, about 85% in source disk not available kept cropping up untill a reboot then i could copy rest of data, dis chkdsk /f to see it didnt find any issues.

wonder if it s just me but that seems to be a issue with ever y seagate iv bought had to get it rma like 2 weeks after gettin them.
 
One could argue luck is catching up with me, I dont think I have any drive fail whilst in my main rig for over a decade before these issues, I have usually only seen drives fail after I retire and later try to reuse or demote them to 2nd rig.

My situation is scary, the drive causing issues right now is my newest spindle, a 14 month old WD red 4TB.
I checked the warranties of all of my WD red's and some are approaching 7 years of age, the cost of replacing them all is a fair chunk of change though.

Personally I dont use seagate. I dont think thats out of lack of trust, but rather I just stayed loyal to WD as up until 18 months ago they had a perfect record for me. To be fair to them also when I did my first ever RMA with them 18 months ago, the drive was out of warranty and they said no problem, so that was hard to fault, was good service.
 
same with seagate i had a external 8TB used on my old xbox 1 x die they could tell by the serial it was a external drive but still sent me a internal one even out of warranty, only prob is thats the drive that has stopped working as of las t week it was used for maybe 25 mins to test if it worked, it did, built a 9900k system put it together dead no power no life what so ever.
 
I wasnt able to make the pending sectors change to reallocated, instead they went back to 0 on a full format. subsequent chkdsk /r's have yielded no errors. Very odd. The m.2 still has no new errors or change to spare sectors.
 
i had 2 rmas with these drives one came back tested worked for test wen back into bubble wrapped enclosure, came out for a build dead. no power. lol so heres hoping this replacement isnt like last.
 
Back
Top