CRC Error Count

Sepe

n00b
Joined
Jul 1, 2010
Messages
57
Hello, I reviewed HHD 3 TB WD Black and I see this message: C7 Interface CRC Error Count. It’s bad? What I have to do? Thanks.
 

_CiPHER_

Weaksauce
Joined
Sep 30, 2015
Messages
74
Those are cabling errors.

The fact that you 'see' this in the SMART output is normal; post the SMART output here to get relevant replies. The raw value is the place to look at. If it is above 0, then you had cable errors in the past.

It does not mean you still have cabling errors, though. Only if the value keeps increasing, this will be the case. So keep an eye on the raw value and make a copy of your SMART output periodically.
 

SomeGuy133

2[H]4U
Joined
Apr 12, 2015
Messages
3,447
Those are cabling errors.

The fact that you 'see' this in the SMART output is normal; post the SMART output here to get relevant replies. The raw value is the place to look at. If it is above 0, then you had cable errors in the past.

It does not mean you still have cabling errors, though. Only if the value keeps increasing, this will be the case. So keep an eye on the raw value and make a copy of your SMART output periodically.

how does that affect usb drives. I get that all the time on my USB drive and I don't know what to do. Is it a cable or HUB?
 

staticlag

[H]ard|Gawd
Joined
Mar 26, 2010
Messages
1,679
how does that affect usb drives. I get that all the time on my USB drive and I don't know what to do. Is it a cable or HUB?

CRC error is showing interference somewhere, could be cable, power supply, nearby florescent lights, static electricity....
 

staticlag

[H]ard|Gawd
Joined
Mar 26, 2010
Messages
1,679
now it is but in the past it weasnt

CRC counter never goes back down, only increments up AFAIK. Most likely you have to update your program's drive database. Gotta be a software error. Or its the OS or BIOS doing power saving stuff to the USB link. Either way the drive looks perfectly fine.
 
Last edited:

SomeGuy133

2[H]4U
Joined
Apr 12, 2015
Messages
3,447
CRC counter never goes back down, only increments up AFAIK. Most likely you have to update your program's drive database. Gotta be a software error. Or its the OS or BIOS doing power saving stuff to the USB link. Either way the drive looks perfectly fine.

k. I thought that number goes up if issue is there but goes back to zero if no issue but who knows
 

patrickdk

Gawd
Joined
Jan 3, 2012
Messages
744
Those are not cable errors, or even CRC errors.

Those are blocks of data it read from the disk, that did not pass the checks that it was valid, but the ECC infomation *corrected* it, without any problem.

This could happen for any number of reasons, disk was bumped while it was writing, corrupted data was written (but only alittle of the data was corrupted), bad spot on the disk, ...

Some drives, this is always rather high, most drives, it is low, just depends, not really something I would worry about.

(Oh, just noticed, two issues in this thread)

The first issue, CRC errors, ya, that is a cable problem, try pluging and replugging the cable, maybe change the cable. It doesn'tcause any real harm, except if bad enough, will slow the transfer speed, due to retry attempts.
 

michalrz

2[H]4U
Joined
Jun 4, 2012
Messages
3,738
You need to watch that one. It's a connectivity issue. It used to happen to me all the time back when SATA cables didn't have latches. Even now it's still a huge issue because after latching there's still enough wiggle room to cause UDMA CRC Errors to go up.
It has to be addressed ASAP because I had plenty drives interpret those as media errors and healthy sectors would become reallocated every few days.
 

SomeGuy133

2[H]4U
Joined
Apr 12, 2015
Messages
3,447
Weird. I have no reloctes and its a mlc usb 3.0 flash drive in that link. So bad usb cabe or bad hub?
 

michalrz

2[H]4U
Joined
Jun 4, 2012
Messages
3,738
Well, depends on the case, but if you see 'current pending sectors', 'realocation events', 'offline uncorrectable' rise from 0 then expect instability. Overall, if you repair the connection and perform a full drive scan and then shutdown the computer then it should clear the 'media related' errors back to 0, because it'll have an opportunity (during the full scan) for a 'second look' and if the sector is good and the error was interface related.
In your case I would say it could be either. Start by getting something like HDD guardian - it sits in your tray and monitors changes in SMART params. Set it to report UDMA CRC and you're set.
Remove excess /unneeded USB devices to free power in the hub for the time being.
If that doesn't help, change the USB cable.
 

SomeGuy133

2[H]4U
Joined
Apr 12, 2015
Messages
3,447
Well, depends on the case, but if you see 'current pending sectors', 'realocation events', 'offline uncorrectable' rise from 0 then expect instability. Overall, if you repair the connection and perform a full drive scan and then shutdown the computer then it should clear the 'media related' errors back to 0, because it'll have an opportunity (during the full scan) for a 'second look' and if the sector is good and the error was interface related.
In your case I would say it could be either. Start by getting something like HDD guardian - it sits in your tray and monitors changes in SMART params. Set it to report UDMA CRC and you're set.
Remove excess /unneeded USB devices to free power in the hub for the time being.
If that doesn't help, change the USB cable.

I have sentinel did you not even look at the picture -_- Alrioght I am ignoring everything you said since you didn't even bother to look at the picture -_-
 

patrickdk

Gawd
Joined
Jan 3, 2012
Messages
744
The ECC corrected won't cause pending or offline uncorrectable to go up, They are different.

ECC corrected is a disk it read fine, but some data was damaged.

offline uncorrectable, is a sector it has had to retry reading many many times, and is too damaged for ECC to fix it.

current pending sectors, is an offline uncorrectable that has been marked as damaged beyond repair and recovery attempts, and is waiting for a write for it to be reallocated.
 

_CiPHER_

Weaksauce
Joined
Sep 30, 2015
Messages
74
patrickdk, i think that is not really correct.

In my experience:

ECC corrected means ECC error correction was necessary to read the sector properly (i.e. without bits being flipped/corrupted). Not all reads require ECC, but many reads do since harddrives with high data density have long passed the point of being able to do their job without ECC. ECC is mandatory today. Unlike DRAM memory for example which can do without and ECC is just a plus to increase reliability to levels enterprise users feel comfortable with.

Current Pending Sector are sectors where after applying ECC the data is still no good. If there was more ECC correcting bits available, the data might be reconstructed properly, but with the ECC ability available, it is not enough and the entire sector is rejected. The ATA standard prohibits the drive from returning data to the host which is known to be corrupt. So the drive will return an I/O error instead after it gives up sector recovery. But it will not forfeit the ability to recover the data - meaning it will not do anything like overwrite or replace that sector - unless either 1) the data is successfully read (rare) or 2) the data is overwritten by the host. Upon 2) the disk will try to read the sector again. If it reads back normal, the sector stays in use and is NOT swapped with a reserve sector (sector reallocation). Only if after overwriting the sector still cannot be read, it counts as physically damaged and will be replaced by the drive - i.e. swapped for a reserve sector.

In the case that the sector is reallocated, the Reallocated Sector Count is increased. But in about 90% of bad sectors today, they are only unreadable due to limited ECC available - known as uBER bad sectors. This is specified by the drive as uncorrectable Bit-Error Rate (uBER) and most consumer drives get 10^-14 which means up to 1 bad sector per day at 100% duty cycle, or once per 4-6 months with light consumer usage. This concerns healthy sectors - not sectors with physical damage. The drive is basically designed to cause unreadable sectors. Enterprise drives get up to 10^-16 uBER specification, meaning this problem is up to 100 times less, causing a factor 100 less bad sectors. Again, this concerns bad sectors that are NOT physically damaged. And uBER can fluctuate wildly with specific drives - it is only roughly valid for very large groups of disks.

Offline Uncorrectable is the same as Current Pending Sector but is not updated immediately (online) but instead of updated only periodically (offline). This can reveal evidence where upon overwriting unreadable sectors all evidence ine SMART had disappeared. The Current Pending Sector is reset back to 0, but when no sectors are swapped for reserve sectors, Reallocated Sector Count also stays at 0. Only the Offline Uncorrectable can betray that there were actually bad sectors in the recent past. I personally think SMART should be extended with a counter that never is reset, but counts the total number of unreadable sectors, much like UDMA CRC Error Count works for cabling errors. The raw value for this attributed is never reset either, but just stays at its value if no further cabling errors occur.
 

SomeGuy133

2[H]4U
Joined
Apr 12, 2015
Messages
3,447
patrickdk, i think that is not really correct.

In my experience:

ECC corrected means ECC error correction was necessary to read the sector properly (i.e. without bits being flipped/corrupted). Not all reads require ECC, but many reads do since harddrives with high data density have long passed the point of being able to do their job without ECC. ECC is mandatory today. Unlike DRAM memory for example which can do without and ECC is just a plus to increase reliability to levels enterprise users feel comfortable with.

Current Pending Sector are sectors where after applying ECC the data is still no good. If there was more ECC correcting bits available, the data might be reconstructed properly, but with the ECC ability available, it is not enough and the entire sector is rejected. The ATA standard prohibits the drive from returning data to the host which is known to be corrupt. So the drive will return an I/O error instead after it gives up sector recovery. But it will not forfeit the ability to recover the data - meaning it will not do anything like overwrite or replace that sector - unless either 1) the data is successfully read (rare) or 2) the data is overwritten by the host. Upon 2) the disk will try to read the sector again. If it reads back normal, the sector stays in use and is NOT swapped with a reserve sector (sector reallocation). Only if after overwriting the sector still cannot be read, it counts as physically damaged and will be replaced by the drive - i.e. swapped for a reserve sector.

In the case that the sector is reallocated, the Reallocated Sector Count is increased. But in about 90% of bad sectors today, they are only unreadable due to limited ECC available - known as uBER bad sectors. This is specified by the drive as uncorrectable Bit-Error Rate (uBER) and most consumer drives get 10^-14 which means up to 1 bad sector per day at 100% duty cycle, or once per 4-6 months with light consumer usage. This concerns healthy sectors - not sectors with physical damage. The drive is basically designed to cause unreadable sectors. Enterprise drives get up to 10^-16 uBER specification, meaning this problem is up to 100 times less, causing a factor 100 less bad sectors. Again, this concerns bad sectors that are NOT physically damaged. And uBER can fluctuate wildly with specific drives - it is only roughly valid for very large groups of disks.

Offline Uncorrectable is the same as Current Pending Sector but is not updated immediately (online) but instead of updated only periodically (offline). This can reveal evidence where upon overwriting unreadable sectors all evidence ine SMART had disappeared. The Current Pending Sector is reset back to 0, but when no sectors are swapped for reserve sectors, Reallocated Sector Count also stays at 0. Only the Offline Uncorrectable can betray that there were actually bad sectors in the recent past. I personally think SMART should be extended with a counter that never is reset, but counts the total number of unreadable sectors, much like UDMA CRC Error Count works for cabling errors. The raw value for this attributed is never reset either, but just stays at its value if no further cabling errors occur.

actually RAM need ECC especially at 32GB plus if accuracy is important. At least thats what I read from various articles. RAM at 32GB will have errors daily even when not pushing a lot of transfers
 

_CiPHER_

Weaksauce
Joined
Sep 30, 2015
Messages
74
Well i have 32GiB RAM at three of my NAS systems which use ZFS. This means memory bitflips will be detected in the form of disk corruption. Not one checksum error has arisen in the many years of service (4-5 years i think). So ECC for RAM is nice but in many circumstances you can do without.

If you say more than one bitflip per day i really would like to see a source on that. Because if that is true, then virtually all systems would become unusable/unstable and ZFS would generate tons of checksum errors 'daily' which is simply not true. I do know RAM bitflips are dependent on the duty cycle though. Many consumer-grade systems have very low duty cycle (1-2%) while heavily used servers can push beyond 50%.
 

SomeGuy133

2[H]4U
Joined
Apr 12, 2015
Messages
3,447
Well i have 32GiB RAM at three of my NAS systems which use ZFS. This means memory bitflips will be detected in the form of disk corruption. Not one checksum error has arisen in the many years of service (4-5 years i think). So ECC for RAM is nice but in many circumstances you can do without.

If you say more than one bitflip per day i really would like to see a source on that. Because if that is true, then virtually all systems would become unusable/unstable and ZFS would generate tons of checksum errors 'daily' which is simply not true. I do know RAM bitflips are dependent on the duty cycle though. Many consumer-grade systems have very low duty cycle (1-2%) while heavily used servers can push beyond 50%.

Newegg was one source I have seen this stated before. It was the easier place to find but I have seen this stated other places before. I would assume newegg wouldn't post bad information like this and it wouldn't stay on their site for years or how ever long that has been on their site

http://www.newegg.com/product/CategoryIntelligenceArticle.aspx?articleId=126

what do you mean duty cycle?

BTW I send that usb drives report to HD sentinels owners for review.
 
Last edited:

patrickdk

Gawd
Joined
Jan 3, 2012
Messages
744
I failed to see what I said was wrong.

ECC error, it read the disk fine, with small amount of corruption.

If the read was worse, could be alignment issue or worse, the disk was not read fine, it is having major issues.
 

SomeGuy133

2[H]4U
Joined
Apr 12, 2015
Messages
3,447
I failed to see what I said was wrong.

ECC error, it read the disk fine, with small amount of corruption.

If the read was worse, could be alignment issue or worse, the disk was not read fine, it is having major issues.
ok nm i see what is being talked about now...sorry i got lost on who was referring to waht
 
Last edited:

_CiPHER_

Weaksauce
Joined
Sep 30, 2015
Messages
74
Well, i explained it in my post, but if you want additional comment on it, here goes:
offline uncorrectable, is a sector it has had to retry reading many many times, and is too damaged for ECC to fix it.
This description is pretty much accurate, but concerns Current Pending Sector instead. Also, it will apply to the first unsuccessful read attempt - often having triggered recovery. But the recovery can be interrupted by a RESET command.

current pending sectors, is an offline uncorrectable that has been marked as damaged beyond repair and recovery attempts, and is waiting for a write for it to be reallocated.
This appears to suggest that all 'Current Pending Sectors' are physically damaged and will be swapped for a reserve sector when overwritten. This, however, is not true. In fact, about 90% of pending sectors are without physical damage and will NOT be swapped for a reserve sector when overwritten. It will simply keep being used, because the sector itself is not damaged or anything. It just had bit errors that exceeded the ECC correcting capability. And upon overwriting/refreshing the sector, it will read back normal without exceeding the ECC capability.

Also, it appears to suggest that Current Pending Sector is different from Offline Uncorrectable. But the only difference is that Offline Uncorrectable is an offline SMART attribute, while Current Pending Sector is online and thus is updated on-the-fly and always accurate, whereas Offline Uncorrectable is not. Upon overwriting the bad sectors, the Current Pending Sector can be 0 while Offline Uncorrectable is not. It is the only evidence remaining that there ever were bad sectors in the past.
 

JoeComp

[H]ard|Gawd
Joined
Jan 23, 2012
Messages
1,036
Little of that is necessarily true. It may be true for some drives from some manufacturers, but it is not true for all drives from all manufacturers. Statements like "90% of pending sectors are without physical damage" are not useful, since if they are correct at all, they are only correct for certain drives from certain manufacturers under certain conditions.

One of the problems with SMART attributes is that they are not well-defined, and can vary a lot between manufacturers. When you get into the sorts of specifics that you are mentioning, you are going to be incorrect for some -- and probably many -- drives.

Note that attribute 198 is not necessarily ONLY uncorrectable sectors detected during off-line SMART testing. Some manufacturers include in attribute 198 uncorrectable sectors detected during off-line SMART testing, during SMART self-test, or during drive operation, or some combination.

Attribute 197 also can vary between drives. More specifically, the way the count gets decremented varies. The way the count increments is fairly consistent among drives (it gets incremented when there is an uncorrectable read error, UNC). As for decrementing it, that can happen if the drive tries to read a previously UNC and successfully reads it -- in that case the drive will usually reallocate the sector and write the newly read value to the remapped sector. However, some drives may try to rewrite the same sector that was previously UNC. Another way it can be decremented is if there is a write request to the sector, then the drive will possibly reallocate the sector. But it may also try to write the previously UNC sector, and if the write succeeds, then it will not reallocate the sector but just use the old sector and decrement the pending count.

And there is no such distinction as "offline" and "online" SMART attributes, such that some are always guaranteed to be accurate. That is not what is being referred to at all. In the SMART attributes, Offline (sometimes written off-line) refers to the off-line SMART test, which runs automatically on some drives during idle time (can be enabled or disabled on some drives), or can be forced to run by sending a SMART command.
 
Last edited:

patrickdk

Gawd
Joined
Jan 3, 2012
Messages
744
Current pending sectors, WILL get reallocated when written to.

If you don't write to it, and it gets a successful read of it, it may mark it as no longer pending, or it may reallocate it. Depends on the drives firmware.

This is the only time I have ever seen a Current Pending Sector go down, without reallocated go up.
 

JoeComp

[H]ard|Gawd
Joined
Jan 23, 2012
Messages
1,036
Current pending sectors, WILL get reallocated when written to.

No, not always. It depends on the drive. As I said, some drives will try writing to the same sector and if the write completes without error, then they will not reallocate it.

Here is an excerpt from the wikipedia entry about SMART attribute 197:

https://en.wikipedia.org/wiki/S.M.A.R.T.

However some drives will not immediately remap such sectors when written; instead the drive will first attempt to write to the problem sector and if the write operation is successful then the sector will be marked good (in this case, the "Reallocation Event Count" (0xC4) will not be increased).

The wikipedia article also says that this is not a good feature, and I tend to agree. I'd rather have the drive remap the sector.
 

_CiPHER_

Weaksauce
Joined
Sep 30, 2015
Messages
74
Little of that is necessarily true. It may be true for some drives from some manufacturers, but it is not true for all drives from all manufacturers. Statements like "90% of pending sectors are without physical damage" are not useful, since if they are correct at all, they are only correct for certain drives from certain manufacturers under certain conditions.
They generally are true for all recent consumer-grade drives by a simple function of mathematics. Consumer-grade drives get specified uBER 10^-14. For lower capacities and duty cycles, this results in very few bad secors due to insufficient ECC correcting capability, but with todays drives with very high data density and high throughput, the same 10^-14 specification results in up to one bad sector per day (100% duty cycle), on average, when testing many samples over a long period of time. These bad sectors are not due to physical damage, but simply because of insufficient ECC correcting capability.

In reality, what i think is happening is that with every increase of data density by bumping the platter capacity, the problem of uBER bad sectors becomes bigger. The number of physically damaged sectors doesn't need to grow at all. If this is true, then it is simply a matter of time that 10 - 15 years ago the problem of uBER was not big at all, while with todays high capacity PMR drives, the uBER is such a concern that techniques such as RAID5 are not feasible any longer, and even RAID6 is quickly losing its polish.

This very problem - of uBER becoming a problem due to increasing data densities - is discussed in these articles:

Why RAID5 stops working in 2009
Why RAID6 stops working in 2019

One of the problems with SMART attributes is that they are not well-defined, and can vary a lot between manufacturers. When you get into the sorts of specifics that you are mentioning, you are going to be incorrect for some -- and probably many -- drives.
Possibly, but i have not had one confirmed case where this was the case. Current Pending Sector, Reallocated Sector Count and UDMA CRC Error Count all seem to behave equal on all drives i ever saw the SMART output for, in all my years of experience.

Note that attribute 198 is not necessarily ONLY uncorrectable sectors detected during off-line SMART testing. Some manufacturers include in attribute 198 uncorrectable sectors detected during off-line SMART testing, during SMART self-test, or during drive operation, or some combination.
What i understand of 198 (Offline Uncorrectable) is that it is the same as Current Pending Sectors - which often get detected by host reads, not by SMART tests - but that this SMART attribute gets collected/updated offline, and not online like Current Pending Sector (197).

Attribute 197 also can vary between drives. More specifically, the way the count gets decremented varies. The way the count increments is fairly consistent among drives (it gets incremented when there is an uncorrectable read error, UNC). As for decrementing it, that can happen if the drive tries to read a previously UNC and successfully reads it -- in that case the drive will usually reallocate the sector and write the newly read value to the remapped sector. However, some drives may try to rewrite the same sector that was previously UNC. Another way it can be decremented is if there is a write request to the sector, then the drive will possibly reallocate the sector. But it may also try to write the previously UNC sector, and if the write succeeds, then it will not reallocate the sector but just use the old sector and decrement the pending count.
Are you sure this is correct? Because in my experience there is no difference in how drives handle Current Pending Sector.

It gets increased once the drive knows there is an unreadable sector. Such a sector can be with physical damage, or without physical damage (uBER). Such a sector can only disappear, when:

1) during a read request by the host, or SMART long-test, or background media scanning (BGMS) the contents of the sector is recovered without error, after applying ECC correction.

2) the sector is overwritten with new data by the host - after which the original data is forfeited.

Upon both of these, Current Pending Sector is decreased. Whether Reallocated Sector Count (5) is increased, depends on whether the sector was physically damaged or not. If after overwriting the sector still cannot be read, then the sector is swapped for a reserve sector instead. The latter would count as physical damage, while the former would count as uBER bad sector - which is increasingly becoming an issue and is why new generation filesystems like ZFS, Btrfs and ReFS are so high in demand with consumer-grade harddrives. Microsoft wanted to add redundancy in their Drive Extender 2.0 software included in future Windows Home Server product. But they cancelled DE 2.0 because of incompatiblity with existing NTFS storage, and opted to further develop ReFS instead.

And there is no such distinction as "offline" and "online" SMART attributes
There is, they are included in the SMART 'flags' which indicate what those attributes do.

I could not find a good online source of all SMART flags in a simple search query, but you can often find them explained in SMART applications which tell whether a SMART attribute is updated on-the-fly (online) or only periodically (offline).

Some SMART applications also have the name of the 198 attribute spelled as 'Offline Uncorrectable Sectors'. For example: http://www.hddstatus.com/hdrepanalysis.php and http://www.hdsentinel.com/smart/index.php.

Also, SmartMonTools lists the description when using the -x parameter, like this:

Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0

                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

The O flag here describes the attributes as being updated on-the-fly or Online, while without O flag it would be offline like Offline Uncorrectable in the example above (Toshiba drive).

I think your best argument is that SMART is not very well defined, no strict standardised specification. This leaves margin for brands and individual products to differ from the norm. But in my experience this is true for specific SMART attributes and not the most important and most ubiquitous SMART attributes: Reallocated Sector Count, Current Pending Sector and UDMA CRC Error Count. I think those pretty much work the same for all drives, at least those i have seen in my years of experience.
 

JoeComp

[H]ard|Gawd
Joined
Jan 23, 2012
Messages
1,036
The O flag here describes the attributes as being updated on-the-fly or Online, while without O flag it would be offline like Offline Uncorrectable in the example above (Toshiba drive).

No, that is incorrect.

As I already explained, when referencing an attribute, "offline" or "always" refers to whether the attribute is updated only during offline testing, or whether it is updated during normal operation of the device or during both normal operation and off-line testing.

As for attribute 198, I repeat that it is not always ONLY uncorrectable sectors detected during off-line testing (it may be for some drives, but not all drives). Some drives increment that count during self-test, or during normal operation. Some drives do not even include the word "offline" in the description of attribute 198.
 

_CiPHER_

Weaksauce
Joined
Sep 30, 2015
Messages
74
What exactly is incorrect about it? Just trying to learn from you by the way, not trying to nitpick. :)

I can at least confirm that Offline Uncorrectable can trigger without ever having run a SMART test, but only during host reads. Again, it may be possible that individual drives or brands behave differently, but i have not seen any evidence of this thus far. I think this attributes is basically the same as Current Pending Sector but is updated more slowly (offline) instead of on-the-fly (online).

If you have better information than me, then i would like to learn about it.
 

_CiPHER_

Weaksauce
Joined
Sep 30, 2015
Messages
74
Could you show me the SMART output of one drive where attribute #198 does have the 'Online' flag set? I would be surprised if you can find one. The name of the attribute indeed can vary, but i think the actual mechanic behind this attribute does not. Only the (offline) period when it is updated, may vary. I have not seen any different behaviour in the wild.
 
Top