Scary SMART Values on New Seagate Enterprise Drives?

Zarathustra[H]

Fully [H]
Joined
Oct 29, 2000
Messages
31,221
Hey all,

So, I'm 4 disks into my "swap all 12 drives in my NAS with larger drives and resilver to grow my ZFS pool" project, and decided to go with 10TB Seagate Helium Enterprise drives (ST10000NM0016)

The four drives I have thus far come from two orders of two drives each. Two from Newegg and two from Amazon.

All drives passed the following tests before being resilvered into the pool:
- SMART Short test
- SMART Conveyance Test
- Badblocks write test (all four test patterns, taking ~5 days)

I've done some reading on this in the past where it was suggested that one pretty much ignore the "RAW VALUES" in SMART readouts from Seagate drives as they probably don't mean what you think they mean, and instead use Seatools for any diagnostics.

The problems I have with Seatools:

1.) The Linux version is old and not maintained and didn't appear to give me any useful information.

2.) The Windows version might be more fully featured, but it doesnt seem to recognize Seagate hard drives as true Seagate hard drives when sitting in a USB dock, and I don't have a box with easily accessible SATA ports I can stick one in to right now

3.) There is a DOS version, but that requires taking my server offline and booting from a Freedos USB stick. My server may not be production, but it is more "home production" than lab, so this would be really inconvenient.


Why I am concerned:

I know "ignore the RAW VALUE field" is what I've found when googling inthe past, but what about the three digit weighted values? Are those to be ignored as well?

Just look at some of these, notably the "Hardware_ECC_Recovered" looks pretty scary on all of these:

Disk 1:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   075   064   044    Pre-fail  Always       -       32160040
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       4
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   081   060   045    Pre-fail  Always       -       129401214
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       494 (1 186 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       3
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   088   088   000    Old_age   Always       -       12
190 Airflow_Temperature_Cel 0x0022   071   055   040    Old_age   Always       -       29 (Min/Max 26/35)
191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4187
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       5
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       30
194 Temperature_Celsius     0x0022   029   045   000    Old_age   Always       -       29 (0 19 0 0 0)
195 Hardware_ECC_Recovered  0x001a   027   003   000    Old_age   Always       -       32160040
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       491 (144 32 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83916735051
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       83452883922

Disk 2:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   080   066   044    Pre-fail  Always       -       91957936
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       4
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   079   061   045    Pre-fail  Always       -       81905155
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       283 (213 63 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       4
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   090   090   000    Old_age   Always       -       10
190 Airflow_Temperature_Cel 0x0022   072   067   040    Old_age   Always       -       28 (Min/Max 20/33)
191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4777
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       16
194 Temperature_Celsius     0x0022   028   040   000    Old_age   Always       -       28 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a   006   002   000    Old_age   Always       -       91957936
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       273 (12 162 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83844942895
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       82966201892

Disk 3:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   079   064   044    Pre-fail  Always       -       70937936
  3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       2
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   077   060   045    Pre-fail  Always       -       53737869
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       158 (50 181 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       2
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
190 Airflow_Temperature_Cel 0x0022   071   067   040    Old_age   Always       -       29 (Min/Max 24/33)
191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4720
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       14
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   007   006   000    Old_age   Always       -       70937936
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       157 (185 136 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83754676019
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       78155955092

Disk 4:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   065   044    Pre-fail  Always       -       193684680
  3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       2
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   077   061   045    Pre-fail  Always       -       54155047
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       158 (142 118 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       2
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   071   067   040    Old_age   Always       -       29 (Min/Max 25/33)
191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4822
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       10
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   011   009   000    Old_age   Always       -       193684680
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       157 (133 76 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83781193947
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       78158256547


So, am I concerned for nothing. Does one just ignore SMART readouts on Seagate Enterprise drives, or did I get 4 drives from 2 different retailers that are all going bad within only a couple of weeks?

Much appreciated.

Crossposted here and here for more eyeballs, as server/enterprise/*nix stuff tends to get less traffic.
 
Last edited:

drescherjm

[H]F Junkie
Joined
Nov 19, 2008
Messages
14,863
I don't ignore all of the SMART data on these drives. I do only look at a few of the values.

This is a script I use for all drive models. For seagate drives I also ignore the Hardware_ECC_Recovered from my output.

Code:
#!/bin/bash
function process_device()
{
echo -n ${device} $(smartctl --all /dev/${device} | grep -e "User Capacity")
hdparm -I /dev/${device} | grep "al Number"
smartctl --all /dev/${device} | grep -e "Reallocated_Sector_Ct" -e "Current_Pending_Sector" -e "Offline_Uncorrectable" -e "UDMA_CRC_Error_Count" -e "Hardware_ECC_Recovered" -e "Command_Timeout" -e "Power_On_Hours"
smartctl --all /dev/${device} | grep FIRMWARE -C 10
echo
}
date
for a in /dev/sd?;
do
device=${a/\/dev\//}
process_device
done
 
D

Deleted member 245375

Guest
Looks fine to me, the Hardware_ECC_Recovered value is a natural side-effect count basically for how physical hard drives work, and it's just telling you that the ECC aspect (the error correction circuitry and subsystem) is working as it should. If the Offline_Uncorrectable value is anything other than 0 then it would be time to pay attention to the status more closely but that's fine as well.
 

Kuba1

n00b
Joined
Jan 9, 2020
Messages
1
Hi

I just came across this post when investigating my own setup made of seagate enterprise disks. I have setup of four ST8000NM0055 working in RAID10 and I noticed that the SMART attribute 191
G-Sense_Error_Rate in all 4 drives is slowly growing and I'm not sure what kind of vibrations are causing it - or are the sensors too sensitive. I only managed to narrow the problem down to random writes (more than random reads) in which heads move across the plates very often.
How did your investigation end? DId your disks survive? Could you show SMART attributes now?
Thank you!

PS. I attach dump of SMART of one of them

Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   064   044    Pre-fail  Always       -       154780
  3 Spin_Up_Time            0x0003   090   089   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       41
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   082   060   045    Pre-fail  Always       -       167547890
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       3183
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       41
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   052   040    Old_age   Always       -       33 (Min/Max 30/36)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       210
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       26
193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       5606
194 Temperature_Celsius     0x0022   033   048   000    Old_age   Always       -       33 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a   100   064   000    Old_age   Always       -       154780
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2558h+20m+13.661s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3389148449
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       60374597957
 

Zarathustra[H]

Fully [H]
Joined
Oct 29, 2000
Messages
31,221
Hi

I just came across this post when investigating my own setup made of seagate enterprise disks. I have setup of four ST8000NM0055 working in RAID10 and I noticed that the SMART attribute 191
G-Sense_Error_Rate in all 4 drives is slowly growing and I'm not sure what kind of vibrations are causing it - or are the sensors too sensitive. I only managed to narrow the problem down to random writes (more than random reads) in which heads move across the plates very often.
How did your investigation end? DId your disks survive? Could you show SMART attributes now?
Thank you!

PS. I attach dump of SMART of one of them

Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   064   044    Pre-fail  Always       -       154780
  3 Spin_Up_Time            0x0003   090   089   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       41
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   082   060   045    Pre-fail  Always       -       167547890
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       3183
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       41
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   052   040    Old_age   Always       -       33 (Min/Max 30/36)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       210
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       26
193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       5606
194 Temperature_Celsius     0x0022   033   048   000    Old_age   Always       -       33 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a   100   064   000    Old_age   Always       -       154780
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2558h+20m+13.661s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3389148449
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       60374597957

Just saw this post.

I'll have to check what mine look like tomorrow, but what I have learned when it comes to Seagate drives is to just ignore the raw_value. It is mostly meaningless.

It's kind of frustrating when coming from other brands where the raw_value actually means something, but it is what it is.

The important part is that the processed value is at 100, which is as good as it gets.

Mine have had no problems now after more than 2 years of 24/7 use in my server. I haven't looked at the smart data in a while though.
 
Top