Scary SMART Values on New Seagate Enterprise Drives?

Discussion in 'SSDs & Data Storage' started by Zarathustra[H], Nov 7, 2017.

  1. Zarathustra[H]

    Zarathustra[H] Official Forum Curmudgeon

    Messages:
    27,642
    Joined:
    Oct 29, 2000
    Hey all,

    So, I'm 4 disks into my "swap all 12 drives in my NAS with larger drives and resilver to grow my ZFS pool" project, and decided to go with 10TB Seagate Helium Enterprise drives (ST10000NM0016)

    The four drives I have thus far come from two orders of two drives each. Two from Newegg and two from Amazon.

    All drives passed the following tests before being resilvered into the pool:
    - SMART Short test
    - SMART Conveyance Test
    - Badblocks write test (all four test patterns, taking ~5 days)

    I've done some reading on this in the past where it was suggested that one pretty much ignore the "RAW VALUES" in SMART readouts from Seagate drives as they probably don't mean what you think they mean, and instead use Seatools for any diagnostics.

    The problems I have with Seatools:

    1.) The Linux version is old and not maintained and didn't appear to give me any useful information.

    2.) The Windows version might be more fully featured, but it doesnt seem to recognize Seagate hard drives as true Seagate hard drives when sitting in a USB dock, and I don't have a box with easily accessible SATA ports I can stick one in to right now

    3.) There is a DOS version, but that requires taking my server offline and booting from a Freedos USB stick. My server may not be production, but it is more "home production" than lab, so this would be really inconvenient.


    Why I am concerned:

    I know "ignore the RAW VALUE field" is what I've found when googling inthe past, but what about the three digit weighted values? Are those to be ignored as well?

    Just look at some of these, notably the "Hardware_ECC_Recovered" looks pretty scary on all of these:

    Disk 1:
    Code:
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   075   064   044    Pre-fail  Always       -       32160040
      3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       4
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   081   060   045    Pre-fail  Always       -       129401214
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       494 (1 186 0)
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       3
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    189 High_Fly_Writes         0x003a   088   088   000    Old_age   Always       -       12
    190 Airflow_Temperature_Cel 0x0022   071   055   040    Old_age   Always       -       29 (Min/Max 26/35)
    191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4187
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       5
    193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       30
    194 Temperature_Celsius     0x0022   029   045   000    Old_age   Always       -       29 (0 19 0 0 0)
    195 Hardware_ECC_Recovered  0x001a   027   003   000    Old_age   Always       -       32160040
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       491 (144 32 0)
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83916735051
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       83452883922
    Disk 2:
    Code:
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   080   066   044    Pre-fail  Always       -       91957936
      3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       4
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   079   061   045    Pre-fail  Always       -       81905155
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       283 (213 63 0)
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       4
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    189 High_Fly_Writes         0x003a   090   090   000    Old_age   Always       -       10
    190 Airflow_Temperature_Cel 0x0022   072   067   040    Old_age   Always       -       28 (Min/Max 20/33)
    191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4777
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2
    193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       16
    194 Temperature_Celsius     0x0022   028   040   000    Old_age   Always       -       28 (0 20 0 0 0)
    195 Hardware_ECC_Recovered  0x001a   006   002   000    Old_age   Always       -       91957936
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       273 (12 162 0)
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83844942895
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       82966201892
    Disk 3:
    Code:
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   079   064   044    Pre-fail  Always       -       70937936
      3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       2
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   077   060   045    Pre-fail  Always       -       53737869
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       158 (50 181 0)
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       2
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
    190 Airflow_Temperature_Cel 0x0022   071   067   040    Old_age   Always       -       29 (Min/Max 24/33)
    191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4720
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
    193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       14
    194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 21 0 0 0)
    195 Hardware_ECC_Recovered  0x001a   007   006   000    Old_age   Always       -       70937936
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       157 (185 136 0)
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83754676019
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       78155955092
    Disk 4:
    Code:
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   083   065   044    Pre-fail  Always       -       193684680
      3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       2
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   077   061   045    Pre-fail  Always       -       54155047
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       158 (142 118 0)
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       2
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   071   067   040    Old_age   Always       -       29 (Min/Max 25/33)
    191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4822
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
    193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       10
    194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 21 0 0 0)
    195 Hardware_ECC_Recovered  0x001a   011   009   000    Old_age   Always       -       193684680
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       157 (133 76 0)
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       83781193947
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       78158256547

    So, am I concerned for nothing. Does one just ignore SMART readouts on Seagate Enterprise drives, or did I get 4 drives from 2 different retailers that are all going bad within only a couple of weeks?

    Much appreciated.

    Crossposted here and here for more eyeballs, as server/enterprise/*nix stuff tends to get less traffic.
     
    Last edited: Nov 7, 2017
  2. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    14,199
    Joined:
    Nov 19, 2008
    I don't ignore all of the SMART data on these drives. I do only look at a few of the values.

    This is a script I use for all drive models. For seagate drives I also ignore the Hardware_ECC_Recovered from my output.

    Code:
    #!/bin/bash
    function process_device()
    {
    echo -n ${device} $(smartctl --all /dev/${device} | grep -e "User Capacity")
    hdparm -I /dev/${device} | grep "al Number"
    smartctl --all /dev/${device} | grep -e "Reallocated_Sector_Ct" -e "Current_Pending_Sector" -e "Offline_Uncorrectable" -e "UDMA_CRC_Error_Count" -e "Hardware_ECC_Recovered" -e "Command_Timeout" -e "Power_On_Hours"
    smartctl --all /dev/${device} | grep FIRMWARE -C 10
    echo
    }
    date
    for a in /dev/sd?;
    do
    device=${a/\/dev\//}
    process_device
    done
    
     
  3. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    14,199
    Joined:
    Nov 19, 2008
    I have the most faith in this
     
  4. Tiberian

    Tiberian DILLIGAFuck

    Messages:
    5,759
    Joined:
    Feb 12, 2012
    Looks fine to me, the Hardware_ECC_Recovered value is a natural side-effect count basically for how physical hard drives work, and it's just telling you that the ECC aspect (the error correction circuitry and subsystem) is working as it should. If the Offline_Uncorrectable value is anything other than 0 then it would be time to pay attention to the status more closely but that's fine as well.