how to check new hdd for health?

extrafuzzyllama · May 4, 2011

hi all i am rather new to setting up any type of server or storage system
but i picked up a few hitachi deskstar 3tb drives and wanted to know what is the best tools to check the health before using it in a raid setup?

like checking for bad sectors and that kind of thing

i am running mostly mac os in my home but i have a pc as well so software for either os would be great

i have another quick question
do hdds also get firmware updates?
and if so is there one for my drive the hitachi 3tb deskstar 7200?

drescherjm · May 4, 2011

If you are a windows user do a full format and after that full format look at the SMART status with a program like CrystalDiskInfo.

If you are comfortable with linux do a 4 pass badblocks read / write test and check the SMART status with smartctl

drescherjm · May 4, 2011

i have another quick question
do hdds also get firmware updates?

Sometimes. Check the vendor website. After putting valuable data on a drive I would not be doing any firmware updates or at least I would not do this on more than 1 disk at a time of a RAID 6 set.

extrafuzzyllama · May 4, 2011

I am mostly a mac user but the drives are going into a synology nas. Probably ext4.
I don't have a linux machine so I might have to get one of my older pcs and load it with linux.

drescherjm · May 4, 2011

I believe you should have badblocks and smartctl on the mac.

Also the nas could be linux based.

extrafuzzyllama · May 4, 2011

Lol this is alot more complicated then I thought it would be.
but its better to be safe then sorry.

drescherjm · May 4, 2011

I do this on every disk I put in a raid. This week I added 2 750 GB drives to a system to swap out a bad 750G drive and upgrade from raid 5 to 6 in the process. For this I verified that both drives added were trouble free. Today I will add the drives and reshape the array before kicking out an additional drive that is showing signs of being on its way out.

extrafuzzyllama · May 4, 2011

I am actually just setting up the nas since its new and so are the drives.
so hopefully that makes this process alot easier.
and yes the synology nas I bought is linux based.
It has its own hdd testing before it builds but I just want to run more tests to be completely sure.

Joe Average · May 4, 2011

Use the manufacturer's diagnostic and do a full scan (not just the short/quick test) of the drive and until that's done, don't trust anything else. Tools that just show S.M.A.R.T. info cannot be trusted - the only tool that will give the proper state of the drive is the manufacturer's diagnostic.

I've got a shoebox here with 12 different drives (some WDs, a Hitachi, two Samsungs, a Toshiba, and a Seagate) and they're all "dead" but the S.M.A.R.T. status for each one shows "green" and all clear - none of the drives pass the manufacturer's diagnostic test(s), however.

The only one you can "trust" is the tool made by the manufacturer of the drive to tell you that very information.

extrafuzzyllama · May 4, 2011

My drives didn't ship with a cd.
I can't find a download for my drive on the hitachi site.
I will look bit harder

drescherjm · May 4, 2011

Joe Average said:
Use the manufacturer's diagnostic and do a full scan (not just the short/quick test) of the drive and until that's done, don't trust anything else. Tools that just show S.M.A.R.T. info cannot be trusted - the only tool that will give the proper state of the drive is the manufacturer's diagnostic.

I've got a shoebox here with 12 different drives (some WDs, a Hitachi, two Samsungs, a Toshiba, and a Seagate) and they're all "dead" but the S.M.A.R.T. status for each one shows "green" and all clear - none of the drives pass the manufacturer's diagnostic test(s), however.

The only one you can "trust" is the tool made by the manufacturer of the drive to tell you that very information.

I guess everyone has their own opinion. I absolutely do not trust manufacturer tools because they tend to more aggressively protect the warranty over your data. I know because I have witnessed drives from Seagate and WDC both were very bad that passed the manufacturer tests. In the seagate test 1/3 of the drive was totally unreadable but the drive tools said the drive was fine. Also all the long test in these tools does is execute a SMART long test which you can easily do yourself with smartctl. And as far as the smart data being wrong or untrusted of all Samsung, Seagate, Hitachi and WDC only Seagate puts bogus info in the SMART raw data but even that is readable if you know how to read it. This comes from looking at 100s of drives over the last 15 years.

extrafuzzyllama · May 4, 2011

how long would badblocks take to run the test on my 3tb drives?

i am going to use a live cd which is probably going to be easier

how exactly do i run a 4 pass badblock test?

here is what i am going to do to run the test please correct me if there is a better way.

on my windows pc i am going to load a fedora live usb or cd and run badblocks that way
not sure how to run smartctl

do i format the drives first? i plan on using ext4 as the filesystem.

should i put the drive in the synology nas and let the nas format to ext4 and run its hdd tests then run the badblocks and smartctl tests on the pc running fedora live usb?

drescherjm · May 4, 2011

4 pass will take 30 to 40 hours. You can reduce the passes for less time.. I usually do 4 passes for my drives.

The following command should do a 2 pass read / write badblocks

Code:

badblocks -wsv -p2 /dev/sdX

where X = your drive (a, b , c ...)

use fdisk -l to and hdparm -I to determine what drive is what if you you have valuable data on some drives.

extrafuzzyllama · May 4, 2011

oh man that is a long time. lol
will the hdd be fine after being ON for 30-40 hrs?

drescherjm · May 4, 2011

Hard drives can handle the continuous reading and writing for much longer than that. However this is a good stress test. You can cut the time in half with the command I posted in my last reply. And as I said in the first reply look at the SMART data after the badblocks has finished.

Here is a script that I use to look at devices in my mdadm arrays.

https://github.com/drescherjm/jmdgentoooverlay/raw/master/Other/shell-scripts/examine_mdraid.sh

And here is an example output:

Code:

datastore2 shell-scripts # examine_mdraid.sh
sda     Serial Number:      3QK086XB
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       73
195 Hardware_ECC_Recovered  0x001a   034   017   000    Old_age   Always       -       227900446
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

sdb     Serial Number:      3QK07S9A
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       6
195 Hardware_ECC_Recovered  0x001a   032   019   000    Old_age   Always       -       142708189
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

sdc     Serial Number:      3QD0Q9VS
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
195 Hardware_ECC_Recovered  0x001a   061   051   000    Old_age   Always       -       219335794
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       2

sdd     Serial Number:      5QK0AEX8
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
195 Hardware_ECC_Recovered  0x001a   032   030   000    Old_age   Always       -       17051703
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

sdf     Serial Number:      3QD0Q1JF
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
195 Hardware_ECC_Recovered  0x001a   062   056   000    Old_age   Always       -       230992581
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

sdg     Serial Number:      3QK00P56
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
195 Hardware_ECC_Recovered  0x001a   042   030   000    Old_age   Always       -       127051287
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

sdh     Serial Number:      3QK09QDS
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
195 Hardware_ECC_Recovered  0x001a   026   012   000    Old_age   Always       -       45597342
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

sdi     Serial Number:      9QK0XTCQ
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       6
195 Hardware_ECC_Recovered  0x001a   033   020   000    Old_age   Always       -       76658567
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

Notes: These are seagate drives so the Hardware_ECC_Recovered is bogus. Its actually masked but I did not bother to decode it in this example. These are the SMART parameters that I hold the most important.

I plan to kick /dev/sda out of the array and further testing it after adding two more disks that just have finished their badblocks test successfully. Note the other drives with 6 or so bad sectors I am not so concerned with. I am also not extreamly concerned with /dev/sda since the 73 number has been pretty stable for a few months. A second drive I kicked from the array this week started a 4 bad sectors and went to over 200 in less than 3 weeks. That one I tested rigorously with badblocks and it ended up recording 5000 SATA events along with over 200 unrecoverable sectors. I was able to get that drive to fail seatools testing at that point and I have it out on RMA.

extrafuzzyllama · May 4, 2011

ok thanks for help

when running these tests i must connect the hdd via sata right?

drescherjm · May 4, 2011

when running these tests i must connect the hdd via sata right?

Yes. You can however do many in parallel. For example on a new raid test I have run 12 simultaneous instances of badblocks over a weekend..

extrafuzzyllama · May 6, 2011

Is there any way to run badblocks on one hdd and speed up test?

drescherjm · May 6, 2011

Do less passes. However 1 drive versus many should not make a difference unless you have a slow machine that can not keep up with the bandwidth of the drives being tested. For me I believe the first 8 dives did not have any negative performance effect on the others.

extrafuzzyllama · May 7, 2011

I am lost
I installed the 3tb hard drive via sata
And loaded up fedora live usb and I don't see the hdd being detected in my computer window.
in the bios it sees the hdd
I need some assistance please.
And how would I unmount the hdd once detected?
And I really only want to run 1 read/write pass

jbraband · May 8, 2011

for reference: i just ran badblocks -wsv /dev/sdb (4pass) on a new Hitachi 2TB 5K3000 and the final elapsed time was 44:36:12

with the procedure tested, i now get to run 4 more 2-day tests. thanks to drescherjm for the note of running in parallel, that saved me 6 days

drescherjm · May 8, 2011

I am lost
I installed the 3tb hard drive via sata
And loaded up fedora live usb and I don't see the hdd being detected in my computer window.
in the bios it sees the hdd
I need some assistance please.

You need to open a shell window ( press ctrl-alt-f1 or open a gnome terminal or konsole window)

Then type

fdisk -l /dev/sd?

and possibly

hdparm -I /dev/sdX

or

smartctl --all /dev/sdX

where X is a device ( a, b, c ... ) f

to see what drives are in your system and make sure you have the correct one.

You may need to use sudo before all commands to gain access to the command because by default these will probably be not available to whatever user is in the GUI.

Then after you determine what drive you want

badblocks -wsv /dev/sdX

to do a 4 pass read write badblocks test.

use the -p #

paramater to reduce the number of passes if you want.

To test more than 1 drive at a time either open more shell windows or use the screen command. If you type screen in a shell (and screen is installed) you can create many shell sessions that you can switch between.

ctl-a ctl-c

will create a new shell session

ctl-a ctl-n

will switch to the next shell session

For beginners its probably best to just open more windows.

EDIT:
Also remember that badblocks -wsv will overwrite your disk destroying all data on the disk so do not do that on a drive that has valuable data.

extrafuzzyllama · May 8, 2011

Once test is done and if badblocks are found what do I do next? And if none are found can I then format the drive to ext4 or other file system?

how to check new hdd for health?

Limp Gawd

[H]F Junkie

[H]F Junkie

Limp Gawd

[H]F Junkie

Limp Gawd

[H]F Junkie

Limp Gawd

Ad Blocker - Banned

Limp Gawd

[H]F Junkie

Limp Gawd

[H]F Junkie

Limp Gawd

[H]F Junkie

Limp Gawd

[H]F Junkie

Limp Gawd

[H]F Junkie

Limp Gawd

n00b

[H]F Junkie

Limp Gawd