Problematic drive: bad magic number invalid superblock, I/O errors, just couple of bad sectors, SW that tries to read sectors constantly?

postcd · Sep 3, 2020

This issue has a solution described a few posts below!

Hello,
the Linux system drive (HDD) has two bad blocks (or parts within it) - HD Tune and Minitool partition wizard reported it and SMART shows 2 pending sectors and 4 offline uncorrectable.
If i understand correctly, i may want to let some software detect unreadable/unwriteable sectors and try to constantly use these in order to turn them offline uncorrectable so the OS skip these?
Or i do not need to care as Windows and Linux automatically use different sector when it fails to write to one?
So what i need to do except ideally replacing the drive?

The drive shows weird symptoms:

dmesg

Code:

[  580.537308] print_req_error: critical medium error, dev sdc, sector 261314564
[  580.537311] Buffer I/O error on dev sdc2, logical block 4, async page read
[  582.378165] sd 11:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  582.378173] sd 11:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
[  582.378175] sd 11:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
[  582.378178] sd 11:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 0f 93 58 04 00 00 01 00
[  582.378180] print_req_error: critical medium error, dev sdc, sector 261314564
[  582.378182] Buffer I/O error on dev sdc2, logical block 4, async page read
[  584.270037] sd 11:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  584.270045] sd 11:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
[  584.270048] sd 11:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
[  584.270052] sd 11:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 0f 93 58 04 00 00 01 00
[  584.270054] print_req_error: critical medium error, dev sdc, sector 261314564
[  584.270057] Buffer I/O error on dev sdc2, logical block 4, async page read

I already tried replacing cable and messing with BIOS SATA modes IDE and AHCI. But other, Windows drive works good. Strange is that the HDD led nonstop lighting in case of this drive and when i boot live system from flash drive, this system tries to access this weird HDD which cause the live system to stop booting. I came to the point to bootup the system from weird HDD, but there was I/O errors and non stop lighting LED.
In Linux recovery mode, it found unexpected inconsistency and required me to run fsck manually which failed: details.

fsck /dev/sdc

Code:

fsck from util-linux 2.29.2
e2fsck 1.43.4 (31-Jan-2017)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/sdc

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
or
    e2fsck -b 32768 <device>

Found a dos partition table in /dev/sdc

Found some command that once helped me to fix similar superblock errors:
e2fsck -f -y -v -C 0 /dev/sdc3

i ran gparted which shown Input/output error during read on /dev/sdc

But ended up opening after ignoring these messages and i selected only EXT* filesystem partition on that drive and clicked to check it. Result is only filesystem errors and no bad blocks

smartctl -a /dev/sdd

Code:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       2217
  3 Spin_Up_Time            0x0027   142   133   021    Pre-fail  Always       -       3866
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1798
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -       19154
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1766
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       144
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1653
194 Temperature_Celsius     0x0022   100   097   000    Old_age   Always       -       43
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       4
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       4

toast0 · Sep 6, 2020

There seems to be a bit of confusion about the partition number. Your errors are on sdc2, but you're talking about sdc3, and you ran fsck on the whole drive, then you gave us smartctl output for sdd.

Do you care about the data on the drive right now? If so, best thing to do is use ddrescue to make an image of it, as best it can before you mess with anything else.

If you want to give the drive another chance, you nee to write to the sectors it can't read anymore. If it was a fluke, a fresh write will be good. Otherwise, a fresh write might convince it to reallocate the sector. If the drive is going to be a pain, a fresh write will succeed, but that sector still won't be readable. At that point, I would partition around it, if you really can't afford a new drive. Either way --- a good reminder that you need an effective backup plan, because hard drives fail. If you actively monitor SMART and have a data safety plan, I'm pretty ok with a couple bad sectors; when I ran HDDs in storage servers, we would run them to 100 reallocated sectors before getting them replaced; for personal use, I would replace around 10, because I don't have good monitoring or a great safety plan, or ready spares.

It seems that at least one of the bad sectors is righy near the front of partition sdc2, so it would be pretty easy to write to that, with something like dd bs=512 (or 4096) count=1 skip=2 ... But writing to partitions like that is dangerous to your data, especially if I got the parameters wrong.

postcd · Sep 7, 2020

toast0 said:
There seems to be a bit of confusion.

Sorry, i was playing with the drive, rebooting, attaching it as a USB so that is why i written different sdx, sdx* - but it is still one drive.

toast0 said:
Do you care about the data

Thanks for reminding, yes, i have made the backup.

toast0 said:
If you want to give the drive another chance, you need to write to the sectors it can't read anymore. ... It seems that at least one of the bad sectors is righy near the front of partition sdc2 ... something like dd bs=512 (or 4096) count=1 skip=2 ... that is dangerous

Yes i want to learn how to find bad sectors and let them me properly marked as either good or uncorrectable so the OS (Windows and Linux) do not touch these sectors during OS installation and use.
dd is not much noob friendly utility and i need to know exact location of a problematic sectors and i do not know it, can you suggest how to exactly learn the location/numbers and explain the dd parameter you are using so the dd write to that sector (after i made a drive backup using for example ddrescue - thanks for the suggestion)? Or you know other tool that does this automatically - finding problem sectors and write to these (i am aware it means possible data loose).

toast0 said:
At that point, I would partition around it

You mean it is again done by "parted" by manually typing command with right parameters including sector numbers? no gparted or utility that is more noob friendly or made for that purpose?

toast0 · Sep 7, 2020

postcd said:
can you suggest how to exactly learn the location/numbers and explain the dd parameter you are using so the dd write to that sector

postcd said:
[ 582.378180] print_req_error: critical medium error, dev sdc, sector 261314564 [ 582.378182] Buffer I/O error on dev sdc2, logical block 4, async page read

The log is helpfully giving you the sector number, both from the start of the drive (261314564 ) and the start of the partition (sdc2, block 4! not block 2 like I typed earliery). Up to you to confirm if sectors/blocks start at zero or one (I think they start at zero). And what size your sectors are.

You could do dd if=/dev/sdc2 of=/dev/null bs=512 count=1 skip=4 and see if it retriggers that error; that would be a good confirmation. You can use a GUI partition tool if you want, but I don't think they have easy ways to tell them to skip sectors. There is a badblocks tool that might help. Not totally sure, it's man page/documentation should be helpful.

Hope these help you get started.

drescherjm · Sep 8, 2020

I would do a 4 pass babdlocks data destructive test

badblocks -wsv device

Caution: badblocks will not care if the drive you choose has data on it. It will blindly do as you tell it (to overwrite every single sector with a pattern). Make absolutely sure you have the correct drive and you don't care about data existing on the drive.

postcd · Sep 26, 2020

Hi, i think this issue is SOLVED now thanks to your suggestions. This is what i did on Linux (not have?, try live bootable USB Linux):

- sdX is my faulty drive, replace by yours, see "lsblk" or "fdisk -l"
- Do this on your risk and possibly backup by "ddrescue" or other SW if drive contains precious data not backed up elsewhere.

$ sudo badblocks -vs /dev/sdX > 160GB_badblocks_bad_sectors
$ cat 160GB_badblocks_bad_sectors

83416588
83416589
83416590
83416591
130657280
130657281
130657282
130657283

$ smartctl -a /dev/sdX

...shown 2 pending sectors, 4 offline uncorectable...

Click to expand...

$ for block in $(cat 160GB_badblocks_bad_sectors);do sudo dd if=/dev/zero of=/dev/sdb bs=1024 count=1 seek=$block;done
(bs=1024 is necessary since this is value used by badblocks)

1+0 vstoupivších záznamů
1+0 vystoupivších záznamů
1024 bajtů (1,0 kB, 1,0 KiB) zkopírováno, 4,59708 s, 0,2 kB/s
1+0 vstoupivších záznamů
1+0 vystoupivších záznamů
1024 bajtů (1,0 kB, 1,0 KiB) zkopírováno, 0,0149379 s, 68,6 kB/s
1+0 vstoupivších záznamů
1+0 vystoupivších záznamů
1024 bajtů (1,0 kB, 1,0 KiB) zkopírováno, 0,00450658 s, 227 kB/s
1+0 vstoupivších záznamů
1+0 vystoupivších záznamů
1024 bajtů (1,0 kB, 1,0 KiB) zkopírováno, 0,00585185 s, 175 kB/s
1+0 vstoupivších záznamů
1+0 vystoupivších záznamů
1024 bajtů (1,0 kB, 1,0 KiB) zkopírováno, 0,00143635 s, 713 kB/s
1+0 vstoupivších záznamů
1+0 vystoupivších záznamů
1024 bajtů (1,0 kB, 1,0 KiB) zkopírováno, 0,000484465 s, 2,1 MB/s
1+0 vstoupivších záznamů
1+0 vystoupivších záznamů
1024 bajtů (1,0 kB, 1,0 KiB) zkopírováno, 0,0451608 s, 22,7 kB/s
1+0 vstoupivších záznamů
1+0 vystoupivších záznamů
1024 bajtů (1,0 kB, 1,0 KiB) zkopírováno, 0,00502066 s, 204 kB/s

after that, "smartctl -a /dev/sdX" shows 0 pending sectors and 4 uncorrectable => like the writing succeeded and caused to mark 2 pending blocks as OK.

I then did short drive test: sudo smartctl -t short /dev/sdX
and after 2 minutes found in "sudo smartctl -a /dev/sdX" output that this test found no read errors unlike the test i have ran before writing to the sectors using "dd":

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 19684 -
# 2 Extended offline Completed: read failure 90% 19684 166833180
# 3 Short offline Completed: read failure 90% 19684 166833180

full SMART:

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       2449
  3 Spin_Up_Time            0x0027   137   133   021    Pre-fail  Always       -       4108
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1817
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -       19684
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1768
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       145
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1671
194 Temperature_Celsius     0x0022   105   095   000    Old_age   Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       4
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       4

and the drive is booting OS without any apparent issue and dmesg command does not look like showing any related issue. Thank you for help solving this. Feedback welcome.

Problematic drive: bad magic number invalid superblock, I/O errors, just couple of bad sectors, SW that tries to read sectors constantly?

postcd

Weaksauce

toast0

2[H]4U

postcd

Weaksauce

toast0

2[H]4U

drescherjm

[H]F Junkie

postcd

Weaksauce