Reconciling smartctl with hdrecover

I started getting a few log messages about sectors which could not be written on an older hard drive, so I investigated with smartctl.
The short test produced zero errors, but the long test stopped at 30% done with sector not readable.
After reading about the issue I found a reference to hdrecover which can automatically fix sector issues.
I ran this programme and indeed it found a number of poor sectors, it was able to fix some of them and others had to be zeroed out.
Repeats of hdrecover do not produce new errors, all looks clear.
I have run fsck on the drive using the rescue disk, all runs clear.
The disk is not in a critical role so I am leaving it in place, more for the sake of education than anything else.

The issue is that now hdrecover reports zero errors/bad sectors, smartctl is not in step with the situation, it still stops at 30% done.

# smartctl -l selftest /dev/sda
smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (SUSE RPM)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       70%     43232         22513253
# 2  Extended offline    Completed: read failure       70%     43231         22513253
# 3  Extended offline    Completed: read failure       70%     43217         22511097
# 4  Short offline       Completed without error       00%     43217         -
# 5  Extended offline    Completed: read failure       70%     43206         22511082
# 6  Short offline       Completed without error       00%     43206         -
# 7  Extended offline    Completed: read failure       70%     43205         22511082

The LBA seems to change, as if each time it runs smartctl can maybe resolve a step. But not always.
It would be nice to have smartctl not see any errors, in tune with hdrecover.

Your hard disk is flaky/dying.
This cannot be fixed by software.

Replace it with a new one.

That said, a hard disk does have some spare sectors where bad ones can be remapped to, but the amount is limited only.
In your case this space seems to be exhausted, but you still have a lot of bad sectors or even get new ones.

The output of “smartctl -HA /dev/sda” should show you how many bad sectors have been re-allocated already.
Or use “smartctl --all /dev/sda” to show all information.

Oh, and you have to run long selftests manually.

smartctl --test=long /dev/sda

If you run “smartctl -l selftest” it only shows the results of previous self-tests. That won’t change of course, even if all your bad sectors have been “fixed” by remapping them now.

Indeed the drive is dying, I can however get some educational value from it as it teeters.
I have been running the long tests as smartctl -t long /dev/sda, which I assume is equivalent to your suggested command.
The status is:

# smartctl -HA /dev/sda
smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (SUSE RPM)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0029   100   253   020    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   081   080   020    Pre-fail  Always       -       2496
  4 Start_Stop_Count        0x0032   099   099   008    Old_age   Always       -       1083
  5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   001   023    Pre-fail  Always   In_the_past 0
  9 Power_On_Hours          0x0012   035   035   001    Old_age   Always       -       43234
 10 Spin_Retry_Count        0x0026   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0013   100   100   020    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   008    Old_age   Always       -       869
 13 Read_Soft_Error_Rate    0x000b   100   077   023    Pre-fail  Always       -       0
194 Temperature_Celsius     0x0022   088   079   042    Old_age   Always       -       32
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       378251
196 Reallocated_Event_Count 0x0010   100   099   020    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0032   100   100   020    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   099   099   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x001a   200   200   000    Old_age   Always       -       0

If I’m reading this correctly, item 5 reallocated sector count indicates that there was a stock of 100 free sectors and it has used all of them up.

For now I am going to ignore smartctl and occasionally repeat hdrecover to see if and when new bad sectors develop, just to see what happens.
Thanks for your suggestions.

No.
Actually NO sector sector has been reallocated yet. You’ll have to look at the RAW_VALUE.

And “Offline_Uncorrectable” value (which is 1 in your case) means, that there is indeed a bad sector that has been found during the long (“offline”) self-test, but hasn’t been reallocated/remapped yet.

Sectors are only remapped on write access. So running “badblocks -n /dev/sda1” or similar should remap bad sectors.
See “man badblocks”:

      -n     Use  non-destructive  read-write  mode.   By  default  only  a non-
              destructive read-only test is done.  This option must not  be  com-
              bined with the -w option, as they are mutually exclusive.

Depending on the file system you use, you should also be able to specify a list of bad blocks that are not to be used for data.

But as the actual bad sectors seem to fluctuate, you probably will have problems again anyway.

On Tue 11 Nov 2014 01:16:01 PM CST, colbec wrote:

Indeed the drive is dying, I can however get some educational value from
it as it teeters.
I have been running the long tests as smartctl -t long /dev/sda, which I
assume is equivalent to your suggested command.
The status is:

smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (SUSE RPM) <====

Hi
Also note the version your using on a old release of openSUSE (10.?) may
not have the features of newer versions of smartctl…


Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890)
SUSE Linux Enterprise Desktop 12 GNOME 3.10.1 Kernel 3.12.28-4-default
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!

Indeed you are right, it is openSUSE 11.4, but the drive itself predates the software by a large number of years so it may lack some of the hardware sophistication that even that smartctl is unable to leverage.

It’s all good learning experience as I work on building the replacement drive and its functionality.