I started getting a few log messages about sectors which could not be written on an older hard drive, so I investigated with smartctl.
The short test produced zero errors, but the long test stopped at 30% done with sector not readable.
After reading about the issue I found a reference to hdrecover which can automatically fix sector issues.
I ran this programme and indeed it found a number of poor sectors, it was able to fix some of them and others had to be zeroed out.
Repeats of hdrecover do not produce new errors, all looks clear.
I have run fsck on the drive using the rescue disk, all runs clear.
The disk is not in a critical role so I am leaving it in place, more for the sake of education than anything else.
The issue is that now hdrecover reports zero errors/bad sectors, smartctl is not in step with the situation, it still stops at 30% done.
# smartctl -l selftest /dev/sda
smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (SUSE RPM)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 70% 43232 22513253
# 2 Extended offline Completed: read failure 70% 43231 22513253
# 3 Extended offline Completed: read failure 70% 43217 22511097
# 4 Short offline Completed without error 00% 43217 -
# 5 Extended offline Completed: read failure 70% 43206 22511082
# 6 Short offline Completed without error 00% 43206 -
# 7 Extended offline Completed: read failure 70% 43205 22511082
The LBA seems to change, as if each time it runs smartctl can maybe resolve a step. But not always.
It would be nice to have smartctl not see any errors, in tune with hdrecover.
Your hard disk is flaky/dying.
This cannot be fixed by software.
Replace it with a new one.
That said, a hard disk does have some spare sectors where bad ones can be remapped to, but the amount is limited only.
In your case this space seems to be exhausted, but you still have a lot of bad sectors or even get new ones.
The output of “smartctl -HA /dev/sda” should show you how many bad sectors have been re-allocated already.
Or use “smartctl --all /dev/sda” to show all information.
Oh, and you have to run long selftests manually.
smartctl --test=long /dev/sda
If you run “smartctl -l selftest” it only shows the results of previous self-tests. That won’t change of course, even if all your bad sectors have been “fixed” by remapping them now.
Indeed the drive is dying, I can however get some educational value from it as it teeters.
I have been running the long tests as smartctl -t long /dev/sda, which I assume is equivalent to your suggested command.
The status is:
# smartctl -HA /dev/sda
smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (SUSE RPM)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0029 100 253 020 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 081 080 020 Pre-fail Always - 2496
4 Start_Stop_Count 0x0032 099 099 008 Old_age Always - 1083
5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 001 023 Pre-fail Always In_the_past 0
9 Power_On_Hours 0x0012 035 035 001 Old_age Always - 43234
10 Spin_Retry_Count 0x0026 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 008 Old_age Always - 869
13 Read_Soft_Error_Rate 0x000b 100 077 023 Pre-fail Always - 0
194 Temperature_Celsius 0x0022 088 079 042 Old_age Always - 32
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 378251
196 Reallocated_Event_Count 0x0010 100 099 020 Old_age Offline - 0
197 Current_Pending_Sector 0x0032 100 100 020 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 099 099 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x001a 200 200 000 Old_age Always - 0
If I’m reading this correctly, item 5 reallocated sector count indicates that there was a stock of 100 free sectors and it has used all of them up.
For now I am going to ignore smartctl and occasionally repeat hdrecover to see if and when new bad sectors develop, just to see what happens.
Thanks for your suggestions.
No.
Actually NO sector sector has been reallocated yet. You’ll have to look at the RAW_VALUE.
And “Offline_Uncorrectable” value (which is 1 in your case) means, that there is indeed a bad sector that has been found during the long (“offline”) self-test, but hasn’t been reallocated/remapped yet.
Sectors are only remapped on write access. So running “badblocks -n /dev/sda1” or similar should remap bad sectors.
See “man badblocks”:
-n Use non-destructive read-write mode. By default only a non-
destructive read-only test is done. This option must not be com-
bined with the -w option, as they are mutually exclusive.
Depending on the file system you use, you should also be able to specify a list of bad blocks that are not to be used for data.
But as the actual bad sectors seem to fluctuate, you probably will have problems again anyway.
On Tue 11 Nov 2014 01:16:01 PM CST, colbec wrote:
Indeed the drive is dying, I can however get some educational value from
it as it teeters.
I have been running the long tests as smartctl -t long /dev/sda, which I
assume is equivalent to your suggested command.
The status is:
smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (SUSE RPM) <====
Hi
Also note the version your using on a old release of openSUSE (10.?) may
not have the features of newer versions of smartctl…
–
Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890)
SUSE Linux Enterprise Desktop 12 GNOME 3.10.1 Kernel 3.12.28-4-default
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!
Indeed you are right, it is openSUSE 11.4, but the drive itself predates the software by a large number of years so it may lack some of the hardware sophistication that even that smartctl is unable to leverage.
It’s all good learning experience as I work on building the replacement drive and its functionality.