Reusing a faulty hard disk

Hi all,
I have a hard disk that had in the past many hardware problems.
I attach you the smart output

smartctl -a /dev/sda
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.1.10-1.16-default] (SUSE RPM)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Scorpio Blue Serial ATA
Device Model:     WDC WD3200BEVT-22ZCT0
Serial Number:    WD-WXF0A99P7681
LU WWN Device Id: 5 0014ee 20375eb73
Firmware Version: 11.01A11
User Capacity:    320,072,933,376 bytes [320 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.5, 3.0 Gb/s                                                                                                                                                                                             
Local Time is:    Tue Mar 19 08:17:11 2013 EET                                                                                                                                                                                   
SMART support is: Available - device has SMART capability.                                                                                                                                                                       
SMART support is: Enabled                                                                                                                                                                                                        
                                                                                                                                                                                                                                 
=== START OF READ SMART DATA SECTION ===                                                                                                                                                                                         
SMART overall-health self-assessment test result: PASSED                                                                                                                                                                         
                                                                                                                                                                                                                                 
General SMART Values:                                                                                                                                                                                                            
Offline data collection status:  (0x00) Offline data collection activity                                                                                                                                                         
                                        was never started.                                                                                                                                                                       
                                        Auto Offline Data Collection: Disabled.                                                                                                                                                  
Self-test execution status:      (   0) The previous self-test routine completed                                                                                                                                                 
                                        without error or no self-test has ever                                                                                                                                                   
                                        been run.                                                                                                                                                                                
Total time to complete Offline                                                                                                                                                                                                   
data collection:                ( 9960) seconds.                                                                                                                                                                                 
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 118) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   198   198   051    Pre-fail  Always       -       23359
  3 Spin_Up_Time            0x0027   184   183   021    Pre-fail  Always       -       1758
  4 Start_Stop_Count        0x0032   096   096   000    Old_age   Always       -       4244
  5 Reallocated_Sector_Ct   0x0033   186   186   140    Pre-fail  Always       -       111
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       7379
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       3402
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       75
193 Load_Cycle_Count        0x0032   124   124   000    Old_age   Always       -       228553
194 Temperature_Celsius     0x0022   114   074   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   184   184   000    Old_age   Always       -       16
197 Current_Pending_Sector  0x0032   198   198   000    Old_age   Always       -       112
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0009   100   253   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 22858 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 22858 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2c 7f e4 40  Error: WP at LBA = 0x00e47f2c = 14974764

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 10 e8 29 7c 01 00      03:46:29.525  WRITE FPDMA QUEUED
  61 08 08 98 28 7a 01 00      03:46:29.524  WRITE FPDMA QUEUED
  61 08 f0 68 3f 53 06 00      03:46:29.524  WRITE FPDMA QUEUED
  61 08 c8 a8 b4 81 01 00      03:46:29.524  WRITE FPDMA QUEUED
  61 08 b0 b8 29 7c 01 00      03:46:29.524  WRITE FPDMA QUEUED

Error 22857 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2c 7f e4 40  Error: UNC at LBA = 0x00e47f2c = 14974764

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 28 80 08 7f e4 24 00      03:46:25.969  READ FPDMA QUEUED
  60 40 78 ba 80 91 12 00      03:46:25.969  READ FPDMA QUEUED
  60 80 70 60 68 25 0e 00      03:46:25.969  READ FPDMA QUEUED
  61 03 68 60 36 a3 09 00      03:46:25.969  WRITE FPDMA QUEUED
  60 40 60 10 cb e8 01 00      03:46:25.969  READ FPDMA QUEUED

Error 22856 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2c 7f e4 40  Error: WP at LBA = 0x00e47f2c = 14974764

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 18 a0 28 7a 01 00      03:46:22.588  WRITE FPDMA QUEUED
  61 08 10 b0 32 7b 01 00      03:46:22.588  WRITE FPDMA QUEUED
  61 08 08 90 28 7a 01 00      03:46:22.587  WRITE FPDMA QUEUED
  61 08 00 98 28 7a 01 00      03:46:22.587  WRITE FPDMA QUEUED
  61 01 f8 c8 d8 ff 02 00      03:46:22.586  WRITE FPDMA QUEUED

Error 22855 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2c 7f e4 40  Error: UNC at LBA = 0x00e47f2c = 14974764

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 28 80 08 7f e4 24 00      03:46:19.106  READ FPDMA QUEUED
  60 38 78 3a 80 91 12 00      03:46:19.105  READ FPDMA QUEUED
  60 80 70 10 67 25 0e 00      03:46:19.105  READ FPDMA QUEUED
  60 20 68 40 36 dd 01 00      03:46:19.105  READ FPDMA QUEUED
  60 38 60 62 e0 86 01 00      03:46:19.105  READ FPDMA QUEUED

Error 22854 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2c 7f e4 40  Error: UNC at LBA = 0x00e47f2c = 14974764

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 40 0e 07 a7 01 00      03:46:15.868  READ FPDMA QUEUED
  61 08 38 a0 28 7a 01 00      03:46:15.868  WRITE FPDMA QUEUED
  60 1b 30 16 07 a7 01 00      03:46:15.868  READ FPDMA QUEUED
  61 08 28 98 32 7b 01 00      03:46:15.867  WRITE FPDMA QUEUED
  60 28 20 08 7f e4 24 00      03:46:15.867  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       60%      7378         31108065

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I have recovered the disk to a new one with dd_rescue so the old disk is left unused. I wanted to know if I could do some low-level format and get rid of the bad areas so I could keep him as sort of disk moving files between systems.

I would like to thank you in advance for your help

Regards
Alex

The ability to “Low Level Format” most hard disks no longer really works and is no better than any format. The best you can do is to visit the manufacturer site for this drive and download and use any specific tools they may provide. Many may requre Windows but some may be bootable and load their own OS. You just got to go to their Web site and see what your options may be. If you turn on SMART in your PC setup, and then partition & format the hard drive, may give you an idea of what part of the disk is bad and you could decide to just not use the area. But even that is kind of weird with how the internal setup tries to map out bad places. In the end, it just best to get rid of it and move on when you can in my opinion, but do visit the Web site to see if any other options might exist.

Thank You,

On Fri, 02 Aug 2013 20:06:01 +0000, alaios wrote:

> I have recovered the disk to a new one with dd_rescue so the old disk is
> left unused. I wanted to know if I could do some low-level format and
> get rid of the bad areas so I could keep him as sort of disk moving
> files between systems.

I wouldn’t use it for anything critical. It’s been my experience that
when a drive fails, it fails. It doesn’t get better.

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

SponRight is a commercial program that was always pretty good fixing disks but it’s cost is as much as a new drive these days. Soooo…

If there is any method of recovering the drive your most likely to find it by googling western digital remapping bad blocks. One of Western Digitals tools might tell you where the bad blocks are - if that’s the problem.

Looking at the smart output I might be inclined to look at the drives life specifications. It doesn’t seem to have lasted very long. They specify that in a number of ways. Power management and things like desktop indexing and type of machine use can have a distinct bearing on which life aspect is most relevant. I think I would be complaining to them if any of the figures are significantly short.

John

On 2013-08-02 22:06, alaios wrote:
>
> Hi all,
> I have a hard disk that had in the past many hardware problems.
> I attach you the smart output

> Code:
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 5 Reallocated_Sector_Ct 0x0033 186 186 140 Pre-fail Always - 111
> 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7379

> 196 Reallocated_Event_Count 0x0032 184 184 000 Old_age Always - 16
> 197 Current_Pending_Sector 0x0032 198 198 000 Old_age Always - 112
> 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0

> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
> # 1 Short offline Completed: read failure 60% 7378 31108065
> --------------------

> I have recovered the disk to a new one with dd_rescue so the old disk
> is left unused. I wanted to know if I could do some low-level format and
> get rid of the bad areas so I could keep him as sort of disk moving
> files between systems.

There is no such thing like “low level format”.

You can use the disk as long as you don’t expect to read the data. Ie,
write and forget, like storing goods in a storehouse with wells in the
floor and leaks on the roof, and you do not know where the dangerous
shelves are.

There are 111 sectors that have been remapped, 112 waiting to be
remapped, and even the short test failed to complete.

You might write to the entire surface with dd, and then run the long
test. A single non relocated sector (pending), and you have to repeat.
When there are no more bad sectors, keep doing it for a week. It there
are no more failures and not a single bad sector more, then you might
risk using the disk.


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

I would beg to differ other than the term low level format is the wrong one. This gives a good description of what manufacturers actually do and how these manufacturer refurbished drives come up for sale. Have to wonder if the actual manufacturers actually did remap them though.

How to Fix Bad Sectors on a Hard Drive

I should add that these people have an axe to grind but the generalities are correct.

Not that we can get our hands on the utility and personally I wouldn’t fancy using it on a truly worn out drive. There is also the possibility that something found on the web might completely trash a disc anyway.

John

On 2013-08-03 16:36, John 82 wrote:
>
> I would beg to differ other than the term low level format is the wrong
> one. This gives a good description of what manufacturers actually do and
> how these manufacturer refurbished drives come up for sale. Have to
> wonder if the actual manufacturers actually did remap them though.
>
> ‘How to Fix Bad Sectors on a Hard Drive’ (http://tinyurl.com/nybl8fp)

Interesting.

> I should add that these people have an axe to grind but the
> generalities are correct.

This paragraph is incorrect:

“Now that we understand how a PList and a GList work let’s take a look
and see why a single bad sector on a drive is an extremely bad situation
to ignore. Since we know that all bad sectors are remapped to a pool of
unused sectors, and the size of this pool is substantial, the only way a
bad sector will show up on your system is if the pool has been
completely used. In other words, your remapping pool is so full that it
cannot take another bad sector and now the drive is throwing errors that
it can’t read from a sector.”

The first bad sector to appear is the first bad sector in the Glist,
because the remap is not done at all if you only read that region. Only
when writing is that remap is done.

Thus you may have some bad sectors waiting for remap while the GList is
empty.

The situation of the OP is intermediate: he has sectors remapped and
waiting.

> Not that we can get our hands on the utility and personally I wouldn’t
> fancy using it on a truly worn out drive. There is also the possibility
> that something found on the web might completely trash a disc anyway.

The utility they talk about might not do different than dd_rescue in Linux.


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

On Sat, 03 Aug 2013 04:36:03 +0000, gogalthorp wrote:

> SponRight is a commercial program that was always pretty good fixing
> disks but it’s cost is as much as a new drive these days.
> Soooo…

Spinrite - and yeah, I’ve used that way back in the past, but even still,
modern hard drives tend to continue to fail once they start. Especially
consumer drives, which aren’t tested as thoroughly as drives intended for
continuous use.

Some have said that periodically reading all the data can help with “data
alignment” issues, but I’ve not seen evidence that demonstrates that.

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

I agree SpinRight was great at reviving things but once a drive starts throwing bad sectors you better figure on getting a new one.

+1

As I stated in a discussion back in August 2011:

When bad sectors begin to be to detected in an HDD for the first time it means
the surface of the disk is deteriorating. When that happens it most often
happens slowly at first then more and more rapidly as time passes. There is NO
WAY to reverse the process. In my considerable experience the rate at which a
disk’s surface fails is progressively more and more rapid as time passes. One
moment they work, and the next moment they don’t. It’s not a question of “if”,
it’s a question of “when”.

The bottom line: Listen to the advice you been given in the previous posts.
Don’t screw around analyzing the problem. Get a new hard drive and copy the data
off the old one now… not “tomorrow”, or when you “get a spare moment”, or “as
soon as the kids go back to school”… NOW.

Sorry if that sounds harsh, but I’ve seen a few hundred of these failures, and
I’ve also listened to several dozen customers say “I should have listened to
your advice.”

The deterioration I mention above usually appears as a section of the coating on
the disk begins to detach from the disk, forming a tiny raised bubble on the
surface which the head can not read. With time that bubble will grow, and as it
does the number of bad sectors will increase more and more rapidly as the
diameter of the bubble steadily increases, thereby increasing the area of the
“bad sectors”.

BTW, guess what happens if and when the height of the bubble grows enough to
close the gap between the platter and the drive head. Once that happens the
drive is toast, and any chance to recover data, etc. is instantly gone without
any prior warning other than exactly the symptoms described in this post…

Jim Henderson wrote:

> Some have said that periodically reading all the data can help with “data
> alignment” issues, but I’ve not seen evidence that demonstrates that.
>

Way, way back (when “Winchester” wasn’t automatically assumed to be a
firearm… ) we commonly did lowlevel format/restore ops because head
positioning was pretty sloppy and domain spread of the magnetic medium was
actually a problem. Some kept it up out of habit but somewhere about the
time IBM got into the desktop hard drive game with the AT they ran some
tests that indicated the re-write wasn’t worth the effort and time.

After the price for the first 5 MB (!!!) drives came down to an affordable
point, it seems like I paid the same price for the disk de jure - the cost
remained the same as each larger size became available. I will say that the
reliability of a current generation disk drive is immeasurably better than
the ones from the 80s or 90s.


Will Honea
whonea@yahoo.com

On 08/03/2013 10:15 PM, Will Honea wrote:
> After the price for the first 5 MB (!!!) drives came down to an affordable
> point, it seems like I paid the same price for the disk de jure - the cost
> remained the same as each larger size became available. I will say that the
> reliability of a current generation disk drive is immeasurably better than
> the ones from the 80s or 90s.

Amen. I once gave a talk to the chemistry department of a major state university
where my major topic was that the price of disks had come down to $1 per MB, and
that meant that departments or even users would be able to afford sufficient
computing resources to free themselves from the main frame in the university’s
computer center. That was about 1985, and would translate to an
inflation-adjusted price of about $3/MB. without the price decreases that Will
mentions, a modern 4 TB disk would cost $12 million, rather than $180!