Hard disk failing?

Dear all,
in the last days I am having a problem with my hard disk. I guess this only windows (ntfs specific) as my linux works just great.

In the last days the windows have a lot of problems reading from the hard disks and when booting up windows are calling for scandisks. Indeed problems are found and fixed. As I was concerned regarding the files there, I have already finished the backups.

I would like to ask your help for some decent tool that can examine the hard disk and see what is going on in the low level (bad sectors?). This program should support ntfs partitions and being able to not mess up with my ext4-linux partitions.

I would like to thank you in advance for your help

Regards
Alex

Never used it, but I’ve heard of people using TestDisk Download - CGSecurity

alaios wrote:
> Dear all,
> in the last days I am having a problem with my hard disk. I guess this
> only windows (ntfs specific) as my linux works just great.
>
> In the last days the windows have a lot of problems reading from the
> hard disks and when booting up windows are calling for scandisks. Indeed
> problems are found and fixed. As I was concerned regarding the files
> there, I have already finished the backups.
>
> I would like to ask your help for some decent tool that can examine the
> hard disk and see what is going on in the low level (bad sectors?).
> This program should support ntfs partitions and being able to not mess
> up with my ext4-linux partitions.

What does SMART say?

No clue, what I do to get that information?

Alex

Forgot to mention that the HD is a WD3200 (at least that is what Yast reports)

Alex

On 2013-03-18 13:26, alaios wrote:

>> What does SMART say?
>
> No clue, what I do to get that information?

You run “smartctl -a /dev/whateverdisk” and post the result here for
interpretation. Then you will allso have to run the short and long
tests, wait the appropriate time, and post that output again. More
explanation in the manual.

Warning: it is possible that the long test finds error in the surface.
Those sectors can be lost, and your data there damaged or lost. So some
advise to do a backup first.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

Hi there,
does this lovely tool work for both ntfs and ext4 or just works at the low level regardless of filesystem?
I have backups ready so this is the best time for running the command.
I just execute it as root.

Can it be also that it doesnot help to stress the hard disk and I should just replace it?

WD provide a hard disk test program that works at the physical disk level and does not care if the partitions are ntfs, ext4, etc. If the disk is still under warranty then this is required for RMA. The test program can be downloaded from the WD site.

Thanks I will try first with the WD program. I wonder though why the problem looks to be only in the ntfs partition. My linux works just fine

On 03/18/2013 05:36 PM, alaios wrote:
>
> Thanks I will try first with the WD program. I wonder though why the
> problem looks to be only in the ntfs partition. My linux works just fine

In addition, smartctl uses firmware built into the disk. It also does not care
about any file structure - the disk is a collection of sectors to it.

Who knows why only the NTFS file system is affected. Perhaps only that section
of the disk is failing. If it really is a hardware failure, it will not be
restricted to that partition for very long.

The smartctl should run as root with the disk unmounted right?

Alex

Also smart output here

ImageShack® - Online Photo and Video Hosting

On 2013-03-19 00:06, alaios wrote:
>
> The smartctl should run as root with the disk unmounted right?

No, the nice thing is that it runs while you keep working. That program
is not the worker, it is the hard disk firmware itself which does the
testing. You go on working, and at the appropriate time you query the
results.

Actually, there is a daemon you can use which can test the disk, say,
every week.

No, it does not care at all how the disk is formatted.

Yes, damage can affect a single partition. If can be a few tracks of the
disk.

I see you posted a photo. No, I will not comment on it, I want text
output pasted here in code tags so that I can see the fields and comment
on them.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

Here you are

smartctl -a /dev/sda
smartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.1.10-1.16-default] (SUSE RPM)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Scorpio Blue Serial ATA
Device Model:     WDC WD3200BEVT-22ZCT0
Serial Number:    WD-WXF0A99P7681
LU WWN Device Id: 5 0014ee 20375eb73
Firmware Version: 11.01A11
User Capacity:    320,072,933,376 bytes [320 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.5, 3.0 Gb/s                                                                                                                                                                                             
Local Time is:    Tue Mar 19 08:17:11 2013 EET                                                                                                                                                                                   
SMART support is: Available - device has SMART capability.                                                                                                                                                                       
SMART support is: Enabled                                                                                                                                                                                                        
                                                                                                                                                                                                                                 
=== START OF READ SMART DATA SECTION ===                                                                                                                                                                                         
SMART overall-health self-assessment test result: PASSED                                                                                                                                                                         
                                                                                                                                                                                                                                 
General SMART Values:                                                                                                                                                                                                            
Offline data collection status:  (0x00) Offline data collection activity                                                                                                                                                         
                                        was never started.                                                                                                                                                                       
                                        Auto Offline Data Collection: Disabled.                                                                                                                                                  
Self-test execution status:      (   0) The previous self-test routine completed                                                                                                                                                 
                                        without error or no self-test has ever                                                                                                                                                   
                                        been run.                                                                                                                                                                                
Total time to complete Offline                                                                                                                                                                                                   
data collection:                ( 9960) seconds.                                                                                                                                                                                 
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 118) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   198   198   051    Pre-fail  Always       -       23359
  3 Spin_Up_Time            0x0027   184   183   021    Pre-fail  Always       -       1758
  4 Start_Stop_Count        0x0032   096   096   000    Old_age   Always       -       4244
  5 Reallocated_Sector_Ct   0x0033   186   186   140    Pre-fail  Always       -       111
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       7379
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       3402
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       75
193 Load_Cycle_Count        0x0032   124   124   000    Old_age   Always       -       228553
194 Temperature_Celsius     0x0022   114   074   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   184   184   000    Old_age   Always       -       16
197 Current_Pending_Sector  0x0032   198   198   000    Old_age   Always       -       112
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0009   100   253   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 22858 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 22858 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2c 7f e4 40  Error: WP at LBA = 0x00e47f2c = 14974764

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 10 e8 29 7c 01 00      03:46:29.525  WRITE FPDMA QUEUED
  61 08 08 98 28 7a 01 00      03:46:29.524  WRITE FPDMA QUEUED
  61 08 f0 68 3f 53 06 00      03:46:29.524  WRITE FPDMA QUEUED
  61 08 c8 a8 b4 81 01 00      03:46:29.524  WRITE FPDMA QUEUED
  61 08 b0 b8 29 7c 01 00      03:46:29.524  WRITE FPDMA QUEUED

Error 22857 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2c 7f e4 40  Error: UNC at LBA = 0x00e47f2c = 14974764

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 28 80 08 7f e4 24 00      03:46:25.969  READ FPDMA QUEUED
  60 40 78 ba 80 91 12 00      03:46:25.969  READ FPDMA QUEUED
  60 80 70 60 68 25 0e 00      03:46:25.969  READ FPDMA QUEUED
  61 03 68 60 36 a3 09 00      03:46:25.969  WRITE FPDMA QUEUED
  60 40 60 10 cb e8 01 00      03:46:25.969  READ FPDMA QUEUED

Error 22856 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2c 7f e4 40  Error: WP at LBA = 0x00e47f2c = 14974764

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 18 a0 28 7a 01 00      03:46:22.588  WRITE FPDMA QUEUED
  61 08 10 b0 32 7b 01 00      03:46:22.588  WRITE FPDMA QUEUED
  61 08 08 90 28 7a 01 00      03:46:22.587  WRITE FPDMA QUEUED
  61 08 00 98 28 7a 01 00      03:46:22.587  WRITE FPDMA QUEUED
  61 01 f8 c8 d8 ff 02 00      03:46:22.586  WRITE FPDMA QUEUED

Error 22855 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2c 7f e4 40  Error: UNC at LBA = 0x00e47f2c = 14974764

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 28 80 08 7f e4 24 00      03:46:19.106  READ FPDMA QUEUED
  60 38 78 3a 80 91 12 00      03:46:19.105  READ FPDMA QUEUED
  60 80 70 10 67 25 0e 00      03:46:19.105  READ FPDMA QUEUED
  60 20 68 40 36 dd 01 00      03:46:19.105  READ FPDMA QUEUED
  60 38 60 62 e0 86 01 00      03:46:19.105  READ FPDMA QUEUED

Error 22854 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 2c 7f e4 40  Error: UNC at LBA = 0x00e47f2c = 14974764

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 40 0e 07 a7 01 00      03:46:15.868  READ FPDMA QUEUED
  61 08 38 a0 28 7a 01 00      03:46:15.868  WRITE FPDMA QUEUED
  60 1b 30 16 07 a7 01 00      03:46:15.868  READ FPDMA QUEUED
  61 08 28 98 32 7b 01 00      03:46:15.867  WRITE FPDMA QUEUED
  60 28 20 08 7f e4 24 00      03:46:15.867  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       60%      7378         31108065

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


On 2013-03-19 08:26, alaios wrote:
>
> Here you are

Thanks.


> --------------------
>     smartctl -a /dev/sda

>   SMART Attributes Data Structure revision number: 16
>   Vendor Specific SMART Attributes with Thresholds:
>   ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x002f   198   198   051    Pre-fail  Always       -       23359
>   3 Spin_Up_Time            0x0027   184   183   021    Pre-fail  Always       -       1758
>   4 Start_Stop_Count        0x0032   096   096   000    Old_age   Always       -       4244
>   5 Reallocated_Sector_Ct   0x0033   186   186   140    Pre-fail  Always       -       111
>   7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
>   9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       7379
>   10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
>   11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
>   12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       3402
>   192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       75
>   193 Load_Cycle_Count        0x0032   124   124   000    Old_age   Always       -       228553
>   194 Temperature_Celsius     0x0022   114   074   000    Old_age   Always       -       33
>   196 Reallocated_Event_Count 0x0032   184   184   000    Old_age   Always       -       16
>   197 Current_Pending_Sector  0x0032   198   198   000    Old_age   Always       -       112
>   198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
>   199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
>   200 Multi_Zone_Error_Rate   0x0009   100   253   051    Pre-fail  Offline      -       0


Ok, parameter 197 is bad. You have at least 112 bad sectors which have
not been remapped (reallocation happens automatically when writing to a
bad sector), and parameter 5 shows that other 111 have already been
reallocated.

So that’s a pretty bad sign. The disk is not too old, just 7379 hours,
if this is a desktop. For a laptop this is though.


>
>   Error 22858 occurred at disk power-on lifetime: 7374 hours (307 days + 6 hours)
>   When the command that caused the error occurred, the device was active or idle.

....


>   SMART Self-test log structure revision number 1
>   Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
>   # 1  Short offline       Completed: read failure       60%      7378         31108065

> --------------------

Not even the short test completed, and that is a worse sign.

Advise: get a backup done ASAP, and I really mean ASAP.

This can mean a file backup, and image backup (with dd_rescue) or
both. Your choice.

Once done, rewrite the entire disk surface with zeroes, run the long
test. If it fails, replace the disk.

If it does not fail, repeat.

Compare both test results. If bad sector count increased, there is no
doubt: replace the hard disk. If it is still under warranty, follow the
appropriate procedure to get it replaced or whatever they do.

If the bad sector count does not increase, you might risk using the disk
again. I would not use it for important things, though.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

I am not even gonna do that. A new hard disk costs 60 euros… so I am buying a new one and keeping the old one …

As this is a laptop hard disk are there any limitations I should take care for ordering a new hard disk? I am thinking for a 1 or 2TB internal hard disk. What are the fastest speeds for normal sata disks_ Is it 7200rpm with a 64mb internal cache? Will that disk be supported from a three years old laptop?

This faulty disk has a hidden partition for installing/recovering also windows. How can I clone this partition once the new hard disk has arrived? (I have backups of all the other files already).
I have those external cables that convert atapi/sata to usb.
I am thinking something like booting a linux distro connecting the new disk as external usb disk to copy the partitions I had (the most important is the one with the hidden recovery partition). Is this what you also had in mind? Which programs would help me accomplish that?

Alex

On Tue 19 Mar 2013 02:56:01 PM CDT, alaios wrote:

I am not even gonna do that. A new hard disk costs 60 euros… so I am
buying a new one and keeping the old one …

As this is a laptop hard disk are there any limitations I should take
care for ordering a new hard disk? I am thinking for a 1 or 2TB internal
hard disk. What are the fastest speeds for normal sata disks_ Is it
7200rpm with a 64mb internal cache? Will that disk be supported from a
three years old laptop?

This faulty disk has a hidden partition for installing/recovering also
windows. How can I clone this partition once the new hard disk has
arrived? (I have backups of all the other files already).
I have those external cables that convert atapi/sata to usb.
I am thinking something like booting a linux distro connecting the new
disk as external usb disk to copy the partitions I had (the most
important is the one with the hidden recovery partition). Is this what
you also had in mind? Which programs would help me accomplish that?

Alex

Hi
I would check the laptop manufacturers website to see what size drives
are available as options, then pick the biggest one. The system power
requirements would probably have been designed around these, or check
the drive your considering (both physical dimensions [as in thickness]
and power requirements [as in current required]) to see if it’s a
suitable alternative.

Why not consider an SSD?


Cheers Malcolm °¿° (Linux Counter #276890)
openSUSE 12.3 (x86_64) Kernel 3.7.10-1.1-desktop
up 1 day 18:31, 3 users, load average: 0.16, 0.07, 0.06
CPU Intel® i5 CPU M520@2.40GHz | GPU Intel® Ironlake Mobile

There are no options at the acer web site as this model is too old for them… to support it.
ssd is too expensive
for example I was thinking for those one of those two models to buy

Western Digital WD7500BPKT Black 750GB interne: Amazon.de: Computer & Zubehör

or

Seagate Momentus XT Interne Festplatte 750GB 2,5 Zoll: Amazon.de: Computer & Zubehör

the site is in German but there are thumbnails with their measurement speeds. I can remove the hard disk a bit later today, so I can measure the dimensions and see how much power the hard disk needs. Would that suffice?

After edit: I have googled a bit and this looks to be the model I already have
http://www.ebay.com/itm/WESTERN-DIGITAL-WD3200BEVT-22ZCT0-See-list-for-DCMs-320gb-2-5-Sata-HDD-/190577925523

On 2013-03-19 15:56, alaios wrote:
>
> I am not even gonna do that. A new hard disk costs 60 euros… so I am
> buying a new one and keeping the old one …

My laptop disk broke recently, similar problem to yours. I bought the
exact same model of HD, and cloned the old to the new byte by byte.

(I bought the same model because I found it on the list at the site I
was shopping from. Less thinking needed :slight_smile: - and that model has good
references)


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

On Tue 19 Mar 2013 03:16:04 PM CDT, alaios wrote:

malcolmlewis;2537924 Wrote:
> Hi
> I would check the laptop manufacturers website to see what size drives
> are available as options, then pick the biggest one. The system power
> requirements would probably have been designed around these, or check
> the drive your considering (both physical dimensions [as in thickness]
> and power requirements [as in current required]) to see if it’s a
> suitable alternative.
>
> Why not consider an SSD?
>
> –
> Cheers Malcolm °¿° (Linux Counter #276890)
> openSUSE 12.3 (x86_64) Kernel 3.7.10-1.1-desktop
> up 1 day 18:31, 3 users, load average: 0.16, 0.07, 0.06
> CPU Intel® i5 CPU M520@2.40GHz | GPU Intel® Ironlake Mobile

There are no options at the acer web site as this model is too old for
them… to support it.
ssd is too expensive
for example I was thinking for those one of those two models to buy

‘Western Digital WD7500BPKT Black 750GB interne: Amazon.de: Computer &
Zubehör’ (http://tinyurl.com/crleveg)

or

‘Seagate Momentus XT Interne Festplatte 750GB 2,5 Zoll: Amazon.de:
Computer & Zubehör’ (http://tinyurl.com/cs2jqgw)

the site is in German but there are thumbnails with their measurement
speeds. I can remove the hard disk a bit later today, so I can measure
the dimensions and see how much power the hard disk needs. Would that
suffice?

After edit: I have googled a bit and this looks to be the model I
already have
http://tinyurl.com/cxyoeb9

Hi
If you enter the model number under support, you should be able to see
the spec? Else check the specs for the hard drive manufacturer (get
your model from smartctl) and check against the power for the drive
listed above.


Cheers Malcolm °¿° (Linux Counter #276890)
openSUSE 12.3 (x86_64) Kernel 3.7.10-1.1-desktop
up 1 day 19:41, 3 users, load average: 0.02, 0.06, 0.05
CPU Intel® i5 CPU M520@2.40GHz | GPU Intel® Ironlake Mobile