Am I facing a failing HDD?

Hey guys (and gals if there are any here :slight_smile: ).

So I started up my 12.1 box today expecting to watch a movie, instead I was staring at a black screen with text that read, “Welcome to Emergency Mode…enter root password to log in.” As can be imagined, my heart sank. So I did some googling and found this usually happens when a drive in /etc/fstab either fails on mount or is missing. More heart sinking.

This PC has 2 hard drives: a 160GB drive formatted in ext4 with the OS on it and a 2TB drive with one large partition formatted in jfs. So in running the “systemctl…” command that the OS presented to me with the Emergency Mode prompt, it appeared that the 2TB was in a FAILED status. (insert weeping and gnashing of teeth here).

So, I ran “fsck /dev/sdb1” and the result came back clean. Rebooted the PC and it booted up completely!!! YAY!!! I’m watching a movie now and everything seems to be fine, BUT, my question is: is this a forewarning of a hard drive failure to come? If so, I need to back this puppy up ASAP.

Thanks in advance! And sorry for the life story. :shame:

So shane2943, all drives do fail, the older the more likely and you must keep a backup of all important data for which you have no replacement. Of course, power failures or a sudden lockup can do the same thing but it is not normal to see such things on a daily basis. I would use this reprieve as an alert to go get a replacement drive while you can and do a backup of anything you can not replace. Good Luck!

Thank You,

10-4. This drive isn’t old at all and, so far, this is a one time deal. sigh This couldn’t have happened when HDDs were cheap!

Thanks for the reply, sir. I will heed this warning.

You must come back and share with us what you did and how it has worked out for you and of course, to ask for any help that you might require.

Thank You,

Before you rush out and buy a new trerabyte drive, have a look at smartmontools. For example I have one hard drive with device name sda

I run su to get rootly powers then I ask smartctl about the drive like this (as root): smartctl -i /dev/sda
and the return is like this:

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.7-9-desktop] (SUSE RPM)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K1000.C
Device Model:     Hitachi HDS721050CLA362
Serial Number:    JP8521HJ09A57V
LU WWN Device Id: 5 000cca 377c43d83
Firmware Version: JP2OA3MA
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Mon Feb 27 15:19:51 2012 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Very interesting. Then I ask smartctl if the drive is feeling well, like this: smartctl -H /dev/sda
and I get this:

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.7-9-desktop] (SUSE RPM)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

Try this test: smartctl -l selftest /dev/sda

Your hard drive probably is sdb, but you can find out with the command (as root): fdisk -l
or this is useful: smartctl --scan

Run a few queries on your TB drive with smartctl, it might be sick, it might be well, here is more on smartmontools: Smartmontools - Community Help Wiki

If the drive is not very old it may be under guarantee. Until recently most HDD had 3 or 5 year guarantees. You can probably find out if it is under warranty from the manufacturers web site. They will probably have an HDD test tool available. Test the drive with that. If it is failing you may still have to buy another drive but you may eventually get the dead one replaced

Hey, thanks for the reply!!!

Here’s the output of those commands:

htpc:~ # smartctl -i /dev/sdb
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.1.9-1.4-default] (SUSE RPM)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 5K3000
Device Model:     Hitachi HDS5C3020ALA632
Serial Number:    ML0220F313KDLD
LU WWN Device Id: 5 000cca 369cfb5f5
Firmware Version: ML6OA580
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Mon Feb 27 17:28:58 2012 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

htpc:~ # 

htpc:~ # smartctl -H /dev/sdb
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.1.9-1.4-default] (SUSE RPM)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

htpc:~ # 
htpc:~ # smartctl -l selftest /dev/sdb
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.1.9-1.4-default] (SUSE RPM)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


htpc:~ #

So it looks as if it’s fine. Perhaps a weird file system glitch thing happened and running “fsck /dev/sdb1” fixed it up.

On 2012-02-28 00:46, shane2943 wrote:

> Code:
> --------------------
> No self-tests have been logged. [To run self-tests, use: smartctl -t]
> --------------------
>
> So it looks as if it’s fine. Perhaps a weird file system glitch thing
> happened and running “fsck /dev/sdb1” fixed it up.

No, it /looks/ that no test was ever run. So, do that now.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

To actually run a test you would have to use:

smartctl -t short /dev/sdb

for a short test on the second drive. You’ll be told how long that will take (probably 1 - 2 minutes) and after that the outcome of your test would show up in the list of tests for that drive that you get when you run:

smartctl -l selftest /dev/sdb

A short test is just one type of test. There are others.

And, instead of commands like the last one that give you a specific piece of information you can use:

smartctl -a /dev/sdb

to get all information about your drive.

Yes it’s looking good, but I was encouraging you to make further tests. For peace of mind you should run the first two commands recommended by Lord_Elmsworth. E.g here’s what I got for sda:

tumbleweed121:/home/john # smartctl -t short /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.7-9-desktop] (SUSE RPM)
Copyright (C) 2002-11 by Bruce Allen, smartmontools

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Tue Feb 28 18:56:46 2012

Use smartctl -X to abort test.
tumbleweed121:/home/john # smartctl -l selftest /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.7-9-desktop] (SUSE RPM)
Copyright (C) 2002-11 by Bruce Allen, smartmontools                                                                                                                                                 
                                                                                                                                                                                                                              
=== START OF READ SMART DATA SECTION ===                                                                                                                                                                                      
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      6042         -
# 2  Short offline       Completed without error       00%      6042         -
# 3  Short offline       Completed without error       00%         2         -

On 2012-02-28 06:46, Lord Emsworth wrote:

> A short test is just one type of test. There are others.

Yes, he should run a long test, which typically also does a surface test.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

WOOPS! You guys are right, I jumped the gun thinking everything is fine.

I ran the short test and it completed without error. I’m running the long test now and will post the results when it’s finished.

Thanks, y’all!

Ok, HERE is the results of the self tests:

htpc:~ # smartctl -l selftest /dev/sdb
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.1.9-1.4-default] (SUSE RPM)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       479         -
# 2  Short offline       Completed without error       00%       473         -

htpc:~ # 

Looks like the drive might be ok.

Yes it does indicate a balance of probabilities [LOL now you won’t be able to sleep]

Mind blown