Yikes, did my HDD crash ?

openSUSE 12.3
KDE 4.10.5

When I booted up this morning, I got a lot of text messages talking about AHCI Port 1 device error.
So I rebooted using my recently created openSUSE 12.3 recovery CD.
Ran GParted and tried to look at the disks. Each one except sda4 has a yellow triangle with an exclamation point.
I saw that there is a feature to do a check disk and repair, and this is what I got.

HELP !!!
What do I do now ?

GParted 0.14.1


Libparted 2.4


Check and repair file system (ntfs) on /dev/sda1  00:00:00    ( ERROR )
        
calibrate /dev/sda1  00:00:00    ( SUCCESS )
        
path: /dev/sda1
start: 2,048
end: 27,265,023
size: 27,262,976 (13.00 GiB)
check file system on /dev/sda1 for errors and (if possible) fix them  00:00:00    ( ERROR )
        
ntfsresize -P -i -f -v /dev/sda1
        
ntfsresize v2012.1.15 (libntfs-3g)
Error reading bootsector: Input/output error
ERROR(5): Opening '/dev/sda1' as NTFS failed: Input/output error
NTFS is inconsistent. Run chkdsk /f on Windows then reboot it TWICE!
The usage of the /f parameter is very IMPORTANT! No modification was
and will be made to NTFS by this software until it gets repaired.
========================================

From what I read, my best guess is that Windows did not shutdown properly. You would have to start WIndows and (as per message) have it repair it’s filesystem(s).

I cannot start Windows since I cannot boot at all.
I have not run Windows in at least a week and I think it’s just coincidental that I picked an SDA where Windows resides.

When I just did SDA7, I got this:

It looks like everything from sda1 thru sda7 except 4 & 5 have errors.

GParted 0.14.1
Libparted 2.4
|**Check and repair file system (ext4) on /dev/sda7**  00:00:00    ( ERROR )|
|---|
||[TABLE]
[TR]
[TD="colspan: 2"]calibrate /dev/sda7  00:00:00    ( SUCCESS )|
||[TABLE]
[TR]
[TD="colspan: 2"]*path: /dev/sda7
start: 831,526,912
end: 1,465,128,959
size: 633,602,048 (302.12 GiB)*|


[/TD]
[/TR]
[/TABLE]
|check file system on /dev/sda7 for errors and (if possible) fix them  00:00:00    ( ERROR )|
|---|
||[TABLE]
[TR]
[TD="colspan: 2"]***e2fsck -f -y -v /dev/sda7***|
||[TABLE]
[TR]
[TD="colspan: 2"]*Could this be a zero-length partition?
*|


|*e2fsck 1.42.6 (21-Sep-2012)
|
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sda7
*|


[/TD]
[/TR]
[/TABLE]
[/TD]
[/TR]
[/TABLE]
[/TD]
[/TR]
[/TABLE]
========================================

Smartctl returned this:

 linux:~ # smartctl /dev/sda -asmartctl 6.0 2012-10-10 r3643 [x86_64-linux-3.7.10-1.1-desktop] (SUSE RPM)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org


Vendor:               /0:0:0:0
Product:              
User Capacity:        600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
linux:~ # 



and this don’t look good.
I ran badblocks and got this:

 linux:~ # sudo badblocks -v /dev/sda1 >bb1Checking blocks 0 to 13631487
Checking for bad blocks (read-only test): 
done                                                 
Pass completed, 13631488 bad blocks found. (13631488/0/0 errors)
linux:~ # 
linux:~ # 

Unless I am crazy, this is saying the whole darn thing is bad. I find this hard to believe and am going to check another. I suspect that more likely is that the HDD controller might be getting flaky.

On 2013-10-19 16:16, hextejas wrote:

> Unless I am crazy, this is saying the whole darn thing is bad. I find
> this hard to believe and am going to check another. I suspect that more
> likely is that the HDD controller might be getting flaky.

I hate those failures :frowning:

Looks bad, sorry.

Did you try changing the sata cable?


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

That’s an option I had not thought of, and will try it after I exhaust all else.

I was able to boot from Win 7 as part of the dual boot but it won’t let me run chkdsk with a /f.

Is there a Linux utility that would do the same as chkdsk that’s part of the openSUSU rescue disk, or elsewhere?

Booting from my Windows CD gives an unrecoverable error and tells me to remove any newly installed HDs or controllers. Sheesh.

What is very odd is that during the BIOS display at boot up, it looks like something is mixed up.

Auto detect AHCI Port 0 IDE HD
Auto detect AHCI Port 1 ATAPI CDROM
AHCI Port1 ST3750528AS CC44
  S.M.A.R.T. Capable and Status BAD
AHCI Port2 HL-DT-ST DVDRAM GH41N MN01

Why in the auto detect is the HD at port 0 yet at port1 at the 3rd line ? Is something confused ?
Also is that “BAD” really meaning that the HD is bad ?

And now I am totally confused and have lost aamy faith in the world of PCs and operating systems.

It’s working and this is what I did.

I have an USB PrecisePuppy Linux, so I booted from it, ran gparted and it fixed one of my drives and now all is well.

I still don’t get it. Why did the original Gparted from openSUSE show everything in error, why do the boot-up listings show Port0 and Port1 to look to be switched.

I need a beer, and fast.

On 2013-10-19 17:26, hextejas wrote:
>
> robin_listas;2592261 Wrote:

>> Did you try changing the sata cable?

> That’s an option I had not thought of, and will try it after I exhaust
> all else.

I would try that first :slight_smile:

> I was able to boot from Win 7 as part of the dual boot but it won’t let
> me run chkdsk with a /f.
>
> Is there a Linux utility that would do the same as chkdsk that’s part of
> the openSUSU rescue disk, or elsewhere?

Not really. The ntfs is proprietary; Linux can access it, but it does
not support all the features because they are not published. Thus a true
fsck of an ntfs partition in Linux is impossible.

> Booting from my Windows CD gives an unrecoverable error and tells me to
> remove any newly installed HDs or controllers. Sheesh.
>
> What is very odd is that during the BIOS display at boot up, it looks
> like something is mixed up.
>
>
> Code:
> --------------------
> Auto detect AHCI Port 0 IDE HD
> Auto detect AHCI Port 1 ATAPI CDROM
> AHCI Port1 ST3750528AS CC44
> S.M.A.R.T. Capable and Status BAD
> AHCI Port2 HL-DT-ST DVDRAM GH41N MN01
> --------------------
>
>
> Why in the auto detect is the HD at port 0 yet at port1 at the 3rd line
> ? Is something confused ?

Dunno.

> Also is that “BAD” really meaning that the HD is bad ?

Or a bad cable :wink:

I see that it works now.

I have an old computer with a problematic cable or connector (parallel
or ribbon cable). The computer worked well for say, a week or a month,
but eventually a hard disk would fail completely. I had to remove the
cable and connect it again, and it worked fine for another three weeks.
I know it is not the cable itself, because I have replaced it several times.

I tell you this so that you see that a bad cable or connector can play
very weird tricks.


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

No kidding !!!
Do you recall the bug I filed about Yast not updating Mozilla Thunderbird Translations ?
Probably not, but after I fixed the disk, the Yast update finished successfully.
I should have listened to you earlier and replaced the cable.
I am off to the store now to get one and I hope that is all it is.

On 2013-10-20 20:46, hextejas wrote:

> No kidding !!!
> Do you recall the bug I filed about Yast not updating Mozilla
> Thunderbird Translations ?

Not really… you have had so many issues that I get lost.
I’m pleasantly surprised that you are still here, “fighting” to use
Linux :-))

> Probably not, but after I fixed the disk, the Yast update finished
> successfully.

Good!

Oops. In that case, close the bugzilla, too.

> I should have listened to you earlier and replaced the cable.
> I am off to the store now to get one and I hope that is all it is.

I happen to have a few of them around, so testing another cable is easy.


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

Well, I replaced the cable and I think it got worse.

The HDD won’t boot now, though I can see the data on it when I booted from a USB Precise Puppy distro.
What’s weird is that it has assigned sdg to it rather than what it was before, which was sda. Is that a function of the distro that I am using, Precise Puppy ?

Now a few questions:

  1. Is it most likely that the drive is bad now rather than the cable or could the controller be bad ?
  2. Do the SATA -1, -2, -3, -4 connectors all come together into one controller ?
  3. I plugged it into SATA-1 because that is where it was before.
  4. Could I use SATA-2 or -3 or -4 ? Is that the way CMOS will try to boot ? I know that IDE drives start ad IDE 1, IDE-2, etc. Does SATA work the same way ?
  5. Should my next step be to replace the HDD ?
  6. And after replacement, can I just copy from my old HDD to the new one, \root\MyName*.* Not sure I have the patch correct but basically wherever my stuff is kept ?

Wish me luck !

and thanks

One more thing I thought of:

  1. Is there a way I can boot from the LiveCD and restore just whatever is needed to make the HDD bootable ? Is it the MBR ?

Well, most of those questions can be ignored. They were more along the lines of trying to figure out what is wrong. I am going to assume it is the hard drive and replace it. If that does not work, that will be a different problem. One to solve on a full stomach.

On 2013-10-21 18:16, hextejas wrote:
>
> Well, I replaced the cable and I think it got worse.

Oh :frowning:

> The HDD won’t boot now, though I can see the data on it when I booted
> from a USB Precise Puppy distro.
> What’s weird is that it has assigned sdg to it rather than what it was
> before, which was sda. Is that a function of the distro that I am using,
> Precise Puppy ?

Humm… no.

The letters are assigned in the order the system sees them first, for
whatever reason. The boot disk (puppy) is seen earlier, so it typically
gets the sda letter.

On my machine, I normally boot from sda. However, when I power up the
external drives, they get there first, somehow, and my boot disk gets
labeled sdc or beyond. Worse, the boot disk, lets say the main disk, can
be sda on boot, and sdc after hibernating, because I powered up the
external drives in between. Ie, the names can change during the same
session!! (I’d have to re-verify this someday to be absolutely sure)

> Now a few questions:
> 1) Is it most likely that the drive is bad now rather than the cable or
> could the controller be bad ?

I’d have to be there to be sure.

> 2) Do the SATA -1, -2, -3, -4 connectors all come together into one
> controller ?

It depends on the motherboard. Mine has three controllers, and I never
know which is which.

> 3) I plugged it into SATA-1 because that is where it was before.

Try another plug. All should work. You only need to choose in the Bios
which one to boot first.

> 4) Could I use SATA-2 or -3 or -4 ? Is that the way CMOS will try to
> boot ? I know that IDE drives start ad IDE 1, IDE-2, etc. Does SATA work
> the same way ?

Well, yes, you can choose any of them to boot first, but it depends on
the bios (cmos config).

> 5) Should my next step be to replace the HDD ?

If you have another one available, I would certainly try. Even a
different name.

> 6) And after replacement, can I just copy from my old HDD to the new
> one, \root\MyName*.* Not sure I have the patch correct but basically
> wherever my stuff is kept ?

It is possible to clone one disk to another, byte by byte, if sizes
permit, and the new one will boot - however, some IDs maye have to be
adjusted. And if the second disk is bigger, the partitions have to be
enlarged later (I think gparted can do that).

On 2013-10-21 18:16, hextejas wrote:

> One more thing I thought of:
>
> 7) Is there a way I can boot from the LiveCD and restore just whatever
> is needed to make the HDD bootable ? Is it the MBR ?

I’m not sure I understand. :-?

On 2013-10-21 19:16, hextejas wrote:

> Well, most of those questions can be ignored. They were more along the
> lines of trying to figure out what is wrong. I am going to assume it is
> the hard drive and replace it. If that does not work, that will be a
> different problem. One to solve on a full stomach.

And a good rest.


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)