Possible hard disk failure imminent?

Hi,

[System: OpenSuSE 11.0 , kernel 2.6.25.20-0.7-pae, athlon i386]

I’ve recently begun to have problems with my Sata Hitachi disk drive; it stores all my documents and music etc. I’ve only begun to notice these problems when I installed the Ex2 IFS driver for Windows so I could access my data (read only) from my Windows disk.

However, the problem is not exclusive to Windows; Windows blue-screens when the Sata link goes down, but SuSE attempts to re-establish a link. Here’s the dmesg output (only includes output relevant to disk activity).

I’ve also run smartcl on the disk, here’s the output.

Note that the output says that it’s soft-resetting the link. Any clues as to what this may infer?

Also, the system has sometimes failed to boot as it sees the disk as corrupted and asks me to perform a fsck, which rewrites the journal (although this may or may not have been down to the aforementioned driver possibly not playing nice with the journal, even though it is in read only mode) and restores the disk to working order.

I can have periods of days where the disk works perfectly fine on both Windows and SusE, and random events where the disk link goes down for some reason. The situation seems to be remedied by me physically pushing the Sata cable into the disk and motherboard., and returns when the computer tower is subject to a considerable vibration from a slight knock et cetera.

On the other hand, this may be a power problem, some ends on a rail are faulty on my PSU and will only give power to my optical drives if they are positioned in a certain angle, although this is probably less likely as the disks don’t go offline, just the link.

Any help would be greatly appreciated, if you require any more information please do not hesitate to ask. Also, I am going to replace the cable when I have time, as I feel as though that is what many of you will probably recommend.

You know that it is hard to guess what may be going on bad with your hardware from afar. You did not state how many megabytes are on this failing hard drive. If it would fit, I would copy it all to your Windows disk at once, before it is too late. Next, power supplies are not that expensive, I just had my old 430 watt power supply go bad and I found a new 600 watt unit for just $50US plus tax and it made all of the difference in the world.

I must tell you that I would NOT try to use an EXT2 partition/driver in Windows and hope the translation works without issues. Consider that openSUSE works great with NTFS partitions, with no issues at all. If I wanted to share a drive between Windows and openSUSE, I would back up my data and reformat the drive to NTFS at once and remove the EX2 driver from Windows. Now openSUSE will by default attempt to enforce Linux user rights on the NTFS partition, but that may not allow to always write to it in Linux. I suggest changing the options to simply say Defaults, which opens it up to all Linux users. This is just my thoughts on the subject and the very way I do it at home on all of my Linux/Windows machines.

Thank You,

Always resolve hardware problems first. Maintaining good power is number one! The fact that you say the power cabling is intermittent means that you can have failed boots, failed read and writes to the drives, and loss of functionality. Today you can pick up a new power supply (if you can find one that fits your case) or find a new case with power supply pretty cheep. I would do this first. Next is as you say replace the data cables which also go intermittent.
Next you are trying to use a Windows ext2fs reader. This is not recommended as Windows uses delayed read / write cycles which can corrupt your Linux system not to mention what it may do to your windows system. Linux has better methods for passing data between OS’s. Check out virtualbox, Samba, or mount windows partitions. Windows is not designed to regard other OS’s with respect, in fact it can’t even respect other windows apps.

Looking at the output of smartctl, it seems you have no Reallocated Sectors and none are pending so the disk may be OK. From what you have tried, I suspect faulty cables and/or connectors. I am a bit intrigued by this statement:

will only give power to my optical drives if they are positioned in a certain angle

What do you mean? For example, do you have skew the connectors in the socket to get them to make contact or what?

In general, the connectors for both data and power have to be a VERY NICE TIGHT FIT. If they are making intermittent contact then you will have no end of trouble. Can you rattle the cable ends in the sockets/ports? If so then you may be using cheap SATA cables between the MB and HDD and/or you have a cheap PSU with the connectors out of spec (slightly too large), in which case you would have to replace the PSU or replace/remake the connectors.

When you say:

The situation seems to be remedied by me physically pushing the Sata cable into the disk and motherboard

Have confirmed that just pushing the SATA connectors in and out solves the problem or is it just the cold restart that does? In other words, does a cold restart alone give the same relief?

Have you tried plugging the MB end of the SATA cable into another port (assuming there is more than one), in case the MB is failing?

Another source of spurious errors is badly seated or failing RAM memory chips. You should run the Memory Test on the installation DVD/CD Menu for 4 passes of 8 tests and see if any errors in red are reported. If so, try taking out the RAM and putting it back in a couple of times to brush the dirt/oxide off the contacts. If that doesn’t work, replace the faulty RAM.

When you are satisfied about connectors and RAM, you could try a badblocks test on the HDD. I.e. as root:

badblocks -sv /dev/sda # this is read only
badblocks -svnt random # this is a much longer non destructive read write test

or under Windows download and install the free HDTune utility which gives you a nice GUI for the same thing.

Hope that helps. Good luck.

Yes, I have to slightly skew the molex for the optical drive. It’s nothing major, it fits like a normal molex; it delivers power properly when connected properly, there is no intermittent nature to the power supply.

A cold restart does not always remedy the problem, but nudging the cable into the motherboard/disk always does.

As I have now replaced the cable, I shall proceed to run a badblocks check on the disk. It’s 320GB capacity, forgot to mention that, sorry. Also, the cable replacement seems to have fixed it, it’s been about four hours and no problems so far. I replaced the cable after I tried to read some data from the ext3 disk from Windows, which resulted in a blue-screen.

After replacement, I can browse all files on the disk; it would seem the driver isn’t doing any damage as I didn’t have to run repairs on the disk after the blue-screen, it mounted fine after I booted into Windows without booting into SuSE first (so the previous repairs may have been required as a result of SuSE/Windows trying to access the disk but failing at boot time; probably SuSE as the journal became corrupted, as Windows has no write access to the disk).

Thank you for all your suggestions and replies; I shall update this topic if the cable replacement did not work, after which I shall proceed backup the contents. On that note, is there a way to rsync over a network path?