Hard disk failure? SMART and smartmontools

I have a brand new Hitachi drive (250 gb and only 3 days old). I have installed a dual boot system with XP and Suse 10.3. I am also a noob, so please bear with me…

I keep getting SMART error messages when I’m in Suse. The Hitachi diagnostic tool tells me the disk is fine and smartmontools says “PASSED” after running this command “smartctl -H /dev/sda”

I have also run a couple of short and long selftests and no errors show up.

smartctl -l selftest /dev/sda

smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is smartmontools Home Page (last updated $Date: 2008/06/16 17:31:16 $)

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Short offline Completed without error 00% 43 -

2 Short offline Completed without error 00% 41 -

3 Extended offline Aborted by host 90% 37 -

4 Extended offline Completed without error 00% 36 -

5 Short offline Completed without error 00% 34 -

Yet the smart log throws out the following

Error 68 occurred at disk power-on lifetime: 43 hours (1 days + 19 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


84 51 00 29 36 ba e8 Error: ICRC, ABRT at LBA = 0x08ba3629 = 146421289

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


25 00 a0 8a 32 ba e0 00 03:23:54.100 READ DMA EXT
27 00 00 00 00 00 e0 00 03:23:54.100 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 02 03:23:54.100 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 02 03:23:54.100 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 03:23:54.100 READ NATIVE MAX ADDRESS EXT

Error 67 occurred at disk power-on lifetime: 43 hours (1 days + 19 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


84 51 00 29 36 ba e8 Error: ICRC, ABRT at LBA = 0x08ba3629 = 146421289

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


25 00 a0 8a 32 ba e0 00 03:23:53.900 READ DMA EXT
25 00 00 8a 2e ba e0 00 03:23:53.900 READ DMA EXT
25 00 00 8a 2a ba e0 00 03:23:53.900 READ DMA EXT
25 00 00 8a 26 ba e0 00 03:23:53.900 READ DMA EXT
c8 00 f8 92 25 ba e8 00 03:23:53.900 READ DMA

Error 66 occurred at disk power-on lifetime: 43 hours (1 days + 19 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


84 51 00 49 18 2e e8 Error: ICRC, ABRT at LBA = 0x082e1849 = 137238601

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


c8 00 70 da 17 2e e8 00 03:23:32.200 READ DMA
c8 00 70 6a 17 2e e8 00 03:23:32.200 READ DMA
25 00 58 12 16 2e e0 00 03:23:32.200 READ DMA EXT
c8 00 18 ca 15 2e e8 00 03:23:32.200 READ DMA
c8 00 c0 5a 12 2e e8 00 03:23:32.200 READ DMA

Error 65 occurred at disk power-on lifetime: 43 hours (1 days + 19 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH


84 51 00 f1 c8 47 e9 Error: ICRC, ABRT at LBA = 0x0947c8f1 = 155699441

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


c8 00 48 aa c8 47 e9 00 03:23:31.600 READ DMA
c8 00 70 6a 3f 2c e8 00 03:23:31.600 READ DMA
c8 00 48 ca 18 2a e8 00 03:23:31.600 READ DMA
c8 00 20 82 a0 29 e8 00 03:23:31.600 READ DMA
c8 00 08 ca 7b 29 e8 00 03:23:31.600 READ DMA

Error 64 occurred at disk power-on lifetime: 40 hours (1 days + 16 hours)
When the command that caused the error occurred, the device was active or idle .

After command completion occurred, registers were:
ER ST SC SN CL CH DH


84 51 00 29 36 ba e8 Error: ICRC, ABRT at LBA = 0x08ba3629 = 146421289

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


25 00 a0 8a 32 ba e0 00 00:39:24.200 READ DMA EXT
25 00 00 8a 2e ba e0 00 00:39:24.200 READ DMA EXT
25 00 00 8a 2a ba e0 00 00:39:24.100 READ DMA EXT
25 00 00 8a 26 ba e0 00 00:39:24.100 READ DMA EXT
c8 00 f8 92 25 ba e8 00 00:39:24.100 READ DMA

Unfortunately this is all Chinese to me and the Hitachi support is closed. Could it be that there is something wrong with the smart monitor? Sending me error messages, although everything’s fine?
Thanks,g

PS: I don’t get any smart messages in XP, btw.
PPS: I just got another message “Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sda, ATA error count increased from 66 to 68”

You might try booting to a live CD and running:

fsck /dev/sda1 (or whatever the drive is)

Do not fsck a mounted drive you might fsck everything up.

Also, check BIOS. Many BIOS’s have a SMART status you can check.

I used the GParted live disc and identified the part of my disk that is supposedly dodgy (the root directory) and fsck gave me the following reply:

/dev/hda6 primary superblock features different from backup, check forced

PASS 1: checking inodes, blocks & sizes
PASS 2: checking directory structure
PASS 3: checking directoyr connectivity
PASS 4: checking reference counts
PASS 5: checking group summary information

/dev/hda6: file system was modified
/dev/hda6/ 173069/2626560 files (0.5% non-contiguous), 982273/524880 blocks

I have now started Suse again, and SMART tells me my other (backup) hard drive is now failing as well!

Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sdb, 1 Currently unreadable (pending) sectors

I had to replace my old hard disk, because SMART told me it was failing.
3 different hd’s failing in as many days? :confused:
thanks,
g

That is odd. I have never seen those errors and I’ve been using Linux since 2003. Maybe your motherboard is having issues with it’s IDE buss. Can you move the HD to another IDE port on your motherbord? Maybe that will help. Just a wild guess though.