Getting errors from hard drive!

In /var/log/messages, I’ve gotten the below errors that also coincide with delayed startup of the OS (it seems to freeze for a while after these errors):

linux kernel:   447.000044] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
linux kernel:   447.000050] ata1.00: failed command: WRITE DMA EXT
linux kernel:   570.000117]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

What’s causing this? How do I fix it?

When you boot and Esc. in to verbose
Where does it hang and what do you see/can you report

Those exact errors are what I see when I “Esc” on boot and it’s hanging. (And they’re in the exact order I printed them in)

What happens in Failsafe

or
If you switch off AHCI in BIOS

OK, I rebooted again and noticed some more stuff:

  1. It first hangs after this message:
/dev/sda2: clean, 157855/1313280 Files, xxxx/xxxx Blocks (I didn't catch the number of blocks quickly enough, but it had actual numbers)

2. It actually prints FOUR messages:

linux kernel: 447.000044] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
linux kernel: 447.000050] ata1.00: failed command: WRITE DMA EXT
linux kernel: 570.000117] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY}



2. Those four messages appear FOUR times, with a long delay in between each time

3. Then it prints: "fsck succeeded, mounting root device, read-write"

Back after me tea.

Enjoy! You like black or green teas?

OK, so after booting into “Fail Safe” mode, here’s what I got:

  1. The same four errors, BUT this time they were accompanied by some new lines:

link is slow to respond
device not ready (err no=-16), forcing hard reset
soft resetting link
configured for UDMA/133
configured for UDMA/100
device reported invalid CHS Sector 0

And then it repeated those combined 10 lines a few times before finally moving on

Also, how do I turn of AHCI in BIOS? Is it “SATA Adapter Enabled/Disabled”? Or “Plug & Play OS YES/NO”?

None of those.
All BIOS is different and you might not even have the option.

(Green Tea)

Remind we what we were on yesterday ?
Was it your partitions I reorganized ?

It was mounting of a drive for all users (it was only readable by root), but this issue I’m 90% sure was occurring before that.

You could do with someone more knowledgeable than me on this.
Hang about you are sure to get them…

I wouldn’t like to ‘Jeff’ it totally for you. :wink:

OK, thanks. This seems to indicate that it might be a software issue:

Of course, this seems to indicate it’s a cable issue, so who knows:
https://bugzilla.redhat.com/show_bug.cgi?id=474552

Any idea how I do this when starting openSUSE?

try booting with ‘pci=nomsi’ or ‘acpi=off’ or ‘noapic’

lspci -m

What does it return ?


00:00.0 "Host bridge" "Silicon Integrated Systems [SiS]" "760/M760 Host" -r02 "Hewlett-Packard Company" "Device 2a04"
00:01.0 "PCI bridge" "Silicon Integrated Systems [SiS]" "SG86C202" "" ""
00:02.0 "ISA bridge" "Silicon Integrated Systems [SiS]" "SiS964 [MuTIOL Media IO]" -r36 "" ""
00:02.5 "IDE interface" "Silicon Integrated Systems [SiS]" "5513 [IDE]" -r01 -p80 "Hewlett-Packard Company" "Device 2a04"
00:02.7 "Multimedia audio controller" "Silicon Integrated Systems [SiS]" "AC'97 Sound Controller" -ra0 "Hewlett-Packard Company" "Device 2a05"
00:03.0 "USB Controller" "Silicon Integrated Systems [SiS]" "USB 1.1 Controller" -r0f -p10 "Hewlett-Packard Company" "Device 2a04"
00:03.1 "USB Controller" "Silicon Integrated Systems [SiS]" "USB 1.1 Controller" -r0f -p10 "Hewlett-Packard Company" "Device 2a04"
00:03.2 "USB Controller" "Silicon Integrated Systems [SiS]" "USB 1.1 Controller" -r0f -p10 "Hewlett-Packard Company" "Device 2a04"
00:03.3 "USB Controller" "Silicon Integrated Systems [SiS]" "USB 2.0 Controller" -p20 "Hewlett-Packard Company" "Device 2a04"
00:04.0 "Ethernet controller" "Silicon Integrated Systems [SiS]" "SiS900 PCI Fast Ethernet" -r90 "Hewlett-Packard Company" "Device 2a04"
00:05.0 "IDE interface" "Silicon Integrated Systems [SiS]" "RAID bus controller 180 SATA/PATA  [SiS]" -r01 -p85 "Hewlett-Packard Company" "Device 2a04"
00:08.0 "PCI bridge" "PLX Technology, Inc." "PEX8112 x1 Lane PCI Express-to-PCI Bridge" -raa "" ""
00:0a.0 "Communication controller" "Agere Systems" "V.92 56K WinModem" -r03 "Agere Systems" "Device 044c"
00:0b.0 "FireWire (IEEE 1394)" "VIA Technologies, Inc." "VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller" -r80 -p10 "Hewlett-Packard Company" "Device 2a04"
00:18.0 "Host bridge" "Advanced Micro Devices [AMD]" "K8 [Athlon64/Opteron] HyperTransport Technology Configuration" "" ""
00:18.1 "Host bridge" "Advanced Micro Devices [AMD]" "K8 [Athlon64/Opteron] Address Map" "" ""
00:18.2 "Host bridge" "Advanced Micro Devices [AMD]" "K8 [Athlon64/Opteron] DRAM Controller" "" ""
00:18.3 "Host bridge" "Advanced Micro Devices [AMD]" "K8 [Athlon64/Opteron] Miscellaneous Control" "" ""
01:00.0 "VGA compatible controller" "Silicon Integrated Systems [SiS]" "661/741/760 PCI/AGP or 662/761Gx PCIE VGA Display Adapter" "Hewlett-Packard Company" "Device 2a06"
02:00.0 "VGA compatible controller" "nVidia Corporation" "G98 [GeForce 8400 GS]" -ra1 "Unknown vendor 19f1" "Device 0a5e"


Looks like there is a problem connecting to the disks since the kernel is stepping though various connection speeds fast so slow. So are yo using really old disks here. I think you mentioned in another post that you had flat cables this indicates PATA interface not SATA. So it may be slow either the controller or the disks is running at a slow speed and the kernel must step through all the possible higher speeds first to lock into the actual speed this takes time since the kernel must wait on the hardware response and finally time out before going to the next. This is happening for each partition. This could be a cable problem. Since if I recall you have 2 drives is one set for master and the other as slave? This is done with a jumper on the back in most cases.

Both are set to cable select.

And yes, both are PATA. One of the hard disks is pretty old and the other I just bought is a western digital WD3200JBRTL.

So is there anything I can do to fix this?

(NOTE: this did not happen with openSUSE 11.2 and my last HD, which was also PATA)

It’s so long since I used these old drives. But I’m sure mine were both set to Master
(Possibly Master and Cable Select)

Are you the 6tr6tr of this thread?

Hard drive failure is imminent! How do I back everything up, including what repos/packages, etc?

If so, are any of

  • the disk
  • the data cable
  • the power cable
  • the motherboard

common with the same items in that thread?

And, I have to check there isn’t something obvious that we can’t see. It sounds a bit like (not necessarily the case, but we need to get it out of the way) the computer could be performing an fsck at every start-up. Providing that there is no reason for suspecting an unclean power-off (you are switching off from menus/using shutdown - h now, aren’t you), then it sounds as if something could be getting corrupted, causing an fsck.

Does anything untoward show up in 'dmesg;?

What partition formats are you using? Primarily, are they journalling (ext3, ext4 probably)? Can you post /etc/fstab, and/or the output from df please?

And, on a very primitive level, when the computer is ‘stalled’ does the hard disk light come on (flash/flicker/on continuously)?

00:00.0 "Host bridge" "Silicon Integrated Systems [SiS]" "760/M760 Host" -r02 "Hewlett-Packard Company" "Device 2a04"
00:01.0 "PCI bridge" "Silicon Integrated Systems [SiS]" "SG86C202" "" ""
00:02.0 "ISA bridge" "Silicon Integrated Systems [SiS]" "SiS964 [MuTIOL Media IO]" -r36 "" ""
00:02.5 "IDE interface" "Silicon Integrated Systems [SiS]" "5513 [IDE]" -r01 -p80 "Hewlett-Packard Company" "Device 2a04"
00:02.7 "Multimedia audio controller" "Silicon Integrated Systems [SiS]" "AC'97 Sound Controller" -ra0 "Hewlett-Packard Company" "Device 2a05"
........
........
01:00.0 "VGA compatible controller" "Silicon Integrated Systems [SiS]" "661/741/760 PCI/AGP or 662/761Gx PCIE VGA Display Adapter" "Hewlett-Packard Company" "Device 2a06"
02:00.0 "VGA compatible controller" "nVidia Corporation" "G98 [GeForce 8400 GS]" -ra1 "Unknown vendor 19f1" "Device 0a5e"

Back in the day (when SiS still had a presence in the consumer motherboard chipset market) SiS chipsets were always alleged to be buggy (although how you could tell between ‘buggy’ and ‘supported by buggy drivers’, i don’t know). running any of the accelerated 3d video drivers can be an ‘interesting’ undertaking, and you look to have a choice between SiS on-board video and an nVidea video card…you do like an interesting life, don’t you?

I see a new post

Both are set to cable select.

Both are on a single motherboard connector/cable?

This probably ought to work, but just in case (and, assuming that both are on the same cable), try setting one to master and the other one to slave.

Yes. I have one new hard drive (replacing the failing one) but everything else is the same.

I have no idea. I don’t know much about dmesg. What should I be looking/grep-ing for?

The /, swap and /home are all ext4 (and swap). The other HD is ext3.

Hmmm…not sure. It’s a desktop so I’m not even sure where it wouls have that light, but I’ll try to look.

Yes.

OK, I’ll try that. But to do that I’ll have to remove the HD’s from the computer to find out what pin setting that is.

dmesg (try typing it at the command line even when you don’t suspect anything is going wrong) gives several pages of output; I’d try looking through it all, there will be a lot of ‘found this core, found that core setting up this subsystem, doing this with USB, doing that with ethernet’ kind of messages, which are mostly ‘noise’ at this point. In this situation, anything that it says about disks is potentially interesting, although some of it will be just ‘normal’.

I was going to post mine, but its full of rubbish about wireless channels and only goes back to my last suspend.

You could grep on ‘warn’ or ‘fail’ on either dmesg or any log and that might be interesting, but you might also miss something, so I’d try looking manually, in the first instance, unless and until there seems to be something that suggest it needs watching more closely.