OpenSUSE 11 and IDE lockups

Hello!

I installed OpenSUSE 11 a few weeks ago and have some problems with it regarding ATA/SATA device handling. I have an AMD Athlon 64 X2 3800+ CPU on Asus M2V motherboard (VIA VT8237A south bridge), 1 ATA HDD, 1 ATA DVD-RW and 1 SATA HDD.

My minor problem is that all disk activity results in 100% utilization of 1 CPU core. Using UDMA5 for the ATA HDD and SATA, I’d expect almost no CPU utilization - which is the case under WinXP.

The major problem is that after some time I cannot access either my SATA HDD or my ATA DVD-RW at all. Sometimes this happens during reading/writing large amount of data, other times it happens without any reason (e.g. 5 minutes after boot this problem happens without any disk activity). When this happens one core’s utilization is continously at 100%. I have to restart the pc to be able to access the disk again. In the system log I can find the below error messages (see end of post).
After googling for similar problems I’ve found and tried the below things to no avail:

  • disable smartd
  • disable and uninstall beagle

What seems to work is disabling APIC. If I start OpenSUSE with noapic kernel boot parameter, the problem doesn’t happen, I can use my pc for hours. If I start the system without this parameter, the problem happens within 5-30 minutes for sure.
Needless to say, I have no problems under WinXP, I can use my pc for 10-16 hours without any problems.

Any ideas or suggestions?

My suspect is libata module. This is the first time I use OpenSUSE, but I used Linux for years. My previous distro had an older kernel and didn’t use libata, and I didn’t have any problems on this very same hardware.


Jul 17 18:15:17 macisuse kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jul 17 18:15:17 macisuse kernel: ata3.00: cmd 25/00:08:2d:2b:37/00:00:21:00:00/e0 tag 0 dma 4096 in
Jul 17 18:15:17 macisuse kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 17 18:15:17 macisuse kernel: ata3.00: status: { DRDY }
Jul 17 18:15:17 macisuse kernel: ata3: soft resetting link
Jul 17 18:15:41 macisuse su: (to root) maci on /dev/pts/3
Jul 17 18:15:48 macisuse kernel: ata3.00: qc timeout (cmd 0x27)
Jul 17 18:15:48 macisuse kernel: ata3.00: failed to read native max address (err_mask=0x4)
Jul 17 18:15:48 macisuse kernel: ata3.00: revalidation failed (errno=-5)
Jul 17 18:15:48 macisuse kernel: ata3: failed to recover some devices, retrying in 5 secs
... above rows repeated several times ...
Jul 17 18:16:59 macisuse kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Jul 17 18:16:59 macisuse kernel: end_request: I/O error, dev sdb, sector 557263661
Jul 17 18:16:59 macisuse kernel: EXT3-fs error (device sdb6): ext3_get_inode_loc: unable to read inode block - inode=26509313, block=53018626
Jul 17 18:16:59 macisuse kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Jul 17 18:16:59 macisuse kernel: end_request: I/O error, dev sdb, sector 133114653
Jul 17 18:16:59 macisuse kernel: Buffer I/O error on device sdb6, logical block 0
Jul 17 18:16:59 macisuse kernel: lost page write due to I/O error on sdb6
Jul 17 18:16:59 macisuse kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Jul 17 18:16:59 macisuse kernel: end_request: I/O error, dev sdb, sector 133127221
Jul 17 18:16:59 macisuse kernel: Buffer I/O error on device sdb6, logical block 1571
Jul 17 18:16:59 macisuse kernel: lost page write due to I/O error on sdb6
Jul 17 18:16:59 macisuse kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Jul 17 18:16:59 macisuse kernel: end_request: I/O error, dev sdb, sector 135211821
Jul 17 18:16:59 macisuse kernel: EXT3-fs error (device sdb6): ext3_get_inode_loc: unable to read inode block - inode=131073, block=262146
Jul 17 18:16:59 macisuse kernel:  <c01071d9>] dump_trace+0x63/0x227
Jul 17 18:16:59 macisuse kernel:  <c0107c8a>] show_trace+0x15/0x29
Jul 17 18:16:59 macisuse kernel:  <c02e85e5>] _etext+0x5b/0x65
Jul 17 18:16:59 macisuse kernel:  <c0125759>] warn_on_slowpath+0x41/0x67
Jul 17 18:16:59 macisuse kernel:  <c0196e13>] mark_buffer_dirty+0x23/0x72
Jul 17 18:16:59 macisuse kernel:  <f96311f9>] ext3_commit_super+0x40/0x53 [ext3]
Jul 17 18:16:59 macisuse kernel:  <f96326c1>] ext3_handle_error+0x71/0x95 [ext3]
Jul 17 18:16:59 macisuse kernel:  <f9632774>] ext3_error+0x39/0x43 [ext3]
Jul 17 18:16:59 macisuse kernel:  <f962a736>] __ext3_get_inode_loc+0x293/0x2ba [ext3]
Jul 17 18:16:59 macisuse kernel:  <f962a7b4>] ext3_iget+0x57/0x324 [ext3]
Jul 17 18:16:59 macisuse kernel:  <f962fd10>] ext3_lookup+0x67/0xa2 [ext3]
Jul 17 18:16:59 macisuse kernel:  <c0180284>] do_lookup+0xa1/0x140
Jul 17 18:16:59 macisuse kernel:  <c0182256>] __link_path_walk+0x899/0xcf9
Jul 17 18:16:59 macisuse kernel:  <c0182702>] path_walk+0x4c/0x9b
Jul 17 18:16:59 macisuse kernel:  <c0182a4f>] do_path_lookup+0x181/0x1ca
Jul 17 18:16:59 macisuse kernel:  <c01832b6>] __user_walk_fd+0x2f/0x43
Jul 17 18:16:59 macisuse kernel:  <c017cbb3>] vfs_lstat_fd+0x16/0x3d
Jul 17 18:16:59 macisuse kernel:  <c017cc45>] vfs_lstat+0x11/0x13
Jul 17 18:16:59 macisuse kernel:  <c017cc5b>] sys_lstat64+0x14/0x28
Jul 17 18:16:59 macisuse kernel:  <c01059e4>] sysenter_past_esp+0x6d/0xa9
Jul 17 18:16:59 macisuse kernel:  <ffffe430>] 0xffffe430
Jul 17 18:16:59 macisuse kernel:  =======================
Jul 17 18:16:59 macisuse kernel: --- end trace 01a11084dbb38cf1 ]---
... below rows repeated several times ...
Jul 17 18:16:59 macisuse kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Jul 17 18:16:59 macisuse kernel: end_request: I/O error, dev sdb, sector 133114653
Jul 17 18:16:59 macisuse kernel: Buffer I/O error on device sdb6, logical block 0
Jul 17 18:16:59 macisuse kernel: lost page write due to I/O error on sdb6

Have you tried to boot with" hwprobe=-modules.pata"?
It’s my default response when you mix ata/sata and an Asus board.

Just tried it and it doesn’t work, the problem occured right after logging in.

If you haven’t put the latest BIOS on that Asus MB you may want to. I have yet to buy a Asus MB that didn’t need a BIOS update for SuSE 10.3 and 11.

BIOS is the latest available version, that was one of the first things I checked :slight_smile:
I think I’ll keep using “noapic” till a newer kernel (or a patch) comes out which fixes this issue.

noapic is pretty intrusive. … You could try to find a less intrusive boot code here:
SDB:Kernel Parameters for ACPI/APIC - openSUSE

If you wish to see a fix on this, IMHO you have a better chance of it happening if you write a bug report: Submitting Bug Reports - openSUSE

I have the same mainboard and the same problem with opensuse 11.0 and 10.3 not with 10.2, windows XP, ubuntu 8.04.

If you can I’d try booting off an 11.1 CD, and then using CNTRL-ALT-F2 to get a commandline to investigate the system.

It may be that kernel updates resolve the issue.

libata uses reimplementation of the PATA drivers, and is still experimental, some of them work well, others work after patches, and some have trouble in some configurations, which the non-pata_ driver doesn’t have.

It may be worth reporting the issue via Bugzilla (after seeing if there’s already reports for similar hardware), one of the paid SuSE kernel hackers is active in the libata/pata_ area, and has been recently resolving problems with some of these drivers.

Unfortunately this issue/bug exists in 11.1 too :frowning: I have to use ‘noapic’ with this release too. Submitted bug report.

The only way I have found to unlock the computer without restart is this:
rmmod -f pata_via
modprobe pata_via