device [8086:340a] error status

Hi.

Recently I’ve been experiencing random lock-ups with my openSUSE 11.4 GNU/Linux box. dmesg is filled with messages like this:

[38713.710597] pcieport 0000:00:03.0: device [8086:340a] error status/mask=00000001/00002000
[38713.710600] pcieport 0000:00:03.0: 0] Receiver Error (First)
[38713.712515] pcieport 0000:00:03.0: AER: Corrected error received: id=0018
[38713.712523] pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0018(Receiver ID)

This are continuously generated and obviously, the word “error” in there doesn’t indicate a happy situation.

I looked up device 8086:340a on the PCI registry and found:

5520/5500/X58 I/O Hub PCI Express Root Port 3 (PCI Devices)

I’m not sure what this means in terms of what to do about it.

System info:

Linux 2.6.37.6-0.9-desktop i686
KDE 4.6.00 (4.6.0) “release 6”
NVIDIA GeForce GT 240
2D driver nvidia
3D driver NVIDIA 275.21

I haven’t had this issue in the past and as for updating, I’ve been patching the system since day one of the 11.4 install.

Thanks!

More info…

failsafe mode works

The random crashing happens after GUI log into KDE.

I’m beginning to suspect the kernel and NVidia drivers… (I downloaded the just released NVidia drivers today, btw.)

More info…

I can boot to the GUI login screen.

Session Type: IceWM, TWM, Failsafe works, dmesg is normal, no pcieport errors

Session Type: Default, KDE, “KDE Plasma Workspace”, “KDE Plasma Workspace (failsafe session)” hangs as the KDE desktop loads, while displaying the progress bar

Under IceWM or TWM, if I execute firefox, konqueror, for example, the programs start up and moments later, the pcieport errors start showing up in dmesg and the system hangs.

I tried installing the default version of the kernel, NVidia driver and preloader with the same result.

More info…

After seeing all the postings about KDE4 and NVidia drivers, I’m leaning more toward that the NVidia drivers are the root cause of the lock-ups. In my research, I’ve tried:

  1. nomodeset in /boot/grub/menu.lst - no affect
  2. NO_KMS_IN_INITRD=yes in Yast->System->/etc/sysconfig->System/Kernel - no affect
  3. Enabled=false in ~/.kde4/share/config/kwinrc, [Compositing] - allows KDE desktop to load, but executing firefox, konqueror results in instant lock

Any tips on how to revert the NVidia driver or switch to nouveau or nv easily?

You could try to downgrade the nvidia driver from Yast > Software manager, if there is an older version there, or remove it in the same way. Take a look at the excellent tutorials from oldcpu here for pre-installation, annoying issues and video drivers.

I’ve made “some” headway…

  1. nomodeset in /boot/grub/menu.lst
  2. NO_KMS_IN_INITRD=yes in Yast->System->/etc/sysconfig->System/Kernel
  3. Enabled=false in ~/.kde4/share/config/kwinrc, [Compositing]
  4. installed the latest NVIDIA driver, 290.10, the “hard way”
  5. blacklisted nouveau in /etc/modprobe.d/50-blacklist.conf

I can get into KDE4 without the instant crash, but after some random period of time, the screen goes blank and I lose keyboard and mouse controls. Sometimes, just launching firefox will result in a blank screen and loss of keyboard and mouse controls.

Any ideas or further debug methods?

P.S. Thanks brunomcl! I think I’ve maxed out all the NVIDIA related fixes…

Do these error messages in /var/log/message have any meaning to anyone? (I’m only posting keywords, not the exact error messages since I’m transcribing them to a netbook. The desktop gets hosed when I start up firefox, konqueror, etc.)

NVRM: GPU at 0000:0d:00.0 has fallen off the bus.
irq 16: nobody cared (try booting with the “irqpoll” option)
Pid: 1272, comm: Xorg Tainted: P 2.6.37.6-0.9-desktop #1
Call Trace:
try_stack_unwind
dump_trace
show_trace_log_lvl
show_trace
dump_stack
__report_bad_irq
note_interrupt
handle_fasteoi_irq
handle_irq
handlers:
(usb_hcd_irq)
(nv_kern_isr)
Disabling IRQ #16

I’ve looked up “NVRM GPU has fallen off the bus” and the issue does appear NVidia card/driver related, but no threads on the net have turned up a solution.

A side effect of this problem:

/var/log/messages and /var/log/warn gets large very fast with pcieport errors, which eventually triggers the system to bzip the files and create new ones. Once /var/log fills with these files, it’s not possible to log in via the GUI because / is full (resolved by logging into runlevel 3 and deleting these messages* and warn* files).

I’ve tried these versions of the Nvidia drivers, installing them the “hard way”:

NVIDIA-Linux-x86-270.41.19.run
NVIDIA-Linux-x86-275.21.run
NVIDIA-Linux-x86-280.13.run
NVIDIA-Linux-x86-290.10.run

and the system is still unstable, freezing shortly after firefox or glxgears runs. /var/log/messages will sometimes display pcieport error messages and/or “NVRM: GPU … has fallen off the bus”.

I’ve tried these kernel /boot/grub/menu.lst flags in various combinations without any luck (this is after searches on “Disabled IRQ 16”:

noirqdebug
pci=nomsi
pci=noaer
pci=routeirq

Well, the problem is “solved”, though the solution shouldn’t be what it was, which is basically moving the NVidia graphics card to another PCIe slot. (I only tried this after reading on another tread of someone having freezing issues due to a faulty cable interconnecting multiple NVidia cards. I’m only using one card, so this solution didn’t match my situation, but it was worth a try and it did lead to a stable system.)

The possible change that is keeping things stable: cat /proc/interrupts shows IRQ16 with nvidia and usb3 instead of nvidia and usb2. (Perhaps I have something on usb2 that conflicts with the nvidia driver?)

System summary:

Linux 2.6.37.6-0.9-desktop i686
KDE 4.6.00 (4.6.0) “release 6”
NVIDIA GeForce GT 240
2D driver nvidia
3D driver NVIDIA 290.10 (latest driver installed via the “hard way”)

  1. nomodeset in /boot/grub/menu.lst
  2. NO_KMS_IN_INITRD=yes in Yast->System->/etc/sysconfig->System/Kernel
  3. Enabled=true in ~/.kde4/share/config/kwinrc, [Compositing] section, though XRender is being used instead of OpenGL (as OpenGL is not selectable)
  4. “blacklisted nouveau” NOT in /etc/modprobe.d/50-blacklist.conf