dmesg flood

After installing openSuse 11.1 I got endless repeats of these lines

irq 179, desc: c052a700, depth: 1, count: 0, unhandled: 0
->handle_irq(): c0160b24, handle_bad_irq+0x0/0x1e8
->chip(): c050991c, no_irq_chip+0x0/0x40
->action(): 00000000
IRQ_DISABLED set
unexpected IRQ trap at vector b3

in dmesg and /var/log/messages.

After a bit googling I found out that is has something to do with interrupts but it didnt got me stopping this flood.

cat /proc/interrupts

       CPU0       CPU1

0: 84 0 IO-APIC-edge timer
1: 11799 0 IO-APIC-edge i8042
3: 4 0 IO-APIC-edge
4: 4 0 IO-APIC-edge
8: 0 0 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
14: 390128 0 IO-APIC-edge pata_via
15: 0 0 IO-APIC-edge pata_via
17: 345631 0 IO-APIC-fasteoi HDA Intel
19: 3 0 IO-APIC-fasteoi ohci1394
20: 0 0 IO-APIC-fasteoi uhci_hcd:usb2
21: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
22: 1477086 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb3
23: 0 0 IO-APIC-fasteoi uhci_hcd:usb5
24: 1643393 0 IO-APIC-fasteoi nvidia
28: 0 0 IO-APIC-fasteoi eth0
179: 8715 0 none-<NULL>
187: 2 0 none-<NULL>
219: 511616 0 PCI-MSI-edge ahci
220: 0 0 PCI-MSI-edge aerdrv
221: 0 0 PCI-MSI-edge aerdrv
222: 0 0 PCI-MSI-edge aerdrv
223: 0 0 PCI-MSI-edge aerdrv
NMI: 0 0 Non-maskable interrupts
LOC: 2499147 2295708 Local timer interrupts
RES: 53977 67039 Rescheduling interrupts
CAL: 15527 25190 function call interrupts
TLB: 7188 10005 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0

Interrupt 179 points to a none-<NULL> entree.
One more thing maybe worth to mention is that my Intel pro 1000 PT card is not functioning, but I’ve never got that device going on any Linux distro.These massages start appearing after install openSUSE 11.1.

Any ideas on how fix this, or at least stop the flooding?

Thanks,
Andy

A bit more info on the subject:

I tried pci=noacpi at boot. This stopped the messages, but X wouldnt start anymore. Rebooting without this parameter brought me back to where I was before.

I also looked at the logs more closely and it seems that the messages start appearing after ifup eth0. eth0 is the non-functioning Intel pro pt 1000 board.

If I somehow could point the the kernel from irq 179 to irq 28, perhaps both problems would go away.
Or am I completely on the wrong track here…

Presuming you can blacklist the ethernet module…

Though I found this is this still relavent I notice they say to blacklist a module. I’m guessing and wildly but is it perhaps the module you’re blacklisting to fix as this page that might be causing it.

Network Connectivity - Linux* Base Driver overview and installation

Perhaps it’ll help.

I’m not sure it its the same thing you’re saying, but

rmmod e1000e

doesnt make the messages go away.
So I doubt blacklisting this module will solve it

are you using this Intel pro 1000? If not, why not disable it in the BIOS?

Its where my ethernet cable is plugged into. I have a 100MB ethernet onboard, that one is disabled in BIOS. My entire home network is GB. Now I use the onboard wlan when I use opensuse.

I sometimes need to boot to Windows, where the Intel pro 1000 works fine…

I could (although not comfy) live with the fact that I’ll never use the Intel ethernet card with opensuse, its that log flooding that bugs me.

hmmm ,the noapic boot option got the Intel pro 1000 ethernet card working. I have no idea what the downside is of this boot option, I hope I’ll never find out.

The message flood is still happening. Its a different irq now (27), but still pointing at an unhandled entry.

cat /proc/interrupts now shows:

      CPU0       CPU1

0: 74 0 XT-PIC-XT timer
1: 1619 0 XT-PIC-XT i8042
2: 0 0 XT-PIC-XT cascade
3: 2 0 XT-PIC-XT
4: 1 0 XT-PIC-XT
7: 5442 0 XT-PIC-XT uhci_hcd:usb2, ehci_hcd:usb5
8: 0 0 XT-PIC-XT rtc0
9: 0 0 XT-PIC-XT acpi
10: 16725 0 XT-PIC-XT nvidia, eth0
11: 740 0 XT-PIC-XT uhci_hcd:usb1, uhci_hcd:usb3, uhci_hcd:usb4, ohci1394, HDA Intel
14: 5864 0 XT-PIC-XT pata_via
15: 0 0 XT-PIC-XT pata_via
27: 1409 0 none-<NULL>
35: 2 0 none-<NULL>
219: 58503 0 PCI-MSI-edge ahci
220: 0 0 PCI-MSI-edge aerdrv
221: 0 0 PCI-MSI-edge aerdrv
222: 0 0 PCI-MSI-edge aerdrv
223: 0 0 PCI-MSI-edge aerdrv
NMI: 0 0 Non-maskable interrupts
LOC: 27148 22093 Local timer interrupts
RES: 1442 2087 Rescheduling interrupts
CAL: 462 554 function call interrupts
TLB: 440 925 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0

Multiple devices on one irq… must related with the noapic option

Still any hints on how to stop the message flood is very much appreciated

I found this but I’m not sure it is the multiple IRQ spamming the logs Gentoo Linux Documentation – Linux hardware stability guide, Part 2

Thanks for the link, It was good reading.
It pointed me to a new thing I could try: reshuffeling irq’s but I didnt have much luck with that. My mb won’t let me assign irq’s to devices. It didnt make any difference to the message flooding.

I did found out that a vga boot (vga= boot option) is not possible either. As soon as I enter a value that is in the VESA range, I get the messages that flood my logs on screen and the boot stalls.
This would perhaps suggest something is wrong with my grapics cars, but its running nicely 3d accelerated my KDE 4.1 desktop.

Still no significant progress :frowning:

Does cat /proc/irq/27 give you any hint as to what it is.

The thing I did notice you are sharing Nvidia with eth0 perhaps passing parameters is triggering something.

The problem I was encountering was searching for none-<NULL> it keeps stripping out <> which isn’t bringing back an exact hit.

Think you have an irq conflict with nvidia and eth0(though only when you use No Advanced Programmable Interrupt Controllers) and a separate issue that means a device is spamming and card isn’t working with apic.

From the little googling noapic is generally used with buggy or out of spec, ACPI firmware which can cause any number of wackiness. Which kind of leads to updating bios, mixed with the gentoo page would imply it certainly is the hardware. Perhaps over here you’ll get better support(though not knowing what none-<NULL> is doesn’t help).

Support Community

irq 27 doesnt really exist:

# l /proc/irq
totaal 0
dr-xr-xr-x  24 root root 0 apr 22 10:19 ./
dr-xr-xr-x 140 root root 0 apr 22  2009 ../
dr-xr-xr-x   2 root root 0 apr 22 10:19 0/
dr-xr-xr-x   3 root root 0 apr 22 10:19 1/
dr-xr-xr-x   4 root root 0 apr 22 10:19 10/
dr-xr-xr-x   6 root root 0 apr 22 10:19 11/
dr-xr-xr-x   2 root root 0 apr 22 10:19 12/
dr-xr-xr-x   2 root root 0 apr 22 10:19 13/
dr-xr-xr-x   3 root root 0 apr 22 10:19 14/
dr-xr-xr-x   3 root root 0 apr 22 10:19 15/
dr-xr-xr-x   2 root root 0 apr 22 10:19 2/
dr-xr-xr-x   2 root root 0 apr 22 10:19 218/
dr-xr-xr-x   3 root root 0 apr 22 10:19 219/
dr-xr-xr-x   3 root root 0 apr 22 10:19 220/
dr-xr-xr-x   3 root root 0 apr 22 10:19 221/
dr-xr-xr-x   3 root root 0 apr 22 10:19 222/
dr-xr-xr-x   3 root root 0 apr 22 10:19 223/
dr-xr-xr-x   2 root root 0 apr 22 10:19 3/
dr-xr-xr-x   3 root root 0 apr 22 10:19 4/
dr-xr-xr-x   2 root root 0 apr 22 10:19 5/
dr-xr-xr-x   2 root root 0 apr 22 10:19 6/
dr-xr-xr-x   4 root root 0 apr 22 10:19 7/
dr-xr-xr-x   3 root root 0 apr 22 10:19 8/
dr-xr-xr-x   3 root root 0 apr 22 10:19 9/
-rw-------   1 root root 0 apr 22 10:19 default_smp_affinity


so that won’t get me any further.
I’m about to pull out the ethernet card to see if the flooding stops and if I’m able to boot in vga modus.
Posting the results later.

Removing the Intel pro 1000 pt card from the mb didn’t make any difference. The messages keep flooding my logs and a vga boot still isn’t possible.

Funny detail: suspend to RAM with WinXP gives me a broken networklink after wakeup. openSuse doesn’t have this problem.

My next step will be to try and find out if there are any BIOS upgrades available for this machine. If not, I’m out of options.

Have you done a search on the board and its chipsets?

Do you have any other things in the PCI’s that might be causing it? Think you confirmed it isn’t the card though it does have its issues but not the spamming.

Real shame can’t find out what the error msg is about. I’m no coder think at this point it may be one for the src code to see what triggers the error.

The only other card I have in the pci slots (besides the ethernet cards) is the nvidia graphics card. (pci-express x16). I have no other graphics cards to exchange it with, and checking if the flood goes away without a graphics cards is a bit hard.

I did succeed in finding a new firmware for the BIOS. It’s one from august 2007, and I’m pretty sure its the latest one.
Finding a way to flash it was much harder (needed DOS boot) but I managed to flash it in the end.
The new BIOS did get me some new features, but nothing like controling irq’s and it didn’t change anything in logs or other behaviour.
I have still 2 unmanaged irq’s in the list, which the kernel complains about on avg 6 times per minute.

I don’t want to alter the syslogd rules on logging kernel messages, but I’m out of options and idea’s:(

I’m only left with bugzilla or kernel lists.

Though beyond the spamming you don’t seem to be noticing much, may get that kind of response. Or prop software can’t fix, but nothing left to try at least they should have the skills to at least track it down hopefully.

Edit
One last thing how about grep in dmesg does that give you any light.

Hi
There is no option for ESCD? If there is enable this and after reboot
it will go back to disabled. Also is there an option for PnP OS, try
with this disabled.


Cheers Malcolm °¿° (Linux Counter #276890)
openSUSE 11.1 (i586) Kernel 2.6.27.21-0.1-pae
up 19:18, 1 user, load average: 0.32, 0.41, 0.36
ASUS eeePC 1000HE ATOM N280 1.66GHz | GPU Mobile 945GM/GMS/GME

I do recognize both options on other boards, but this BIOS doesnt have them. Its an Phoenix Award BIOS. quite limmited

Hi
I would then look at making notes of any customized BIOS settings and
the reset your BIOS and then leave the battery out for a few minutes.
You have checked the battery voltage is ok?

Then go in and reconfigure as required.


Cheers Malcolm °¿° (Linux Counter #276890)
openSUSE 11.1 x86_64 Kernel 2.6.27.21-0.1-default
up 11 days 18:24, 2 users, load average: 0.25, 0.17, 0.15
GPU GeForce 8600 GTS Silent - Driver Version: 180.44

I just flashed the BIOS a few days ago. Afterwards all settings were back to factory defaults. Don’t you think flashing a BIOS would have the same effect as removing the battery for a while?
As for the weak battery, I’dd expect the first signs would be a slow clock or a loss of settings. Nothing like that happend.

I am at a fase where I’ll try just about anything, so I’ll consider removing the battery.

I give up!

It might not be the bravest thing to do, but there’s just too little information available on this problem anywhere. It might be too rare and I’m just a lucky guy who got this message flood, but my guess its more an open end like:

“Hmmm, interrupt went off and no device or module has subscribed to this irq. Let’s log it and ignore it.”

If anyone wants to try Kerneltrap or simular: go for it, but expect replies like “Its not the kernels fault, you have a crappy BIOS”

I got the messages out of /var/log/messages by adjusting the /etc/syslog-ng/syslog-ng.conf file. If you want the same, you can try this:

In the Filter definition section of /etc/syslog-ng/syslog-ng.conf add:



filter f_irqtrap_a  { facility(kern) and match("unexpected IRQ trap"); };
filter f_irqtrap_b  { facility(kern) and match("irq") and match("desc") and match("depth"); };
filter f_irqtrap_c  { facility(kern) and match("->handle_irq"); };
filter f_irqtrap_d  { facility(kern) and match("->chip()") and match("no_irq_chip"); };
filter f_irqtrap_e  { facility(kern) and match("->action()"); };
filter f_irqtrap_f  { facility(kern) and match("IRQ_DISABLED set"); };


…and modify the line

filter f_messages   { not facility(news, mail) and not filter(f_iptables); };


to

filter f_messages   { not facility(news, mail) and not filter(f_iptables) and not filter(f_irqtrap_a) and not filter(f_irqtrap_b) and not filter(f_irqtrap_c) and not filter(f_irqtrap_d) and not filter(f_irqtrap_e) and not filter(f_irqtrap_f); };

It probably could be done in a shorter way, but this works… Be aware that this doesnt solve the problem, it just hides the signals of the problem.
The flooding has stopped, but only in messages. dmesg is still flooded, so thats not usable anymore.

I’ll keep checking if someone is more patient then me and did find a proper solution to this problem.

Thanks to all that tried to help.

Andy