Kernel panic

Hi,
I recently installed OpenSUSE 11.1 and my computer always freezes after it has been running for ~24h. I cannot immediately reboot after it froze - the boot process stops with a kernel panic (see below for full error message). After waiting for ~1h I can boot again normally.
I thought this was a heat-related hardware problem, so I tried out the tools StressCPU 2.0 and Stress 1.0 to see wether I have overheating problems - but both tools run fine for hours (even when I set all fans to “silent” instead of “standard”).
I also tried out Memtest86+ and SeaTools to see wether there are problems with my RAM / hard-drive, again nothing…

This is the full error message I get, when I immediately reboot after OpenSUSE froze (I replaced some hexadecimal parts of the message by “…” because it was quite long):

Console: switching to colour frame buffer device 156x60
fb0: VESA VGA frame buffer device
general protection fault: 0000 [1] SMP
last sysfs file:
CPU 3
Modules linked in:
Supported: Yes
Pid: 1, comm: swapper Not tainted 2.6.27.21-0.1-default #1
RIP: 0010:<ffffffff802c3e78>] <ffffffff802c3e78>] find_inode+0x47/0x6c
RSP: 0018:ffff88022f12dc20 EFLAGS: 00010206
RAX: 00000000000000eba RBX: 04000000000000000 RCX:
RDX: …
RBP: …
R10: …
R13: …
FS: …
CS: …
CR2: …
DR0: …
DR3: …
Process swapper (pid: 1, threadinfo ffff88022f12c000, task ffff88022f12a040)
Stack: …
Call Trace:
… ifind+0x34/0x8a
… sysfs_addrm_start+0x3f/0x97
… sysfs_add_file_mode+0x43/0x7f
… device_add+0x143/0x472
… device_create_vargs+0x9a/0xc6
… device_create+0x2c/0x34
… tty_register_device+0xd3/0xde
… tty_register_driver+0x1eb/0x209
… vty_init+0xed/0x10d
… tty_init+0x1d2/0x1d6
… _stext+0x41/0x110
… kernel_init+0x9b/0xea
… child_rip+0xa/0x11

Code: 1e eb 23 4c 89 e6 48 89 df 41 ff d5 85 c0 74 13 f6 830802 00 00 70 74 28 48 89 df e8 61 fe ff ff eb db 48 8b 1b 48 85 db 74 14 <4c> 39 bb f8 00 00 00 48 8b 03 48 89 dd 0f 18 08 75 e6 eb eb c4 31
RIP <ffffffff802c3e78>] find_inode+0x47/0x6c
RSP <ffff88022f12dc20>
— end trace 63e114fe32182d78 ]—
Kernel panic - not syncing: Attempted to kill init!

I am really confused by this problem, any help would be greatly appreciated.

you can try to trouble shoot why it won’t boot smoothly until you wait
an hour, if you wish…

personally, i think it would be better to figure out why it freezes in
the first place, fix that, and avoid having to reboot…

since the most frequent cause of a kernel panic is defective or
incompatible RAM, i’d guess you do have memory problem and that is the
cause of both the freeze AND the reboot symptoms…

how long did you run Memtest86+??
since your freeze occurs about a day after boot, let it run 24 hours
AT LEAST…


deConficter

As deConficter suggested, you definitely have memory related problem. It may not mean that you have to replace your memory chips. It could be something in the address-decoding circuitry. If that is the problem, it can not get solved by just replacing memory chips. The problem may not recur until one of your programs write to some specific area in the memory.

As suggested by deConficter, you have to run the memory test for longer periods.

Typically memory testing programs write some pattern to certain area of the memory and then read them back to compare. However, such methodology can not be used when your have problems with address decoding circuitry. The written content will really go to some other area of the memory (due to wrongly decoded memory location) and when it is read back, the program will get the correct pattern back. Some memory testing programs do a test called “walk-through” test to find such errors.

I ran Memtest86+ for ~5 hours (three complete passes). I´ll let it run again for at least a day and see what happens.

@syampillai:
“It may not mean that you have to replace your memory chips. It could be something in the address-decoding circuitry. If that is the problem, it can not get solved by just replacing memory chips.”

Is there a possibility to find out wether the problem is not with the memory but with the address-decoding circuitry ? And what would I have to do when I know that this is the problem - would this mean that my mainboard is defective ?

@syampillai:
“The written content will really go to some other area of the memory (due to wrongly decoded memory location) and when it is read back, the program will get the correct pattern back. Some memory testing programs do a test called “walk-through” test to find such errors.”
Thanks for the help. Do you know a free tool that can do such a test ?

Memtest86+ tool is very good for this purpose. Have you gone through its documentation and done all the tests? It has got 10 different tests.