Hi,
I recently installed OpenSUSE 11.1 and my computer always freezes after it has been running for ~24h. I cannot immediately reboot after it froze - the boot process stops with a kernel panic (see below for full error message). After waiting for ~1h I can boot again normally.
I thought this was a heat-related hardware problem, so I tried out the tools StressCPU 2.0 and Stress 1.0 to see wether I have overheating problems - but both tools run fine for hours (even when I set all fans to “silent” instead of “standard”).
I also tried out Memtest86+ and SeaTools to see wether there are problems with my RAM / hard-drive, again nothing…
This is the full error message I get, when I immediately reboot after OpenSUSE froze (I replaced some hexadecimal parts of the message by “…” because it was quite long):
you can try to trouble shoot why it won’t boot smoothly until you wait
an hour, if you wish…
personally, i think it would be better to figure out why it freezes in
the first place, fix that, and avoid having to reboot…
since the most frequent cause of a kernel panic is defective or
incompatible RAM, i’d guess you do have memory problem and that is the
cause of both the freeze AND the reboot symptoms…
how long did you run Memtest86+??
since your freeze occurs about a day after boot, let it run 24 hours
AT LEAST…
As deConficter suggested, you definitely have memory related problem. It may not mean that you have to replace your memory chips. It could be something in the address-decoding circuitry. If that is the problem, it can not get solved by just replacing memory chips. The problem may not recur until one of your programs write to some specific area in the memory.
As suggested by deConficter, you have to run the memory test for longer periods.
Typically memory testing programs write some pattern to certain area of the memory and then read them back to compare. However, such methodology can not be used when your have problems with address decoding circuitry. The written content will really go to some other area of the memory (due to wrongly decoded memory location) and when it is read back, the program will get the correct pattern back. Some memory testing programs do a test called “walk-through” test to find such errors.
I ran Memtest86+ for ~5 hours (three complete passes). I´ll let it run again for at least a day and see what happens.
@syampillai:
“It may not mean that you have to replace your memory chips. It could be something in the address-decoding circuitry. If that is the problem, it can not get solved by just replacing memory chips.”
Is there a possibility to find out wether the problem is not with the memory but with the address-decoding circuitry ? And what would I have to do when I know that this is the problem - would this mean that my mainboard is defective ?
@syampillai:
“The written content will really go to some other area of the memory (due to wrongly decoded memory location) and when it is read back, the program will get the correct pattern back. Some memory testing programs do a test called “walk-through” test to find such errors.”
Thanks for the help. Do you know a free tool that can do such a test ?