openSUSE 11.1, intermittent hangs.

I’m running openSUSE 11.1, kernel 2.6.27.21-0.1-pae, and I’m getting intermittent hangs. These hangs seem to be getting more frequent, which made me think hardware. But I’ve run memory tests for hours and not had any failures. Also, the machine runs reliably under Windows XP.

I can avoid the hangs by booting with the “acpi=off” option, but one of the nvidia modules won’t load, which means no KDM desktop.

Can anyone give me some tips on tracking down the cause of the hangs?

Motherboard is GA-MA69G-S3H, CPU is AMD Athlon™ 64 X2 Dual Core Processor 4800+, BIOS settings are conservative/auto.

I’ve tried upgrading to OpenSUSE 11.3, 64 bit, but that kernel (2.6.34) refuses to boot (general protection fault, followed by a kernel panic, “tried to kill init”). With failsafe kernel options I managed to get that to boot, and installation started on an empty partition, but it crashes again as soon as it tries to kexec in the installed kernel.

Cheers,

Matt

Did you try 11.2?

I’ve tried upgrading to OpenSUSE 11.3, 64 bit, but that kernel (2.6.34) refuses to boot (general protection fault, followed by a kernel panic, “tried to kill init”). With failsafe kernel options I managed to get that to boot, and installation started on an empty partition, but it crashes again as soon as it tries to kexec in the installed kernel.
This sounds crazy to me when your PC works OK with openSUSE version 11.1.

Two things I would ask, when was the last time you cleaned out the PC case of dust and dirt? And, have you tried booting this PC from a LiveCD yet?

Any computer, older than a year, should be cleaned out. I include the reseating of all cards, memory modules and plugs. I normally purchase a couple of cans of duster spray to blow out the CPU heatsink, Video card heatsink, Power Supply, CDROM/DVDROM/FLOPPY drives and all case fans. I normally disconnect all cables and take the PC outside where there is lots of light to see inside and the dust you blow out stays outside.

As for the LiveCD, if this does not work then installing openSUSE is not likely to work any better though the exception is video and the need to add the kernel load option “nomodeset”. Before any kernel is loaded, you have the option to type in kernel options and using the nomodeset command can help on certain types of video hardware when using openSUSE 11.3. If using openSUSE 11.2 and it works, the nomodeset command is surely required for openSUSE 11.3 when it does not work.

Thank You,

Guys, thanks for your advice.

I don’t think it’s overheating or dust. I do clean out my PC from time to time, and I also did it recently, including blowing out the DIMM slots, etc., and also reseated my CPU heatsink, no luck. I have also tried 11.2, and I think that refused to boot either, but I don’t remember for sure.

I looked out the 11.2 DVD, and was going to try it again, but just tonight my PC has started locking up even with acpi=off. So I’m really inclined to think I have some bizarre HW failure that Linux tickles, but not Windows, and I think I’ll order a new motherboard. Not yet sure if I’ll get the same one again, or try my luck with something else.

The live CD boot is a good idea - it would rule out anything to do with my disks. I’ll give that a go too.

-M

On 11/27/2010 04:36 PM, mattt-sgi wrote:
> I looked out the 11.2 DVD, and was going to try it again, but just
> tonight my PC has started locking up even with acpi=off. So I’m really
> inclined to think I have some bizarre HW failure that Linux tickles, but
> not Windows, and I think I’ll order a new motherboard. Not yet sure if
> I’ll get the same one again, or try my luck with something else.

I don’t see any indication that you have run memtest86+. Linux uses all of
memory, whereas Windows tends to use low memory and not touch the high stuff.
You need to run a longish test - overnight is good. The test can be found onh
any opf the installation media.

From post #1: “…I’ve run memory tests for hours and not had any failures.”

That was about 4 passes through memtest86+, with no errors.

However, I’ve given the memtest another go, and now it is showing errors, and quite a lot. So, looks like this is definitely hardware. Now I need to work out if it is a DIMM or something else. New motherboard is on order, so I’ll try that first.

Thanks for your help.

A suggestion I got from ASUS some years ago:

Hardware is seldom really broken. Many times disassembly, cleaning of contact points, reseating modules is enough.
I’m still glad about that. On numerous occasions, even where RAM reported corruption, cleaning and reseating brought things back to normal, sometimes for years.

For anyone actually reading this thread in the future: Situations where a
software failure causes Linux to freeze and/or hang are rare. The kernel will
either recover with appropriate messages in the logs, or it will panic if no
recovery is possible. You will know this situation by the caps and numb lock
lights flashing at 1 Hz. A memory failure may generate the above; however, it is
just as likely to cause a situation where the CPU gets code that causes it to lock.

No, the fact that Windows works just fine does not mean your memory is OK. For
systems with a lot of memory, Windows wastes it and never uses it. Linux uses
all of memory. It it is not needed for code or user data, then it is used as
disk cache. The difference in the way memory is used is one of the reasons that
a box that can barely run XP is very snappy with Linux, particularly if a
light-weight desktop is used.

The thing to take away is that if you are getting unexplained crashes with no
trace in the dmesg output, run memtest86+. A few passes are not sufficient! I
always run it at least 12 hours for 1-2 GB of RAM.