openSUSE 11 freezes shortly after logon in a console

I am experiencing a weird problem with my system - everything works as expected but if I switch to another console by pressing Ctrl+Alt+F1 the system will freeze shortly after I login and enter a command or two.

This is a “deep freeze” situation, switching back to the original screen by pressing Ctrl+Alt+F7 does not work, my keypresses are not echoed on the screen, and I cannot switch to another one (ex: Ctrl+Alt+F2). The only thing left to do is power off/on [since there is no restart button].

The funniest thing is that it happened while I was explaining to someone that “Linux is so high-tech, that even if something goes wrong, you can log in again in a different session and do some wizardry by typing certain commands. Regardless of what happens, the system is still usable in command line mode”.

It could be a number of things, but seeing you are switching video mode (console to and from GUI) I’m suspecting it has to do with your video drivers.
Is this something you can ‘reproduce’? Makes troubleshooting all that easier.

Some extra information that would be of use:
What version of openSUSE are you using & which video card?
Also is this KDE 4.x by any chance?

And yes, Linux is also not Unbreakable… like Oracle would like us have think :wink: Mainly with all this video craziness (3D) going on and some drivers out there not being as optimal as they could be…

Cheers,
Wj

openSUSE 11, not using a custom-built kernel or anything of that sort, consider this a typical default installation.

The video card is Intel GMA900, using the driver that was installed by default. The graphical environment is Gnome.

I was thinking about making a list of items for a test matrix, perhaps I could spot some patterns.

So far I have the following criteria (which I believe have something to do with this):

  • Compiz, on/off
  • Battery vs AC adapter

Which other things are relevant to this case?

Re “unbreakableness”, of course - it is an OS and it can fail, but what I am trying to achieve is to bring it to a state which is known to be rock solid and then freeze everything (i.e. no updates, no new software, etc). In such circumstances the system is very unlikely to throw any kind of surprises. I have successfully implemented such a scenario on my Linux desktop and on my Windows laptop. This is my first experience with Linux on a laptop so things are not yet as smooth as they could be.

Laptops can sure be a challenge, even doing a fresh Windows install (as opposed to the oem’s version). Unlike desktops, a lot is proprietary, strangeness in the bios, temperamental displays . . . ugh. Anyway, fwiw I think you’re going down the two most likely tracks, either power management or the graphics/display. With the former, try disabling all the power features in the bios as well as the OS; boot with acpi, apm, apci, and alpci all disabled. On the graphics side, take a look at /var/log/Xorg.0.log to see if X is throwing any errors. Of course, disable Xgl if using that, and I seem to recall that AIGLX is now enabled by default in X and that it is called for compositing with the Intel driver if Hardware Acceleration is enabled. Intel provides the source for the GMA chipsets driver, but there have been issues with dri and glx with X. You might experiment with a different driver set; you can get newer ones in the Xorg repository (but be sure you get the matching dri and glx drivers which AFAIK are in the Mesa package). Again, just fwiw . . .

mingus725 wrote:

>
> Laptops can sure be a challenge, even doing a fresh Windows install (as
> opposed to the oem’s version). Unlike desktops, a lot is proprietary,
> strangeness in the bios, temperamental displays . . . ugh. Anyway,
> fwiw I think you’re going down the two most likely tracks, either power
> management or the graphics/display. With the former, try disabling all
> the power features in the bios as well as the OS; boot with acpi, apm,
> apci, and alpci all disabled. On the graphics side, take a look at
> /var/log/Xorg.0.log to see if X is throwing any errors. Of course,
> disable Xgl if using that, and I seem to recall that AIGLX is now
> enabled by default in X and that it is called for compositing with the
> Intel driver if Hardware Acceleration is enabled. Intel provides the
> source for the GMA chipsets driver, but there have been issues with dri
> and glx with X. You might experiment with a different driver set; you
> can get newer ones in the Xorg repository (but be sure you get the
> matching dri and glx drivers which AFAIK are in the Mesa package).
> Again, just fwiw . . .
>
>
if you bring the laptop up in ‘runlevel 3’, text-mode, no gui… does it
exhibit the same “deep freeze” ?

Boot into this mode by entering a ‘3’ on the options line at the grub screen
when you choose ‘normal’ or ‘failsafe’ boot.

The command (as root) ‘reboot’ will restart the system properly when you’re
done.

You could also test if ‘single-user’ (maintenance) mode works as well.
Enter ‘1’ on the options line in grub during boot. (or ‘S’)

This would help determine if the video driver, used for gui-mode (runlevel
5) is involved in this freezing.

Also, is there any disk activity? Hard drive light flickering? If you
leave it alone, does it come back to life after some time has passed?

Loni

L R Nix
lornix@lornix.com

There is no HDD activity, and if the system is left alone (waited for ~5 min) nothing happens.

I also revealed another instance of this problem - the system freezes when I start Gnome Terminal. As in the previous case, I manage to type a few characters, or even execute a command if it is short - but that’s all.

If I restart in ‘fail safe’ mode, such a freeze does not occur.

It does not occur if I connect to this computer remotely via SSH.

I am sure this is video-related; recently I made an update (I was notified about several important updates and pressed ‘install now’ without reviewing the list), because after that update all the video playback results in a blue rectangle instead of the actual movie (the sound works though).

So, the prime suspect is the Intel graphics driver. Is it enough to simply disable hardware acceleration? Or is there a possibility to install another driver for the graphics card?

hmmm… you could try downgrading the xorg-X11-driver-video package, back to 7.3-138.2 which is the original one from the install media.
This would ‘cure’ slow response, but I haven’t read of this fixing lockups… we’ll know if you give it a try.
After downgrading reboot or reload X to make the change take effect.

Also see this thread : OpenSuSE 11 and Xorg update - openSUSE Forums

Which kernel version are you running btw? whats the output of ‘uname -a’?

The 2 primary differences of ‘fail safe’ mode is turning off power mgmt and “x11failsafe” which is a new kernel parameter I hadn’t seen before and could not find any useful info about. If you haven’t already pinned it down to power vs the video, try booting in failsafe taking out the x11failsafe and then again leaving it but taking out the acpi and apm arguments. (You don’t have to change menu.lst to do this; hit escape at the grub boot menu and you’ll drop into a text menu, highlight a selection and type “e” and you’ll be able to edit the stanza on-the-fly - the commands are self-explanatory.) That may pin the problem down at least categorically.

I looked back over a few threads where I helped with a problem similar to what you describe. Perhaps this will be of help:

In one case, we ultimately traced the problem to KDE. It was resolved by updating to what I think is the current 4.1 in the 11.0 Factory repository (NOT the stable or unstable). We never figured out exactly what was wrong, but kwin and plasma were the suspects. In any event, that fixed it.

In another instance - and this is a bit complicated, so bear with me - we looked at /var/log/Xorg.0.log and found that X was throwing AIGLX errors plus dri and glx errors, including files not found. In 11.0, AIGLX is enabled by default in X if hardware acceleration is enabled. I suggested stripping out all ref’s to AIGLX and compositing in xorg.conf and if sax could be run, being sure that acceleration was disabled. Secondly, there was clearly a mismatch between the Intel driver being used (installed with either a kernel and/or X update) and the dri and glx video drivers; this version of the Intel driver was calling files which are only installed in a newer version of X along with the matching Mesa package which provides the dri and glx drivers. The solution suggested was to either downgrade the kernel and X to the version installed with 11.0, or to add the X repository and upgrade X and the Mesa package. Unfortunately, I don’t know which, if either, of these solutions worked because the user did not report back (either it got fixed and he went his merry way, or he, well, you know . . . ). So . . . if you haven’t yet, take a look at the var/log/Xorg.0.log file as well as in .xession-errors in your home directory; perhaps you’re experience an AIGLX and/or Intel driver version problem as above. If you see these errors, consider stripping out AIGLX, re-verting to prev kernel & X, or upgrading X & Mesa.

Hope this makes some sense - don’t hesitate to ask if it doesn’t - and helps.

I have downgraded the video card driver, as suggested by Magic31 (precisely that version). This seems to do the trick. Not only that Gnome Terminal works, but I also have no freezes when switching to another screen with Ctrl+Alt+F1. Also, there was another problem - all video playback would result in a blue window instead of the actual movie - this one is now solved too.

I guess I will just disable automatic updates, and apply them manually when I become aware of critical issues. My general conclusion (and advice for others) is to always read the changelogs of the updates, especially when it comes to drivers or other low-level components.

There are two general rules of system administration:

  1. always install the latest updates and patches
  2. if it is not broken, don’t fix it

For now, I’ll stick to #2 :slight_smile:

Thank you for your support, I greatly appreciate it.

These are the details requested earlier, in case someone else stumbles upon the same problem.

This is the output of uname:

Linux pazuzu 2.6.25.11-0.1-pae #1 SMP 2008-07-13 20:48:28 +0200 i686 i686 i386 GNU/Linux

X.org.log has several error entries, one of them is unique, the second one appears 8 times:

[1] vm86() syscall generated signal 11.
[2] (EE) intel(0): Mode 1280x1024 does not fit virtual size 1024x1024 - internal error

Ah… my victory celebrations were premature - I now discovered that I cannot turn off the computer properly. It freezes when I try to reboot it or shut it down.

  • I tried booting it in fail safe mode - everything worked.
  • I then booted normally, disabling Compiz - no effect.
  • I rebooted to runlevel 3 - same thing
  • I rebooted to runlevel 3 with** acpi=off**
    and now the reboot and shutdown procedures are working as expected.

Of course, I don’t have the battery charge applet, which is a serious problem.

Now that I know that acpi is the culprit, where to dig?

Notes

  • I am not sure this is only a reboot/shutdown problem. I noticed that it freezes only when I do this, right after I enter the root password to confirm that I have the right to shutdown the machine. I then tried running other commands, waiting a bit - they worked.

  • I am sure this is a freeze, because I do not see “sending all processes the TERM signal” or “the system is about to be HALTED”, etc. If I am in runlevel 3 it freezes right after I enter the root password; and in runlevel 5 it freezes after I click the respective button (then the keyboard is inactive, the mouse does not move, etc)

Take a look in the bios to see what power management options are there. I have read of conflicts between the bios acpi and OS, and even IIRC being able to turn off the bios setting, leave it on in the OS and the kernel makes it work (sorta like the kernel taking over the IRQ’s). Worth checking into maybe.

The sad news is that there are no power related settings in the BIOS, nothing at all.

I have read somewhere that APM and ACPI are doing the same thing, but ACPI is newer and more development effort is dedicated to it. I also found out that these are mutually exclusive, i.e. you can’t have both of them enabled at the same time.

My next idea to try is* acpi=off and apm=on*. I will let you know how things went, until then please let me know if you have other troubleshooting tips.

Usually if there is acpi in the bios, there simply is no apm. The former superceded the latter. The OS interacts with the bios to control the hardware, and that’s where things break. There have been problems with both the standard and with the bios implementations. Take a look at the Criticisms section in the Wikipedia article - what is especially nice was Gate’s effort to sabotage acpi for linux.

Usually if there is acpi in the bios, there simply is no apm

There are no power-related settings there, not ACPI, nor APM.

After booting with “acpi=off apm=on” the system did not freeze, but the battery applet did not work, and features such as ‘sleep’ were not available. In other words, the system behaved as if the setting was “acpi=off”

  1. How can I find out if the system supports APM?
  2. How to verify whether it is enabled or not?

Take a look at the Criticisms section in the Wikipedia article - what is especially nice was Gate’s effort to sabotage acpi for linux.

Indeed, that paragraph is not very encouraging.

My next idea is to try to downgrade the kernel to the version that was there immediately after the installation.

Which other components is it worth to downgrade in the context of this experiment?

On Fri, 15 Aug 2008 07:16:03 GMT
ralienpp <ralienpp@no-mx.forums.opensuse.org> wrote:

>
> - How can I find out if the system supports APM?
> - How to verify whether it is enabled or not?

To see if APM or ACPI is supported by your motherboard:

sudo dmidecode | egrep -i “acpi|apm”

Mine returns:
APM is supported
ACPI is supported

To browse all of the (amazing) info that dmidecode outputs:

sudo dmidecode | less

(press ‘q’ to exit less when you’re done)

Loni


L R Nix
lornix@lornix.com

On Thu, 14 Aug 2008 19:26:03 GMT
ralienpp <ralienpp@no-mx.forums.opensuse.org> wrote:

>
> The sad news is that there are no power related settings in the BIOS,
> nothing at all.
>
> I have read somewhere that APM and ACPI are doing the same thing, but
> ACPI is newer and more development effort is dedicated to it. I also
> found out that these are mutually exclusive, i.e. you can’t have both
> of them enabled at the same time.
>
> My next idea to try is- acpi=off -and- apm=on-. I will let you know how
> things went, until then please let me know if you have other
> troubleshooting tips.
>
>

Have you tried ‘acpi=force’ or ‘acpi=noirq’ ?

Read through all the posts… I don’t believe you’ve ever mentioned which
laptop you have, although it does have the Intel GMA900 video chipset.

That information might help. Please?

Loni


L R Nix
lornix@lornix.com

Loni, you are awesome! . . . :slight_smile:

@ralienpp -

The command hwinfo --bios will also give you the bios info. hwinfo is great for looking at the firmware in most of your hardware.

On Fri, 15 Aug 2008 15:36:03 GMT
mingus725 <mingus725@no-mx.forums.opensuse.org> wrote:

>
> Loni, you are awesome! . . . :slight_smile:
>
>

Whaaat? What’d I do? Did I do something? If it’s broke, I didn’t do it.
Nobody saw me… can’t prove it!

No, really… what?

Loni (Grin)

L R Nix
lornix@lornix.com

The laptop in question is an LG LS70 Express.

I did not yet get the chance to try it with acpi=noirq, but I found this list of ACPI settings, and will try them one after another to see whether there is an effect.

How can I find what are the current settings of the kernel? (I cannot rely on the list I see when the system boots, because some settings will use default values if not explicitly declared)

I will let you know what the output of hwinfo is when I’m at that computer next time.