Unable to start X or (gracefully) shutdown after upgrade 42.3 -> 15.0

Hi,

This morning I have upgraded (or at least tried to) my laptop from openSUSE 42.3 to 15.0. Afterwards, the whole computer freezes when I try to log in.

The upgrade:
Backed up my repos, I disabled all third party repos and changed all 42.3 repos to 15.0 (using *sed -i ‘s/42.3/15.0/g’ /etc/zypp/repos.d/**) and ran zypper dup --download-in-advance in a tmux session. This took 2-3 hours, during the last half hour I suddenly had some trouble with my internet connection: my wifi disconnected, and when it finally reconnected, there seemed to be a DNS issue.

Initial symptoms:
After zypper was done, I rebooted the computer. The graphical login screen shows but when I log in, everything freezes (ctrl-alt-f1 doesn’t switch to a terminal, ctrl-alt-backspace (2x) doesn’t do anything), sometimes the frozen screen is the login scree, sometimes is it the splash screen. Before logging in, I can move the mouse, click buttons, type in the password field and everything works as expected until I enter a correct password. (Incorrect password is handled as expected).

Initial investigation and other symptoms:
I booted into console, looked through some logs, I could not really find an obvious error in there. When I run startx, some messages are shown, and then everything freezes again, unresponsive to anything I tried, except holding the powerbutton. When I try to shut down (showdown -h now), normal shutdown starts, but before the system shutdown, the screen is filled with messages and a stacktrace that I can’t really make sense of. Every 20 seconds or so, new messages and stacktraces appear. From what I understand, there is a core that is not responding. At one point I managed to boot into a state that I wasn’t able to reach afterwards: I booted using the default recovery mode (no special options): I was expecting the log in screen to show, but instead the terminal login messages showed. Every 20 seconds or so, the screen filled with messages and stacktrace indicating that one CPU core wasn’t doing what it was supposed to do. I could run command while this was happening (as opposed to when during shutting down). I saw in top that one process (called simply “X”) was using 100% CPU. In htop, this process was shown with a much more descriptive name (that I can’t remember…).

Internet issues:
During the last half hour of the upgrade, there were some issues with my internet connection, as described before. I seem to have the same problem now in the terminal: I can ping IP addresses not using domain names and other internet stuff doesn’t work (zypper can’t refresh repos for example). This is with my network cable plugged in, didn’t try to get wifi running.

I have some pictures from the messages and call traces I described, but it doesn’t seem like I can upload them here. I’ll try to add them later in some other way. I don’t know what logfiles are of any importance, but I would happily try to retrieve these as plain text from the laptop to post here if you tell me which.

At this point I’ve run out of ideas and phrases to google. Only next step I can think of is give up the 15.0 installation and try to revert to my previous 42.3 installation (hopefully making use of btrfs, because I didn’t make a proper backup…) or otherwise a complete re-install :confused:

output of startx: http://kroppyer.nl/upgradeissue/startx.jpg
messages after attempt to shutdown: http://kroppyer.nl/upgradeissue/calltrace1.jpg http://kroppyer.nl/upgradeissue/calltrace2.jpg http://kroppyer.nl/upgradeissue/calltrace3.jpg

Maybe virtualbox or nouveau are causing problems?

Try alt-SysRq-reisub to reboot when that happens, more elegant. Note that not all those options are enabled by default, but does not matter, enough works that you can reboot.

(Print Screen key, if SysRq key is not labelled)

When you get to the Grub screen, hit the “e” key, this will put you in Grub Edit mode.

Scroll down to the line that begins with “linux…”

Hit the “End” key to make sure you get to the end of the line, as it actually wraps.

Add a space.

Add:

nomodeset

F10 to continue booting

Let us know if that helps.

Also, for questions like this, it is a good idea to tell us which Desktop you installed/are using and which Display Manager is in use, as well as the info about what graphics card you have, etc. Those things do make a difference.

IMO, Optimus users should be advised against online upgrades. There’s too much to go wrong. Did your 42.3 have NVidia drivers installed? NVidia repos?

To start with, I’d try a boot by disabling Plymouth, use the e key at the Grub menu, work down to the linux line, remove quiet and splash=silent, then append plymouth.enable=0 before proceeding with boot. Can you then login to a normal session? If you can get into X, to run

inxi -SGxx

it might help us help you sort this out, as should

zypper lr -d

whether or not you can get into X.

Use the command susepaste to upload /var/log/Xorg.0.log and images. You may need to first install susepaste and/or inxi.

Thanks for the advice so far!

My laptop is an HP zbook with an NVidia graphics card (see http://paste.opensuse.org/view//60261086). I don’t believe I have proprietary NVidia software running, but I do remember mucking around a bit while trying to get a docking station to work over a year ago (fresh 42.3 install), I think that’s why I was subscribed to the X11:Bumblebee and graphics repos (see below). But I believe I switched back to my initial configuration (defaults for opensuse 42.3 with KDE at the time). Not sure what to answer to the question of what display manager I’m using, but /etc/sysconfig/displaymanager mentions “Xorg” and “sddm” (see http://paste.opensuse.org/view//34646305)

I booted with the nomodeset option as Fraser_Bell described. Now I was able to continue after the graphical log-in and enter KDE, and everything seems to work except that my display repaints really slowly (I can see it refresh from top to bottom). Also, my DNS issue still exists (maybe that’s unrelated. If so, I could try and fix that first, so zypper and susepaste can get online, right now I’m scp’ing to/from my other laptop).

While booted with nomodeset, I ran inxi -SGxx (see http://paste.opensuse.org/view//60261086) and zypper lr -d (see http://paste.opensuse.org/view//17838889), and copied Xorg.0.log (see http://paste.opensuse.org/view//66951343) to my other laptop.

Then I shut down from KDE, which worked fine (although some unusual screens showed, guessing that’s just a consequence of the nomodeset option).

I then booted with Plymouth disabled, as described by mrmazda. This didn’t help, I had the same symptoms. alt-sysrq-reisub worked like a charm :slight_smile: I tried a couple of times, and there seems to be no way to exit the graphical login screen without crashing:

  • logging in correctly freezes the computer
  • ctrl-alt-F1 freezes the computer
  • ctrl-alt-backspace (2x) freezes the computer
  • clicking Shutdown from the login panel freezes the computer
    Any action keeping my in the login screen works fine:
  • Switch user
  • enter wrong password
  • show virtual keyboard
  • select different desktop session (options are: IceWM, Plasma, TWM, User/System default), but when I try to log in with a different option selected (e.g. IceWM) the computer freezes.

Update: it seems I’ve solved the DNS issue (by adding “nameserver 8.8.8.8” to /etc/resolve.conf, and starting dnsmasq with sudo service dnsmasq start), so now I can easily use zypper and others in an effort to fix the broken system, and I feel more confident to remove packages since I can re-install them now without trouble. But I’m still shooting in the dark here as to what might be the cause…

If you have Optimus graphics then you need bumblebee to mange it if you removed the bumblebee repo before yo upgraded you probably are trying to run older version Also If you install NVIDIA drivers that could also be older stuff still installed. clean up your video drivers.

I don’t think I’m making use of optimus. In fact, I didn’t even know I had an intel integrated graphics card. zypper se bumblebee and zypper se nvidia show I have no bumblebee or nvidia packages installed. zypper se nouveau does show some installed packages. I tried to reinstall them, but that broke things completely. I tried installing and removing a number of packages but eventually I used snapper to undo today’s changes.

Then I went into the bios and set the graphics card option from “auto” to “discrete” (third option was “hybrid”). After this everything was working again. However grub is not using the perfect resolution for my screen, and after booting, my screen is flickering somewhat. And I can’t get my external display to work (it is recognised, correct resolution is automatically set, but the display doesn’t seem to receive any input).

So it definitely seems to be a graphics card/driver issue, and the symptoms are now restricted to only graphical things: I haven’t had any crashes yet that prevent me from getting into a terminal.

I will spend some time later today or tomorrow trying to diagnose and fix this issue, if unsuccessful, I might post in a different subforum as this seems no longer a boot/login issue. but of course: if you have any suggestions as to how to diagnose or fix this new problem, happy to hear them :slight_smile:

When booting with nomodeset you are effectively selecting rescue mode. It disables all competent X drivers that could support your hardware. The inxi output and Xorg.0.log both show you that the (fallback) X driver FBDEV is in use. FBDEV is SLOW, as you’ve seen. As gogalthorp suggested, you do need to focus on a Bumblebee path, starting by reenabling the X11:Bumblebee repo. After doing so, what is the output from?:

zypper se -si | grep 'tem Pac' | grep -v plication

Whatever it reports is likely to need to be upgraded or removed.

zypper dup --from X11:Bumblebee

might be all you need.

See output here SUSE Paste

Running zypper dup --from X11:Bumblebee only affected one package which was not in the list (dkms). I enable the other repositories one by one and did the same trick. Then I removed packages I felt confident removing, and worked the list down to the following SUSE Paste

From what’s left, I don’t think ipe is the problem, and I don’t really feel like it’s a good idea to run something like zypper rm kernel-default…

At no point during this process did I successfully boot with graphics set to “auto” in the BIOS (without the nomodeset option). I was able to boot with it set to “discrete” as described before, but not without an occasionally flickering screen as a result (couldn’t test external monitor, will try tomorrow).

So still a pretty significant problem. I’ll try a few things tomorrow. Would it make sense to try to remove the old kernel packages (4.4.165, 4.4.162) while leaving the new ones (4.12.14)? Or do you thin I should focus on installing a graphics driver that works (with any of the BIOS options “auto”/“discrete”/“hybrid”)?

Is Bumblebee is actually installed? If it is, and SDB:NVIDIA Bumblebee - openSUSE Wiki remains up to date, I’d try switching to whichever is the other method from that currently applied. Before anything more I’d check for a BIOS update, and apply it before doing anything else. I’d not spend time on kernel removal before being satisfied with graphics performance, which I’d expect to be optimal only with BB properly installed regardless whether or not you wish to use only discrete NVidia graphics. IOW, focus on graphics. I don’t have any Optimus hardware, so consider this all opinion.

All of them are crashes in nouveau power management routines.

Quick search shows at least four different generations of HP zbook (G1 through G4) each having possibly wide variation of hardware. So this does not add much information. Same quick search shows similar reports for some zbook variants with possible workaround being disabling PCIe power management which is somewhat consistent with stack traces shown earlier.

My educated guess is that firmware/kernel attempts to power down unused nVidia card which is handled poorly by nouveau driver.

I don’t believe I have proprietary NVidia software running

Believe?!? This is your system, you are expected to know what you have installed.

No (unless you are trying to free some disk space)

Or do you thin I should focus on installing a graphics driver that works (with any of the BIOS options “auto”/“discrete”/“hybrid”)?

You should first actually know what you have in your system and understand how it works (so far I have a feeling that you use words “discrete” or “hybrid” like some magic incantations). Then you need to decide what you want to use - iGPU alone, dGPU alone or Optimus (hybrid). And which option is available depends on your exact hardware and what settings it offers. Only then can you focus on “installing graphics driver that works” because these three cases require rather different approaches. And in any case you need to make known your actual hardware (not just “HP zbook”) so that you - or someone else for you - can actually search for similar issues with this hardware.

With hybrid graphics disabled in BIOS?

There are a lot of similar reports, unfortunately, without any resolution. One suggested workaround is to disable nouveau power management. Testing actual kernel from Kernel:HEAD or Kernel:stable would make sense. If the problem goes away open bug report for Leap.

Exact graphics card spec is given behind the susepaste link, and I would’ve given a more descriptive name for the laptop if it had one printed on… But you’re right, let me dig a little deeper and give a more detailed description of my hardware for googlability:
HP Zbook studio G4
Intel core i7-7700HQ @ 2.80GHz
Graphics:

  • NVIDIA GM107GLM [Quadro M1200 Mobile]
  • Intel Device 591b

Graphics driver is nouveau.

With discrete graphics selected in the BIOS, my laptop display seems unable to keep up with repainting the screen at times, and external display doesn’t work (as described before). With auto or hybrid selected, my laptop freezes when exiting the login screen (as described in the first post).

Based on your feedback, I will see what happens when I disable PCIe power management, and if that doesn’t help I will install Bumblebee.

As for the choice between integrated/discrete: I need an external screen and basic graphics (nothing fancy), if the integrated card can do that I’m happy, but the BIOS doesn’t seem to offer an integrated-only option (and from experience with my other laptop, the integrated card may not support a second screen). I’m also happy to use the nVidia card exclusively, even if it drains my battery faster. And of course this means if I get optimus to work I would be happy too.

I updated my BIOS (1.03 → 1.23) – no difference: freeze when graphics set to auto in the BIOS, unless I boot with nomodeset. With graphics set to discrete in the BIOS, laptop displays flickers occasionally and external monitor doesn’t work.
But there is a fourth graphics option now in the BIOS: UMA graphics, which I think is the option for integrated graphics that was missing previously. As it will turn out, selecting this option solves my problem (for now).

About disabling PCIe power management: unsure if this is useful info but running dmesg | grep aspm yields the following


    0.194844] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
    0.293822] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
    0.294285] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration

Before trying the UMA option, I tried the following:

With graphics set to auto in the BIOS:

  • booting with pcie_aspm=off doesn’t seem to make a difference, computer still locks up as soon as I try to leave the login screen in any way.
  • booting with acpi=off seems to work fine, but still not external display (doesn’t get recognised even), and shutting down seems a problem: I get the following messages:

  611.138698] systemd-journald[442]: Failed to send stream file descriptor to service manager: Connection refused
  613.620689] reboot: System halted

and everything freezes again, now even alt-sysrq-reisub doesn’t do anything. A short press on the powerbutton turns the laptop off.

With graphics set to discrete in BIOS:

  • booting with pcie-aspm=off doesn’t seem to make a difference: display still flickers, and external monitor gets recognised, but doesn’t receive any input.
  • booting with acpi=off: the screen remains at (very) low resolution, no more flickering, but the external monitor doesn’t get recognised. Shutdown without any problems.

With graphics set to UMA in BIOS:

  • booting with default options: everything seems to work now! no flickering laptop screen and external monitor works fine. Running inxi -SGxx tells me I’m indeed using only the intel card now. YaST hardware information tells me the driver is i915. As this integrated card seems to support the external display, I’m perfectly happy with not using the nVidia card.

So I’m guessing the problem I experiences was entirely nouveau related, and now that I’ve found a way to avoid nouveau, I’m no longer experiencing the problem. I might see if the nouveau problem can be fixed so that I can get the nVidia card to work (optimus or not), but first I need to catch up on some work.

Your help was much appreciated, thanks!

Yes, that was link I meant: [SOLVED] HP Zbook G4 and Integrated Graphics problem on Ubuntu 18.04
You may try suggested workaround next time.

To make this more useful in future searches, adding inxi -SGxx output (run while booted using nomodeset) copied from susepaste 60261086 which is destined to expire:

inxi -SGxx
System:    Host: nbwin1547 Kernel: 4.12.14-lp150.12.28-default x86_64 bits: 64 gcc: 7.3.1
           Console: tty 1 dm: sddm,sddm Distro: openSUSE Leap 15.0
Graphics:  Card-1: Intel Device 591b bus-ID: 00:02.0 chip-ID: 8086:591b
           Card-2: NVIDIA GM107GLM [Quadro M1200 Mobile] bus-ID: 01:00.0 chip-ID: 10de:13b6
           Display Server: X.org 1.19.6 drivers: fbdev (unloaded: modesetting,vesa)
           tty size: 240x67 Advanced Data: N/A out of X

Note that 8086:591b and i7-7700HQ Intel devices equate to keyword “Kaby Lake” and “Intel® HD Graphics 630”.

I have two different Kaby Lake/630 PCs, with no discrete gfxcards installed, that work flawlessly with both 15.0 and TW, and with up to 3 simultaneously connected displays.