Boot problem after update - must force INIT 3 and then INIT 5 to get graphical login

Hi everyone,
few weeks ago, after a normal update on LEAP 15.4, i got a nasty problem at boot: long story short, the boot process goes without errors but I get no graphical login.

To go graphical, I need to log in as root, revert to runlevel 3 and then go again to runlevel 5, issuing


# init 3
# init 5

By the way, if I issue directly init 5, I got a blank screen in another virtual console and must go back to the previous console, set runlevel 3 and then 5.

This tells me that my video card is correctly managed, right? I have a nVidia with proprietary driver.

I really don’t know how to solve the problem.

I also don’t know what log could be useful to attach.

Any help really appreciated :slight_smile:

Alberto

Reboot and before doing init 3/init 5 capture full output of “journalctl -b” as root and upload to https://susepaste.org. Then execute init 3/init 5 and provide the same output again, also upload.

Hi,
thanks for the prompt answer.

Here is the journal before the init commands: https://susepaste.org/46430267
And here is the journal after them: https://susepaste.org/8782108

Alberto

Well, the first time sddm attempts to start Xorg server this server crashes. Also NVIDIA kernel drivers are being loaded when Xorg server begins startup. The second time Xorg server is started normally. It sounds like some race condition between loading of NVIDIA drivers and Xorg. You should now have to files - /var/log/Xorg.0.log and /var/log/Xorg.0.log.old. Could you upload them for comparison?

Interesting that exactly the same problem on Tumbleweed was described earlier on NVIDIA forums (but without any real solution either): Ubuntu 21.10 - "Failed to grab modeset ownership" with 495.44 - #40 by plamponi - Linux - NVIDIA Developer Forums.

Here is Xorg.0.log: https://susepaste.org/61876622

And here is Xorg.0.log.old: https://susepaste.org/61199004

I will try myself to analyze the logs: maybe stopping X from starting before the nVidia driver is completely loaded could help? But being a dinosaur, and being grown up with the old initV I have no idea how to do it with systemd…

The sequence of events - Xorg starts, detects NVIDIA and unloads modesetting driver, then comes udev event that DRI device becomes available (as the result of loading nvidia-dri I assume), Xorg attempts to load modesetting driver again and crashes.

Here is Xorg.0.log: SUSE Paste

And here DRI device is available from the very beginning, so Xorg starts with preferred NVIDIA driver, it succeeds and modesetting driver is unloaded without initialization.

This sounds like upstream Xorg issue. You could try to report it to Xorg project.

maybe stopping X from starting before the nVidia driver is completely loaded could help?

Normally systemd-logind should wait until DRI device becomes available so display managers only attempt to start Xorg after this point, but using “nomodeset” disables this logic. Also I am not sure what triggers loading of NVIDIA kernel drivers.

You could try force loading of nvidia drivers early (see “man modules-load.d”) and removing “nomodeset” kernel command line.

Hi.
No news here. Tried a lot, with no luck.
I even tried to create a systemd service starting after the login message in the console to let it issue the init commands: nothing.

I tried removing nomodeset. Nothing.

I posted on Xorg issues board, no answer till now.

Still hoping to get a solution…

With systemd, the graphics driver should have been loaded before the GUI start job was queued – example with an AMD GPU and wait for Chrony to synchronise the system time –


 # journalctl -b 0 --output=short-monotonic --no-hostname --system | grep -iE 'amdgpu|graphic|chrony'
    0.225555] kernel: smpboot: CPU0: AMD Ryzen 5 3400G with Radeon Vega Graphics (family: 0x17, model: 0x18, stepping: 0x1)
    0.317792] kernel: ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
    4.949798] kernel: [drm] amdgpu kernel modesetting enabled.
    4.960435] kernel: amdgpu: Topology: Add APU node [0x0:0x0]
    4.960505] kernel: fb0: switching to amdgpu from EFI VGA
    4.960613] kernel: amdgpu 0000:06:00.0: vgaarb: deactivate vga console
    4.960668] kernel: amdgpu 0000:06:00.0: enabling device (0006 -> 0007)
    4.960716] kernel: amdgpu 0000:06:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
    4.960892] kernel: amdgpu 0000:06:00.0: amdgpu: Fetched VBIOS from VFCT
    4.960894] kernel: amdgpu: ATOM BIOS: 113-PICASSO-118
    4.961719] kernel: amdgpu 0000:06:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
    4.961722] kernel: amdgpu 0000:06:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
    4.961723] kernel: amdgpu 0000:06:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
    4.961806] kernel: [drm] amdgpu: 2048M of VRAM memory ready
    4.961808] kernel: [drm] amdgpu: 3072M of GTT memory ready.
    4.966900] kernel: amdgpu 0000:06:00.0: amdgpu: PSP runtime database doesn't exist
    4.977805] kernel: amdgpu: hwmgr_sw_init smu backed is smu10_smu
    4.998702] kernel: amdgpu 0000:06:00.0: amdgpu: Will use PSP to load VCN firmware
    5.082066] kernel: amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
    5.086804] kernel: amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
    5.137085] kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
    5.176696] kernel: amdgpu: HMM registered 2048MB device memory
    5.176774] kernel: amdgpu: Topology: Add APU node [0x15d8:0x1002]
    5.176778] kernel: kfd kfd: amdgpu: added device 1002:15d8
    5.177014] kernel: amdgpu 0000:06:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 11, active_cu_number 11
    5.180405] kernel: fbcon: amdgpudrmfb (fb0) is primary device
    5.262744] kernel: amdgpu 0000:06:00.0: [drm] fb0: amdgpudrmfb frame buffer device
    5.291304] kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
    5.291309] kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
    5.291311] kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
    5.291313] kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
    5.291315] kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
    5.291317] kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
    5.291323] kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
    5.291324] kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
    5.291326] kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
    5.291328] kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
    5.291329] kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
    5.291331] kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
    5.291333] kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
    5.291334] kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
    5.291336] kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
    5.295139] kernel: [drm] Initialized amdgpu 3.42.0 20150101 for 0000:06:00.0 on minor 0
    6.544340] systemd[1]: Queued start job for default target Graphical Interface.
    7.220235] kernel: snd_hda_intel 0000:06:00.1: bound 0000:06:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
   20.033886] chronyd[1812]: chronyd version 4.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 -DEBUG)
   20.075169] chronyd[1812]: Frequency 3.451 +/- 0.065 ppm read from /var/lib/chrony/drift
   20.158778] systemd[1]: Starting Wait for chrony to synchronize system clock...
   33.305508] chronyd[1812]: Selected source 50.7.124.24 (0.de.pool.ntp.org)
   33.306311] chronyd[1812]: System clock wrong by 1.511193 seconds
   33.306456] chronyd[1812]: System clock was stepped by 1.511193 seconds
   34.235322] systemd[1]: Finished Wait for chrony to synchronize system clock.
   35.170732] systemd[1]: Reached target Graphical Interface.

Some more information –


 # systemctl status graphical.target
● graphical.target - Graphical Interface
     Loaded: loaded (/usr/lib/systemd/system/graphical.target; indirect; vendor preset: disabled)
     Active: active since Wed 2022-09-07 13:23:49 CEST; 3h 55min ago
       Docs: man:systemd.special(7)

Sep 07 13:23:49 xxx systemd[1]: Reached target Graphical Interface.
 # 


 > cat /usr/lib/systemd/system/graphical.target
#  SPDX-License-Identifier: LGPL-2.1-or-later
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Graphical Interface
Documentation=man:systemd.special(7)
Requires=multi-user.target
Wants=display-manager.service
Conflicts=rescue.service rescue.target
After=multi-user.target rescue.service rescue.target display-manager.service
AllowIsolate=yes
 > 

As you can see, the Graphical Taget requires that, the Multi-User target has been reached, beforehand …
And, you can list the dependencies of the Graphical target –


 # systemctl list-dependencies graphical.target
graphical.target
● ├─ca-certificates.path
● ├─display-manager.service
○ ├─issue-generator.service
● ├─smartd_generate_opts.path
○ ├─systemd-update-utmp-runlevel.service
○ ├─YaST2-Firstboot.service
○ ├─YaST2-Second-Stage.service
● └─multi-user.target
○   ├─after-local.service
●   ├─apache2.service
●   ├─apparmor.service
○   ├─appstream-sync-cache.service
●   ├─auditd.service
●   ├─autofs.service
●   ├─avahi-daemon.service
●   ├─chrony-wait.service
●   ├─chronyd.service
●   ├─cron.service
●   ├─cups-browsed.service
●   ├─cups.path
●   ├─dbus.service
●   ├─firewalld.service
●   ├─irqbalance.service
●   ├─kbdsettings.service
●   ├─mcelog.service
●   ├─nfs-server.service
●   ├─nfs.service
●   ├─nscd.service
●   ├─plymouth-quit-wait.service
○   ├─plymouth-quit.service
●   ├─postfix.service
○   ├─purge-kernels.service
●   ├─rpcbind.service
●   ├─rsyslog.service
●   ├─smartd.service
●   ├─sshd.service
●   ├─systemd-logind.service
○   ├─systemd-update-utmp-runlevel.service
●   ├─systemd-user-sessions.service
●   ├─tpm2-abrmd.service
●   ├─vboxautostart-service.service
●   ├─vboxdrv.service
●   ├─wicked.service
●   ├─basic.target
●   │ ├─-.mount
 . 
 . 
 . 

Therefore, given systemd and, the way that systemd organises what starts when and, what a given unit needs before it is started, if, the Graphical target is reached before the NVIDIA drivers are properly loaded, there is something very wrong with the systemd Unit which loads the NVIDIA driver …

For the records, if I disable the nvidia driver everything works fine…

If plymouth and nvidia’s proprietary drivers are both installed, try removing plymouth, or at least disable it via linu line in Grub. I’m not sure which work, since I never have it installed:

  • plymouth=0
  • noplymouth
    *]plymouth.enable=0
    Maybe they all would work. :slight_smile:

In case it helps, I am also experiencing this problem starting today (I also have nvidia drivers from the nvidia repository installed). I have been applying updates daily, as they come.

I tried rebooting once again and everything seems to work no. Very odd.

Did you guys fix the problem … because starting today i have the problem … again. I had something like this last october or november and fixed it by getting rid of nvidia. fast forward i want to try again and problem still persists.

System boots into a black blank screen with an underscore in the top left corner and a mouse pointer. Switch to virtual console, login as root, init 3 and init 5. Voila … everything works.

The only interestings things in my log i found so far are:

Feb 14 15:06:07 tuxedo-xps systemd-udevd[642]: /usr/lib/udev/rules.d/99-bumblebee-nvidia-dev.rules:10 Invalid value “/bin/bash -c ‘/bin/mknod -m 666 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \ -f 1) 0;’” for RUN (char 51: invalid substitution type), ignoring.
Feb 14 15:06:14 tuxedo-xps systemd-udevd[669]: nvidia_uvm: Process ‘/bin/bash -c ‘/bin/mknod -m 666 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \ -f 1) 0;’’ failed with exit code 1.
Feb 14 15:16:35 tuxedo-xps systemd-udevd[8526]: nvidia_uvm: Process ‘/bin/bash -c ‘/bin/mknod -m 666 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \ -f 1) 0;’’ failed with exit code 1.

And

Feb 14 16:06:02 tuxedo-xps systemd-modules-load[291]: Failed to find module ‘bbswitch’
Feb 14 15:06:07 tuxedo-xps systemd-modules-load[631]: Module ‘bbswitch’ is deny-listed
Feb 14 15:15:18 tuxedo-xps kernel: bbswitch: version 0.8
Feb 14 15:15:18 tuxedo-xps kernel: bbswitch: Found integrated VGA device 0000:00:02.0: _SB_.PCI0.GFX0
Feb 14 15:15:18 tuxedo-xps kernel: bbswitch: Found discrete VGA device 0000:01:00.0: _SB_.PCI0.PEG0.PEGP
Feb 14 15:15:18 tuxedo-xps kernel: bbswitch: detected an Optimus _DSM function
Feb 14 15:15:18 tuxedo-xps kernel: bbswitch: device 0000:01:00.0 is in use by driver ‘nvidia’, refusing OFF
Feb 14 15:15:18 tuxedo-xps kernel: bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on

So the bbswitch module is there it just doesnt seem to be found on the first try for some reason.

I still have the problem. No clue how to solve it.

Reminds me of 19 months open bug 1188954:
black screen because dm tries to start before /dev/dri/card0 has been created during init

Does systemctl restart xdm have the same effect as

# init 3
# init 5

?