Sometimes linux is not booting on latest kernel

This has been plaguing me for years. I used to think it’s somehow related to secure boot that I had on my old thinkpad laptop, but it also happens on my new dell latitude laptop with secure boot off.

What happens is that I zypper dup, and when there’s a kernel update it’s a gamble whether it’s going to boot or not. When it doesn’t boot, I get a black screen until I hard reset. When that happens I revert to the older kernel version in the advanced boot options.

Does this happen for anyone else? Is there anything I can do? Is this a known issue? Thoughts?

Currently I’m on kernel 6.9.9-1-default, if I try to boot 6.10.2-1-default I get the black screen I described.

@eshoe Hi and welcome to the Forum :smile:
I suspect graphics, running Nvidia driver? Can you post the output from inxi -GSaz

Hello! Thanks. Yes it’s my first post, but long time lurker :slight_smile:

System:
  Kernel: 6.9.9-1-default arch: x86_64 bits: 64 compiler: gcc v: 13.3.0
    clocksource: tsc avail: acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.9.9-1-default
    root=/dev/mapper/system-root splash=silent mitigations=auto quiet
    security=apparmor mem_sleep_default=deep
  Desktop: KDE Plasma v: 6.1.3 tk: Qt v: N/A info: frameworks v: 6.4.0
    wm: kwin_wayland tools: avail: xscreensaver vt: 3 dm: SDDM Distro: openSUSE
    Tumbleweed 20240801
Graphics:
  Device-1: Intel Raptor Lake-P [Iris Xe Graphics] vendor: Dell driver: i915
    v: kernel alternate: xe arch: Gen-13 process: Intel 7 (10nm) built: 2022+
    ports: active: DP-5,eDP-1 empty: DP-1, DP-2, DP-3, DP-4, HDMI-A-1
    bus-ID: 0000:00:02.0 chip-ID: 8086:a7a1 class-ID: 0300
  Display: wayland server: X.org v: 1.21.1.12 with: Xwayland v: 24.1.1
    compositor: kwin_wayland driver: X: loaded: modesetting unloaded: fbdev,vesa
    alternate: intel dri: iris gpu: i915 d-rect: 3840x1200 display-ID: 0
  Monitor-1: DP-5 pos: right res: 1920x1080 size: N/A modes: N/A
  Monitor-2: eDP-1 pos: primary,left res: 1920x1200 size: N/A modes: N/A
  API: EGL v: 1.5 hw: drv: intel iris platforms: device: 0 drv: iris
    device: 1 drv: swrast surfaceless: drv: iris wayland: drv: iris x11:
    drv: iris inactive: gbm
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 24.1.3 glx-v: 1.4
    direct-render: yes renderer: Mesa Intel Graphics (RPL-U)
    device-ID: 8086:a7a1 memory: 14.99 GiB unified: yes display-ID: :1.0
  API: Vulkan v: 1.3.290 layers: 1 device: 0 type: integrated-gpu
    name: Intel Graphics (RPL-U) driver: N/A device-ID: 8086:a7a1
    surfaces: xcb,xlib,wayland

It happens for me too - on a system using an NVIDIA GPU;
It drops me without a GUI on a raw terminal.
However I usually just log in, do a sudo reboot now and the next boot it works.

@eshoe unusual for Intel GPU to not play nice… your running Xorg there, not Wayland? Have you logged out and selected Wayland? If using Xorg, then maybe you created an xorg.conf file?

As soon as Grub menu is exited, strike ESC. The resulting unhidden messages during boot may provide error messages constituting clues to failure. If you remove the splash=silent and quiet entries from your GRUB_CMDLINE_LINUX_DEFAULT= line in /etc/default/grub, and regenerate grub.cfg, you’ll see those messages on every boot.

@dotdotdot333 - different for me. In my case linux doesn’t boot at all. I don’t even get to a TTY.
@malcolmlewis - I am using wayland…?
@mrmazda - I’ve done that. Nothing. When I select 6.10.2-1-default. Black screen, no messages at all. When I select 6.9.9-1-default I can boot and I see the messages as expected.

Could it be some weird interaction with Dell’s UEFI? But I don’t see why it would happen on every other version or so of the kernel.

That means your problem occurs early in your boot process.

Have you tried a remote login after a 6.10.2 boot was given plenty of time to complete?

# inxi -GSaz --vs --zl --hostname
inxi 3.3.35-00 (2024-06-18)
System:
  Host: ab560 Kernel: 6.10.2-1-default arch: x86_64 bits: 64 compiler: gcc
    v: 13.3.0 clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz root=LABEL=<filter> noresume
    ipv6.disable=1 net.ifnames=0 consoleblank=0 preempt=full mitigations=auto
    xe.force_probe=4c8b i915.force_probe=!4c8b earlyprintk=efi
  Desktop: TDE (Trinity) v: R14.1.2 tk: Qt v: 3.5.0 wm: Twin v: 3.0
    with: kicker vt: 7 dm: 1: TDM 2: XDM Distro: openSUSE Tumbleweed 20240801
Graphics:
  Device-1: Intel RocketLake-S GT1 [UHD Graphics 730] vendor: ASUSTeK
    driver: xe v: kernel alternate: i915 arch: Gen-12.1 process: Intel 10nm
    built: 2020-21 ports: active: DP-1,HDMI-A-1,HDMI-A-2 empty: HDMI-A-3
    bus-ID: 00:02.0 chip-ID: 8086:4c8b class-ID: 0300
  Display: x11 server: X.Org v: 21.1.12 compositor: Twin v: 3.0 driver: X:
    loaded: modesetting unloaded: fbdev,vesa alternate: intel dri: iris gpu: xe
    display-ID: :0 screens: 1
  Screen-1: 0 s-res: 3600x2640 s-dpi: 120 s-size: 762x558mm (30.00x21.97")
    s-diag: 944mm (37.18")
  Monitor-1: DP-1 pos: primary,bottom-l model: Acer K272HUL serial: <filter>
    built: 2018 res: 2560x1440 hz: 60 dpi: 109 gamma: 1.2
    size: 598x336mm (23.54x13.23") diag: 686mm (27") ratio: 16:9 modes:
    max: 2560x1440 min: 720x400
  Monitor-2: HDMI-A-1 mapped: HDMI-1 pos: top-left model: NEC EA243WM
    serial: <filter> built: 2011 res: 1920x1200 hz: 60 dpi: 94 gamma: 1.2
    size: 519x324mm (20.43x12.76") diag: 612mm (24.1") ratio: 16:10 modes:
    max: 1920x1200 min: 640x480
  Monitor-3: HDMI-A-2 mapped: HDMI-2 pos: top-right model: Dell P2213
    serial: <filter> built: 2012 res: 1680x1050 hz: 60 dpi: 90 gamma: 1.2
    size: 473x296mm (18.62x11.65") diag: 558mm (22") ratio: 16:10 modes:
    max: 1680x1050 min: 720x400
  API: EGL v: 1.5 hw: drv: intel iris platforms: device: 0 drv: iris
    device: 1 drv: swrast gbm: drv: iris surfaceless: drv: iris x11: drv: iris
    inactive: wayland
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 24.1.3 glx-v: 1.4
    direct-render: yes renderer: Mesa Intel Graphics (RKL GT1)
    device-ID: 8086:4c8b memory: 14.74 GiB unified: yes
  API: Vulkan v: 1.3.290 layers: 2 device: 0 type: integrated-gpu name: Intel
    Graphics (RKL GT1) driver: N/A device-ID: 8086:4c8b surfaces: xcb,xlib
    device: 1 type: cpu name: llvmpipe (LLVM 18.1.8 256 bits) driver: N/A
    device-ID: 10005:0000 surfaces: xcb,xlib
#

I wonder if xe.force_probe=4c8b and i915.force_probe=!4c8b, as you can see I used above on my kernel cmdline, added to yours would be of any benefit. My GPU is a year or more older than yours, so requires at least the first two in order to use the Xe drivers instead of older.

Hmm raptor-Lake is that not the Intel CPU that has known instabilities???

@eshoe @mrmazda it should be fine, I’m running it here on Aeon;

pinxi -GSaz --zu

System:
  Kernel: 6.10.2-1-default arch: x86_64 bits: 64 compiler: gcc v: 13.3.0
    clocksource: tsc avail: acpi_pm
    parameters: initrd=\opensuse-aeon\6.10.2-1-default\initrd-<filter>
    quiet loglevel=2 systemd.show_status=no console=ttyS0,115200 console=tty0
    vt.global_cursor_default=0 ignition.platform.id=metal security=selinux
    selinux=1 i915.force_probe="!46d1" xe.force_probe="46d1"
    root=UUID=<filter> rootflags=subvol=@/.snapshots/6/snapshot
    systemd.machine_id=<filter>
  Desktop: GNOME v: 46.3.1 tk: GTK v: 3.24.43 wm: gnome-shell
    tools: gsd-screensaver-proxy dm: GDM v: 46.2 Distro: Aeon
Graphics:
  Device-1: Intel Alder Lake-N [UHD Graphics] driver: xe v: kernel
    alternate: i915 arch: Gen-12.2 process: Intel 10nm built: 2021-22+ ports:
    active: HDMI-A-1 empty: HDMI-A-2 bus-ID: 00:02.0 chip-ID: 8086:46d1
    class-ID: 0300
  Display: wayland server: X.org v: 1.21.1.12 with: Xwayland v: 24.1.1
    compositor: gnome-shell driver: gpu: xe display-ID: 0
  Monitor-1: HDMI-A-1 model: AAA built: 2012 res: 1920x1080 dpi: 85
    size: 575x323mm (22.64x12.72") diag: 660mm (25.96") modes: max: 1920x1080
    min: 720x400
  API: EGL v: 1.5 hw: drv: intel iris platforms: device: 0 drv: iris
    device: 1 drv: swrast gbm: drv: iris surfaceless: drv: iris wayland:
    drv: iris x11: drv: iris
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 24.1.3 glx-v: 1.4
    direct-render: yes renderer: Mesa Intel Graphics (ADL-N)
    device-ID: 8086:46d1 memory: 7.52 GiB unified: yes display-ID: :0.0

Have you tried a remote login after a 6.10.2 boot was given plenty of time to complete?

@mrmazda - Given around 1 minute of waiting (I did not time it), without seeing any printed message still, laptop is reset. Is auto-reset a known behavior when boot is stuck?

I wonder if xe.force_probe=4c8b and i915.force_probe=!4c8b, as you can see I used above on my kernel cmdline, added to yours would be of any benefit. My GPU is a year or more older than yours, so requires at least the first two in order to use the Xe drivers instead of older.

No change, other than getting this scary looking log message now:
[ 4.239381] [ T519] Setting dangerous option force_probe - tainting kernel

@malcolmlewis - what’s the difference between the magic bytes 46d1 and 4c8b?

Hmm raptor-Lake is that not the Intel CPU that has known instabilities???

@gogalthorp - yes it’s the code name. I’m crossing my fingers that I’m not affected by oxidation. I have 3 year warranty from Dell in case I am. Regarding the instability, as far as I know it’s been observed mostly on Desktop CPUs, not mobile. I hope that’s the case.

@eshoe that’s the chip id for my Intel GPU… :wink:

I just read today that Intel is in process of extending generation 13 & 14 CPU warranties from 3 years to 5.

@eshoe that’s the chip id for my Intel GPU… :wink:

Oh :sweat_smile: , thanks. Well using my correct chip id changed nothing for the not-working 6.10.2-1 boot process.

I just read today that Intel is in process of extending generation 13 & 14 CPU warranties from 3 years to 5.

That would be nice. I’m not sure how it works with Dell’s warranty though. It’s not like I’d get full warranty for all hardware and Dell support for extra 2 years (though it would be cool!)

I suspect Intel will have a procedure allowing you to get a replacement if needed without involving or even notifying Dell.

The TO has a Laptop. You can’t easily change a CPU on a Laptop “without involving the manufacturer”.

“You” may not be the one doing the changing. Intel could have designated facilities for having a change made by a professional whether laptop or otherwise, possibly to include a Dell service center if one is nearby, and others are not.

Hi I’m having a similar / related issue when booting into the latest 6.10.2-1 update in so far as it is not recognizing my display and defaulting to a really low resolution with big splash screen & then icons etc. being massive and blurry with no option to change as displays not recognized or shown in settings to adjust. Currently using advanced options & booting into previous 6.9… fine. I’m also using a dreaded Nvidia graphics card with X11 as it seems to misbehave with Wayland before this. Any help or suggestions gratefully received. Here’s the output from inxi -GSaz below: inxi -GSaz
System:
Kernel: 6.9.9-1-default arch: x86_64 bits: 64 compiler: gcc v: 13.3.0
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/boot/vmlinuz-6.9.9-1-default
root=/dev/mapper/system-root splash=silent resume=/dev/system/swap
mitigations=auto quiet security=apparmor
Desktop: GNOME v: 46.3.1 tk: GTK v: 3.24.43 wm: gnome-shell
tools: gsd-screensaver-proxy avail: xscreensaver dm: GDM v: 46.2
Distro: openSUSE Tumbleweed 20240801
Graphics:
Device-1: NVIDIA GP106 [GeForce GTX 1060 3GB] vendor: Gigabyte
driver: nvidia v: 550.67 alternate: nouveau,nvidia_drm non-free: 545.xx+
status: current (as of 2024-06; EOL~2026-12-xx) arch: Pascal code: GP10x
process: TSMC 16nm built: 2016-2021 pcie: gen: 2 speed: 5 GT/s lanes: 16
link-max: gen: 3 speed: 8 GT/s ports: active: none off: DP-1,HDMI-A-1
empty: DVI-D-1,DVI-D-2 bus-ID: 01:00.0 chip-ID: 10de:1c02 class-ID: 0300
Display: x11 server: X.org v: 1.21.1.12 with: Xwayland v: 24.1.1
compositor: gnome-shell driver: X: loaded: nvidia
unloaded: fbdev,modesetting,vesa alternate: nouveau,nv
gpu: nvidia,nvidia-nvswitch display-ID: :0 screens: 1
Screen-1: 0 s-res: 5360x1440 s-size: <missing: xdpyinfo>
Monitor-1: DP-1 mapped: DP-0 note: disabled pos: primary,top-left
model: LG (GoldStar) HDR WQHD serial: built: 2023 res: 3440x1440
hz: 60 dpi: 109 gamma: 1.2 size: 800x335mm (31.5x13.19")
diag: 867mm (34.1") modes: max: 3440x1440 min: 640x480
Monitor-2: HDMI-A-1 mapped: HDMI-0 note: disabled pos: bottom-r
model: LG (GoldStar) 2D FHD TV serial: built: 2013 res: 1920x1080
hz: 60 dpi: 96 gamma: 1.2 size: 509x286mm (20.04x11.26") diag: 584mm (23")
ratio: 16:9 modes: max: 1920x1080 min: 640x480
API: OpenGL v: 4.6.0 vendor: nvidia v: 550.67 glx-v: 1.4
direct-render: yes renderer: NVIDIA GeForce GTX 1060 3GB/PCIe/SSE2
memory: 2.93 GiB
API: EGL Message: EGL data requires eglinfo. Check --recommends.

Tired the --recommends thing at the bottom of that output: maybe this section could be relevant amongst all the other stuff? Test: recommended kernel modules:

GPU modules are only needed if applicable. NVMe drives do not need drivetemp
but other types do.

To load a module: modprobe - To permanently load add to
/etc/modules or /etc/modules-load.d/modules.conf (check your system paths for
exact file/directory names).

amdgpu: -s, -G AMD GPU sensor data (newer GPUs)… Missing
drivetemp: -Dx drive temperature (kernel >= 5.6)… Missing
nouveau: -s, -G Nvidia GPU sensor data (if using free driver)… Missing
radeon: -s, -G AMD GPU sensor data (older GPUs)… Missing

The following recommended kernel modules are missing:
amdgpu
drivetemp
nouveau
radeon

Not sure how I’d install those. Guess I could also try booting with the on board graphics connected instead of the Nvidia or try it with Wayland?

Update: Kernel 6.10.3-1 also doesn’t boot.

Same thing as before. Black screen. No boot messages. After 1 minute laptop gets reset.

I’m really lost and don’t have experience debugging linux kernel boot process.

Any new ideas?
Thanks.