NVIDIA crash after 20250811 update

I upgraded to Tumbleweed 20250811 with kernel 6.12.41 longterm at runlevel 3 of grub with:

zypper dup

Loading the NVIDIA has failed with errors. I was on the 570 serie and now it seems to have switched to 580 but not everything upgraded.

How can I reinstall or reload the correct nvidia drivers?

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 580.76

Here’s my inxi

$ inxi -Gxxx
Graphics:
  Device-1: Intel Meteor Lake-P [Intel Arc Graphics] vendor: Dell driver: N/A
    arch: Xe-LPG bus-ID: 00:02.0 chip-ID: 8086:7d55 class-ID: 0300
  Device-2: NVIDIA AD107M [GeForce RTX 4060 Max-Q / Mobile] vendor: Dell
    driver: nvidia v: 570.172.08 arch: Lovelace pcie: speed: 2.5 GT/s lanes: 8
    bus-ID: 01:00.0 chip-ID: 10de:28e0 class-ID: 0300
  Device-3: Realtek Integrated_Webcam_FHD driver: uvcvideo type: USB
    rev: 2.0 speed: 480 Mb/s lanes: 1 bus-ID: 3-9:7 chip-ID: 0bda:557c
    class-ID: fe01 serial: 200901010001
  Display: x11 server: X.org v: 1.21.1.15 compositor: xfwm4 v: 4.20.0
    driver: X: loaded: modesetting unloaded: vesa failed: nvidia
    alternate: fbdev,nouveau,nv gpu: nvidia display-ID: :0.0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-size: <missing: xdpyinfo>
  Monitor-1: Unknown-1 mapped: None-1 res: mode: 1920x1080 hz: 60
    scale: 100% (1) size: N/A modes: 1920x1080
  API: OpenGL v: 4.5 vendor: mesa v: 25.1.7 glx-v: 1.4 es-v: 3.2
    direct-render: yes renderer: llvmpipe (LLVM 20.1.8 256 bits)
    device-ID: ffffffff:ffffffff
  Info: Tools: api: glxinfo de: xfce4-display-settings
    gpu: nvidia-settings,nvidia-smi x11: xprop,xrandr

and here’s the journal boot errors:

Aug 13 10:59:34 dodoite kernel: 
Aug 13 10:59:35 dodoite kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
Aug 13 10:59:35 dodoite kernel: [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
Aug 13 10:59:35 dodoite nvidia-persistenced[1190]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 466>
Aug 13 10:59:35 dodoite systemd[1]: Failed to start NVIDIA Persistence Daemon.
Aug 13 10:59:35 dodoite nvidia-persistenced[1238]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 466>
Aug 13 10:59:35 dodoite systemd[1]: Failed to start NVIDIA Persistence Daemon.
Aug 13 10:59:35 dodoite nvidia-persistenced[1313]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 466>
Aug 13 10:59:35 dodoite systemd[1]: Failed to start NVIDIA Persistence Daemon.
Aug 13 10:59:36 dodoite nvidia-persistenced[1381]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 466>
Aug 13 10:59:36 dodoite systemd[1]: Failed to start NVIDIA Persistence Daemon.
Aug 13 10:59:36 dodoite nvidia-persistenced[1390]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 466>
Aug 13 10:59:36 dodoite systemd[1]: Failed to start NVIDIA Persistence Daemon.
Aug 13 10:59:36 dodoite systemd[1]: Failed to start NVIDIA Persistence Daemon.
Aug 13 10:59:43 dodoite lightdm[1587]: gkr-pam: unable to locate daemon control file

and here are the repos

 $ zypper lr -u
Repository priorities in effect:                                                                                                            (See 'zypper lr -P' for details)
      70 (raised priority)  :  1 repository
      99 (default priority) :  5 repositories

# | Alias                      | Name              | Enabled | GPG Check | Refresh | URI
--+----------------------------+-------------------+---------+-----------+---------+---------------------------------------------------------
1 | NVIDIA:repo-non-free       | repo-non-free     | Yes     | (r ) Yes  | Yes     | https://download.nvidia.com/opensuse/tumbleweed
2 | openSUSE:repo-non-oss      | repo-non-oss      | Yes     | (r ) Yes  | Yes     | http://cdn.opensuse.org/tumbleweed/repo/non-oss
3 | openSUSE:repo-openh264     | repo-openh264     | Yes     | (r ) Yes  | Yes     | https://codecs.opensuse.org/openh264/openSUSE_Tumbleweed
4 | openSUSE:repo-oss          | repo-oss          | Yes     | (r ) Yes  | Yes     | http://cdn.opensuse.org/tumbleweed/repo/oss
5 | openSUSE:repo-oss-debug    | repo-oss-debug    | No      | ----      | ----    | http://cdn.opensuse.org/debug/tumbleweed/repo/oss
6 | openSUSE:repo-oss-source   | repo-oss-source   | No      | ----      | ----    | http://cdn.opensuse.org/source/tumbleweed/repo/oss
7 | openSUSE:update-tumbleweed | update-tumbleweed | Yes     | (r ) Yes  | Yes     | http://cdn.opensuse.org/update/tumbleweed
8 | packman                    | Packman           | Yes     | (r ) Yes  | Yes     | https://ftp.fau.de/packman/suse/openSUSE_Tumbleweed/

Post:
zypper se -si nvidia

Failed to initialize NVML: Driver/library version mismatch
NVML library version: 580.76
 Device-2: NVIDIA AD107M [GeForce RTX 4060 Max-Q / Mobile] vendor: Dell
    driver: nvidia v: 570.172.08

Hiya. It is a common issue on Tumbleweed when using Nvidia open kernel modules. It is often out of sync for a little while when driver update drops. Versions of nvidia-userspace-meta-G06 and nvidia-open-driver-G06-signed-kmp-longterm have to match for driver to work. It should get “fixed” in 20250812 or newer.

For time being you have 3 options:

  1. Rollback nvidia-userspace-meta-G06 and relevant packages to 570.172.08 (sudo zypper in --oldpackage nvidia-userspace-meta-G06-570.172.08-10.1).
  2. Rollback update using snapper.
  3. Use nvidia-driver-G06-kmp-longterm instead.
$ zypper se -si nvidia
Loading repository data...
Reading installed packages...

S  | Name                                       | Type    | Version                   | Arch   | Repository
---+--------------------------------------------+---------+---------------------------+--------+------------------
i  | kernel-firmware-nvidia                     | package | 20250516-4.1              | noarch | repo-oss
i  | libnvidia-egl-gbm1                         | package | 1.1.2-7.14                | x86_64 | repo-non-free
i  | libnvidia-egl-wayland1                     | package | 1.1.20-51.1               | x86_64 | repo-non-free
i  | libnvidia-egl-x111                         | package | 1.0.3-21.1                | x86_64 | repo-non-free
i  | libnvidia-gpucomp                          | package | 580.76.05-39.1            | x86_64 | repo-non-free
i+ | nvidia-common-G06                          | package | 580.76.05-39.1            | x86_64 | repo-non-free
i+ | nvidia-compute-G06                         | package | 580.76.05-39.1            | x86_64 | repo-non-free
i+ | nvidia-compute-utils-G06                   | package | 580.76.05-39.1            | x86_64 | repo-non-free
i+ | nvidia-gl-G06                              | package | 580.76.05-39.1            | x86_64 | repo-non-free
i+ | nvidia-libXNVCtrl                          | package | 580.76.05-41.1            | x86_64 | repo-non-free
i+ | nvidia-modprobe                            | package | 580.76.05-17.1            | x86_64 | repo-non-free
i  | nvidia-open-driver-G06-signed-kmp-default  | package | 570.172.08_k6.15.8_1-2.1  | x86_64 | repo-oss
i  | nvidia-open-driver-G06-signed-kmp-longterm | package | 570.172.08_k6.12.40_1-1.4 | x86_64 | (System Packages)
i  | nvidia-open-driver-G06-signed-kmp-longterm | package | 570.172.08_k6.12.39_1-1.3 | x86_64 | (System Packages)
i  | nvidia-open-driver-G06-signed-kmp-longterm | package | 570.172.08_k6.12.41_1-2.1 | x86_64 | repo-oss
i+ | nvidia-persistenced                        | package | 580.76.05-2.1             | x86_64 | repo-non-free
i+ | nvidia-settings                            | package | 580.76.05-41.1            | x86_64 | repo-non-free
i+ | nvidia-userspace-meta-G06                  | package | 580.76.05-11.1            | noarch | repo-non-free
i+ | nvidia-userspace-meta-G06                  | package | 580.76.05-11.1            | noarch | repo-non-free
i+ | nvidia-video-G06                           | package | 580.76.05-39.1            | x86_64 | repo-non-free
i+ | openSUSE-repos-Tumbleweed-NVIDIA           | package | 20250728.9adc675-1.1      | x86_64 | repo-oss

Would this be alleviated by switch from tumbleweed to slowroll?
Because I’ve been considering doing it early next month when I finish some work that I have in progress. Maybe I should do it earlier…

Switch to the kmps from the Nvidia Repo…

I don’t know since I didn’t used Slowroll. Usually before upgrade I look that versions for Nvidia driver match before upgrading or pin them using zypper al if they don’t and wait for nvidia-open-driver-G06-signed-kmp-longterm to get updated. You can just rollback nvidia-userspace-meta-G06 to 570.172.08 to fix it for now.

Sorry, I don’t know how to do that (I’m an old dinosaur who knows enough to be dangerous, aka the duning-kruger effect)

How do I change the repos?

sudo zypper in nvidia-driver-G06-kmp-default

I want to confirm because I get the following:

$ sudo zypper in nvidia-driver-G06-kmp-default
[sudo] password for root: 
Refreshing service 'NVIDIA'.
Refreshing service 'openSUSE'.
Looking for gpg keys in repository Packman.
  gpgkey=https://ftp.fau.de/packman/suse/openSUSE_Tumbleweed/repodata/repomd.xml.key
Retrieving repository 'Packman' metadata .......................................................................................................[done]
Building repository 'Packman' cache ............................................................................................................[done]
Loading repository data...
Reading installed packages...
Resolving package dependencies...

Problem: 1: the installed nvidia-open-driver-G06-signed-kmp-longterm-570.172.08_k6.12.39_1-1.3.x86_64 conflicts with 'nvidia-driver-G06-kmp' provided by the to be installed nvidia-driver-G06-kmp-default-580.76.05_k6.15.8_1-39.1.x86_64
 Solution 1: Following actions will be done:
  deinstallation of nvidia-open-driver-G06-signed-kmp-longterm-570.172.08_k6.12.39_1-1.3.x86_64
  deinstallation of nvidia-open-driver-G06-signed-kmp-longterm-570.172.08_k6.12.40_1-1.4.x86_64
  deinstallation of nvidia-open-driver-G06-signed-kmp-longterm-570.172.08_k6.12.41_1-2.1.x86_64
  deinstallation of nvidia-open-driver-G06-signed-kmp-default-570.172.08_k6.15.8_1-2.1.x86_64
 Solution 2: do not install nvidia-driver-G06-kmp-default-580.76.05_k6.15.8_1-39.1.x86_64

Do I use solution 1?

Yup. Solution 1 is what you want.

1 Like

Perfect. Thank you very much for taking the time to help

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.