Nvidia drivers after kernel update

I think this might be 20th time this year I’m having the NVIDIA drivers break after an update.

2 Likes

This morning I was on kernel 6.17.9-1-default. As part of converting properly to the OpenSUSE new open driver for CUDA, I manually removed all Nvidia and Cuda related packages that appear to be associated with the meta package cuda-cloud-opengpu, and I also removed any associated with the 580 driver. What I did is fully described in Getting The CUDA Toolkit Through ZYPper - #3 by mchnz. Then as noted in the referenced post I installed:

# zypper in nvidia-open-driver-G06-signed-cuda-kmp-default
# version=$(rpm -qa --queryformat '%{VERSION}\n' nvidia-open-driver-G06-signed-cuda-kmp-default | cut -d "_" -f1 | sort -u | tail -n 1)
# zypper install cuda-cloud-opengpu = ${version}

This afternoon I moved to kernel 6.18.0-1-default, rebooted, and sddm failed to start. The content of Xorg.0.log was:

[     9.055] (II) LoadModule: "glx"
[     9.105] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[     9.122] (II) Module glx: vendor="X.Org Foundation"
[     9.122]    compiled for 1.21.1.21, module version = 1.0.0
[     9.122]    ABI class: X.Org Server Extension, version 10.0
[     9.122] (II) LoadModule: "nvidia"
[     9.122] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[     9.135] (II) Module nvidia: vendor="NVIDIA Corporation"
[     9.135]    compiled for 1.6.99.901, module version = 1.0.0
[     9.135]    Module class: X.Org Video Driver
[     9.139] (II) NVIDIA dlloader X Driver  580.105.08  Wed Oct 29 22:16:45 UTC 2025
[     9.139] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[     9.139] (II) Loading sub module "fb"
[     9.139] (II) LoadModule: "fb"
[     9.139] (II) Module "fb" already built-in
[     9.139] (II) Loading sub module "wfb"
[     9.139] (II) LoadModule: "wfb"
[     9.139] (II) Loading /usr/lib64/xorg/modules/libwfb.so
[     9.140] (II) Module wfb: vendor="X.Org Foundation"
[     9.140]    compiled for 1.21.1.21, module version = 1.0.0
[     9.140]    ABI class: X.Org ANSI C Emulation, version 0.4
[     9.143] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the
[     9.143] (EE) NVIDIA:     system's kernel log for additional error messages and
[     9.143] (EE) NVIDIA:     consult the NVIDIA README for details.

Either I haven’t installed the driver and cuda-cloud-opengpu properly, or Tumbleweed 20251205 or 6.18.0-1-default has some kind of issue with the nvidia driver. I’ve booted back into 6.17.9-1-default, and all is well.

@mchnz because Nvidia CUDA has moved to the 590.44.01 driver…

So using the rpm’s, conflicts will arise if you don’t check the version numbers of a) what is installed and b) what is going to be installed. If none match then your going to have issues…

It takes but a few moments and a reboot or two for the run files… but it just works…
I have CUDA 13.1 and the 590.44.01 open driver running fine on Prime Render Offload…

I persist with the “recommended way” because I feel that might help the distribution become a bit better (if I give feedback in places like this).

I would confess to have assumed/guessed/hoped that recent developments in the openness of the driver might have resolved the mismatch issues. It’s relatively cheap for me to try things, so I just plowed ahead without reading all the warnings, which I have now read:

SDB:NVIDIA drivers - openSUSE Wiki Troubleshooting:
WARNING The prebuilt NVIDIA driver targets a specific kernel version, but is independently released by NVIDIA. Every time a new kernel version is released by openSUSE, NVIDIAs drivers lag in their updated and will not be available for the newer kernel. This does not cause an installation error, it simply just fails to provide an NVIDIA driver for the new kernel. You will have to manually check on each update whether the NVIDIA driver was provided, or setup a health-checker script to do so for you (which will cause an update failure and rollback on each updated until NVIDIA finally releases the updated driver).

So recent changes to the “recommended way” don’t solve all the issues. I still need to maintain my own awareness of versions and releases.

It would be good to develop a common solution for the mentioned health-checker script - I suppose figuring out in general which driver/hardware/kernel can cohabit might be quite tricky.

I will return to doing what I have been doing - which is always to be cautious around kernel bumps.

1 Like

@mchnz I use the rpms as well on Leap 16.0, just not on Tumbleweed :wink:

1 Like

Thanks. For now I am switching to old kernel in grub. I will wait for some days for some fix. But if this repeats, then I will have to try the nvidia run file.

I wonder if this kernel issue is with only major updates like 6.17 to 6.18, or even for minor updates like 6.18.1 to 6.18.2 ?

@csrinivas they should probably be built for every kernel… not sure if they drop to weak-updates.

Historically, with Tumbleweed since 2018, I’ve very often gotten away with plowing ahead with minor version bumps, such as 6.18.1 to 6.18.2 - I will often apply them if I’m not too busy with other things. Major bumps such as 6.17 to 6.18 have had a higher probability of causing issues, so I’ve tended to hold back on them. Tumbleweed configured for multi-version-kernels reduces the fallout considerably.

Over the past year with Wayland, Plasma-6, related driver changes from Nvidia, and OpenSUSE nvidia packaging changes, all coming along together, I did start to feel it was getting a bit much. But things seem to have shaken out and calmed down a bit of late.

1 Like

Like that dialogue from Airplane movie, “I picked the wrong time to install tumbleweed” :joy:

Just when there is a major kernel update.

new update
I did a fresh install of tumbleweed, with latest snapshot. And then i used zypper to do “install recommends” command. It installed the driver but it is for 6.12 long term driver, and also installed 6.12 LTS kernel. Now nvidia is working with 6.12, but still fails with 6.18.

Now what will happen in future when driver is available for 6.18? Will it replace this ?

This is the install log.

Preload finished. [success (5.7 MiB/s) ] ................................................................[done]
Retrieving: libOpenCL1-2.3.4-65.1.x86_64 (repo-non-free)                                  (1/18),  65.5 KiB
Retrieving: libnvidia-egl-gbm1-1.1.2-7.17.x86_64 (repo-non-free)                          (2/18),  19.6 KiB
Retrieving: libnvidia-egl-wayland1-1.1.20-52.3.x86_64 (repo-non-free)                     (3/18),  47.7 KiB
Retrieving: libnvidia-egl-x111-1.0.3-21.5.x86_64 (repo-non-free)                          (4/18),  54.4 KiB
Retrieving: libnvidia-gpucomp-580.105.08-44.1.x86_64 (repo-non-free)                      (5/18),  19.1 MiB
Retrieving: nvidia-modprobe-580.105.08-20.1.x86_64 (repo-non-free)                        (6/18),  29.4 KiB
Retrieving: nvidia-persistenced-580.105.08-2.1.x86_64 (repo-non-free)                     (7/18),  32.9 KiB
Retrieving: nvidia-gl-G06-580.105.08-44.1.x86_64 (repo-non-free)                          (8/18), 143.3 MiB
Retrieving: kernel-longterm-6.12.60-1.1.x86_64 (Main Repository (OSS))                    (9/18), 176.1 MiB
Retrieving: clinfo-3.0.25.02.14-1.1.x86_64 (Main Repository (OSS))                       (10/18),  63.2 KiB
Retrieving: mozilla-openh264-2.6.0-2.suse1699.10.x86_64 (repo-openh264)                  (11/18), 466.7 KiB
Retrieving: nvidia-common-G06-580.105.08-44.1.x86_64 (repo-non-free)                     (12/18),  72.5 MiB
Retrieving: nvidia-open-driver-G06-signed-kmp-longterm-580.105.08_k6.12.59_1-2.4.x86_64 (Main Repository (OSS))
                                                                                         (13/18),   9.5 MiB
Retrieving: nvidia-video-G06-580.105.08-44.1.x86_64 (repo-non-free)                      (14/18),   7.0 MiB
Retrieving: nvidia-open-driver-G06-signed-kmp-meta-580.105.08-25.1.x86_64 (repo-non-free)
                                                                                         (15/18),  53.8 KiB
Retrieving: nvidia-compute-G06-580.105.08-44.1.x86_64 (repo-non-free)                    (16/18),  59.4 MiB
Retrieving: nvidia-compute-utils-G06-580.105.08-44.1.x86_64 (repo-non-free)              (17/18), 564.7 KiB
Retrieving: nvidia-userspace-meta-G06-580.105.08-24.1.x86_64 (repo-non-free)             (18/18),  11.5 KiB

Checking for file conflicts: ............................................................................[done]
( 1/18) Installing: libOpenCL1-2.3.4-65.1.x86_64 ........................................................[done]
( 2/18) Installing: libnvidia-egl-gbm1-1.1.2-7.17.x86_64 ................................................[done]
( 3/18) Installing: libnvidia-egl-wayland1-1.1.20-52.3.x86_64 ...........................................[done]
( 4/18) Installing: libnvidia-egl-x111-1.0.3-21.5.x86_64 ................................................[done]
( 5/18) Installing: libnvidia-gpucomp-580.105.08-44.1.x86_64 ............................................[done]
( 6/18) Installing: nvidia-modprobe-580.105.08-20.1.x86_64 ..............................................[done]
( 7/18) Installing: nvidia-persistenced-580.105.08-2.1.x86_64 ...........................................[done]
( 8/18) Installing: nvidia-gl-G06-580.105.08-44.1.x86_64 ................................................[done]
( 9/18) Installing: kernel-longterm-6.12.60-1.1.x86_64 ..................................................[done]
(10/18) Installing: clinfo-3.0.25.02.14-1.1.x86_64 ......................................................[done]
(11/18) Installing: mozilla-openh264-2.6.0-2.suse1699.10.x86_64 .........................................[done]
(12/18) Installing: nvidia-common-G06-580.105.08-44.1.x86_64 ............................................[done]
(13/18) Installing: nvidia-open-driver-G06-signed-kmp-longterm-580.105.08_k6.12.59_1-2.4.x86_64 .........[done]
Created symlink '/etc/systemd/system/systemd-hibernate.service.wants/nvidia-hibernate.service' -> '/usr/lib/systemd/system/nvidia-hibernate.service'.
Created symlink '/etc/systemd/system/multi-user.target.wants/nvidia-powerd.service' -> '/usr/lib/systemd/system/nvidia-powerd.service'.
Created symlink '/etc/systemd/system/systemd-suspend.service.wants/nvidia-resume.service' -> '/usr/lib/systemd/system/nvidia-resume.service'.
Created symlink '/etc/systemd/system/systemd-hibernate.service.wants/nvidia-resume.service' -> '/usr/lib/systemd/system/nvidia-resume.service'.
Created symlink '/etc/systemd/system/systemd-suspend-then-hibernate.service.wants/nvidia-resume.service' -> '/usr/lib/systemd/system/nvidia-resume.service'.
Created symlink '/etc/systemd/system/systemd-suspend.service.wants/nvidia-suspend.service' -> '/usr/lib/systemd/system/nvidia-suspend.service'.
(14/18) Installing: nvidia-video-G06-580.105.08-44.1.x86_64 .............................................[done]
(15/18) Installing: nvidia-open-driver-G06-signed-kmp-meta-580.105.08-25.1.x86_64 .......................[done]
Created symlink '/etc/systemd/system/multi-user.target.wants/nvidia-persistenced.service' -> '/usr/lib/systemd/system/nvidia-persistenced.service'.
(16/18) Installing: nvidia-compute-G06-580.105.08-44.1.x86_64 ...........................................[done]
(17/18) Installing: nvidia-compute-utils-G06-580.105.08-44.1.x86_64 .....................................[done]
(18/18) Installing: nvidia-userspace-meta-G06-580.105.08-24.1.x86_64 ....................................[done]
%transfiletriggerin(systemd-257.9-3.1.x86_64) script output:
Creating group 'nvidia-persistenced' with GID 460.
Creating user 'nvidia-persistenced' (User for NVIDIA Persistenced Service) with UID 460 and GID 460.
Running post-transaction scripts ........................................................................[done]

That is unexpected.

andrei@tumbleweed:~> zypper info --requires nvidia-open-driver-G06-signed-kmp-meta
...
Requires       : nvidia-open-driver-G06-signed-kmp

andrei@tumbleweed:~> zypper search --provides -x nvidia-open-driver-G06-signed-kmp
Loading repository data...
Reading installed packages...

S  | Name                                            | Summary                                                 | Type
---+-------------------------------------------------+---------------------------------------------------------+--------
   | nvidia-open-driver-G06-signed-cuda-kmp-default  | NVIDIA open kernel module driver for GeForce 16 serie-> | package
   | nvidia-open-driver-G06-signed-cuda-kmp-longterm | NVIDIA open kernel module driver for GeForce 16 serie-> | package
   | nvidia-open-driver-G06-signed-kmp-default       | NVIDIA open kernel module driver for GeForce 16 serie-> | package
   | nvidia-open-driver-G06-signed-kmp-longterm      | NVIDIA open kernel module driver for GeForce 16 serie-> | package
andrei@tumbleweed:~>

There is something wrong with dependencies either.

andrei@tumbleweed:~> zypper info --supplements nvidia-open-driver-G06-signed-kmp-default
...
Supplements    : (kernel-default and nvidia-open-driver-G06-signed)

andrei@tumbleweed:~> zypper info --supplements nvidia-open-driver-G06-signed-kmp-longterm
...
Supplements    : (kernel-longterm and nvidia-open-driver-G06-signed)

andrei@tumbleweed:~>

But the nvidia-open-driver-G06-signed does not exist

andrei@tumbleweed:~> zypper info nvidia-open-driver-G06-signed
Loading repository data...
Reading installed packages...


package 'nvidia-open-driver-G06-signed' not found.
No matching items found.
andrei@tumbleweed:~> zypper search --provides -x nvidia-open-driver-G06-signed
Loading repository data...
Reading installed packages...
No matching items found.
andrei@tumbleweed:~>

which means, zypper now needs to select one of the four packages providing nvidia-open-driver-G06-signed-kmp. There is no fixed rule how zypper does it. In the past it was common to pick the first in alphabetical order. Maybe, now it settles for the last.

Oh, and - contrary to what I’d expect - the recommends all point to the single package:

andrei@tumbleweed:~> zypper info --supplements nvidia-open-driver-G06-signed-kmp-meta
Loading repository data...
Reading installed packages...


Information for package nvidia-open-driver-G06-signed-kmp-meta:
---------------------------------------------------------------
Repository     : nvidia
Name           : nvidia-open-driver-G06-signed-kmp-meta
Version        : 580.105.08-25.1
Arch           : x86_64
Vendor         : obs://build.suse.de/Proprietary:X11:Drivers
Installed Size : 0 B
Installed      : No
Status         : not installed
Source package : nvidia-userspace-meta-G06-580.105.08-25.1.src
Upstream URL   : https://build.opensuse.org/package/show/X11:XOrg/nvidia-userspace-meta-G06
Summary        : Meta package to select open nvidia driver in sync
Description    :
    Meta package to select open nvidia driver in sync, i.e. trigger
    installation of nvidia-open-driver-G06-signed-kmp. Hardware
    supplements moved to this meta package. Also require
    nvidia-userspace-meta-G06 in sync in case where --no-recommends
    is being used.
Supplements    : [580]
    (kernel-default and namespace:modalias(pci:v000010DEd00001E02sv*sd*bc03sc0[02]i00*))
    (kernel-default and namespace:modalias(pci:v000010DEd00001E04sv*sd*bc03sc0[02]i00*))
...
    (kernel-default and nvidia-open-driver-G06-signed-kmp-meta)
    (kernel-default and nvidia-open-driver-G06-signed-kmp-meta)
    (kernel-longterm and nvidia-open-driver-G06-signed-kmp-meta)
    (kernel-longterm and nvidia-open-driver-G06-signed-kmp-meta)
...
    (kernel-longterm and namespace:modalias(pci:v000010DEd00001E02sv*sd*bc03sc0[02]i00*))
    (kernel-longterm and namespace:modalias(pci:v000010DEd00001E04sv*sd*bc03sc0[02]i00*))
...

This needs bug report I would say.