The state of nvidia drivers packaging

sorfat · August 19, 2024, 11:52am

The main question of this post is “how could we go about in enacting a change in how or what versions of the nvidia drivers are being packaged”

The background:

As it stands right now the policy seems to be to package only the production stable branch of the drivers. From what i understand, due to the proprietary nature of the drivers, opensuse is not able to distribute these directly in the repos and they are hosted on the nvidia servers. This in itself is not a problem, however, i am not sure if part of this agreement with nvidia is the fact that only production stable can be packaged.

At the time of writing this:
production stable: 550.107
latest version in the nvidia repo : 550.100
latest open modules in the repos: 550.107

So installing the drivers right now would land you in 550.100 with the proprietary module without being able to install the open modules.

The situation would not be as bad if the 550.100 drivers did not have known issues across the board. Issues that have been fixed either in 550.107 or in the 555 new feature branch.
Apart from those issues, the new feature branch offers explicit sync.

Both the latest version of gnome and kde include support for explicit sync but it is not usable on nvidia due to the drivers that are available in the repos being ones without that feature.

While for leap i can see a reason to package the production stable branch, for tumbleweed that is essentially a rolling release i don’t understand not packaging the new feature branch.

Don’t get me wrong, i am not advocating for packaging beta drivers, but the stable new feature branch and stable production should be fine. The idea would be packaging the highest version between those two.

If the argument would be cuda, then the issue is even stranger. The open module in the repos is 555.42 and the latest cuda download from nvidia uses 560 so beta branch.

Yes, i am aware that i can install any version i want using the run file from nvidia but it’s not 2006, and considering that the wiki states the run file as being “the hard way” i doubt it would be something a new user would go for. (even nvidia suggest using the distro packages over the run file)

The conclusions would be:

is the current policy of packaging the production stable branch something imposed by nvidia or self imposed ?
could the community affect this policy in some way and how should we go about doing this ?
is there some sort of limitation in packaging different versions of the driver for leap and tumbleweed ? (the repos that nvidia hosts are different for the two so there would be no mix-up there)

*appologies for and odd wording or gramatical errors - this is not my main language

Sauerland · August 19, 2024, 12:01pm

I think update is on it’s way.

groo · August 19, 2024, 2:04pm

I assume we will have a blog post or wiki update or both on how to use these.
Maybe Stefan Dirsch (sndirsch) could provide some information on how this will work?

looking forward for these changes.

digigan · August 20, 2024, 11:26pm

That would be incredible, I look forward to utilizing the open kernel and running Xwayland flicker-free

az · August 27, 2024, 3:51am

I just encountered the same problem. Since the proprietary kernel module causes kernel panic, I have to use the open kernel module. Are there any solutions to this or any updates?

digigan · August 27, 2024, 4:18am

I did not heard anything official, according to this Reddit comment you can replace the Tumbleed repo with the Cuda repo (both are provided by Nvidia), and reinstall the packages.
I haven’t tried it yet. I’m used to the propietary blob breaking the kernel on every update if I forget to patch it. But it shouldn’t be needed if installed through the repos. Hopefully someone with more knowledge can hop in

sorfat · August 27, 2024, 9:39am

I did try that without success. The modules just don’t load for some reason.
At this stage we are kinda stuck with the “hard way” and that works but I can’t see a good reason for the current state of the drivers in the repos ( even the 550 ones are still outdated )

So the production stable one is outdated, the open module is mismatched, the cuda open modules are mismatched to the current cuda…

I just can’t figure out for what or whom this is being packaged.

hui · August 27, 2024, 10:01am

By following bugzilla you can see that the latest stable version is pushed and now Nvidia needs to publish them to the openSUSE repo.

digigan · August 27, 2024, 12:04pm

I was able to make it work by disabling secureboot and replacing the kernel with the open one, without that I was getting a black screen on SDDM

Svyatko · August 29, 2024, 4:00am

Open kernel driver is not a panacea because it is out of tree. LTS kernel might help.

malcolmlewis · August 29, 2024, 12:44pm

@Svyatko Runs fine here without issues, but using cuda run and the later run file… The newer run files install the open driver by default now if hardware is Turing+

digigan · August 29, 2024, 1:00pm

What’s the advantage of using the run file vs just adding the official cuda repo? The only tinkering I needed with it was to install the open kernel instead of the default (which would leave me with a black screen on SDDM)

There’s a package named nv-prefer-signed-open-driver on tw OSS repos for signed kernels but it was a few versions behind so I went for the non signed one

malcolmlewis · August 29, 2024, 1:14pm

@digigan more control over the install options, my current Nvidia GPU is used for offload only. Then I can also run later versions of the driver.

thommierother · August 31, 2024, 8:51am

It would be nice to hear from @sndirsch and SUSE colleagues about the current status of the deployment of nvidia 555.x with explicit sync. This (and the “flickering”) is the last stumbling block to prevent wayland migration, I assume, for many people … Maybe now after the holiday season is over ?

Android_Gynous · September 1, 2024, 12:13am

I would like to state for the record, I have not experienced a kernel panic for over a month now using the proprietary drivers in X11.

mchnz · September 1, 2024, 8:16pm

Yesterday I tripped over a big reason to use the “hard way” installer.

I had a TW machine locked on kernel 6.9.9-1-default, but yesterday dup’ed to TW 20240829 and also needed to update the G05 driver. Using the nvidia repo failed somewhat silently because 6.9.9-1-default was built with an older gcc than the 20240829 default (the only clues were in the logs). I eventually turned to the “hard way” installer, its pre-install checks pointed out the problem and the solution (use CC to define the compiler, so CC=gcc-13 sh NVIDIA-Linux-x86_64-470.239.06.run).

Presumably the same issue may be true for those who choose the longterm kernel.

(I’ve since added this example to the wiki page for the “hard way” - possibly my prose is a bit clumsy and maybe the example should be moved to a common point before the choice is made.)

Prexy · September 2, 2024, 2:51pm

So, are you still using a 6.9 kernel? I followed forum advice and locked that kernel but also locked 6.10 and have been waiting for a “fix” to nvidia so I can use the newer kernel. The listed fixes are WAY over my level of competency. The 6.10 kernel doesn’t find my second monitor, will not let me adjust the resolution on the found monitor and is using a G04 driver. I’m afraid to update that and break the 6.9 kernel which works. Any advice?

malcolmlewis · September 2, 2024, 3:24pm

@Prexy that all depends on your GPU model… But also I don’t use the rpms…

Prexy · September 2, 2024, 3:33pm

Forgot to mention
6.9.9-1-default (64-bit)

and
NVIDIA GeForce GT 730/PCIe/SSE2 using G05

malcolmlewis · September 2, 2024, 3:49pm

@Prexy The driver should rebuild automatically on a kernel upgrade, if it doesn’t then you should be creating a bug report.