nvidia-gfxG04 from Zypper broken

Hi,
e
I perfomed a zypper dup today (from a November build) , after rebooting the GUI didn’t load correctly (mouse cursor visible and movable but otherwise black screen). Since this is a fairly normal experience for me as an NVIDIA user I reboot using init 3 and uninstall nvidia-gfxG04 to revert to nouveau and reboot and everything works.

I then instal nvidia-gfxG04 manually and watch the build error and notice it’s running wild trying to build against old kernel versions for some reason and throwing warnings by the bucketload since the kernel symbols for those kernels have been removed. So I uninstall it again.

Having been bemused by this I decide to see if the “Hard Way” using the NVIDIA installer works, so I download the latest NVIDIA installer (for the same version) reboot with init 3 and run it, and it works (the now usual need to stomp libglvnd excepted) and after a reboot it works normally.

Can anyone provide me help on what’s happening with nvidia-gfxG04 and possibly how to get it functioning again, the automatic updates are nice.

Hard to guess, in principle it should work.

What kernel packages do you have installed? Which kernel are you using?

rpm -qa | grep kernel
uname -a

What exact error messages do you get when installing nvidia-gfxG04?

There is a known “problem”, in that the kernel module is only compiled (successfully) for one kernel and therefore will only work with that kernel, even if several are installed.
See https://lists.opensuse.org/opensuse-factory/2018-02/msg01086.html and https://bugzilla.opensuse.org/show_bug.cgi?id=1082704

Sorry for the delayed response, I got caught up in other stuff;
kalanyr@linux-npzk:~> rpm -qa | grep kernel
kernel-default-4.15.5-1.2.x86_64
kernel-devel-4.13.11-1.2.noarch
kernel-default-devel-4.13.11-1.2.x86_64
kernel-default-devel-4.15.5-1.2.x86_64
kernel-source-4.13.11-1.2.noarch
kernel-source-4.15.5-1.2.noarch
kernel-syms-4.15.5-1.2.x86_64
kernel-macros-4.15.5-1.2.noarch
kernel-default-4.13.11-1.2.x86_6
kernel-devel-4.15.5-1.2.noarch
kernel-firmware-20180201-1.1.noarch
kalanyr@linux-npzk:~> uname -a
Linux linux-npzk 4.15.5-1-default #1 SMP PREEMPT Thu Feb 22 21:48:29 UTC 2018 (52ce732) x86_64 x86_64 x86_64 GNU/Linux

You can find my zypper log since I changed over to using to using nvidia-gfxG04 here (it was working until the February update. I recommend searching for 'nvidia-gfx ’ for the relevant stuff).
https://www.dropbox.com/s/mckso9gon3s0by8/zypper-extract.txt?dl=0

I don’t see a problem in there on a quick glance.

So, is it working now?
There was a general (permission-related) problem with the rpm packages, should be fixed since a while though (actually at the time when you started this thread already).

You need to update to the latest nvidia packages though.

If it’s still not working, please provide more details, at least the Xorg log.

Btw, you can ignore the “errors” for the older kernel version…
And the “bug” I mentioned (which didn’t cause a problem for the latest kernel IIRC) should be fixed meanwhile anyway.

The list of the packages look fine, although outdated. The current Tumbleweed kernel is 4.15.7… :wink:

Okay, I removed the old kernels, uninstalled the manual NVIDIA drivers, reinstalled the vendor neutral libary, then reinstalled nvidia-gfxG04 as part of zypper dup. There’s definitely something still broken, it took way too long to build and there’s errors in the zypper log but I can’t make heads or tails of them.

Zypper log extract from doing this can be found here:
https://www.dropbox.com/s/b1cwknx23k4lu8a/zypper-extractMarch.txt?dl=0

Woops, sorry , you posted in between me refreshing and posing, all my x0rg logs (that I can find) can be found here: https://www.dropbox.com/sh/c6qtsjlmx4vg3fw/AADXWjA2XL4G3BmGp0HAwNiLa?dl=0

While I think of it:
kalanyr@linux-npzk:~> rpm -qa | grep kernel
kernel-default-4.15.5-1.2.x86_64
kernel-devel-4.15.7-1.7.noarch
kernel-source-4.15.7-1.7.noarch
kernel-default-devel-4.15.5-1.2.x86_64
kernel-default-4.15.7-1.7.x86_64
kernel-source-4.15.5-1.2.noarch
kernel-macros-4.15.7-1.7.noarch
kernel-syms-4.15.5-1.2.x86_64
kernel-syms-4.15.7-1.7.x86_64
kernel-devel-4.15.5-1.2.noarch
kernel-default-devel-4.15.7-1.7.x86_64
kernel-firmware-20180201-1.1.noarch
kalanyr@linux-npzk:~> kalanyr@linux-npzk:~> uname -a
kalanyr@linux-npzk:~: command not found
kalanyr@linux-npzk:~> uname -a
Linux linux-npzk 4.15.7-1-default #1 SMP PREEMPT Wed Feb 28 12:40:23 UTC 2018 (a36e160) x86_64 x86_64 x86_64 GNU/Linux
kalanyr@linux-npzk:~>

Sure it takes longer because the rpm packages recreate the initrd as well on installation.
That doesn’t indicate that something is broken.

You probably refer to lines like this with “errors”, right?

depmod: ERROR: fstatat(5, nvidia-drm.ko): No such file or directory

Might be caused by dangling symlinks, not necessarily a problem either though.

Well, Xorg.log.0 shows that the nvidia driver is not installed, you probably uninstalled it again…
Xorg.0.log.old (likely from the previous boot) does show that the nvidia driver is loaded and used, and also nvidia’s GLX module. I see no indication of a problem in there.

So what exactly doesn’t work? How exactly is it “broken”?
What problems do you have if the driver rpms are installed?

I have other packages that also need to recreate initrd, the nvidia build is taking longer than those as well (noticeably so). Though you’re right it’s not necessarily indicative of a problem, it may just take an unusual amount of time to build the package but I don’t remember it taking that long previously.

And yeah, I had to uninstall it, to get a GUI (init 3 works of course) , the problem is that when the GUI should load, it only works partially: the mouse cursor appears on a black screen but the rest of the login screen does not appear. The GUI works fine if I use nouveau instead and the driver from NVIDIA works too. To be clear the system is still responsive though: mouse cursor moves properly and pressing the power key causes a normal shutdown from the black screen (though I get a grey screen for a bit in the process that I suspect is supposed to be a GUI shutdown screen).

I suppose you use SDDM as login manager (and KDE Plasma as desktop), is that correct?

Try adding your user and the user “sddm” to the “video” group then.
It might still be a permission problem…

You could also try to run “glxinfo” as user and post the output.

Hi, apologies for the slow responses.

I tried adding those to users to video and it doesn’t do anything.

I tried glxinfo and it’s not installed. I did a zypper se for glxinfo and it didn’t return anything either. I tried a google search and all I got was a few things about it having been removed from MESA/X a few years ago. Could you point me at the current place to get it ?

Hm, the different permissions are basically the only difference between the RPM packages and the .run installer.

The driver is a closed-source binary blob anyway, both ways install the exact same files…

I tried glxinfo and it’s not installed. I did a zypper se for glxinfo and it didn’t return anything either. I tried a google search and all I got was a few things about it having been removed from MESA/X a few years ago. Could you point me at the current place to get it ?

It’s in the package Mesa-demo-x.

It’s me again. Didn’t have much call to use Linux for a while. I installed glxinfo , it’s not much use when the repo Nvidia drivers are installed (get an Error: can’t open display) and nothing else, I assume because it’s because I can’t run it in a GUI since the GUI hangs, even dropping back to Ctrl-Alt-2 after startx and the inevitable hang doesn’t help.

The noveau glxinfo log is https://www.dropbox.com/s/cqvl2ha3me0jhlj/noveau.log?dl=0

I can get you the official nvidia installer one if you like, I just haven’t because it always stomps a system library (the cross vendor compatability driver) , and I have to fix that before I can run noveau again.

I’m kind of tempted to just get the list of installed packages , nuke the system and start over (since my data is all separate), in case it’s just some weird error from a rolling update somewhere.

Yes, you’d need to run it inside the graphical session.

You may be able to get krunner open by pressing Alt+F2 in the broken session (in case you are using Plasma), or login to IceWM e.g.
You should be able to get back to the login screen by pressing Ctrl+Alt+Backspace twice.

Although, if you are using SDDM (you didn’t answer that question), the login screen probably doesn’t work either.
In that case, try to switch to xdm by running “sudo update-alternatives --config default-displaymanager”

The noveau glxinfo log is https://www.dropbox.com/s/cqvl2ha3me0jhlj/noveau.log?dl=0

The output with nouveau won’t help to find out why the nvidia driver doesn’t work, unfortunately.

It does show that you are using Mesa’s software OpenGL renderer though, but that may just be because Mesa-dri-nouveau isn’t installed…

I can get you the official nvidia installer one if you like, I just haven’t because it always stomps a system library (the cross vendor compatability driver) , and I have to fix that before I can run noveau again.

It won’t help anyway, because the point was to find out what the problem with the driver installed via rpm packages is.

About your library problem, that should be “fixed” if you uninstall the driver because it will restore the old libraries.
OTOH, Tumbleweed uses libglvnd since a while that allows to install several GL implementations at the same time (i.e. Mesa’s and nvidia’s) and choose the proper one on runtime.

But maybe that’s your problem somehow?
Maybe the libGL.so (and others) got replaced earlier, and the nvidia uninstaller restores the wrong one (coming from Mesa)?
That may cause the problem with the nvidia rpms, because those depend on libglvnd meanwhile on TW and won’t work without it AFAIK.
The .run installer OTOH will just oberwrite hte will
What’s the output of “ls -l /usr/lib64/libGL*”? (after you uninstalled the nvidia driver)

I’m kind of tempted to just get the list of installed packages , nuke the system and start over (since my data is all separate), in case it’s just some weird error from a rolling update somewhere.

It may help if the manual nvidia driver installation via the .run installer overwrote some system libraries in the past and that’s what causes the problem.

Although, maybe just reinstalling libglnvd (after you removed nvidia) should help then:

sudo zypper in -f libglnvd

I’m not sure which Display Manager I’m using (I’ll check next time I’m using Linux) . But you’re right that it breaks before Login (it breaks pretty much straight after the mouse pointer is loaded in the GUI so I go from the boot environment to a black screen with a responsive mouse cursor and it stays that way) , I’ll try changing display managers and desktops and seeing how the repository works with it.

And yes its the libglnvd that get’s stomped (Nvidia’s official installer doesn’t like the OpenSUSE one (you have to allow the Nvidia installer to stomp it) and Noveau doesn’t like NVIDIA’s version (you don’t get graphics until you use the command line to reinstall libgllnvd from the tumbleweed repository).

Actually libglnvd is just a wrapper that loads the proper GL libraries (nvidia’s or Mesa’s) on runtime.
nvidia’s installer should support it AFAIK, although I have no experience with that (I don’t even have an nvidia card…).
It could be that there’s a specific command line switch to use libglnvd, I don’t know.

What you do allow is replacing libglvnd with nvidia’s libGL* which of course won’t work with nouveau.
In any case, uninstalling the nvidia driver should restore the previous ones though, unless something is already messed up on your system.

I did some more poking with this:

XDM as a dispay manager does get me to the login screen (as does LightDM) but neither proceed past there (just get left with an image of the login screen). XDM doesn’t allow any opportunity to run glxinfo that I could see and running it from a console terminal while xdm was open just gives the usual can’t open display error.

It’s definitely libglvnd that the official NVIDIA doesn’t like, the nvidia installer says it’s incomplete and asks to overwrite it with it’s own “complete” one. I crosschecked and verified that the repo NVIDIA didn’t want that (and it doesn’t) it fails to load even with that replacement done.

you’re not alone - upgrading nvidia drivers have taken my system down as well.
For the time being I’ve uninstalled them and did a fresh install “the hard way” using NVIDIA installer.

Sorry for the delay…

As mentioned, you should run glxinfo when logged in to a graphical session.

It’s definitely libglvnd that the official NVIDIA doesn’t like, the nvidia installer says it’s incomplete and asks to overwrite it with it’s own “complete” one. I crosschecked and verified that the repo NVIDIA didn’t want that (and it doesn’t) it fails to load even with that replacement done.

I would recommend to uninstall all traces of the nvidia driver (including backups of libGL* and so on it may have made), then reinstall libglvnd, and only afterwards try to install nvidia again.

And I ask again:

What’s the output of “ls -l /usr/lib64/libGL*”? (after you uninstalled the nvidia driver)

I’m pretty sure that the repo packages do work fine in general…

To bring my part of this tale to a close:

My Tumbleweed install fell over this week after a dup (could boot to terminal but nothing I could think of would successfully bring the GUI back up). It was installed on November 2015. I installed fresh and that’s using Tumbleweed Nvidia drivers quite happily, so it seems that there was a problem somewhere in the configuration.

I did install machinery and do an inspect before I nuked the old system. Looking at a compare to the fresh install , you can see how things slowly get out of hand, there are packages that aren’t provided anymore (very old boost libraries for example) that haven’t been invalidated , and updates that went undetected despite fairly regular use of dup (there particularly seems to be an issue with this when moving back and forward between official and pacman , as is sometimes necessary for bleeding edge stuff).

Thanks for your all your help wolfi323