Nvidia driver coverage and Quadro RTX 5000 Mobile

Hi
Make sure you rebuild initrd with the mkinitrd command.

Ok I’ve blacklisted nouveau


cat /etc/modprobe.d/50-blacklist.conf 
blacklist nouveau

and run mkinitrd

and rebooted. Nothing any different. I tried reinstalling suse-prime and enabled the suse-prime service (which was not enabled before…not really clear if I need it to be, but enabling it doesn’t seem to change anything).

When I prime-select nvidia, and reboot, it boots up into the graphical desktop fine but behaves as if I didn’t enable nvidia, intel is still being used. If I prime-select nvidia and instead of restarting, logout, it drops me to a text mode login prompt. No errors in xorg logs, just no mention of nvidia either. I’ve realized I can check “prime-select log-view” and see the error


PCI BusID of NVIDIA card could not be detected!

Also I continue to find the only mention of nvidia in journalctl is still

kernel: nvidia: disagrees about version of symbol module_layout

Does this indicate a problem?

I think I will try uninstalling the nvidia drivers, and then reinstalling from scratch.

Hi
I use the hard way here to install the nvidia driver, not had any issues…

Can you show the output from the following, maybe it (nvidia) is in weak-updates;


 /sbin/modinfo nvidia | grep filename:
filename:       /lib/modules/5.16.15-1-default/kernel/drivers/video/nvidia.ko

I’ve since uninstalled all the G06 drivers, rebooted, and then installed again (using zypper this time instead of yast) and rebooted again.

Now, I find two nvidia errors instead of just the one when I grep journalctl:


sudo journalctl -b 0| grep -i nvidia 
Mar 28 17:49:11 eric-hpzbookfury15g kernel: **nvidia**: disagrees about version of symbol module_layout 
Mar 28 17:49:20 eric-hpzbookfury15g ksystemstats[2287]: Could not retrieve information for **NVidia** GPU 0

However, for the first time since these updates, prime-select thinks nvidia is configured:

[FONT=monospace]

sudo prime-select get-current 
Driver configured: nvidia 
bbswitch not loaded 

If you want energy saving bbswitch should be loaded in intel mode. 
For this package 'bbswitch' needs to be installed on your system. 
Or make use of DynamicPowerManagement on Turing GPUs or later by 
switching to suse-prime's 'offload' or 'nvidia' mode.

[/FONT]
But intel does appear to be the one in use:[FONT=monospace]

[/FONT]

[FONT=monospace]
[FONT=monospace]glxinfo | grep 'OpenGL renderer string' 
**OpenGL renderer string**: Mesa Intel(R) UHD Graphics P630 (CML GT2)[/FONT][/FONT]

Running the command you requested.

/sbin/modinfo nvidia | grep filename: 
**filename:**       /lib/modules/5.16.15-1-default/updates/nvidia.ko

I’ve tried several prime-select logout sequences. prime-select does seem to better report what it was last set to but I get the same behavior with “prime-select nvidia”. Logging out leaves me at a console prompt. pirme-select log-view reports failing:


 18:15:30 ] user_logout_waiter: X restart detected, preparing switch to nvidia 
 18:15:32 ] PCI BusID of NVIDIA card could not be detected! 
 18:15:32 ] Configuration failed

I think perhaps I should download the 460.91.03 driver and install the hard way.

Hi
Or the 510.60.02, you might also head over to the nvidia forums as there maybe some clues there…

The only other thought I have is the MOK thing mentioned here: https://en.opensuse.org/SDB:NVIDIA_drivers

I do recall some bios-looking screen flashing when I first rebooted after one of the updates. I didn’t get a good look at it, but it could have been the screen they show here. However I’ve got secure boot disabled in the bios…I went in and verified. So that doesn’t seem to make sense. I tried using the command they give to manually import the certificate but…

sudo mokutil --import /var/lib/nvidia-pubkeys/MOK-nvidia-gfxG06-510.60.02-6.1-default.der --root-pw  
Failed to get file status, /var/lib/nvidia-pubkeys/MOK-nvidia-gfxG06-510.60.02-6.1-default.der

There is no /var/lib/nvidia-pubkeys directory at all.

Hi
So what is the output from;


efibootmgr -v

If it booted from shim, then that would cause the issue…

BootCurrent: 000A 
Timeout: 0 seconds 
BootOrder: 000A,0009,0006,0001,0000,0005,0007,0003 
Boot0000* WDC WD10SPSX-60A6WT0  PciRoot(0x0)/Pci(0x17,0x0)/Sata(2,65535,0)N.....YM....R,Y.....ISPH 
Boot0001* Windows Boot Manager  HD(1,GPT,a566de49-4418-4ad8-a333-9fc4f2dc5e19,0x800,0x82000)/File(\EFI\Microsoft\Boot\bootmgfw.efi)WINDOWS.........x...B.C.D.O.B.J.E.C.T.=.
{.9.d.e.a.8.6.2.c.-.5.c.d.d.-.4.e.7.0.-.a.c.c.1.-.f.3.2.b.3.4.4.d.4.7.9.5.}........................ISPH 
Boot0003* Wi-Fi IPV4 Network    PciRoot(0x0)/Pci(0x14,0x3)/MAC(f44ee3e922c1,1)/Wi-Fi(00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
:00:00:00)/IPv4(0.0.0.00.0.0.0,0,0)N.....YM....R,Y.....ISPH 
Boot0004* IPV6 Network - Intel(R) Ethernet Connection (10) I219-LM      PciRoot(0x0)/Pci(0x1f,0x6)/MAC(508140329d8b,0)/IPv6(::]:<->::]:,0,0)N.....YM....R,Y.....ISPH 
Boot0005* IPV4 Network - Intel(R) Ethernet Connection (10) I219-LM      PciRoot(0x0)/Pci(0x1f,0x6)/MAC(508140329d8b,0)/IPv4(0.0.0.00.0.0.0,0,0)N.....YM....R,Y.....ISPH 
Boot0006  USB:          PciRoot(0x0)/Pci(0x14,0x0)N.....YM....R,Y.....ISPH 
Boot0007* IPV4 Network  PciRoot(0x0)/Pci(0x1c,0x0)/Pci(0x0,0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/USB(3,0)/USB(3,0)/MAC(489ebde20607,0)/IPv4(0.0.0.00.0.0.0,0,0)N.....YM....R,Y....
.ISPH 
Boot0008* IPV6 Network  PciRoot(0x0)/Pci(0x1c,0x0)/Pci(0x0,0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/USB(3,0)/USB(3,0)/MAC(489ebde20607,0)/IPv6(::]:<->::]:,0,0)N.....YM....R,Y.....
ISPH 
Boot0009* geckolinux    HD(1,GPT,a566de49-4418-4ad8-a333-9fc4f2dc5e19,0x800,0x82000)/File(\EFI\geckolinux\grubx64.efi) 
Boot000A* geckolinux-secureboot HD(1,GPT,a566de49-4418-4ad8-a333-9fc4f2dc5e19,0x800,0x82000)/File(\EFI\geckolinux\shim.efi)

Looks like that bottom one is first in the boot order and I do see that word shim there…um…so now what?

Hi
So this is geckolinux, not Tumbleweed…

Fire up YaST bootloader and uncheck secure boot, then check the efibootmgr output. As a test, before you do that run the following to change the boot temporarily and reboot.


efibootmgr -n 0009
systemctl reboot

Well it I think of Gecko rolling as Tumbleweed with a few tweaks one could just as well have made to a Tumbleweed install. I apologize if there is an important difference I glossed over that would have been important to know up front.

I tried the


efibootmgr -n 0009 
systemctl reboot

And it took a strangely long time to boot…I thought it had hung, but it did finally finish booting after a few minutes.

Hi
Have no idea what tweaks they do, but it does make a big difference, just like third party repositories :wink:

So was there a difference in your issue?

I’ve not yet tried making the change via Yast bootloader, but after booting after efibootmgr -n 0009, I tried prime-select nvidia and logged out and got the same behavior as before…textmode login prompt, prime-select gave me the same “PCI BusID of NVIDIA card could not be detected!”. I’m not sure if you were intending the nvidia test with the temporary efibootmgr change, or only after the Yast bootloader change.

(Also, FWIW, taking-a-long-time-boot behavior did not recur. The laptop did not wake from sleeping last night. The next boot booted 000A as you might expect, but setting it back to 0009 and rebooting booted in a normal amount of time.)

Hi
You can inspect what’s happening to cause the delay (I suspect dracut rebuilding the nvidia modules…);


systemd-analyze blame

I would suggest booting the system for an openSUSE Live desktop and see if things start to work with just the default intel/nouveau (as in see the nvidia card).

Maybe a geckolinux user can comment, do they have a forum?

The boot delay only happened the first time, so what you suggest makes sense. This was several boots ago now, so I assume the that blame command doesn’t show it. Is there something I could grep for in the journalctl output from a given boot that would indicate the rebuild?

There is a gecko forum, I’ll see if there’s any ideas there.

I’d like to understand better the relationship between the nvidia drivers and the efiboot… Why might the shim cause this problem? And why would earlier nvidia drivers have worked fine with it previously until this update?

Hi
Signed vs unsigned kernel modules, secure vs unsecure boot. I don’t use it here…

Well, thanks for all your help, Malcolm. I’m going to have to put this problem down for a while and get some work done, but tonight I may try some of these last things you’ve suggested and / or a hard way install. I appreciate your efforts!

Hi
It’s all very strange IMHO, be interesting to see what happens with a live USB as to whether it sees your card or not.