GPU Unused: Nvidia (Help, desperate)

@helplps can you remove the nvidia-conf file and rename 50-nvidia-default.conf.rpmsave to 50-nvidia-default.conf and then run dracut -f --regenerate-all

Reboot and then force the re-install of nvidia-open-driver-G06-signed-kmp-default and see if you can capture the errors.

@hui Not sure, no issues with the open driver (from the run file) with Wayland/GNOME, but that was awhile back… I’m using Intel ARC/Nvidia when I did run an AMD RTX550 as primary with the T400 it was flakey with respect to opencl…

Perhaps inxi’s report should be confirmed correct:

lsmod | egrep 'vid|veau|nv|open|amdg'

Possibly it isn’t yet fully up to speed with Open drivers?

Doing as asked, the same fail message pops up
when installing nvidia-open-driver-G06-signed-kmp-default.:

warning: %triggerin(nvidia-open-driver-G06-signed-kmp-default-550.127.05_k6.11.6_2-1.4.x86_64) scriptlet failed, exit status 1


@mrmazda this is the output of
lsmod | egrep 'vid|veau|nv|open|amdg':

nvme_fabrics           45056  0
nvme_keyring           20480  1 nvme_fabrics
amdgpu              15429632  24
amdxcp                 12288  1 amdgpu
i2c_algo_bit           20480  1 amdgpu
drm_ttm_helper         16384  2 amdgpu
ttm                   106496  2 amdgpu,drm_ttm_helper
drm_exec               12288  1 amdgpu
gpu_sched              69632  1 amdgpu
drm_suballoc_helper    12288  1 amdgpu
drm_buddy              20480  1 amdgpu
nvme                   65536  4
drm_display_helper    278528  1 amdgpu
nvme_core             245760  6 nvme,nvme_fabrics
video                  81920  2 amdgpu,ideapad_laptop
nvme_auth              24576  1 nvme_core
wmi                    32768  3 video,wmi_bmof,ideapad_laptop

@mrmazda Maybe for the open rpm… no issues for multiple run file/kernel releases here.

/sbin/modinfo nvidia | grep license

license:        Dual MIT/GPL

cat /proc/driver/nvidia/gpus/0000\:02\:00.0/information | grep -E "Model|Firmware"

Model: 		 NVIDIA T400
GPU Firmware: 	 565.57.01

@helplps can you switch to the proprietary driver from the repository?

installing nvidia-open-driver-G06-signed-kmp-default (truncated). There’s also some at the beginning that said nvidia.ko, etc. isn’t found:

...
/usr/src/kernel-modules/nvidia-550.127.05-default/nvidia.o: warning: objtool: _nv012734rm+0x5d: 'naked' return found in MITIGATION_RETHUNK build
/usr/src/kernel-modules/nvidia-550.127.05-default/nvidia.o: warning: objtool: _nv040944rm+0x12f: 'naked' return found in MITIGATION_RETHUNK build
  MODPOST /usr/src/kernel-modules/nvidia-550.127.05-default/Module.symvers
  CC [M]  /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia.mod.o
  CC [M]  /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-uvm.mod.o
  CC [M]  /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-modeset.mod.o
  CC [M]  /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-drm.mod.o
  CC [M]  /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-peermem.mod.o
  LD [M]  /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-uvm.ko
  LD [M]  /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-modeset.ko
  LD [M]  /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-drm.ko
  LD [M]  /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-peermem.ko
  LD [M]  /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia.ko
  BTF [M] /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-peermem.ko
Skipping BTF generation for /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-peermem.ko due to unavailability of vmlinux
  BTF [M] /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-modeset.ko
Skipping BTF generation for /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-modeset.ko due to unavailability of vmlinux
  BTF [M] /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-drm.ko
Skipping BTF generation for /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-drm.ko due to unavailability of vmlinux
  BTF [M] /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-uvm.ko
Skipping BTF generation for /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia-uvm.ko due to unavailability of vmlinux
  BTF [M] /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia.ko
Skipping BTF generation for /usr/src/kernel-modules/nvidia-550.127.05-default/nvidia.ko due to unavailability of vmlinux
make[2]: Leaving directory '/usr/src/linux-6.11.5-2-obj/x86_64/default'
make[1]: Leaving directory '/usr/src/linux-6.11.5-2'
ld.bfd  -T /usr/src/linux-obj/x86_64/default/scripts/module.lds -r -o nv-linux.o \
  nvidia.mod.o nvidia/nv-interface.o
/
/usr/src/kernel-modules/nvidia-550.127.05-default /
rm -f -r conftest
make[1]: Entering directory '/usr/src/kernel-modules/nvidia-550.127.05-default'
make[1]: *** /lib/modules/6.11.6-2-default/build: No such file or directory.  Stop.
make[1]: Leaving directory '/usr/src/kernel-modules/nvidia-550.127.05-default'
make: *** [Makefile:89: clean] Error 2

I noticed that I don’t even have the modules (nvidia, nvidia_drm) now, it’s only nouveau. Not sure if that’s a good thing or not, but at least it’s different from what I’ve always been seeing after installing nvidia drivers.

This is what I installed:
zypper se -i nvidia:

S  | Name                             | Summary                                                               | Type
---+----------------------------------+-----------------------------------------------------------------------+--------
i  | kernel-firmware-nvidia           | Kernel firmware files for Nvidia Tegra and graphics drivers           | package
i+ | kernel-firmware-nvidia-gspx-G06  | Kernel firmware file for open NVIDIA kernel module driver G06         | package
i  | libnvidia-egl-wayland1           | The EGLStream-based Wayland external platform                         | package
i  | nvidia-compute-G06               | NVIDIA driver for computing with GPGPU                                | package
i  | nvidia-compute-G06-32bit         | 32bit NVIDIA driver for computing with GPGPU                          | package
i  | nvidia-driver-G06-kmp-default    | NVIDIA graphics driver kernel module for GeForce 700 series and newer | package
i  | nvidia-gl-G06                    | NVIDIA OpenGL libraries for OpenGL acceleration                       | package
i  | nvidia-gl-G06-32bit              | 32bit NVIDIA OpenGL libraries for OpenGL acceleration                 | package
i+ | nvidia-utils-G06                 | NVIDIA driver tools                                                   | package
i+ | nvidia-video-G06                 | NVIDIA graphics driver for GeForce 700 series and newer               | package
i  | nvidia-video-G06-32bit           | 32bit NVIDIA graphics driver for GeForce 700 series and newer         | package
i+ | openSUSE-repos-Tumbleweed-NVIDIA | openSUSE NVIDIA repository definitions                                | package

Better use
zypper se -si nvidia
to get more Informations.

Also post:
zypper se -si kernel-default

zypper se -si nvidia:

S  | Name                             | Type    | Version                   | Arch   | Repository
---+----------------------------------+---------+---------------------------+--------+----------------------
i  | kernel-firmware-nvidia           | package | 20241018-1.1              | noarch | Main Repository (OSS)
i+ | kernel-firmware-nvidia-gspx-G06  | package | 550.127.05-69.1           | x86_64 | NVIDIA
i  | libnvidia-egl-wayland1           | package | 1.1.16-3.1                | x86_64 | Main Repository (OSS)
i  | nvidia-compute-G06               | package | 550.127.05-27.1           | x86_64 | NVIDIA
i  | nvidia-compute-G06-32bit         | package | 550.127.05-27.1           | x86_64 | NVIDIA
i  | nvidia-driver-G06-kmp-default    | package | 550.127.05_k6.11.3_2-27.1 | x86_64 | NVIDIA
i  | nvidia-gl-G06                    | package | 550.127.05-27.1           | x86_64 | NVIDIA
i  | nvidia-gl-G06-32bit              | package | 550.127.05-27.1           | x86_64 | NVIDIA
i+ | nvidia-utils-G06                 | package | 550.127.05-27.1           | x86_64 | NVIDIA
i+ | nvidia-video-G06                 | package | 550.127.05-27.1           | x86_64 | NVIDIA
i  | nvidia-video-G06-32bit           | package | 550.127.05-27.1           | x86_64 | NVIDIA
i+ | openSUSE-repos-Tumbleweed-NVIDIA | package | 20240712.dd8c2eb-1.2      | x86_64 | Main Repository (OSS)

zypper se -si kernel-default:

S  | Name                 | Type    | Version    | Arch   | Repository
---+----------------------+---------+------------+--------+----------------------
i+ | kernel-default       | package | 6.11.5-2.1 | x86_64 | (System Packages)
i+ | kernel-default       | package | 6.11.6-2.1 | x86_64 | Main Repository (OSS)
i  | kernel-default-devel | package | 6.11.5-2.1 | x86_64 | (System Packages)
i  | kernel-default-devel | package | 6.11.5-1.1 | x86_64 | (System Packages)

Hm, you don’t have the latest matching kernel-default-devel package installed. You should have kernel-default-devel-6.11.6-2.1 but only have 6.11.5-2.1

Can you perform a zypper dup and post zypper se -si kernel-default again?

S  | Name                 | Type    | Version    | Arch   | Repository
---+----------------------+---------+------------+--------+----------------------
i+ | kernel-default       | package | 6.11.5-2.1 | x86_64 | (System Packages)
i+ | kernel-default       | package | 6.11.6-2.1 | x86_64 | Main Repository (OSS)
i  | kernel-default-devel | package | 6.11.5-2.1 | x86_64 | (System Packages)
i  | kernel-default-devel | package | 6.11.5-1.1 | x86_64 | (System Packages)
i  | kernel-default-devel | package | 6.11.6-2.1 | x86_64 | Main Repository (OSS)

@helplps Do you have a monitor you can plug into the device. I wonder if it’s some weird hardware setup powering the gpu off since you can switch to just the dGPU in your BIOS. Is this dual boot with Windows?

This is not dual boot. I have also already tried going dGPU and using another monitor, but the TV doesn’t get a signal at all. I don’t think it’s a hardware issue since I’ve tried a live boot of both OpenSUSE and Fedora and the GPU works just fine.

Hello, @malcolmlewis. I would like to know if there are any updates to this problem? I’ve already updated to 6.11.7-1-default and this problem still persists. What solution would you recommend at this point?

@helplps I would look at removing all the nvidia rpms, disable the openSUSE Nvidia repo/service and install the cuda run file which includes the driver… AKA The hard Way… just make sure kernel-default-devel and libglvnd-devel are installed…

https://en.opensuse.org/SDB:NVIDIA_the_hard_way

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=OpenSUSE&target_version=15&target_type=runfile_local

That’s what I use here, but bear in mind my Nvidia gpu is not for any graphics, pure offload.

As I’ve mentioned here on this reddit post, also linked on my very first post here, I have tried to do it the hard way, but it doesn’t work. And even now, it still doesn’t work. This is the error when doing it the hard way, same as always.

ERROR: An error occurred while performing the step: "Checking to see whether the nvidia kernel module was successfully built". See /var/log/nvidia-installer.log for details.
-> The command `cd kernel; /usr/bin/make -k -j16  NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/6.11.3-1-default/source" SYSOUT="/lib/modules/6.11.3-1-default/build" NV_KERNEL_MODULES="nvidia"` failed with the following output:

make[1]: Entering directory '/usr/src/linux-6.11.3-1'
make[2]: Entering directory '/usr/src/linux-6.11.3-1-obj/x86_64/default'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: gcc (SUSE Linux) 14.2.1 20241007 [revision 4af44f2cf7d281f3e4f3957efce10e8b2ccb2ad3]
  You are using:           cc (SUSE Linux) 14.2.1 20241007 [revision 4af44f2cf7d281f3e4f3957efce10e8b2ccb2ad3]
  MODPOST /tmp/selfgz5709/NVIDIA-Linux-x86_64-550.127.05/kernel/Module.symvers
  LD [M]  /tmp/selfgz5709/NVIDIA-Linux-x86_64-550.127.05/kernel/nvidia.ko
  BTF [M] /tmp/selfgz5709/NVIDIA-Linux-x86_64-550.127.05/kernel/nvidia.ko
/bin/sh: line 1: ./tools/bpf/resolve_btfids/resolve_btfids: No such file or directory
make[4]: *** [/usr/src/linux-6.11.3-1/scripts/Makefile.modfinal:59: /tmp/selfgz5709/NVIDIA-Linux-x86_64-550.127.05/kernel/nvidia.ko] Error 127
make[4]: *** Deleting file '/tmp/selfgz5709/NVIDIA-Linux-x86_64-550.127.05/kernel/nvidia.ko'
make[4]: Target '__modfinal' not remade because of errors.
make[3]: *** [/usr/src/linux-6.11.3-1/Makefile:1882: modules] Error 2
make[2]: *** [/usr/src/linux-6.11.3-1/Makefile:224: __sub-make] Error 2
make[2]: Target 'modules' not remade because of errors.
make[2]: Leaving directory '/usr/src/linux-6.11.3-1-obj/x86_64/default'
make[1]: *** [Makefile:224: __sub-make] Error 2
make[1]: Target 'modules' not remade because of errors.
make[1]: Leaving directory '/usr/src/linux-6.11.3-1'
make: *** [Makefile:89: modules] Error 2
ERROR: The nvidia kernel module was not created.

@malcolmlewis actually, this is actually the correct one. I apologize.

-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[   12.913974] [   T1880] evm: overlay not supported
[   12.976178] [   T2010] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   12.977390] [   T2010] Bridge firewalling registered
[   13.068491] [   T2041] Initializing XFRM netlink socket
[  110.079587] [   T1939] wlp4s0: authenticate with f4:6f:ed:6d:5f:18 (local address=70:a6:cc:0f:6f:13)
[  110.083050] [   T1939] wlp4s0: send auth to f4:6f:ed:6d:5f:18 (try 1/3)
[  110.121660] [    T127] wlp4s0: authenticated
[  110.124173] [    T127] wlp4s0: associate with f4:6f:ed:6d:5f:18 (try 1/3)
[  110.126020] [    T127] wlp4s0: RX AssocResp from f4:6f:ed:6d:5f:18 (capab=0x11 status=0 aid=4)
[  110.191889] [    T127] wlp4s0: associated
[  155.079905] [   T2626] nvidia: loading out-of-tree module taints kernel.
[  155.079912] [   T2626] nvidia: module license 'NVIDIA' taints kernel.
[  155.079914] [   T2626] Disabling lock debugging due to kernel taint
[  155.079916] [   T2626] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[  155.079917] [   T2626] nvidia: module license taints kernel.
[  155.445067] [   T2626] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[  155.445077] [   T2626] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[  155.446052] [   T2626] NVRM: This can occur when another driver was loaded and 
                          NVRM: obtained ownership of the NVIDIA device(s).
[  155.446053] [   T2626] NVRM: Try unloading the conflicting kernel module (and/or
                          NVRM: reconfigure your kernel without the conflicting
                          NVRM: driver(s)), then try loading the NVIDIA kernel module
                          NVRM: again.
[  155.446054] [   T2626] NVRM: No NVIDIA devices probed.
[  155.446239] [   T2626] nvidia-nvlink: Unregistered Nvlink Core, major device number 236
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Please check if nouveau is loaded:

sudo lsmod |grep nouv

if it is, check if it is blacklisted:

grep -i 'blacklist nouveau' /lib/modprobe.d/*
grep -i 'blacklist nouveau' /etc/modprobe.d/*

if grep finds something, maybe you have to rebuild the initrd and reboot:

sudo dracut --force --regenerate-all

I have already tried this, even Nouveau can’t be loaded for some reason, even with no other drivers installed, even with the blacklist files deleted, even doing a dracut, then restarting, then modprobe’ing.

@malcolmlewis I’m guessing there’s not much else I can do now, huh?

just some notes from another nvidia victim that just fixed his problem when reading your and other threads - and the experts above might dismiss in a second…

I had a problem for some weeks with way too small fonts on some window parts (mostly menus).
Was probably fixed by the latest tumbleweed update, problem is gone, but noticed it isnt the nvidia driver running but intel (don’t know since when).

  • Reason was: mok management was skipped unnoticed: do you have the relevant keys accepted?
  • There was a problem with compressed debug symbols that caused btfid related errors like in your output. Not sure how to followup on this. Maybe look for the package that comes from (binutils?) and see if it has the required version?
  • There was a pahole problem (too old, using too much memory) resulting in the nvidia driver not building

All relatively unlikely because should be fixed, but maybe you have some old stuff laying around or sometimes problems resurface.

Have you considered to install a separate system in parallel from scratch? I usually reserve a partition for this. Maybe it is possible to find where the systems or update/build processes diverge

Good luck!

@helplps the only thin I can think of to try is to boot to multi-user systemctl set-default multi-user.target reboot and from the VT, log in as root and force the rpms to install (or re-install the run file).

I’m assuming your now up to date and have the 6.11.8 kernel?

Then you can set back to a desktop boot with systemctl set-default graphical.target and reboot.