Geforce RTX 3070

I am trying to get an RTX 3070 working under Tumblweed. This is for compute use only.

I have added the nvidia tumbleweed repo: Index of /opensuse/tumbleweed
and installed G06 drivers. Upon first reboot, screen remained black (monitors connected to intel graphics) but on a second reboot X11 came up. Drivers did load, but there is an error given in dmesg output.

[   32.029125] nvidia: loading out-of-tree module taints kernel.
[   32.049323] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[   32.049849] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[   32.091533] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  525.116.04  Release Build  (abuild@host)  Tue May  9 00:00:00 UTC 2023
[   32.168083] nvidia-uvm: Loaded the UVM driver, major device number 511.
[   32.193663] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  525.116.04  Release Build  (abuild@host)  Tue May  9 00:00:00 UTC 2023
[   32.195316] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   34.049250] NVRM: Open nvidia.ko is only ready for use on Data Center GPUs.
[   34.049256] NVRM: To force use of Open nvidia.ko on other GPUs, see the
[   34.293904] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[   34.294040] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device

Solutions suggested revolve around switching to the nvidia drivers. Do I have to blacklist the nouveau driver?

lsmod | Grep nvid
nvidia_drm             90112  0
nvidia_modeset       1363968  1 nvidia_drm
nvidia_uvm           3170304  0
nvidia               6086656  5 nvidia_uvm,nvidia_modeset
video                  73728  3 dell_wmi,i915,nvidia_modeset
uname -a
Linux myhost 6.3.2-1-default #1 SMP PREEMPT_DYNAMIC Mon May 15 15:59:38 UTC 2023 (70ea6f6) x86_64 x86_64 x86_64 GNU/Linux

Thank you very much in advance.

As far as I know, correct me if I’m wrong. The nvidia install rpm does the blacklisting automatically.
You only blacklist nouveau by hand when you are going to use the .run installer from nvidia.

Edit for additional info:
I am using nvidia also but this is the .run file.
This is the output of my dmesg

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.3.2-1-default root=UUID=94073f5d-491e-4bbb-b173-402858702b34 splash=silent resume=/dev/disk/by-uuid/91198234-5dd1-4cc8-9618-61cbbd296164 quiet security=apparmor loglevel=3 nvidia-drm.modeset=1 video=1920x1080x32 mitigations=auto
[    0.022336] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.3.2-1-default root=UUID=94073f5d-491e-4bbb-b173-402858702b34 splash=silent resume=/dev/disk/by-uuid/91198234-5dd1-4cc8-9618-61cbbd296164 quiet security=apparmor loglevel=3 nvidia-drm.modeset=1 video=1920x1080x32 mitigations=auto
[    3.673711] audit: type=1400 audit(1684770250.614:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=545 comm="apparmor_parser"
[    3.673716] audit: type=1400 audit(1684770250.614:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=545 comm="apparmor_parser"
[    4.649938] nvidia: loading out-of-tree module taints kernel.
[    4.649954] nvidia: module license 'NVIDIA' taints kernel.
[    4.663791] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    4.895314] nvidia-nvlink: Nvlink Core is being initialized, major device number 242
[    4.899768] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    4.956492] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  530.41.03  Thu Mar 16 19:23:04 UTC 2023
[    4.967038] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    5.986130] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0

Here is the output of lsmod

nvidia_drm             90112  4
nvidia_modeset       1290240  8 nvidia_drm
nvidia              55857152  366 nvidia_modeset
video                  73728  1 nvidia_modeset

Maybe someone can comment on what are you missing by compairing both our result.

This is an RTX 3050

1 Like

@sboehringer that looks like the open G06 driver installed (nvidia-open-driver-G06-signed-kmp-default)…

This is a hybrid system, so Nvidia Prime Render Offload should work for your needs, can you post the output from;

zypper se -si nvidia
Loading repository data...
Reading installed packages...

S  | Name                                      | Type    | Version                          | Arch   | Repository
---+-------------------------------------------+---------+----------------------------------+--------+------------------------
i  | kernel-firmware-nvidia                    | package | 20230427-1.1                     | noarch | openSUSE-Tumbleweed-Oss
i  | kernel-firmware-nvidia-gsp-G06            | package | 525.116.04-1.1                   | x86_64 | openSUSE-Tumbleweed-Oss
i  | libnvidia-egl-wayland1                    | package | 1.1.11-1.2                       | x86_64 | openSUSE-Tumbleweed-Oss
i  | nvidia-compute-G06                        | package | 525.116.04-8.1                   | x86_64 | NVIDIA
i  | nvidia-compute-G06-32bit                  | package | 525.116.04-8.1                   | x86_64 | NVIDIA
i  | nvidia-compute-utils-G06                  | package | 525.116.04-8.1                   | x86_64 | NVIDIA
i+ | nvidia-drivers-G06                        | package | 525.116.04-8.1                   | x86_64 | NVIDIA
i+ | nvidia-gfxG05-kmp-default                 | package | 530.30.02_k4.12.14_lp150.12.82-0 | x86_64 | cuda-opensuse15-x86_64
i+ | nvidia-gl-G06                             | package | 525.116.04-8.1                   | x86_64 | NVIDIA
i  | nvidia-gl-G06-32bit                       | package | 525.116.04-8.1                   | x86_64 | NVIDIA
i+ | nvidia-open-driver-G06-signed-kmp-default | package | 525.116.04_k6.3.2_1-1.3          | x86_64 | openSUSE-Tumbleweed-Oss
i  | nvidia-utils-G06                          | package | 525.116.04-8.1                   | x86_64 | NVIDIA
i+ | nvidia-video-G06                          | package | 525.116.04-8.1                   | x86_64 | NVIDIA
i  | nvidia-video-G06-32bit                    | package | 525.116.04-8.1                   | x86_64 | NVIDIA

Thank you. If I install via the .run installer, will kernerl drivers be updated with dkms?

No. See SDB:NVIDIA the hard way - openSUSE Wiki

But what Malcom already guessed is right: you have the open driver installed. As the open driver doesn’t play well will all cards. you need to uninstall it and install the right one.
Uninstall the open driver (and the wrong G05 driver):

sudo zypper rm nvidia-open-driver-G06-signed-kmp-default 
sudo zypper rm nvidia-gfxG05-kmp-default

And directly install the right driver afterwards:
sudo zypper in nvidia-driver-G06-kmp-default

This is how it looks on a working setup:

ich@rennsemmel:~> zypper se -si nvidia
Repository-Daten werden geladen...
Installierte Pakete werden gelesen...

S  | Name                          | Type  | Version                 | Arch   | Repository
---+-------------------------------+-------+-------------------------+--------+------------------------
i+ | kernel-firmware-nvidia        | Paket | 20230427-1.1            | noarch | Haupt-Repository (OSS)
i  | libnvidia-egl-wayland1        | Paket | 1.1.11-1.2              | x86_64 | Haupt-Repository (OSS)
i+ | nvidia-compute-G06            | Paket | 525.116.04-8.1          | x86_64 | nVidia Graphics Drivers
i+ | nvidia-compute-G06-32bit      | Paket | 525.116.04-8.1          | x86_64 | nVidia Graphics Drivers
i  | nvidia-compute-utils-G06      | Paket | 525.116.04-8.1          | x86_64 | nVidia Graphics Drivers
i+ | nvidia-driver-G06-kmp-default | Paket | 525.116.04_k6.3.1_1-8.1 | x86_64 | nVidia Graphics Drivers
i+ | nvidia-gl-G06                 | Paket | 525.116.04-8.1          | x86_64 | nVidia Graphics Drivers
i+ | nvidia-gl-G06-32bit           | Paket | 525.116.04-8.1          | x86_64 | nVidia Graphics Drivers
i+ | nvidia-texture-tools          | Paket | 2.1.2-2.9               | x86_64 | Haupt-Repository (OSS)
i  | nvidia-utils-G06              | Paket | 525.116.04-8.1          | x86_64 | nVidia Graphics Drivers
i+ | nvidia-video-G06              | Paket | 525.116.04-8.1          | x86_64 | nVidia Graphics Drivers
i+ | nvidia-video-G06-32bit        | Paket | 525.116.04-8.1          | x86_64 | nVidia Graphics Drivers
ich@rennsemmel:~> 

1 Like

Thank you, this solved the problem!

Unfortunately, a new problem came up. When trying to start tensorflow, cuda libraries were missing. I remember installing them before, but I repeated according to instructions given here:

The cuda meta-package (zypper in cuda) essentially undoes the modifications above and zypper se -si nvidia gives

Loading repository data...
Reading installed packages...

S | Name                           | Type    | Version                          | Arch   | Repository
--+--------------------------------+---------+----------------------------------+--------+------------------------
i | kernel-firmware-nvidia         | package | 20230427-1.1                     | noarch | openSUSE-Tumbleweed-Oss
i | kernel-firmware-nvidia-gsp-G06 | package | 525.116.04-1.1                   | x86_64 | openSUSE-Tumbleweed-Oss
i | libnvidia-egl-wayland1         | package | 1.1.11-1.2                       | x86_64 | openSUSE-Tumbleweed-Oss
i | nvidia-computeG05              | package | 530.30.02-0                      | x86_64 | cuda-opensuse15-x86_64
i | nvidia-gfxG05-kmp-default      | package | 530.30.02_k4.12.14_lp150.12.82-0 | x86_64 | cuda-opensuse15-x86_64
i | nvidia-glG05                   | package | 530.30.02-0                      | x86_64 | cuda-opensuse15-x86_64
i | x11-video-nvidiaG05            | package | 530.30.02-0                      | x86_64 | cuda-opensuse15-x86_64

These instrcutions are for 15.4 BTW. Would I have to switch to Leap to get tensorflow working?

Thank you in advance.

Do you need the cuda toolkit or do you want to have cuda working? If it is the latter, remove the unneded repo:
https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/x86_64/cuda-opensuse15.repo
This repo does not work for Tumbleweed!
(And please check first via yast2-software or zypper if you can already install packages with the existing repos and without adding additional external ones…)

And you have broken again your driver installation as you now have again a wild mix of G05, G06 and 530 series installed :roll_eyes:

The cuda libraries are included in following packages:

zypper in nvidia-compute-G06
zypper in nvidia-compute-utils-G06 

Show again

zypper se -si nvidia

so that we can work again on fixing your driver mess…

1 Like

Indeed the install did not work, there are errors during kernel module build openSUSE Paste

However, tensorflow has specific needs about the cuda version, going back to the previous install, tensorflow didn’t find the cuda libraries. I see that there are both instructions to build tensorflow from source

and also a containerized version of tensforflow from nvidia.
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel_21-03.html

I will have a look into these.

Thank you for your help, it is greatly appreciated.

DKMS is unreliable here on my side. Sometimes it work more often not.
Could be I did something wrong but I just quit using it. Installing the run file is not hard to do.

Edit: Also using DKMS make me wait longer if it is working than instaliling the .run file.

2 Likes