I am trying to get an RTX 3070 working under Tumblweed. This is for compute use only.
I have added the nvidia tumbleweed repo: Index of /opensuse/tumbleweed
and installed G06 drivers. Upon first reboot, screen remained black (monitors connected to intel graphics) but on a second reboot X11 came up. Drivers did load, but there is an error given in dmesg output.
[ 32.029125] nvidia: loading out-of-tree module taints kernel.
[ 32.049323] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[ 32.049849] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 32.091533] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 525.116.04 Release Build (abuild@host) Tue May 9 00:00:00 UTC 2023
[ 32.168083] nvidia-uvm: Loaded the UVM driver, major device number 511.
[ 32.193663] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 525.116.04 Release Build (abuild@host) Tue May 9 00:00:00 UTC 2023
[ 32.195316] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 34.049250] NVRM: Open nvidia.ko is only ready for use on Data Center GPUs.
[ 34.049256] NVRM: To force use of Open nvidia.ko on other GPUs, see the
[ 34.293904] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[ 34.294040] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
Solutions suggested revolve around switching to the nvidia drivers. Do I have to blacklist the nouveau driver?
As far as I know, correct me if I’m wrong. The nvidia install rpm does the blacklisting automatically.
You only blacklist nouveau by hand when you are going to use the .run installer from nvidia.
Edit for additional info:
I am using nvidia also but this is the .run file.
This is the output of my dmesg
nvidia_drm 90112 4
nvidia_modeset 1290240 8 nvidia_drm
nvidia 55857152 366 nvidia_modeset
video 73728 1 nvidia_modeset
Maybe someone can comment on what are you missing by compairing both our result.
But what Malcom already guessed is right: you have the open driver installed. As the open driver doesn’t play well will all cards. you need to uninstall it and install the right one.
Uninstall the open driver (and the wrong G05 driver):
Unfortunately, a new problem came up. When trying to start tensorflow, cuda libraries were missing. I remember installing them before, but I repeated according to instructions given here:
The cuda meta-package (zypper in cuda) essentially undoes the modifications above and zypper se -si nvidia gives
Loading repository data...
Reading installed packages...
S | Name | Type | Version | Arch | Repository
--+--------------------------------+---------+----------------------------------+--------+------------------------
i | kernel-firmware-nvidia | package | 20230427-1.1 | noarch | openSUSE-Tumbleweed-Oss
i | kernel-firmware-nvidia-gsp-G06 | package | 525.116.04-1.1 | x86_64 | openSUSE-Tumbleweed-Oss
i | libnvidia-egl-wayland1 | package | 1.1.11-1.2 | x86_64 | openSUSE-Tumbleweed-Oss
i | nvidia-computeG05 | package | 530.30.02-0 | x86_64 | cuda-opensuse15-x86_64
i | nvidia-gfxG05-kmp-default | package | 530.30.02_k4.12.14_lp150.12.82-0 | x86_64 | cuda-opensuse15-x86_64
i | nvidia-glG05 | package | 530.30.02-0 | x86_64 | cuda-opensuse15-x86_64
i | x11-video-nvidiaG05 | package | 530.30.02-0 | x86_64 | cuda-opensuse15-x86_64
These instrcutions are for 15.4 BTW. Would I have to switch to Leap to get tensorflow working?
Do you need the cuda toolkit or do you want to have cuda working? If it is the latter, remove the unneded repo: https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/x86_64/cuda-opensuse15.repo
This repo does not work for Tumbleweed!
(And please check first via yast2-software or zypper if you can already install packages with the existing repos and without adding additional external ones…)
And you have broken again your driver installation as you now have again a wild mix of G05, G06 and 530 series installed
The cuda libraries are included in following packages:
zypper in nvidia-compute-G06
zypper in nvidia-compute-utils-G06
Show again
zypper se -si nvidia
so that we can work again on fixing your driver mess…
Indeed the install did not work, there are errors during kernel module build openSUSE Paste
However, tensorflow has specific needs about the cuda version, going back to the previous install, tensorflow didn’t find the cuda libraries. I see that there are both instructions to build tensorflow from source
DKMS is unreliable here on my side. Sometimes it work more often not.
Could be I did something wrong but I just quit using it. Installing the run file is not hard to do.
Edit: Also using DKMS make me wait longer if it is working than instaliling the .run file.