Hello everyone,
I am facing this issue for a while now but finally decided to fix it.
I have 2 Tumbleweed machines, one which never had the NVidia driver installed manually and one (my main machine) which had the driver installed via the run file provided by NVidia in the past.
No big deal since everything worked just fine on both machines.
Blender using CUDA in Cycles and ffmpeg using NVENC.
However on my main machine (which once had the driver installed via the run file) I am not able to use CUDA anymore if I run the repo driver.
If I run the run-file driver everything just works. But I am a little bit tired of rebuilding the kernel module on each Kernel update. It’s Tumbleweed so you can guess that happens not very rare here
I was digging down the issue and checked everything I could imagine.
Both machines have the same packages installed via YaST (including the nvidia compute package) and also the nvidia-smi reports:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:26:00.0 On | N/A |
| 23% 63C P0 53W / 180W | 593MiB / 8116MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 00000000:27:00.0 Off | N/A |
| 0% 40C P8 5W / 180W | 2MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2073 G /usr/bin/X 209MiB |
| 0 2295 G /usr/bin/kwin_x11 112MiB |
| 0 2301 G /usr/bin/plasmashell 121MiB |
| 0 3134 G /usr/lib64/firefox/firefox 2MiB |
| 0 3191 G /usr/lib64/firefox/firefox 142MiB |
+-----------------------------------------------------------------------------+
CUDA 10.2
However every time I want to use the NVENC or Blender to render on the GPU (repo driver) on my main rig it just fails with CUDA initialization issues.
I also checked if all symlink in /etc/lib and /etc/lib64 are the same on both machines. They are.
Than finally I checked ldconfig and found some differences between both machines.
The working installation:
sudo ldconfig -p | grep -i cuda
libicudata.so.66 (libc6,x86-64) => /usr/lib64/libicudata.so.66
libicudata.so.61.1 (libc6,x86-64) => /usr/lib64/libicudata.so.61.1
libicudata.so.suse62.1 (libc6,x86-64) => /usr/lib64/libicudata.so.suse62.1
libicudata.so.suse61.1 (libc6,x86-64) => /usr/lib64/libicudata.so.suse61.1
libicudata.so (libc6,x86-64) => /usr/lib64/libicudata.so
libcuda.so.1 (libc6,x86-64) => /usr/lib64/libcuda.so.1
libcuda.so.1 (libc6) => /usr/lib/libcuda.so.1
libcuda.so (libc6,x86-64) => /usr/lib64/libcuda.so
libcuda.so (libc6) => /usr/lib/libcuda.so
The broken installation:
sudo ldconfig -p | grep -i cuda
libicudata.so.66 (libc6,x86-64) => /usr/lib64/libicudata.so.66
libcuda.so.1 (libc6,x86-64) => /usr/lib64/libcuda.so.1
libcuda.so.1 (libc6) => /usr/lib/libcuda.so.1
libcuda.so (libc6,x86-64) => /usr/lib64/libcuda.so
libcuda.so (libc6) => /usr/lib/libcuda.so
So it looks like ldconfig is got messed up by the nvidia installer and the repo installer seems not to fix it.
Can someone tell me how I can fix this on my own?