NVidia installer broke CUDA support for Repo driver

Hello everyone,

I am facing this issue for a while now but finally decided to fix it.

I have 2 Tumbleweed machines, one which never had the NVidia driver installed manually and one (my main machine) which had the driver installed via the run file provided by NVidia in the past.

No big deal since everything worked just fine on both machines.

Blender using CUDA in Cycles and ffmpeg using NVENC.

However on my main machine (which once had the driver installed via the run file) I am not able to use CUDA anymore if I run the repo driver.

If I run the run-file driver everything just works. But I am a little bit tired of rebuilding the kernel module on each Kernel update. It’s Tumbleweed so you can guess that happens not very rare here :smiley:

I was digging down the issue and checked everything I could imagine.

Both machines have the same packages installed via YaST (including the nvidia compute package) and also the nvidia-smi reports:

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:26:00.0  On |                  N/A |
| 23%   63C    P0    53W / 180W |    593MiB /  8116MiB |      1%      Default |

+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 00000000:27:00.0 Off |                  N/A |
|  0%   40C    P8     5W / 180W |      2MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |

|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2073      G   /usr/bin/X                                   209MiB |
|    0      2295      G   /usr/bin/kwin_x11                            112MiB |
|    0      2301      G   /usr/bin/plasmashell                         121MiB |
|    0      3134      G   /usr/lib64/firefox/firefox                     2MiB |

|    0      3191      G   /usr/lib64/firefox/firefox                   142MiB |

+-----------------------------------------------------------------------------+


CUDA 10.2

However every time I want to use the NVENC or Blender to render on the GPU (repo driver) on my main rig it just fails with CUDA initialization issues.

I also checked if all symlink in /etc/lib and /etc/lib64 are the same on both machines. They are.

Than finally I checked ldconfig and found some differences between both machines.

The working installation:

sudo ldconfig -p | grep -i cuda

        libicudata.so.66 (libc6,x86-64) => /usr/lib64/libicudata.so.66

        libicudata.so.61.1 (libc6,x86-64) => /usr/lib64/libicudata.so.61.1

        libicudata.so.suse62.1 (libc6,x86-64) => /usr/lib64/libicudata.so.suse62.1

        libicudata.so.suse61.1 (libc6,x86-64) => /usr/lib64/libicudata.so.suse61.1
        libicudata.so (libc6,x86-64) => /usr/lib64/libicudata.so
        libcuda.so.1 (libc6,x86-64) => /usr/lib64/libcuda.so.1

        libcuda.so.1 (libc6) => /usr/lib/libcuda.so.1
        libcuda.so (libc6,x86-64) => /usr/lib64/libcuda.so
        libcuda.so (libc6) => /usr/lib/libcuda.so


The broken installation:

sudo ldconfig -p | grep -i cuda

        libicudata.so.66 (libc6,x86-64) => /usr/lib64/libicudata.so.66

        libcuda.so.1 (libc6,x86-64) => /usr/lib64/libcuda.so.1

        libcuda.so.1 (libc6) => /usr/lib/libcuda.so.1

        libcuda.so (libc6,x86-64) => /usr/lib64/libcuda.so

        libcuda.so (libc6) => /usr/lib/libcuda.so


So it looks like ldconfig is got messed up by the nvidia installer and the repo installer seems not to fix it.

Can someone tell me how I can fix this on my own?

Hi
I still install the hardway for both nvidia driver, cuda and cudnn, if I run the ldconfig command, I see way more cuda libs…


ldconfig -p | grep -i cuda |wc -l
60

I only use cuda cores here, but all works in the likes of blender etc. If as root user you just run the ldconfig command to rebuild the cache, does that help? Are you up to date with zypper dup eg 5.6.11 kernel and 20200508 release?