Cannot get CUDA to work

I installed CUDA libraries on my system using the .run file for Leap 15, as specified on this thread. I downloaded the 12.4.0_550.54.14 version as my current driver is 550, i’ve then unmarked the driver and module installation as i’ve it already installed from the package manager, i’ve then checked PATH and ldconfig and added the missing paths where needed, and also downloaded CuDNN and copied the libs, however nothing can detect it.

Tensorflow was giving me CUDA_ERROR_NOT_INITIALIZED, while Blender was saying i had no compatible GPU. As mentioned in this thread, the open kernel module has to be the same version as the CUDA component, so i uninstalled the open-source module and installed the old proprietary one. However, after that, CUDA still can’t work. Now Tensorflow gives me CUDA_ERROR_UNKNOWN and Blender still says i have no compatible GPU.

¿Something i’m missing?

My system specs:

 (⌚qui jun-6 9:38:57)-(🦊joseskvolpe:~)-( 304K:62)
$ neofetch
              _aaaymQQmwaaa,                 joseskvolpe@ProtoFOX 
          ,wWQQQD????????$QQQQa,.            -------------------- 
       _wQQB?"              ??QQQa,          OS: openSUSE Tumbleweed-Slowroll x86_64 
     sQQD^                      ?QQ6\        Host: Nitro AN515-47 V1.14 
    yWW'                          4QQg       Kernel: 6.9.5-1-default 
  ,QQD          .aaaaaaaa          ^4Q6      Uptime: 55 mins 
 ,mQP        _wWQW?????YWWQa,        4Qm     Packages: 3449 (rpm), 44 (flatpak) 
 jQ@        wWW?'        ^4QQc       ^$QL    Shell: bash 5.2.26 
,QQ'       jWW'            )QW\       ]QQ    Resolution: 1920x1080 
|QQ       ,QW'              ]QQ       ^QQ|   DE: Plasma 6.0.5 
|QQ       |QQ               ]QQ        QQ|   WM: kwin 
|QQ        4Qg              ]QQ       .QQ|   Theme: [Plasma], X-Vulpus-DarkRed [GTK2/3] 
'QQ6       '$WQac.         _QQ(       jQQ    Icons: [Plasma], Vulpinity [GTK2/3] 
 ]QQw        "?QWQQf      _mQP       ,QQ(    Terminal: yakuake 
  4QQga                  wQQP       ,mQ?     CPU: AMD Ryzen 5 7535HS with Radeon Graphics (12) @ 4.603GHz 
   4QQQga,            saQWP'       jQQf      GPU: AMD ATI Radeon 680M 
    ?QQQQQQwaaaaaaaayWWW?'       _mQ@'       GPU: NVIDIA GeForce RTX 3050 Mobile 
      ?WQQQP?9VWUV???^        _amQP^         Memory: 6362MiB / 15171MiB 
        "4QQQaa,          ,awQQQ?^
           "?VQQQQQQQQQQQQQQP?'                                      
                                                                     

nvidia-smi:

$ nvidia-smi
Thu Jun 20 21:39:21 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   45C    P8              8W /   60W |      34MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      4694      G   /usr/bin/kwin_wayland                           2MiB |
+-----------------------------------------------------------------------------------------+

@JoseskVolpe Probably the AMD gpu fighting with CUDA…

So, is the amdpu active? This is a desktop or laptop?

Can you post the output from inxi -GSaxxz

Yes, it’s. Using Prime. It’s a laptop.

Sure.

 (⌚qui jun-6 9:40:04)-(🦊joseskvolpe:~)-( 304K:62)
$ inxi -GSaxxz
System:
  Kernel: 6.9.5-1-default arch: x86_64 bits: 64 compiler: gcc v: 13.3.0
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/vmlinuz-6.9.5-1-default
    root=/dev/mapper/OpenSUSE-SYSTEM splash=silent resume=/dev/OpenSUSE/SWAP
    quiet pcie_aspm=force acpi_backlight=native security=apparmor rd.shell=0
    mitigations=auto
  Desktop: KDE Plasma v: 6.0.5 tk: Qt v: N/A info: frameworks v: 6.2.0
    wm: kwin_wayland tools: avail: xscreensaver vt: 2 dm: SDDM Distro: openSUSE
    Tumbleweed-Slowroll 20240605
Graphics:
  Device-1: NVIDIA GA107M [GeForce RTX 3050 Mobile]
    vendor: Acer Incorporated ALI driver: nvidia v: 550.90.07
    alternate: nouveau,nvidia_drm non-free: 550.xx+ status: current (as of
    2024-04; EOL~2026-12-xx) arch: Ampere code: GAxxx process: TSMC n7 (7nm)
    built: 2020-2023 pcie: gen: 1 speed: 2.5 GT/s lanes: 8 link-max: gen: 4
    speed: 16 GT/s lanes: 16 ports: active: none off: HDMI-A-1 empty: none
    bus-ID: 01:00.0 chip-ID: 10de:25a2 class-ID: 0300
  Device-2: AMD Rembrandt [Radeon 680M] vendor: Acer Incorporated ALI
    driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm)
    built: 2020-22 pcie: gen: 4 speed: 16 GT/s lanes: 16 ports:
    active: DP-3,eDP-1 empty: DP-1, DP-2, DP-4, DP-5, DP-6, DP-7, DP-8,
    Writeback-1 bus-ID: 75:00.0 chip-ID: 1002:1681 class-ID: 0300 temp: 47.0 C
  Device-3: Chicony ACER HD User Facing driver: uvcvideo type: USB rev: 2.0
    speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 5-1:2 chip-ID: 04f2:b76f
    class-ID: fe01 serial: <filter>
  Display: wayland server: X.org v: 1.21.1.12 with: Xwayland v: 24.1.0
    compositor: kwin_wayland driver: X: loaded: modesetting dri: radeonsi
    gpu: nvidia,amdgpu d-rect: 5376x3024 display-ID: 0
  Monitor-1: DP-3 pos: primary,top-left res: 1920x1080 size: N/A modes: N/A
  Monitor-2: HDMI-A-1 pos: bottom-c res: 1920x1080 size: N/A modes: N/A
  Monitor-3: eDP-1 pos: middle-r res: 1536x864 size: N/A modes: N/A
  API: EGL v: 1.5 hw: drv: nvidia drv: amd radeonsi platforms: device: 0
    drv: nvidia device: 1 drv: radeonsi device: 3 drv: swrast surfaceless:
    drv: nvidia wayland: drv: radeonsi x11: drv: radeonsi
    inactive: gbm,device-2
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: amd mesa v: 24.0.8 glx-v: 1.4
    direct-render: yes renderer: AMD Radeon 660M (radeonsi rembrandt LLVM
    18.1.6 DRM 3.57 6.9.5-1-default) device-ID: 1002:1681 memory: 500 MiB
    unified: no display-ID: :0.0
  API: Vulkan v: 1.3.283 layers: 2 device: 0 type: integrated-gpu name: AMD
    Radeon 660M (RADV REMBRANDT) driver: N/A device-ID: 1002:1681
    surfaces: xcb,xlib,wayland device: 1 type: discrete-gpu name: NVIDIA
    GeForce RTX 3050 Laptop GPU driver: N/A device-ID: 10de:25a2
    surfaces: xcb,xlib,wayland

@JoseskVolpe So it could be a Wayland issue, could be a AMD issue, could be a suse-prime issue…

In the system BIOS can you disable the iGPU device?

Could try switcherooctl instead?

If you download the blender benchmark too 3.1.0 from < Index of /release/BlenderBenchmark2.0/launcher/>

Extract and run as your user eg;

 ./benchmark-launcher-cli 
? Choose a Blender version: 4.0.0
> Will render scenes: monster, junkshop, classroom
? No files need to be downloaded, continue? Yes
? Choose a device:  [Use arrows to move, type to filter]
> Intel Xeon CPU E5-2695 v4 @ 2.10GHz
  NVIDIA T400

Do you see the Nvidia GPU?

Switch back to the AMD gpu, or select offload and test again.

No. There’s no option for that.

No, it always launches the programs on the AMD iGPU, and it can’t list the GPUs aswell.

No.

 (⌚qui jun-6 10:12:14)-(🦊joseskvolpe:~/Desktop)-( 1,5G:11)
$ ./benchmark-launcher-cli 
? Choose a Blender version: 4.1.0
> Will render scenes: monster, junkshop, classroom
? Download size will be 796 MB, continue? Yes
787725120 / 796518709 [----------------------------------------------------------------------------------------->] 98.90% 2800950 p/s
? Choose a device:  [Use arrows to move, type to filter]
> AMD Ryzen 5 7535HS with Radeon Graphics

So yeah, probably something with Prime. Last time i’ve set prime to boot on AMD mode to diagnose that apocalyptic Mesa disaster, since my NVIDIA GPU was still working fine i kept it on that setting in the hope it would save battery (¿maybe?). I’ll go back to offload mode and test again.
I’m currently using the proprietary module.

Reverted to offload and tried again, same results

@JoseskVolpe then I would look at switcherooctl…

Check if switcheroo-control is installed, if it is, then first off uninstall suse-prime, then enable the switcheroo service systemctl enable --now switcheroo-control.service.

Then fire up YaST Bootloader and in the kernel options add nosimplefb=1 or add at Grub by pressing the e key and adding at the end of the linux(linuxefi) line for a one off test, then press F10 to boot.

All going well on reboot, as your user run the command switherooctl list should see something like;

switcherooctl list

Device: 0
  Name:        Intel Corporation DG2 [Arc A380]
  Default:     yes
  Environment: DRI_PRIME=pci-0000_04_00_0

Device: 1
  Name:        NVIDIA Corporation TU117GLM [Quadro T400 Mobile]
  Default:     no
  Environment: __GLX_VENDOR_LIBRARY_NAME=nvidia __NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only

switcherooctl now lists my GPUs

$ switcherooctl list
Device: 0
  Name:        Advanced Micro Devices, Inc. [AMD®/ATI] Rembrandt [Radeon 680M]
  Default:     yes
  Environment: DRI_PRIME=pci-0000_75_00_0

Device: 1
  Name:        NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile]
  Default:     no
  Environment: __GLX_VENDOR_LIBRARY_NAME=nvidia __NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only

Blender also detected my NVIDIA GPU, but the benchmark failed. After that, it stopped detecting it.

$ ./benchmark-launcher-cli 
? Choose a Blender version: 4.1.0
> Will render scenes: monster, junkshop, classroom
? No files need to be downloaded, continue? Yes
? Choose a device: NVIDIA GeForce RTX 3050 Laptop GPU
? Start benchmarking? Yes
Warming up monster
ERROR: An unexpected error occurred. Run with '--verbosity 3' for detailed logs.
ERROR: Did not receive Benchmark JSON Data.

Tensorflow has the same results

@JoseskVolpe but still visible with switcherooctl and nvidia-smi?

Yes. It’s still visible.

@JoseskVolpe if you start blender with switcherooctl /path/to/blender does that work, this is the openSUSE version of blender?

No, it doesn’t. Both Flatpak and OpenSUSE version.

@JoseskVolpe what about the tarball version of blender?

Can you log out and switch to Xorg rather than Wayland and test too…

Tested X11 and tarball Blender, they don’t work either.

@JoseskVolpe Hmmm, so my past experience with an AMD RX550 and the Quadro T400 did not work for me… So I’ve switched to Intel/Nvidia or Nvidia/Nvidia without issues.

So there is no BIOS settings regarding either GPU?

Oh, in the flatpak version in the environment variables (via flatseal) can you add __GLX_VENDOR_LIBRARY_NAME=nvidia __NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only or __NV_PRIME_RENDER_OFFLOAD=1 __NV_PRIME_RENDER_OFFLOAD_PROVIDER=NVIDIA-G0 __GLX_VENDOR_LIBRARY_NAME=nvidia __VK_LAYER_NV_optimus=NVIDIA_only and see if that helps

Well i hope there’s something to make it work with AMD/NVIDIA. Changing the GPU is not really a option for me unless if i take the whole motherboard and replace it lol

No, there isnt. It’s like, a really simple BIOS. I guess it got the same firmware from the Aspire models. It’s a budget gamming laptop.

It didn’t worked.

@JoseskVolpe I’m really not sure, if you run clinfo what does that show? I have no experience with AMD/Nvidia laptops, I have a HP Laptop that is dual AMD gpu’s that runs fine…

It shows nothing.

$ clinfo
Number of platforms                               0

@JoseskVolpe and using switcherooctl clinfo

@JoseskVolpe I also have a check script for cuda…

#!/usr/bin/bash

function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }

check libcuda

check libcudart

check libnccl

check libcudnn 

nvcc -V