Sorry the swearing but HOLY ■■■■ I MADE IT TO WORK OMFGGGGGG (I’M SO HAPPY //>w<// )
Ok so… I went to Yast and removed EVERYTHING related to NVIDIA, every fricking thing. No need to remove the NVIDIA repository. Reboot.
Next, i installed the drivers with the recommended command from the Wiki + the driver and firmware.
sudo zypper install-new-recommends --repo NVIDIA:repo-non-free
sudo zypper in nvidia-drivers-G06 kernel-firmware-nvidia-gspx-G06
Reboot, enroll the keys.
Done. Just that. No .run file needed.
I don’t know how that worked. Bu for record, i’ve before that tried to just import the Leap repository and install the drivers and CUDA libraries ignoring file conflicts, but without sucess, so i later uninstalled everything and removed the repositories.
FINALLY AAAAAAAAA
OptiX is also detectable (but Blender crashes as on Windows).
Now, AMD HIP is the next…
Awww cmon, i had to reboot because the suspend glitch the NVIDIA drivers does (freezes the system) and now it isn’t working anymore. ¿WHY?
@JoseskVolpe check the output from journalctl -b Are you running Xorg or Wayland? Did the likes of suse-prime get installed?
There’s nothing relevant to NVIDIA nor CUDA
Wayland, and the time it was working i was using Wayland aswell.
No.
But cmon, it was working TwT
@JoseskVolpe So is the nvidia driver loaded?
/sbin/lspci -nnk | grep -EA3 "VGA|Display|3D"
Has nouveau been blacklisted?
/sbin/modprobe -c | grep -E "blacklist nouveau"
Does the nvidia driver match the running kernel?
/sbin/modinfo nvidia | grep filename
uname -a
Check wayland is actually running
echo $XDG_SESSION_TYPE
Check suse-prime is not installed (It has a habit of that for dual graphics…)
zypper se suse-prime
So you went into windows and tested blender, it crashed, then you booted back into Tumbleweed and nvidia not working?
Yes
Yes
Yes
Yes
Not installed
I made it to work in Tumbleweed, switched to OptiX then it crashed, so i switched back to CUDA. I took the power plug off to move the laptop but it had frozen during the suspension after i closed the lid so i had to reboot (NVIDIA driver and their power management are crappy), plugged it back in and CUDA wasn’t working anymore.
@JoseskVolpe Any response of your Nvidia thread? I would set persistece or add the power options?
cat /etc/systemd/system/nvidia-persistence.service
# /etc/systemd/system/nvidia-persistence.service
#
[Unit]
Description=Systemd service for enabling persistence mode on gpus
[Service]
Type=forking
ExecStart=/usr/bin/nvidia-persistenced --user <your_username>
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced
[Install]
WantedBy=multi-user.target
1 Like
Yes, i’ve been requested to send debug logs
¿Isn’t persistence mode bad for power efficiency? Like, it’ll quickly drain my battery even when i’m not using the GPU. ¿Is it mandatory for CUDA?
@JoseskVolpe No just run to test and see if that helps… you can stop start as required…
Enabled and started, didn’t worked, so i rebooted, now it works. Something seems to be blocking CUDA from starting, that’s why it won’t work.
I won’t consider it as solved as i’d to keep persistence mode enabled and it has a power cost.
When the GPU is not being used for any purpose (i.e. it is idle, technically: no contexts of any kind are instantiated on the GPU) and persistence mode is not enabled, the GPU, in concert with the GPU driver, will automatically reduce its power state to a very low level, sometimes including a complete power-off scenario.
What does "persistence mode" actually do which reduces CUDA startup time? - Stack Overflow
Ok thats weird. I was testing on Blender but then i suddendly noticed it stopped using the GPU and was stressing out the CPU, so i restarted Blender and it wasn’t detecting CUDA anymore.
Seems like CUDA server is also crashing.
@JoseskVolpe sure it’s not a real hardware issue? I’ve never had cuda not working, in saying that there were issues with blender and my AMD/Nvidia setup…
If you install nvtop and monitor the GPU’s in a separate terminal to see what is happening, not overheating?
Rebooted, CUDA isn’t working anymore even though persistence mode is enabled
It seems like to be a software issue. Temperature right now is 42°C but CUDA still doesn’t works, it never reached over 86ºC. Also CUDA never stopped working while testing on Windows.
@JoseskVolpe If you run the test script to check all the cuda components, is that working?
What does inxi -GSaz show.
Yes
$ ./detect_cuda.sh
libcudart.so.12 -> libcudart.so.12.4.99
libcuda.so.1 -> libcuda.so.550.90.07
libcudadebugger.so.1 -> libcudadebugger.so.550.90.07
libcuda.so.1 -> libcuda.so.550.90.07
libcuda is installed
libcudart.so.12 -> libcudart.so.12.4.99
libcudart is installed
ERROR: libnccl is NOT installed
libcudnn.so.9 -> libcudnn.so.9.2.0
libcudnn_ops.so.9 -> libcudnn_ops.so.9.2.0
libcudnn_heuristic.so.9 -> libcudnn_heuristic.so.9.2.0
libcudnn_graph.so.9 -> libcudnn_graph.so.9.2.0
libcudnn_engines_runtime_compiled.so.9 -> libcudnn_engines_runtime_compiled.so.9.2.0
libcudnn_engines_precompiled.so.9 -> libcudnn_engines_precompiled.so.9.2.0
libcudnn_cnn.so.9 -> libcudnn_cnn.so.9.2.0
libcudnn_adv.so.9 -> libcudnn_adv.so.9.2.0
libcudnn is installed
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
$ inxi -GSaz
System:
Kernel: 6.9.5-1-default arch: x86_64 bits: 64 compiler: gcc v: 13.3.0
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/vmlinuz-6.9.5-1-default
root=/dev/mapper/OpenSUSE-SYSTEM splash=silent resume=/dev/OpenSUSE/SWAP
quiet pcie_aspm=force acpi_backlight=native security=apparmor rd.shell=0
mitigations=auto
Desktop: KDE Plasma v: 6.0.5 tk: Qt v: N/A info: frameworks v: 6.2.0
wm: kwin_wayland tools: avail: xscreensaver vt: 2 dm: SDDM Distro: openSUSE
Tumbleweed-Slowroll 20240605
Graphics:
Device-1: NVIDIA GA107M [GeForce RTX 3050 Mobile]
vendor: Acer Incorporated ALI driver: nvidia v: 550.90.07
alternate: nouveau,nvidia_drm non-free: 550.xx+ status: current (as of
2024-04; EOL~2026-12-xx) arch: Ampere code: GAxxx process: TSMC n7 (7nm)
built: 2020-2023 pcie: gen: 1 speed: 2.5 GT/s lanes: 8 link-max: gen: 4
speed: 16 GT/s lanes: 16 ports: active: none empty: HDMI-A-1
bus-ID: 01:00.0 chip-ID: 10de:25a2 class-ID: 0300
Device-2: AMD Rembrandt [Radeon 680M] vendor: Acer Incorporated ALI
driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm)
built: 2020-22 pcie: gen: 4 speed: 16 GT/s lanes: 16 ports: active: eDP-1
empty: DP-1, DP-2, DP-3, DP-4, DP-5, DP-6, DP-7, DP-8, Writeback-1
bus-ID: 75:00.0 chip-ID: 1002:1681 class-ID: 0300 temp: 43.0 C
Device-3: Chicony ACER HD User Facing driver: uvcvideo type: USB rev: 2.0
speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 5-1:2 chip-ID: 04f2:b76f
class-ID: fe01 serial: <filter>
Display: wayland server: X.org v: 1.21.1.12 with: Xwayland v: 24.1.0
compositor: kwin_wayland driver: X: loaded: amdgpu,nvidia
unloaded: fbdev,modesetting,vesa alternate: nouveau,nv dri: radeonsi
gpu: nvidia,amdgpu display-ID: 0
Monitor-1: eDP-1 res: 1536x864 size: N/A modes: N/A
API: EGL v: 1.5 hw: drv: nvidia drv: amd radeonsi platforms: device: 0
drv: nvidia device: 1 drv: radeonsi device: 3 drv: swrast surfaceless:
drv: nvidia wayland: drv: radeonsi x11: drv: radeonsi
inactive: gbm,device-2
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: amd mesa v: 24.0.8 glx-v: 1.4
direct-render: yes renderer: AMD Radeon 660M (radeonsi rembrandt LLVM
18.1.6 DRM 3.57 6.9.5-1-default) device-ID: 1002:1681 memory: 500 MiB
unified: no display-ID: :0.0
API: Vulkan v: 1.3.283 layers: 2 device: 0 type: integrated-gpu name: AMD
Radeon 660M (RADV REMBRANDT) driver: N/A device-ID: 1002:1681
surfaces: xcb,xlib,wayland device: 1 type: discrete-gpu name: NVIDIA
GeForce RTX 3050 Laptop GPU driver: N/A device-ID: 10de:25a2
surfaces: xcb,xlib,wayland
@JoseskVolpe and the reason for this kernel option entry pcie_aspm=force? This can, if hardware that does not support ASPM can cause the system to stop responding…
I had to use this option to make display brightness adjustment work
@JoseskVolpe acer_wmi should look after that or the acpi backlight… or the amd one is interfering…