How to roll back Nvidia driver version until issues pass

iqgrande · October 2, 2021, 7:24pm

Greetings:

The latest Nvidia driver updates have seemingly caused some issues with my system (e.g., I have tickets 1190840 and 1190871 open for them). For situations such as this, it would be nice for me to be able to roll back the version to the previous one and freeze it there until updates come out for testing. I am quite confident Zypper can accommodate this. However, I am new to Zypper and am not sure how to do this. Additionally, it would be helpful knowing what other steps (e.g., manual kernel driver compilation) would need to occur with such a temporary solution. Is there any guidance the community can provide for this? I looked for past threads on this topic but couldn’t find anything (and apologize beforehand if I missed some). Thank you for your help with this.

Kind regards,
Anthony

malcolmlewis · October 2, 2021, 7:43pm

Hi
Install the nvidia driver the hard way? All good here, however I only use as offload…


inxi -Gxxz
Graphics:  Device-1: AMD Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] driver: amdgpu v: kernel bus-ID: 02:00.0 
           chip-ID: 1002:699f 
           Device-2: NVIDIA GP108 [GeForce GT 1030] vendor: eVga.com. driver: nvidia v: 470.74 bus-ID: 03:00.0 
           chip-ID: 10de:1d01 
           Display: x11 server: X.Org 1.20.13 compositor: gnome-shell driver: loaded: amdgpu,nvidia 
           unloaded: fbdev,modesetting,vesa alternate: ati,nouveau,nv resolution: 1: 1920x1080~60Hz 2: 1920x1080~60Hz 
           3: 1920x1080~60Hz s-dpi: 96 
           OpenGL: renderer: Radeon RX550/550 Series (POLARIS12 DRM 3.42.0 5.14.6-1-default LLVM 12.0.1) v: 4.6 Mesa 21.2.2 
           direct render: Yes 

switcherooctl launch -g 1 inxi -Gxxz

Graphics:  Device-1: AMD Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] driver: amdgpu v: kernel bus-ID: 02:00.0 
           chip-ID: 1002:699f 
           Device-2: NVIDIA GP108 [GeForce GT 1030] vendor: eVga.com. driver: nvidia v: 470.74 bus-ID: 03:00.0 
           chip-ID: 10de:1d01 
           Display: x11 server: X.Org 1.20.13 compositor: gnome-shell driver: loaded: amdgpu,nvidia 
           unloaded: fbdev,modesetting,vesa alternate: ati,nouveau,nv resolution: 1: 1920x1080~60Hz 2: 1920x1080~60Hz 
           3: 1920x1080~60Hz s-dpi: 96 
           OpenGL: renderer: NVIDIA GeForce GT 1030/PCIe/SSE2 v: 4.6.0 NVIDIA 470.74 direct render: Yes

You must be setting something somewhere for the power setting issue, by default Tumbleweed uses schedutil these days…


cpupower frequency-info 

analyzing CPU 0:
  driver: intel_cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 20.0 us
  hardware limits: 1.20 GHz - 3.50 GHz
  available cpufreq governors: ondemand performance schedutil
  current policy: frequency should be within 1.20 GHz and 3.50 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 1.45 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes

iqgrande · October 2, 2021, 10:51pm

I only have the Nvidia graphics card to use.

> inxi -Gxxz 
**Graphics:  Device-1:** NVIDIA TU106 [GeForce RTX 2070 Rev. A] **vendor:** eVga.com. **driver:** nvidia **v:** 470.74  
           **bus-ID:** 01:00.0 **chip-ID:** 10de:1f07  
           **Device-2:** AVerMedia Live Streamer CAM 313 **type:** USB **driver:** snd-usb-audio,uvcvideo **bus-ID:** 1-7:4  
           **chip-ID:** 07ca:313a  
           **Display:** x11 **server:** X.org 1.20.13 **compositor:** kwin_x11 **driver:** **loaded:** nvidia  
           **unloaded:** fbdev,modesetting,nouveau,vesa **alternate:** nv **resolution:** <missing: xdpyinfo>  
           **OpenGL:** **renderer:** NVIDIA GeForce RTX 2070/PCIe/SSE2 **v:** 4.6.0 NVIDIA 470.74 **direct render:** Yes

I don’t recall modifying the power settings, however it appears to be set for “powersave”.

> cpupower frequency-info 
analyzing CPU 0: 
  driver: intel_pstate 
  CPUs which run at the same hardware frequency: 0 
  CPUs which need to have their frequency coordinated by software: 0 
  maximum transition latency:  Cannot determine or is not supported. 
  hardware limits: 800 MHz - 4.70 GHz 
  available cpufreq governors: performance powersave 
  current policy: frequency should be within 800 MHz and 4.70 GHz. 
                  The governor "powersave" may decide which speed to use 
                  within this range. 
  current CPU frequency: Unable to call hardware 
  current CPU frequency: 800 MHz (asserted by call to kernel) 
  boost state support: 
    Supported: yes 
    Active: yes

[FONT=monospace]> cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor | sort | uniq 
powersave
[/FONT]

If I did want to install the Nvidia drivers the “hard” way, how do I uninstall the ones I currently have installed (which were the recommended way) and then install them manually? Thank you for your help with this.

malcolmlewis · October 2, 2021, 11:31pm

iqgrande:

I only have the Nvidia graphics card to use.

> inxi -Gxxz 
**Graphics:  Device-1:** NVIDIA TU106 [GeForce RTX 2070 Rev. A] **vendor:** eVga.com. **driver:** nvidia **v:** 470.74  
           **bus-ID:** 01:00.0 **chip-ID:** 10de:1f07  
           **Device-2:** AVerMedia Live Streamer CAM 313 **type:** USB **driver:** snd-usb-audio,uvcvideo **bus-ID:** 1-7:4  
           **chip-ID:** 07ca:313a  
           **Display:** x11 **server:** X.org 1.20.13 **compositor:** kwin_x11 **driver:****loaded:** nvidia  
           **unloaded:** fbdev,modesetting,nouveau,vesa **alternate:** nv **resolution:** <missing: xdpyinfo>  
           **OpenGL:****renderer:** NVIDIA GeForce RTX 2070/PCIe/SSE2 **v:** 4.6.0 NVIDIA 470.74 **direct render:** Yes

I don’t recall modifying the power settings, however it appears to be set for “powersave”.

> cpupower frequency-info 
analyzing CPU 0: 
  driver: intel_pstate 
  CPUs which run at the same hardware frequency: 0 
  CPUs which need to have their frequency coordinated by software: 0 
  maximum transition latency:  Cannot determine or is not supported. 
  hardware limits: 800 MHz - 4.70 GHz 
  available cpufreq governors: performance powersave 
  current policy: frequency should be within 800 MHz and 4.70 GHz. 
                  The governor "powersave" may decide which speed to use 
                  within this range. 
  current CPU frequency: Unable to call hardware 
  current CPU frequency: 800 MHz (asserted by call to kernel) 
  boost state support: 
    Supported: yes 
    Active: yes

[FONT=monospace]> cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor | sort | uniq 
powersave
[/FONT]

If I did want to install the Nvidia drivers the “hard” way, how do I uninstall the ones I currently have installed (which were the recommended way) and then install them manually? Thank you for your help with this.

Hi
For uninstall, look down the page here: SDB:NVIDIA drivers - openSUSE Wiki

For the ‘Hard Way’ see SDB:NVIDIA the hard way - openSUSE Wiki

For your power issue, disable the intel pstate driver and run intel_cpufreq and see if that helps. Fire up YaST Bootloader and in the kernel command line options add intel_pstate=disable

Also check the system BIOS for any power settings before adding above.

iqgrande · October 3, 2021, 6:23am

There’s a lot of good information in the previous post. Thanks! What’s more accurate: the CPU frequency from lscpu or the frequency from cpupower frequency-info? The former is telling me something higher than the latter.

mrmazda · October 3, 2021, 1:27pm

Instead of having to roll back, don’t upgrade the kernel until you’ve sure the available driver matches the current kernel. Run zypper al kernel-de*. That will lock the kernel. When locked, the kernel won’t be upgraded automatically. When you wish to upgrade the kernel, run zypper in kernel-default. Zypper will offer to “remove” the lock and install the kernel, but because of the wildcard in the lock, zypper will merely ignore the lock, and proceed to install the new kernel.

malcolmlewis · October 3, 2021, 2:05pm

Hi
Yes, that is possible, however, review of what is changing is a far better option (via the announcement), then make a choice to update or not… for me it takes all but a few moments to rebuild the modules and reboot…

iqgrande · October 5, 2021, 4:18pm

There’s a lot of good information here. I really appreciate everyone’s posts. My probably last question is meant to add atop some of the previous responses. In general, when TW pushes something that causes havoc on my system I’d like to have 2 states that I can readily boot into: (1) the safe state where everything works and (2) the current state so I can test if things have been fixed and to also provide ticket feedback (I am trying to do my part and raise awareness of issues I find early and often). Sometimes an issue may take several months to rectify. Since openSUSE has technologies like btrfs that it builds atop, I can imagine several scenarios that someone could pick to deal with (1) and (2). In my mind, creating a manual snapshot for (1) that can be accessed easily would likely be the easiest way of dealing with the 2 states. Is there a better way or more details as to the best way to manage the snapshots for (1) and (2)? The disadvantage of the snapshot approach is that throughout that time the system is not getting updates unless the offending packages are just frozen (per earlier suggestion). Thank you for any insight you have into this; this is really helping me understand the preferred ways of managing a rolling TW system.

malcolmlewis · October 5, 2021, 4:32pm

Hi
I just roll with every release… the glibc rebuild caused a few issues, but no showstoppers (I use slack, so just reverted to web ui) for me. VScode still doesn’t work on nvidia, but my default is AMD which it works with fine.

awerlang · October 6, 2021, 4:52am

iqgrande:

There’s a lot of good information here. I really appreciate everyone’s posts. My probably last question is meant to add atop some of the previous responses. In general, when TW pushes something that causes havoc on my system I’d like to have 2 states that I can readily boot into: (1) the safe state where everything works and (2) the current state so I can test if things have been fixed and to also provide ticket feedback (I am trying to do my part and raise awareness of issues I find early and often). Sometimes an issue may take several months to rectify. Since openSUSE has technologies like btrfs that it builds atop, I can imagine several scenarios that someone could pick to deal with (1) and (2). In my mind, creating a manual snapshot for (1) that can be accessed easily would likely be the easiest way of dealing with the 2 states. Is there a better way or more details as to the best way to manage the snapshots for (1) and (2)? The disadvantage of the snapshot approach is that throughout that time the system is not getting updates unless the offending packages are just frozen (per earlier suggestion). Thank you for any insight you have into this; this is really helping me understand the preferred ways of managing a rolling TW system.

In a default installation zypper creates a pair of snapshots on each update, snapper prunes old snapshots on a schedule, and grub integrates booting from older snapshots in its menu. In case you find a broken update with a package, you can choose to boot from a previous snapshot, rollback to it, reboot, lock the package from updating, update the system. Repeat.