Hi all, I have a graphics card (AMD RX580) that is prone to crashing when its VRAM runs at the maximum clock of 2120 MHz, as detailed in this bug report: RX580 GPU crash on maximum VRAM clockspeed (kernel 6.8 and later) (#3761) · Issues · drm / amd · GitLab
I’m not actually sure if this is a driver bug anymore as opposed to defective VRAM, but in any case, the only way to prevent it has been limiting the VRAM clock to 1000 MHz. On SUSE I’d prefer to do this using a script and systemd unit instead of Corectrl, since I don’t trust the OPI repo for Corectrl (it’s clearly built by a third party).
I have the following script as /usr/local/bin/fix-amdgpu.sh:
#!/bin/sh -
# Lock VRAM clock to 1000 MHz to prevent the dreaded amdgpu crash
/usr/bin/echo "manual" > /sys/class/drm/card1/device/power_dpm_force_performance_level
/usr/bin/echo "1" > /sys/class/drm/card1/device/pp_dpm_mclk
Running this directly works:
~> cat /sys/class/drm/card1/device/pp_dpm_mclk
0: 300Mhz
1: 1000Mhz *
2: 2120Mhz
I also have a custom service, fix-amdgpu.service, I’ve enabled for reapplying the script after suspend:
[Unit]
After=suspend.target
[Service]
Type=simple
ExecStart=/usr/local/bin/fix-amdgpu.sh
[Install]
WantedBy=suspend.target
However, after suspend and resume, it’s clear that this service has not run at the correct time:
~> cat /sys/class/drm/card1/device/pp_dpm_mclk
0: 300Mhz *
1: 1000Mhz
2: 2120Mhz
What gives? Every modification I’ve tried has failed, as far as I can tell it never executes after resume.|
Edit: I’m aware the unit I’ve copied here won’t run on boot. I’ll add that after I figure out the resume part.