Making a service run on boot and after resume

Hi all, I have a graphics card (AMD RX580) that is prone to crashing when its VRAM runs at the maximum clock of 2120 MHz, as detailed in this bug report: RX580 GPU crash on maximum VRAM clockspeed (kernel 6.8 and later) (#3761) · Issues · drm / amd · GitLab

I’m not actually sure if this is a driver bug anymore as opposed to defective VRAM, but in any case, the only way to prevent it has been limiting the VRAM clock to 1000 MHz. On SUSE I’d prefer to do this using a script and systemd unit instead of Corectrl, since I don’t trust the OPI repo for Corectrl (it’s clearly built by a third party).

I have the following script as /usr/local/bin/fix-amdgpu.sh:

#!/bin/sh -
# Lock VRAM clock to 1000 MHz to prevent the dreaded amdgpu crash
/usr/bin/echo "manual" > /sys/class/drm/card1/device/power_dpm_force_performance_level
/usr/bin/echo "1" >  /sys/class/drm/card1/device/pp_dpm_mclk

Running this directly works:

~> cat /sys/class/drm/card1/device/pp_dpm_mclk
0: 300Mhz 
1: 1000Mhz *
2: 2120Mhz 

I also have a custom service, fix-amdgpu.service, I’ve enabled for reapplying the script after suspend:

[Unit]
After=suspend.target

[Service]
Type=simple
ExecStart=/usr/local/bin/fix-amdgpu.sh

[Install]
WantedBy=suspend.target

However, after suspend and resume, it’s clear that this service has not run at the correct time:

~> cat /sys/class/drm/card1/device/pp_dpm_mclk
0: 300Mhz *
1: 1000Mhz 
2: 2120Mhz 

What gives? Every modification I’ve tried has failed, as far as I can tell it never executes after resume.|

Edit: I’m aware the unit I’ve copied here won’t run on boot. I’ll add that after I figure out the resume part.

Ahahaha, solved immediately after posting. What is actually needed is this:

[Unit]
After=suspend.target

[Service]
Type=simple
ExecStart=exec /usr/local/bin/fix-amdgpu.sh

[Install]
WantedBy=suspend.target

Note the exec.

Edit: however, it does not work at all if graphical.target is added to WantedBy. Is it possible to have the script exec’d both during boot AND after resume in one systemd unit? Or are separate ones necessary?

Edit 2: and now the “working” version is once again not working after suspend. So I’m at a complete loss. Maybe this is a systemd bug? I can’t see how else it would work once and then not at all.

This line cannot work. There is no external command exec and systemd is not shell. Nor is it needed, just use ExecStart=/usr/local/bin/fix-amdgpu.sh.

That also does not work, unfortunately.

Never found a real solution, or any indication if this is an OpenSUSE bug, systemd bug, or just something obscure and not well documented. I wound up installing Corectrl from Dead_Mozay’s repo; it looks like Dead_Mozay also maintains some packages in the OpenSUSE official repos, so I think they can be assumed trustworthy.

I am have only scanty knowledge of the systemd details. So I can only suggest thing that may point you to something (I doubt, but I can try).

I have a service running “at boot”, but I have

Type=oneshot

I do not know if it matters, but you can try.

But what I am missing here is that you want this to be run on suspend.target. I assume that you found somewhere that this target exists. I would be curious where you found this.
Specially what it means. When I read this, I would conclude that it means “at going into suspend”, not at recovering from it. An I assume you want this to be run at recovering .

After=suspend.target

or its alternatives are supposed to fix that - running after the target has been reached, thus, after suspend has ended.

I tried oneshot as well, didn’t see any difference.

From man systemctl:

suspend
Suspend the system. This will trigger activation of the special target unit suspend.target. This command is asynchronous, and will return after the suspend operation is successfully enqueued. It will not wait for the suspend/resume cycle to complete.

The bold is mine.

Ouch, thank you and good catch. I’ll have to see if there’s anything that actually does return only after suspend is over. :expressionless:

Okay, according to the systemd-sleep man page (sorry for missing this), scripts or symlinks in /usr/lib/systemd/system-sleep/ should run both before and after suspend with different arguments. My fix-amdgpu.sh script should run the same regardless of arguments (I’ve checked). And when I add a logging command and symlink the script into that directory, I can see in journalctl very clearly that it runs both before and after suspend. But, the GPU memory clock is still set wrong, which I guess means it is being reset elsewhere.

I’m marking this as solved - apparently while /usr/lib/systemd/system-sleep/ is considered a hack, it should not get overwritten by package managers or installers. However the actual amdgpu issue seems harder to solve without dedicated overclocking software, as it’s difficult to tell where else the memory speed is being set.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.