System Freeze with AMD Vega GPUs After Mesa 24.3.x Update & Kernel Logs Reveal Multiple AMD Driver Issues

Environment:
Distribution: OpenSUSE TW
GPU: AMD Vega (Picasso architecture)
DE/WM: KDE Plasma (Wayland)

Detailed Problem Description:
I am experiencing critical system stability issues with my AMD graphics card after updating to Mesa 24.3.x (specifically, Mesa 24.3.0 and above). The system completely freezes after just a few minutes of use, rendering the computer unresponsive. I am unable to interact with the system, and the only solution is to perform a hard reboot.

This issue is consistent across both X11 and Wayland environments and primarily affects Chromium-based browsers.

Symptoms:
The system experiences a complete freeze after a short period of use, typically within minutes, especially when using Chromium-based browsers.
No apparent trigger or consistent pattern for the freeze.
The system becomes unresponsive, requiring a hard reboot to recover.

Driver and GPU Information:
└─[$] vainfo

Trying display: wayland
libva info: VA-API version 1.22.0
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_22
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.22 (libva 2.22.0)
vainfo: Driver version: Mesa Gallium driver 24.3.1 for AMD Radeon Vega 8 Graphics (radeonsi, raven, LLVM 19.1.5, DRM 3.59, 6.11.8-1-default)
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            :    VAEntrypointVLD
      VAProfileMPEG2Main              :    VAEntrypointVLD
      VAProfileJPEGBaseline           :    VAEntrypointVLD
      VAProfileVP9Profile0            :    VAEntrypointVLD
      VAProfileVP9Profile2            :    VAEntrypointVLD
      VAProfileNone                   :    VAEntrypointVideoProc

Kernel Logs Reveal Multiple AMD Driver Issues:

  1. PSP (Platform Security Processor) Failures:
  • Failed PSP commands: LOAD_TA and INVOKE_CMD
  • Secure display generic failure
  • PSP-related command responses returning error status
  1. Missing Critical GPU Functionalities:
  • RAS (Reliability, Availability, and Serviceability) Trusted Application unavailable
  • RAP Trusted Application not available
  1. Power Management Limitations:
  • Runtime Power Management (PM) not available

└─[$] sudo journalctl -b -1 -g amdgpu

Dec 18 17:59:39 tumbleweed-msi kernel: [drm] amdgpu kernel modesetting enabled.
Dec 18 17:59:39 tumbleweed-msi kernel: amdgpu: Virtual CRAT table created for CPU
Dec 18 17:59:39 tumbleweed-msi kernel: amdgpu: Topology: Add CPU node
Dec 18 17:59:39 tumbleweed-msi kernel: amdgpu 0000:30:00.0: enabling device (0006 -> 0007)
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: Fetched VBIOS from VFCT
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu: ATOM BIOS: 113-PICASSO-118
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: vgaarb: deactivate vga console
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
Dec 18 17:59:40 tumbleweed-msi kernel: [drm] amdgpu: 2048M of VRAM memory ready
Dec 18 17:59:40 tumbleweed-msi kernel: [drm] amdgpu: 6950M of GTT memory ready.
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu: hwmgr_sw_init smu backed is smu10_smu
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: reserve 0x400000 from 0xf47fc00000 for PSP TMR
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: RAS: optional ras ta ucode is not available
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: RAP: optional rap ta ucode is not available
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: Secure display: Generic Failure.
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0
Dec 18 17:59:40 tumbleweed-msi kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Dec 18 17:59:40 tumbleweed-msi kernel: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu: Virtual CRAT table created for GPU
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu: Topology: Add dGPU node [0x15d8:0x1002]
Dec 18 17:59:40 tumbleweed-msi kernel: kfd kfd: amdgpu: added device 1002:15d8
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 11, active_cu_number 8
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Dec 18 17:59:40 tumbleweed-msi kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Dec 18 17:59:40 tumbleweed-msi

The link to your bugreport:
https://bugzilla.opensuse.org/show_bug.cgi?id=1234732

Possibly cured in Mesa 24.3.2:

https://docs.mesa3d.org/relnotes/24.3.2.html

If you don’t want to use drivers for specific device, you can use llvmpipe/lavapipe.