GPU malfunction or driver error

My monitor goes black.

The led indicator of the monitor states that input signal is not available.

Bit since everything is plugged in, and I can hear that the PC is still running, my assumptions were eighter the GPU is dying, the monitor is dying or there is an GPU driver error.

So I logged via ssh, executed dmesg, and here is what I think is the interesting part:

[  471.638352] [     T42] amdgpu 0000:01:00.0: [drm] *ERROR* [CRTC:60:crtc-0] flip_done timed out
[  471.638385] [    T507] amdgpu 0000:01:00.0: amdgpu: Dumping IP State
[  471.638393] [    T507] amdgpu 0000:01:00.0: amdgpu: Dumping IP State Completed
[  471.648477] [    T507] amdgpu 0000:01:00.0: amdgpu: ring gfx timeout, signaled seq=79034, emitted seq=79036
[  471.648486] [    T507] amdgpu 0000:01:00.0: amdgpu: Process information: process systemsettings pid 4943 thread systemsett:cs0 pid 4947
[  471.648491] [    T507] amdgpu 0000:01:00.0: amdgpu: Starting gfx ring reset
[  471.648496] [    T507] amdgpu 0000:01:00.0: amdgpu: Ring gfx reset failure
[  471.648501] [    T507] amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
[  471.867646] [    T507] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110)
[  471.868022] [    T507] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[  472.055622] [    T507] amdgpu: cp is busy, skip halt cp
[  472.242828] [    T507] amdgpu: rlc is busy, skip halt rlc
[  472.243893] [    T507] amdgpu 0000:01:00.0: amdgpu: BACO reset
[  472.771190] [    T507] amdgpu 0000:01:00.0: amdgpu: GPU reset succeeded, trying to resume
[  472.771593] [    T507] [drm] PCIE GART of 256M enabled (table at 0x000000F400800000).
[  472.771606] [    T507] [drm] VRAM is lost due to GPU reset!
[  473.002129] [    T507] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
[  473.002485] [    T507] amdgpu 0000:01:00.0: amdgpu: resume of IP block <gfx_v8_0> failed -110
[  473.002498] [    T507] amdgpu 0000:01:00.0: amdgpu: GPU reset(2) failed
[  473.002502] [    T507] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
[  473.015606] [    T507] amdgpu 0000:01:00.0: amdgpu: GPU reset end with ret = -110
[  473.015614] [    T507] amdgpu 0000:01:00.0: amdgpu: GPU Recovery Failed: -110
[  483.414443] [    T506] amdgpu 0000:01:00.0: amdgpu: Dumping IP State
[  483.414454] [    T506] amdgpu 0000:01:00.0: amdgpu: Dumping IP State Completed
[  483.414469] [    T506] amdgpu 0000:01:00.0: amdgpu: ring gfx timeout, signaled seq=79036, emitted seq=79036
[  483.414479] [    T506] amdgpu 0000:01:00.0: amdgpu: Process information: process systemsettings pid 4943 thread systemsett:cs0 pid 4947
[  483.414484] [    T506] amdgpu 0000:01:00.0: amdgpu: Starting gfx ring reset
[  483.414489] [    T506] amdgpu 0000:01:00.0: amdgpu: Ring gfx reset failure
[  483.414494] [    T506] amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
x@x-desktop:~>

Now I’m a bit confused.

I can see there are some error but I can’t understand if it is the hardware dying or it is a driver error?

Edit:
Kernel: 6.13.0-1-default
OS: Tumbleweed
GPU: AMD RX 570
Driver: open source amdgpu

First install?

Test hw with Leap Live ISO:
https://download.opensuse.org/download/distribution/openSUSE-stable/live/

Post

inxi -aFz
x@x-desktop:~> inxi -aFz
System:
  Kernel: 6.13.0-1-default arch: x86_64 bits: 64 compiler: gcc v: 14.2.1
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.13.0-1-default
    root=/dev/mapper/system-root splash=silent resume=/dev/system/swap quiet
    security=apparmor mitigations=auto,nosmt
  Desktop: KDE Plasma v: 6.2.5 tk: Qt v: N/A info: frameworks v: 6.10.0
    wm: kwin_wayland tools: avail: xscreensaver vt: 2 dm: SDDM Distro: openSUSE
    Tumbleweed 20250124
Machine:
  Type: Desktop Mobo: Gigabyte model: P67A-UD3-B3 serial: <superuser required>
    uuid: <superuser required> BIOS: Award v: F9 date: 03/21/2012
CPU:
  Info: model: Intel Core i5-2500 bits: 64 type: MCP arch: Sandy Bridge
    gen: core 2 level: v2 built: 2010-12 process: Intel 32nm family: 6
    model-id: 0x2A (42) stepping: 7 microcode: 0x2F
  Topology: cpus: 1x dies: 1 clusters: 4 cores: 4 smt: <unsupported> cache:
    L1: 256 KiB desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB
    L3: 6 MiB desc: 1x6 MiB
  Speed (MHz): avg: 1600 min/max: 1600/3700 scaling: driver: intel_cpufreq
    governor: schedutil cores: 1: 1600 2: 1600 3: 1600 4: 1600 bogomips: 26340
  Flags: avx ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities:
  Type: gather_data_sampling status: Not affected
  Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT
    disabled
  Type: mds mitigation: Clear CPU buffers; SMT disabled
  Type: meltdown mitigation: PTI
  Type: mmio_stale_data status: Unknown: No mitigations
  Type: reg_file_data_sampling status: Not affected
  Type: retbleed status: Not affected
  Type: spec_rstack_overflow status: Not affected
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: Retpolines; IBPB: conditional; IBRS_FW;
    STIBP: disabled; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not
    affected
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX
    470/480/570/570X/580/580X/590] vendor: ASUSTeK driver: amdgpu v: kernel
    arch: GCN-4 code: Arctic Islands process: GF 14nm built: 2016-20 pcie:
    gen: 2 speed: 5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s ports:
    active: DVI-D-2 empty: DP-1,DVI-D-1,HDMI-A-1 bus-ID: 01:00.0
    chip-ID: 1002:67df class-ID: 0300 temp: 49.0 C
  Display: wayland server: X.org v: 1.21.1.15 with: Xwayland v: 24.1.4
    compositor: kwin_wayland driver: X: loaded: modesetting unloaded: vesa
    alternate: fbdev dri: radeonsi gpu: amdgpu display-ID: 0
  Monitor-1: DVI-D-2 model: BenQ GW2470 serial: <filter> built: 2017 res:
    mode: 1920x1080 hz: 60 scale: 100% (1) dpi: 93 gamma: 1.2
    size: 527x296mm (20.75x11.65") diag: 604mm (23.8") ratio: 16:9 modes:
    max: 1920x1080 min: 720x400
  API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
    device: 1 drv: swrast gbm: drv: kms_swrast surfaceless: drv: radeonsi
    wayland: drv: radeonsi x11: drv: radeonsi
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.3.4 glx-v: 1.4
    direct-render: yes renderer: AMD Radeon RX 570 Series (radeonsi polaris10
    LLVM 19.1.7 DRM 3.59 6.13.0-1-default) device-ID: 1002:67df
    memory: 3.91 GiB unified: no display-ID: :0.0
  API: Vulkan v: 1.4.304 layers: 5 device: 0 type: discrete-gpu name: AMD
    Radeon RX 570 Series (RADV POLARIS10) driver: N/A device-ID: 1002:67df
    surfaces: xcb,xlib,wayland
  Info: Tools: api: eglinfo, glxinfo, vulkaninfo
    de: kscreen-console,kscreen-doctor wl: wayland-info
    x11: xdpyinfo, xprop, xrandr
Audio:
  Device-1: Intel 6 Series/C200 Series Family High Definition Audio
    vendor: Gigabyte driver: snd_hda_intel v: kernel bus-ID: 00:1b.0
    chip-ID: 8086:1c20 class-ID: 0403
  Device-2: Advanced Micro Devices [AMD/ATI] Ellesmere HDMI Audio [Radeon
    RX 470/480 / 570/580/590] vendor: ASUSTeK driver: snd_hda_intel v: kernel
    pcie: gen: 2 speed: 5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s
    bus-ID: 01:00.1 chip-ID: 1002:aaf0 class-ID: 0403
  API: ALSA v: k6.13.0-1-default status: kernel-api with: aoss
    type: oss-emulator tools: alsactl,alsamixer,amixer
  Server-1: PipeWire v: 1.2.7 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
    vendor: Gigabyte driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s
    lanes: 1 port: de00 bus-ID: 04:00.0 chip-ID: 10ec:8168 class-ID: 0200
  IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  Info: services: NetworkManager,sshd
Bluetooth:
  Device-1: Cambridge Silicon Radio Bluetooth Dongle (HCI mode) driver: btusb
    v: 0.8 type: USB rev: 2.0 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 2-1.2:3
    chip-ID: 0a12:0001 class-ID: e001
  Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 4.0
    lmp-v: 6 status: discoverable: no pairing: no class-ID: 7c0104
Drives:
  Local Storage: total: 4.18 TiB used: 1.03 TiB (24.7%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 840 EVO 120GB
    size: 111.79 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    tech: SSD serial: <filter> fw-rev: DB6Q scheme: GPT
  ID-2: /dev/sdb maj-min: 8:16 vendor: Kingston model: SA400S37480G
    size: 447.13 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    tech: SSD serial: <filter> fw-rev: K1B3 scheme: GPT
  ID-3: /dev/sdc maj-min: 8:32 vendor: Toshiba model: HDWD240 size: 3.64 TiB
    block-size: physical: 4096 B logical: 512 B speed: 3.0 Gb/s tech: HDD
    rpm: 5400 serial: <filter> fw-rev: 0A scheme: GPT
Partition:
  ID-1: / raw-size: 100.09 GiB size: 100.09 GiB (100.00%)
    used: 30.05 GiB (30.0%) fs: btrfs dev: /dev/dm-2 maj-min: 254:2
    mapped: system-root
  ID-2: /home raw-size: 100.09 GiB size: 100.09 GiB (100.00%)
    used: 30.05 GiB (30.0%) fs: btrfs dev: /dev/dm-2 maj-min: 254:2
    mapped: system-root
  ID-3: /opt raw-size: 100.09 GiB size: 100.09 GiB (100.00%)
    used: 30.05 GiB (30.0%) fs: btrfs dev: /dev/dm-2 maj-min: 254:2
    mapped: system-root
  ID-4: /var raw-size: 100.09 GiB size: 100.09 GiB (100.00%)
    used: 30.05 GiB (30.0%) fs: btrfs dev: /dev/dm-2 maj-min: 254:2
    mapped: system-root
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default) zswap: no
  ID-1: swap-1 type: partition size: 11.67 GiB used: 0 KiB (0.0%)
    priority: -2 dev: /dev/dm-1 maj-min: 254:1 mapped: system-swap
Sensors:
  System Temperatures: cpu: 38.0 C mobo: N/A gpu: amdgpu temp: 49.0 C
  Fan Speeds (rpm): N/A gpu: amdgpu fan: 1233
Info:
  Memory: total: 12 GiB available: 11.67 GiB used: 5.76 GiB (49.4%)
  Processes: 357 Power: uptime: 1h 19m states: freeze,mem,disk suspend: deep
    avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
    suspend, test_resume image: 4.64 GiB services: org_kde_powerdevil,
    power-profiles-daemon, upowerd Init: systemd v: 256 default: graphical
    tool: systemctl
  Packages: pm: rpm pkgs: N/A note: see --rpm tools: yast,zypper pm: flatpak
    pkgs: 23 Compilers: N/A Shell: Bash v: 5.2.37 running-in: konsole
    inxi: 3.3.37
x@x-desktop:~> 

Well its one month old installation. But this happend to me even before but with very low frequency (once per 3 months) so I’ve been ignoring it.

But yesterday it was 3 times in 15 minutes so I know it was time to get some logs and investigate.

Today is fine, no issues.

To me is mostly important to know if my hardware is dying after 14+ years, or it is some kernel / driver issue.

Edit:
the logs above are from running system as is. Not from the ISO.

IMHO hardware failure, but which part? Video card? GPU slot? PSU?
Try another video card or another slot.
Possibly cleaning contacts can help.

BTW, you can install up to Core i7-3770K.