Amdgpu instability on Tumbleweed 20240911

Anyone getting frequent amdgpu crashes on Tumbleweed 20240911?

I’m able to replicate a crash reliably by using moonlight. Within 30 minutes of streaming, the graphics stack crashes, and fails to recover, bringing down plasmashell and the rest of the graphical session with it.

Sep 12 21:04:41 WMS-Mini kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_unified_0 timeout, signaled seq=54751, emitted seq=54751
Sep 12 21:04:41 WMS-Mini kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process moonlight pid 3317 thread moonlight:cs0 pid 3452
Sep 12 21:04:41 WMS-Mini kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset begin!
Sep 12 21:04:41 WMS-Mini kernel: ------------[ cut here ]------------
Sep 12 21:04:41 WMS-Mini kernel: WARNING: CPU: 10 PID: 115 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:630 amdgpu_irq_put+0x46/0x70 [amdgpu]
Sep 12 21:04:41 WMS-Mini kernel: Modules linked in: xpad ff_memless rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns snd_seq nf_conntrack_broadcast snd_seq_device af_packet uhid cmac algif_hash algif_skcipher af_alg nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip6table_filter ip6_tables qrtr bnep nf_tables iptable_filter snd_ctl_led snd_ps_pdm_dma snd_soc_dmic nls_iso8859_1 snd_soc_ps_mach snd_sof_amd_acp63 nls_cp437 snd_sof_amd_vangogh vfat snd_sof_amd_rembrandt fat snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_amd_sdw_acpi soundwire_amd soundwire_generic_allocation soundwire_bus snd_soc_core snd_hda_codec_realtek snd_compress iwlmvm snd_pcm_dmaengine snd_hda_codec_generic snd_rpl_pci_acp6x snd_hda_codec_hdmi
Sep 12 21:04:41 WMS-Mini kernel:  snd_hda_scodec_component snd_acp_pci intel_rapl_msr snd_acp_legacy_common mac80211 amd_atl snd_pci_acp6x libarc4 intel_rapl_common btusb snd_pci_acp5x snd_hda_intel snd_intel_dspcfg btrtl snd_rn_pci_acp3x snd_intel_sdw_acpi snd_hda_codec btintel joydev btbcm snd_hda_core btmtk edac_mce_amd bluetooth snd_hwdep snd_pcm snd_acp_config snd_timer hid_generic snd_soc_acpi snd soundcore snd_pci_acp3x r8169 iwlwifi kvm_amd realtek mdio_devres cfg80211 libphy kvm usbhid thunderbolt amd_pmc tiny_power_button pcspkr k10temp i2c_piix4 rfkill thermal button nvme_fabrics loop fuse dm_mod configfs efi_pstore nfnetlink dmi_sysfs ip_tables x_tables amdgpu crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel amdxcp i2c_algo_bit sha512_ssse3 drm_ttm_helper sha256_ssse3 ttm xhci_pci sha1_ssse3 xhci_pci_renesas drm_exec nvme gpu_sched xhci_hcd drm_suballoc_helper drm_buddy aesni_intel drm_display_helper nvme_core crypto_simd cec usbcore cryptd ccp rc_core nvme_auth sp5100_tco t10_pi video
Sep 12 21:04:41 WMS-Mini kernel:  wmi i2c_hid_acpi i2c_hid serio_raw btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq msr i2c_dev efivarfs
Sep 12 21:04:41 WMS-Mini kernel: CPU: 10 PID: 115 Comm: kworker/u64:2 Not tainted 6.10.7-1-default #1 openSUSE Tumbleweed f8092fbba2996175fc5c99a37a8fe2c3d5d37ff6
Sep 12 21:04:41 WMS-Mini kernel: Hardware name: Micro Computer (HK) Tech Limited Venus series/F7BSC, BIOS 1.09 11/20/2023
Sep 12 21:04:41 WMS-Mini kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Sep 12 21:04:41 WMS-Mini kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
Sep 12 21:04:41 WMS-Mini kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 2a f9 fe c5 e9 6a fd ff ff <0f> 0b b8 ea ff ff ff e9 19 f9 fe c5 b8 ea ff ff ff e9 0f f9 fe c5
Sep 12 21:04:41 WMS-Mini kernel: RSP: 0018:ffffa0930055bca0 EFLAGS: 00010246
Sep 12 21:04:41 WMS-Mini kernel: RAX: ffff8b48c0c4e220 RBX: ffff8b48f0c00000 RCX: 0000000000000000
Sep 12 21:04:41 WMS-Mini kernel: RDX: 0000000000000000 RSI: ffff8b48f0c254e0 RDI: ffff8b48f0c00000
Sep 12 21:04:41 WMS-Mini kernel: RBP: ffff8b48f0c00000 R08: 0000000000042640 R09: 0000000000000006
Sep 12 21:04:41 WMS-Mini kernel: R10: ffffa0930055bc58 R11: 0000000000000000 R12: 0000000000001050
Sep 12 21:04:41 WMS-Mini kernel: R13: ffff8b48f0c44928 R14: ffff8b49a96ca000 R15: ffff8b48f0c105e8
Sep 12 21:04:41 WMS-Mini kernel: FS:  0000000000000000(0000) GS:ffff8b4f01f00000(0000) knlGS:0000000000000000
Sep 12 21:04:41 WMS-Mini kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 12 21:04:41 WMS-Mini kernel: CR2: 000055c24aa5d008 CR3: 00000004f0236000 CR4: 0000000000750ef0
Sep 12 21:04:41 WMS-Mini kernel: PKRU: 55555554
Sep 12 21:04:41 WMS-Mini kernel: Call Trace:
Sep 12 21:04:41 WMS-Mini kernel:  <TASK>
Sep 12 21:04:41 WMS-Mini kernel:  ? amdgpu_irq_put+0x46/0x70 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  ? __warn.cold+0x8e/0xe8
Sep 12 21:04:41 WMS-Mini kernel:  ? amdgpu_irq_put+0x46/0x70 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  ? report_bug+0xff/0x140
Sep 12 21:04:41 WMS-Mini kernel:  ? handle_bug+0x3c/0x80
Sep 12 21:04:41 WMS-Mini kernel:  ? exc_invalid_op+0x17/0x70
Sep 12 21:04:41 WMS-Mini kernel:  ? asm_exc_invalid_op+0x1a/0x20
Sep 12 21:04:41 WMS-Mini kernel:  ? amdgpu_irq_put+0x46/0x70 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  gfx_v11_0_hw_fini+0x1b/0xf0 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  gfx_v11_0_suspend+0xe/0x20 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  amdgpu_device_ip_suspend_phase2+0x10c/0x1a0 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  ? amdgpu_device_ip_suspend_phase1+0x70/0xd0 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  amdgpu_device_ip_suspend+0x40/0x70 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  amdgpu_device_pre_asic_reset+0xd0/0x290 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  amdgpu_device_gpu_recover.cold+0x46f/0xab8 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  amdgpu_job_timedout+0x186/0x1d0 [amdgpu dd130bd7b264316f48a7589a9d9cb949d1081904]
Sep 12 21:04:41 WMS-Mini kernel:  drm_sched_job_timedout+0x67/0x100 [gpu_sched be71deab3171dd2e4b0461c45fe34dede824ab5d]
Sep 12 21:04:41 WMS-Mini kernel:  process_one_work+0x168/0x320
Sep 12 21:04:41 WMS-Mini kernel:  worker_thread+0x32a/0x470
Sep 12 21:04:41 WMS-Mini kernel:  ? __pfx_worker_thread+0x10/0x10
Sep 12 21:04:41 WMS-Mini kernel:  kthread+0xcf/0x100
Sep 12 21:04:41 WMS-Mini kernel:  ? __pfx_kthread+0x10/0x10
Sep 12 21:04:41 WMS-Mini kernel:  ret_from_fork+0x31/0x50
Sep 12 21:04:41 WMS-Mini kernel:  ? __pfx_kthread+0x10/0x10
Sep 12 21:04:41 WMS-Mini kernel:  ret_from_fork_asm+0x1a/0x30
Sep 12 21:04:41 WMS-Mini kernel:  </TASK>
Sep 12 21:04:41 WMS-Mini kernel: ---[ end trace 0000000000000000 ]---
Sep 12 21:04:41 WMS-Mini kernel: ------------[ cut here ]------------

Running openSUSE Tumbleweed 20240911 x86_64 on kernel 6.10.9-1.

The output says that you are still using an old kernel. Was the upgrade complete? Do you have package locks? Is it the actual error output or from a prior boot?

I didn’t even see that. Good catch.

It seems like a zypper dup did not remove the previous kernel but just installed 6.10.9-1, created a boot entry named 6.10.9-1, but actually used 6.10.7-1.1. Interesting.

I just did a sudo zypper rm kernel-default-6.10.7-1.1 && sudo dracut -f and rebooted. I’ll test again to see if I can replicate the crash.

Still present on 6.10.9-1.

Sep 13 10:57:07 WMS-Mini kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 13 10:57:07 WMS-Mini kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_unified_0 timeout, signaled seq=340507, emitted seq=340507
Sep 13 10:57:07 WMS-Mini kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process moonlight pid 22515 thread moonlight:cs0 pid 22656
Sep 13 10:57:07 WMS-Mini kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset begin!
Sep 13 10:57:07 WMS-Mini flatpak[22515]: 00:49:45 - SDL Info (0): Video decode unit queue overflow
Sep 13 10:57:07 WMS-Mini flatpak[22515]: 00:49:45 - SDL Info (0): IDR frame request sent
Sep 13 10:57:07 WMS-Mini flatpak[22515]: 00:49:45 - SDL Info (0): Waiting for IDR frame
Sep 13 10:57:07 WMS-Mini kernel: ------------[ cut here ]------------
Sep 13 10:57:07 WMS-Mini kernel: WARNING: CPU: 5 PID: 22860 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:630 amdgpu_irq_put+0x46/0x70 [amdgpu]
Sep 13 10:57:07 WMS-Mini kernel: Modules linked in: xpad ff_memless nf_conntrack_netbios_ns nf_conntrack_broadcast rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device af_packet nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip6table_filter ip6_tables uhid cmac algif_hash algif_skcipher af_alg qrtr nf_tables bnep iptable_filter nls_iso8859_1 nls_cp437 vfat fat joydev snd_ctl_led snd_soc_dmic snd_ps_pdm_dma snd_soc_ps_mach snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp iwlmvm snd_sof snd_sof_utils snd_pci_ps snd_amd_sdw_acpi soundwire_amd snd_hda_codec_realtek soundwire_generic_allocation mac80211 snd_hda_codec_generic soundwire_bus hid_generic libarc4 snd_hda_scodec_component snd_hda_codec_hdmi snd_soc_core amd_atl
Sep 13 10:57:07 WMS-Mini kernel:  intel_rapl_msr snd_hda_intel intel_rapl_common snd_intel_dspcfg snd_compress snd_intel_sdw_acpi btusb edac_mce_amd snd_pcm_dmaengine snd_rpl_pci_acp6x btrtl snd_hda_codec btintel snd_acp_pci snd_acp_legacy_common btbcm snd_hda_core snd_pci_acp6x kvm_amd btmtk iwlwifi snd_hwdep r8169 usbhid bluetooth snd_pci_acp5x snd_pcm kvm snd_timer snd_rn_pci_acp3x realtek cfg80211 snd snd_acp_config snd_soc_acpi mdio_devres soundcore thunderbolt snd_pci_acp3x pcspkr k10temp i2c_piix4 libphy rfkill thermal tiny_power_button button amd_pmc nvme_fabrics fuse loop dm_mod efi_pstore configfs nfnetlink dmi_sysfs ip_tables x_tables amdgpu crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul amdxcp i2c_algo_bit drm_ttm_helper ghash_clmulni_intel sha512_ssse3 ttm drm_exec sha256_ssse3 xhci_pci gpu_sched xhci_pci_renesas sha1_ssse3 drm_suballoc_helper nvme drm_buddy aesni_intel xhci_hcd nvme_core crypto_simd drm_display_helper nvme_auth cryptd usbcore cec ccp rc_core sp5100_tco t10_pi video wmi i2c_hid_acpi
Sep 13 10:57:07 WMS-Mini kernel:  i2c_hid serio_raw btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq msr i2c_dev efivarfs
Sep 13 10:57:07 WMS-Mini kernel: CPU: 5 PID: 22860 Comm: kworker/u64:0 Not tainted 6.10.9-1-default #1 openSUSE Tumbleweed 14a7a7da264eabf52ab7b4ce03df2738ac9068cb
Sep 13 10:57:07 WMS-Mini kernel: Hardware name: Micro Computer (HK) Tech Limited Venus series/F7BSC, BIOS 1.09 11/20/2023
Sep 13 10:57:07 WMS-Mini kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Sep 13 10:57:07 WMS-Mini kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
Sep 13 10:57:07 WMS-Mini kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 0a e5 02 d9 e9 6a fd ff ff <0f> 0b b8 ea ff ff ff e9 f9 e4 02 d9 b8 ea ff ff ff e9 ef e4 02 d9
Sep 13 10:57:07 WMS-Mini kernel: RSP: 0018:ffffb225c11bfca0 EFLAGS: 00010246
Sep 13 10:57:07 WMS-Mini kernel: RAX: ffff93c3e2f0c020 RBX: ffff93c3d2a80000 RCX: 0000000000000000
Sep 13 10:57:07 WMS-Mini kernel: RDX: 0000000000000000 RSI: ffff93c3d2aa54e0 RDI: ffff93c3d2a80000
Sep 13 10:57:07 WMS-Mini kernel: RBP: ffff93c3d2a80000 R08: 0000000000042640 R09: 0000000000000006
Sep 13 10:57:07 WMS-Mini kernel: R10: ffffb225c11bfc58 R11: 0000000000000000 R12: 0000000000001050
Sep 13 10:57:07 WMS-Mini kernel: R13: ffff93c3d2ac4930 R14: ffff93c3cd3f4400 R15: ffff93c3d2a905e8
Sep 13 10:57:07 WMS-Mini kernel: FS:  0000000000000000(0000) GS:ffff93ca01c80000(0000) knlGS:0000000000000000
Sep 13 10:57:07 WMS-Mini kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 13 10:57:07 WMS-Mini kernel: CR2: 00007f5c90057000 CR3: 000000021f3d6000 CR4: 0000000000750ef0
Sep 13 10:57:07 WMS-Mini kernel: PKRU: 55555554
Sep 13 10:57:07 WMS-Mini kernel: Call Trace:
Sep 13 10:57:07 WMS-Mini kernel:  <TASK>
Sep 13 10:57:07 WMS-Mini kernel:  ? amdgpu_irq_put+0x46/0x70 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  ? __warn.cold+0x8e/0xe8
Sep 13 10:57:07 WMS-Mini kernel:  ? amdgpu_irq_put+0x46/0x70 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  ? report_bug+0xff/0x140
Sep 13 10:57:07 WMS-Mini kernel:  ? handle_bug+0x3c/0x80
Sep 13 10:57:07 WMS-Mini kernel:  ? exc_invalid_op+0x17/0x70
Sep 13 10:57:07 WMS-Mini kernel:  ? asm_exc_invalid_op+0x1a/0x20
Sep 13 10:57:07 WMS-Mini kernel:  ? amdgpu_irq_put+0x46/0x70 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  gfx_v11_0_hw_fini+0x1b/0xf0 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  gfx_v11_0_suspend+0xe/0x20 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  amdgpu_device_ip_suspend_phase2+0x10c/0x1a0 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  ? amdgpu_device_ip_suspend_phase1+0x72/0xe0 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  amdgpu_device_ip_suspend+0x40/0x70 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  amdgpu_device_pre_asic_reset+0xd0/0x290 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  amdgpu_device_gpu_recover.cold+0x50b/0xad3 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  amdgpu_job_timedout+0x186/0x1d0 [amdgpu 93f004b05b6349dc62d73233d15ec9bc1761ca11]
Sep 13 10:57:07 WMS-Mini kernel:  drm_sched_job_timedout+0x67/0x100 [gpu_sched 5ebb1d8ded82493bbea8ca1fae7d764e50e74e60]
Sep 13 10:57:07 WMS-Mini kernel:  process_one_work+0x168/0x320
Sep 13 10:57:07 WMS-Mini kernel:  worker_thread+0x32a/0x470
Sep 13 10:57:07 WMS-Mini kernel:  ? __pfx_worker_thread+0x10/0x10
Sep 13 10:57:07 WMS-Mini kernel:  kthread+0xcf/0x100
Sep 13 10:57:07 WMS-Mini kernel:  ? __pfx_kthread+0x10/0x10
Sep 13 10:57:07 WMS-Mini kernel:  ret_from_fork+0x31/0x50
Sep 13 10:57:07 WMS-Mini kernel:  ? __pfx_kthread+0x10/0x10
Sep 13 10:57:07 WMS-Mini kernel:  ret_from_fork_asm+0x1a/0x30
Sep 13 10:57:07 WMS-Mini kernel:  </TASK>
Sep 13 10:57:07 WMS-Mini kernel: ---[ end trace 0000000000000000 ]---
Sep 13 10:57:07 WMS-Mini kernel: ------------[ cut here ]------------

As with my Reply in another thread from months ago (post follows), this is a longshot. At the time, was running TW … since then, we are using Leap 15.6, and the problem persists without the fix.

The issue: random reboots and such. The fix was instant (cured) and have never had another problem.

Anyway, for our AMD CPU and GPU desktop machine, we have to add these two arguments to the GRUB boot line:

amdgpu.ppfeaturemask=0xffffbffb   amdgpu.dpm=0

My Reply (to old thread) detail is here, with references to sources where I found the fix:

The complete thread is this:

These do not sound like the symptoms I’m having at all. I’ll try, but I doubt it will make a difference.

It should help us help you to know more than just amdgpu about your system. Please paste here input/output from: inxi -GSaz

It’s a simple five minute test.
Boot the machine.

At the Grub boot choices menu, highlight the boot entry then tap ‘E’ to edit.
Use the arrow key to move down to the ‘linux’ entry

Add the entries, using Grub’s editor. (using this method is not permanent, so will not persist on the next boot, so a simple temp test).
Now, boot with the changes.

If it doesn’t fix, the next boot will be back to normal - no permanent changes

I am aware of how to apply module options. Thank you.

The reason why I haven’t replied yet is because I don’t have time to test it at the moment.

System:
  Kernel: 6.10.9-1-default arch: x86_64 bits: 64 compiler: gcc v: 14.2.0
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.10.9-1-default
    root=UUID=41dc7024-cde8-4cdb-ab8e-c73bc5a08427 splash=silent
    mitigations=auto quiet security=apparmor
  Desktop: KDE Plasma v: 6.1.5 tk: Qt v: N/A info: frameworks v: 6.5.0
    wm: kwin_wayland tools: avail: xscreensaver vt: 2 dm: SDDM Distro: openSUSE
    Tumbleweed 20240911
Graphics:
  Device-1: Advanced Micro Devices [AMD/ATI] Phoenix1 driver: amdgpu v: kernel
    arch: RDNA-3 code: Phoenix process: TSMC n4 (4nm) built: 2023+ pcie: gen: 4
    speed: 16 GT/s lanes: 16 ports: active: HDMI-A-1 empty: DP-1, DP-2, DP-3,
    DP-4, DP-5, DP-6, DP-7, HDMI-A-2, Writeback-1 bus-ID: c4:00.0
    chip-ID: 1002:15bf class-ID: 0300 temp: 28.0 C
  Display: wayland server: X.org v: 1.21.1.12 with: Xwayland v: 24.1.2
    compositor: kwin_wayland driver: X: loaded: modesetting unloaded: fbdev,vesa
    dri: radeonsi gpu: amdgpu display-ID: 0
  Monitor-1: HDMI-A-1 res: 1920x1080 size: N/A modes: N/A
  API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
    device: 1 drv: swrast surfaceless: drv: radeonsi wayland: drv: radeonsi x11:
    drv: radeonsi inactive: gbm
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.1.3 glx-v: 1.4
    direct-render: yes renderer: AMD Radeon 780M (radeonsi gfx1103_r1 LLVM
    18.1.8 DRM 3.57 6.10.9-1-default) device-ID: 1002:15bf memory: 3.91 GiB
    unified: no display-ID: :0.0
  API: Vulkan v: 1.3.290 layers: 6 device: 0 type: integrated-gpu name: AMD
    Radeon 780M (RADV GFX1103_R1) driver: N/A device-ID: 1002:15bf
    surfaces: xcb,xlib,wayland

It unfortunately didn’t solve the issue. The display doesn’t work with amdgpu.dpm=0.

I suggest to see if crashes persist running Plasma in an Xorg session instead of Wayland, or if you can produces the moonlight crashes using an IceWM session.

It does. I ended up disabling video decoding accel for now. Software decoding isn’t ideal for a 4K 120 FPS stream, but at least it doesn’t die.

It which, using both the other two changes from using Wayland/Plasma?

The crash occurs regardless of the type of session used (X Server Plasma, Wayland Plasma, or IceWM).

It seems to be related with video decoding/encoding acceleration as I can also replicate it in Firefox when viewing video content off YouTube for example.

Another report of the same issue: Reddit - Dive into anything

Given newness of your GPU I suggest you report a bug. You may want to read also kernel bug report and consider trying a vanilla kernel first.

From inxi output:

Desktop: KDE Plasma v: 6.1.5 tk: Qt v: N/A info: frameworks v: 6.5.0
wm: kwin_wayland tools: avail: xscreensaver vt: 2 dm: SDDM Distro: openSUSE
Tumbleweed 20240911

Display: wayland server: X.org v: 1.21.1.12 with: Xwayland v: 24.1.2
compositor: kwin_wayland driver: X: loaded: modesetting unloaded: fbdev,vesa
dri: radeonsi gpu: amdgpu display-ID: 0

API: Vulkan v: 1.3.290 layers: 6 device: 0 type: integrated-gpu name: AMD
Radeon 780M (RADV GFX1103_R1) driver: N/A device-ID: 1002:15bf
surfaces: xcb,xlib,wayland

ILL system missed some drivers.

So, after some splunking…

https://bugzilla.redhat.com/show_bug.cgi?id=2299241

This seems to be a known issue with Mesa < 24.1.6. It was fixed in this pull request radeonsi/vcn: Add decode DPB buffers as CS dependency (!30510) · Merge requests · Mesa / mesa · GitLab.

Unfortunately, OpenSUSE today is shipping Mesa 24.1.3. Looks like I just need to be patient.

Mesa 3D 24.1.7 is available with X11:XOrg repo.

https://software.opensuse.org/package/Mesa

1 Like

Why opensuse leap 15.6 not have any Experimental Packages for Mesa 3D