Hi folks,
Ever since configuring a new desktop machine with a Ryzen 3400g and opensuse 15.2,
I’ve been plagued with system crashes for which rebooting is the only solution I’ve really
found. It is possible to ssh into the machine from elsewhere, but the jobs that were running
have gone idle and even killing the idle processes doesn’t unhang the machine. Typically
the problem occurs while running firefox, often with more than one tab open. A guarantee
of hitting it within a few seconds is to start up a parallel job (something compiled with
gcc that has some openmp loops in it), and then start up firefox.
I have looked in /var/log/messages, and the signature of the crash appears to be this:
2021-03-30T21:10:39.631908-06:00 lindblad kernel: 9373.307201] simd exception: 0000 #1] SMP NOPTI
2021-03-30T21:10:39.631920-06:00 lindblad kernel: 9373.307206] CPU: 6 PID: 1625 Comm: X Not tainted 5.3.18-lp152.66-default #1 openSUSE Leap 15.2
2021-03-30T21:10:39.631920-06:00 lindblad kernel: 9373.307209] Hardware name: Micro-Star International Co., Ltd MS-7B86/B450 GAMING PLUS MAX (MS-7B86), BIOS H.70 06/17/2020
2021-03-30T21:10:39.631922-06:00 lindblad kernel: 9373.307284] RIP: 0010:mode_support_and_system_configuration+0x2881/0x4b20 [amdgpu]
2021-03-30T21:10:39.631922-06:00 lindblad kernel: 9373.307287] Code: 17 00 00 0f 28 c3 e8 6e d1 ff ff f3 41 0f 11 87 40 19 00 00 e9 2d fd ff ff 83 bd a8 00 00 00 06 75 9a f3 0f 10 85 40 1b 00 00 <f3> 0f 5e 85 f8 17 00 00 e8 42 d1 ff ff 41 8b 97 80 04 00 00 0f 28
2021-03-30T21:10:39.631927-06:00 lindblad kernel: 9373.307292] RSP: 0018:ffffb1790173f790 EFLAGS: 00010246
2021-03-30T21:10:39.631927-06:00 lindblad kernel: 9373.307295] RAX: 0000000000000000 RBX: ffff915d9229afa0 RCX: 0000000000000004
2021-03-30T21:10:39.631928-06:00 lindblad kernel: 9373.307297] RDX: 0000000000000006 RSI: ffff915d9229ac58 RDI: 0000000000000001
2021-03-30T21:10:39.631929-06:00 lindblad kernel: 9373.307300] RBP: ffff915d9229abac R08: ffff915d9229bdb4 R09: 0000000000000120
2021-03-30T21:10:39.631929-06:00 lindblad kernel: 9373.307302] R10: ffff915d9229b558 R11: 0000000000000004 R12: ffff915d9229aa14
2021-03-30T21:10:39.631930-06:00 lindblad kernel: 9373.307304] R13: ffff915d9229c28c R14: ffff915d9229aa14 R15: ffff915d9229aa14
2021-03-30T21:10:39.631930-06:00 lindblad kernel: 9373.307307] FS: 00007f4d141e6ec0(0000) GS:ffff915e80780000(0000) knlGS:0000000000000000
2021-03-30T21:10:39.631931-06:00 lindblad kernel: 9373.307310] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2021-03-30T21:10:39.631931-06:00 lindblad kernel: 9373.307312] CR2: 00007f4d0412e000 CR3: 00000007f8d0e000 CR4: 00000000003406e0
2021-03-30T21:10:39.631932-06:00 lindblad kernel: 9373.307314] Call Trace:
2021-03-30T21:10:39.631932-06:00 lindblad kernel: 9373.307387] dcn_validate_bandwidth+0xd7a/0x1f80 [amdgpu]
2021-03-30T21:10:39.631933-06:00 lindblad kernel: 9373.307451] dc_commit_updates_for_stream+0x92c/0x1410 [amdgpu]
2021-03-30T21:10:39.631933-06:00 lindblad kernel: 9373.307502] ? amdgpu_display_get_crtc_scanoutpos+0x85/0x170 [amdgpu]
2021-03-30T21:10:39.631933-06:00 lindblad kernel: 9373.307567] amdgpu_dm_atomic_commit_tail+0x10da/0x1e10 [amdgpu]
2021-03-30T21:10:39.631934-06:00 lindblad kernel: 9373.307580] ? commit_tail+0x3d/0x80 [drm_kms_helper]
2021-03-30T21:10:39.631934-06:00 lindblad kernel: 9373.307588] commit_tail+0x3d/0x80 [drm_kms_helper]
2021-03-30T21:10:39.631935-06:00 lindblad kernel: 9373.307596] drm_atomic_helper_commit+0x107/0x130 [drm_kms_helper]
2021-03-30T21:10:39.631936-06:00 lindblad kernel: 9373.307612] drm_mode_obj_set_property_ioctl+0x24d/0x2e0 [drm]
2021-03-30T21:10:39.631936-06:00 lindblad kernel: 9373.307616] ? mutex_lock+0xe/0x30
2021-03-30T21:10:39.631936-06:00 lindblad kernel: 9373.307630] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
2021-03-30T21:10:39.631937-06:00 lindblad kernel: 9373.307644] drm_ioctl_kernel+0xac/0xf0 [drm]
2021-03-30T21:10:39.631937-06:00 lindblad kernel: 9373.307658] drm_ioctl+0x2eb/0x3b0 [drm]
2021-03-30T21:10:39.631938-06:00 lindblad kernel: 9373.307674] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
2021-03-30T21:10:39.631938-06:00 lindblad kernel: 9373.307678] ? do_iter_write+0xf2/0x1a0
2021-03-30T21:10:39.631939-06:00 lindblad kernel: 9373.307728] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
2021-03-30T21:10:39.631939-06:00 lindblad kernel: 9373.307731] do_vfs_ioctl+0xa0/0x680
2021-03-30T21:10:39.631939-06:00 lindblad kernel: 9373.307735] ? __sys_recvmsg+0x8a/0xa0
2021-03-30T21:10:39.631940-06:00 lindblad kernel: 9373.307737] ksys_ioctl+0x70/0x80
2021-03-30T21:10:39.631940-06:00 lindblad kernel: 9373.307740] __x64_sys_ioctl+0x16/0x20
2021-03-30T21:10:39.631940-06:00 lindblad kernel: 9373.307743] do_syscall_64+0x65/0x1f0
2021-03-30T21:10:39.631941-06:00 lindblad kernel: 9373.307745] entry_SYSCALL_64_after_hwframe+0x44/0xa9
2021-03-30T21:10:39.631941-06:00 lindblad kernel: 9373.307748] RIP: 0033:0x7f4d11ad59e7
2021-03-30T21:10:39.631942-06:00 lindblad kernel: 9373.307751] Code: b3 66 90 48 8b 05 b1 14 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 14 2c 00 f7 d8 64 89 01 48
2021-03-30T21:10:39.631942-06:00 lindblad kernel: 9373.307755] RSP: 002b:00007ffc56f609b8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
2021-03-30T21:10:39.631943-06:00 lindblad kernel: 9373.307758] RAX: ffffffffffffffda RBX: 000055f7c687d800 RCX: 00007f4d11ad59e7
2021-03-30T21:10:39.631943-06:00 lindblad kernel: 9373.307760] RDX: 00007ffc56f609f0 RSI: 00000000c01864ba RDI: 000000000000000d
2021-03-30T21:10:39.631943-06:00 lindblad kernel: 9373.307762] RBP: 00007ffc56f609f0 R08: 000000000000005a R09: 000055f7c687e0c0
2021-03-30T21:10:39.631944-06:00 lindblad kernel: 9373.307764] R10: 000055f7c795e284 R11: 0000000000003246 R12: 00000000c01864ba
2021-03-30T21:10:39.631944-06:00 lindblad kernel: 9373.307766] R13: 000000000000000d R14: 0000000000000fff R15: 0000000000000003
2021-03-30T21:10:39.631945-06:00 lindblad kernel: 9373.307769] Modules linked in: fuse af_packet xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter usblp dmi_sysfs msr edac_mce_amd ccp kvm snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio irqbypass snd_hda_codec_hdmi snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul snd_hda_core ghash_clmulni_intel snd_hwdep aesni_intel snd_pcm aes_x86_64 ppdev joydev snd_timer crypto_simd sp5100_tco cryptd glue_helper r8169 snd wmi_bmof soundcore parport_pc realtek parport libphy i2c_piix4 k10temp pcspkr gpio_amdpt gpio_generic button hid_microsoft ff_memless hid_generic usbhid xfs libcrc32c amdgpu amd_iommu_v2 gpu_sched i2c_algo_bit ttm
2021-03-30T21:10:39.631945-06:00 lindblad kernel: 9373.307797] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel drm xhci_pci sr_mod xhci_hcd cdrom usbcore wmi video sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
2021-03-30T21:10:39.631946-06:00 lindblad kernel: 9373.307822] --- end trace f7e31fd7e1ab6046 ]---
2021-03-30T21:10:39.631946-06:00 lindblad kernel: 9373.307892] RIP: 0010:mode_support_and_system_configuration+0x2881/0x4b20 [amdgpu]
2021-03-30T21:10:39.631947-06:00 lindblad kernel: 9373.307897] Code: 17 00 00 0f 28 c3 e8 6e d1 ff ff f3 41 0f 11 87 40 19 00 00 e9 2d fd ff ff 83 bd a8 00 00 00 06 75 9a f3 0f 10 85 40 1b 00 00 <f3> 0f 5e 85 f8 17 00 00 e8 42 d1 ff ff 41 8b 97 80 04 00 00 0f 28
2021-03-30T21:10:39.631947-06:00 lindblad kernel: 9373.307903] RSP: 0018:ffffb1790173f790 EFLAGS: 00010246
2021-03-30T21:10:39.631947-06:00 lindblad kernel: 9373.307908] RAX: 0000000000000000 RBX: ffff915d9229afa0 RCX: 0000000000000004
2021-03-30T21:10:39.631948-06:00 lindblad kernel: 9373.307913] RDX: 0000000000000006 RSI: ffff915d9229ac58 RDI: 0000000000000001
2021-03-30T21:10:39.631948-06:00 lindblad kernel: 9373.307917] RBP: ffff915d9229abac R08: ffff915d9229bdb4 R09: 0000000000000120
2021-03-30T21:10:39.631948-06:00 lindblad kernel: 9373.307921] R10: ffff915d9229b558 R11: 0000000000000004 R12: ffff915d9229aa14
2021-03-30T21:10:39.631949-06:00 lindblad kernel: 9373.307925] R13: ffff915d9229c28c R14: ffff915d9229aa14 R15: ffff915d9229aa14
2021-03-30T21:10:39.631949-06:00 lindblad kernel: 9373.307929] FS: 00007f4d141e6ec0(0000) GS:ffff915e80780000(0000) knlGS:0000000000000000
2021-03-30T21:10:39.631950-06:00 lindblad kernel: 9373.307934] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2021-03-30T21:10:39.631950-06:00 lindblad kernel: 9373.307939] CR2: 00007f4d0412e000 CR3: 00000007f8d0e000 CR4: 00000000003406e0
2021-03-30T21:11:04.768518-06:00 lindblad tracker-store[14225]: OK
googling on some of the strings in the outputput above, leads me to
threads like this one:
https://gitlab.freedesktop.org/drm/amd/-/issues/1154
where I read information that indicates that my problem has something to do with some exception masking inconsistancy (or something like that).
It also seems to indicate that the fixes for the problem (if it is indeed the same one) appear in linux 5.8 or so. Not being terribly deep in kernel
patching/hacking/substitution, I’m not in a position to swap out kernels to try such myself however.
So my question are these:
- am I right in my diagnosis that my problem is likely the same one as in the referenced thread?
- If so, are the patches involved, scheduled for (already in?) the update stream for opensuse 15.2? How would
I know whether my OS has had them applied? - If not, is this error signature familiar to anyone, and does it have a fix that I can try out on my machine?
Thanks,
andy271828