Kernel Panic Causing KDE to Freeze (AMD Build)

Hello Fellow OpenSUSE users!

I’ve recently switched to Tumbleweed and had some troubles with my hardware. Most issues were fixed (enabling IOMMU on MOBO to fix USB registration problems), however, I am now facing an issue with kernel panic (my guess) causing the system to freeze and become unresponsive.

My current setup is as follows:

  • CPU: AMD FX8350
  • MOBO: Gigabyte 970A-DS3P
  • Wireless/BT adapter: Gigabyte GC-WB1833D-I
  • RAM: 2x Patriot Viper 3 8GB DDR3-1600
  • SSD: Kingston A400 256GB (used for OS)
  • HD: WD Blue 2TB (used for data)
  • Video: MSI Radeon HD 7850 2GB

OS Configuration: OpenSUSE Tumbleweed (version 20230108)

Current Diagnosis

  • I have not noticed that system failure can occur anywhere between 3 and 12 hours of runtime
  • The only log that I have witnessed in journalctl is the following

win_x11[1783]: kwin_core: XCB error: 152 (BadDamage), sequence: 47449, resource id: 10576624, major code: 143 (DAMAGE), minor code: 3 (Subtract)

Symptoms

  • Screen freezes and displays no feedback in dmesg or journalctl
  • Mouse is unresponsive (unplugging and plugging back in yields no change)
  • Keyboard is unresponsive (unplugging and plugging back in yields no change)
  • Ctrl+Alt+F1 will not escape to console
  • “Raising the Elephant” key combos are not possible
  • After reboot, no further log entries are present in /var/log/ logs

I’m out of troubleshooting ideas. If anyone has an idea about what further troubleshooting methods I could try or any suggestions about a fix, I would be very thankful!

Certainly not expert here, so may be suggesting very naive things, but some thoughts:

When is this occurring? Any particular sequence of steps/software that you’re running? Is it on boot?

You mentioned it is happening between 3 and 12 hours of runtime or not failing until after 12 hrs? Is it possibly a temperature issue? Brother had an AMD processor that did have this kind of freeze at times, he eventually traced it to temperature.

Could maybe try booting into a simpler graphical interface (icewm?) or just to text? Maybe it’s a graphics card issue?

Some thoughts to get you started, hopefully.

1 Like

I agree. Also try switching between Wayland and Xorg, or vice versa, for Plasma.

Please post here using code tags

	like here
		to preserve original
	command output
					layout

input/output from inxi -Faz.

Try switching display drivers between radeon and modesetting, or vice versa.

1 Like

Hello @SisterPenguin and @mrmazda ,

Thank you very much for such fast replies.

I will take both of your advice to form my next action plan.
Now, I will do the following:

  1. Monitor temperatures until the next system freeze
  2. Switching display drivers from radeon to modesetting, or vice versa
  3. Switching from Wayland to Xorg
  4. (If 2 and 3 fail) Switch display drivers and Xorg
  5. (If 4 fails) Boot into simpler GUI (I will use IceWM as suggested) - I do prefer to avoid this because I want to use Plasma, but I guess we will see

After all of this I will report back with progress updates. This might take one week due to the wait times for system freeze.

I can’t determine the conditions for the failure. That is part of my issue.
I have seen that freezes occur both when many applications are running and when none are. Freezing never occurs after boot and usually occurs after 3 hours (fastest recorded).

You may try to kill graphics by holding down ctrl + alt and pressing the tab key twice.

Both introduced around 2012 – possibly, hardware ageing could be an issue –

  • Memory/RAM beginning to display ageing errors?

You have 2 memory cards – please try swapping them between the Mainboard slots you’re using.

  • If the issue doesn’t disappear, you’ll have to first run with only one (8 GB) memory module and then, with the other one – if the issue narrows down to only one memory module, replace that module …

Hello All,

Quick update: I ran my system today and watched sensors. The system ran for 9 hours until freezing. I don’t think temperature is the issue as I received the following results:

Max CPU temp: 24.0°C
Max GPU temp: 24.0°C
Max WiFi adapter temp: 26.0°C

@karlmistelberger , thank you for the suggestion. Unfortunately, those keystrokes didn’t work. The keyboard doesn’t respond anymore after system freeze.

@dcurtisfra , the CPU was purchased in 2013, but the MOBO was purchased only in 2019 (somehow they still sell this one). Nevertheless, the RAM could definitely be a culprit. I will add this test to my action plan. I do suspect that my issue is hardware-related.

My updated action plan is now as follows:

  1. [COMPLETED - no result] Monitor temperatures until the next system freeze
  2. Switch display drivers from radeon to modesetting, or vice versa
  3. Switch from Wayland to Xorg
  4. Switch display drivers and use Xorg
  5. Test RAM sticks
  6. Boot into simpler GUI (I will use IceWM as suggested)

More updates to come in the following days

Hmm, the fact that it never occurs before 3 hours initially gave me temperature theory fuel, but that appears to be bust. I’m kind of inclined to think that dcrutisfra might be onto something with the RAM.

The relatively long functioning duration tends to make me think hardware, or maybe a memory leak someplace. Are there any suspect background processes that are running? (top may help to see if there are things that are running unexpectedly or that you don’t recognize)

Keep us posted; looks like your action plan is great.

Hello All,

A quick update regarding my stability testing progress.

Short Summary: Of all tests tried so far, switching display servers yielded the most change. It turns out that I was booting into X11 and not Wayland. After switching to Wayland, my system remained stable for 24 hours (longest yet). When booting into IceWM, the system froze again after 13h (upper limit of time before crashes so far).

Detailed Summary
Firstly, to respond to @mrmazda’s missed request for system information using inxi -Faz.
Note: I have booted into Wayland and not X11 at the point of this output.

System:
  Kernel: 6.1.3-1-default arch: x86_64 bits: 64 compiler: gcc v: 12.2.1
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.1.3-1-default
    root=UUID=a2985ff3-8f15-48d0-b76a-020afc4957d3 splash=silent quiet
    security=apparmor mitigations=auto
  Desktop: KDE Plasma v: 5.26.5 tk: Qt v: 5.15.7 wm: kwin_wayland vt: 2
    dm: SDDM Distro: openSUSE Tumbleweed 20230110
Machine:
  Type: Desktop System: Gigabyte product: N/A v: N/A
    serial: <superuser required> Chassis: type: 3 serial: <superuser required>
  Mobo: Gigabyte model: 970A-DS3P v: x.x serial: <superuser required>
    UEFI: American Megatrends v: FD date: 02/26/2016
CPU:
  Info: model: AMD FX-8350 bits: 64 type: MT MCP arch: Piledriver level: v2
    built: 2012-13 process: GF 32nm family: 0x15 (21) model-id: 2 stepping: 0
    microcode: 0x6000852
  Topology: cpus: 1x cores: 8 smt: enabled cache: L1: 384 KiB
    desc: d-8x16 KiB; i-4x64 KiB L2: 8 MiB desc: 4x2 MiB L3: 8 MiB desc: 1x8 MiB
  Speed (MHz): avg: 1403 high: 1406 min/max: 1400/4000 boost: enabled
    scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 1406 2: 1400
    3: 1406 4: 1406 5: 1406 6: 1400 7: 1405 8: 1400 bogomips: 64300
  Flags: avx ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities:
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data status: Not affected
  Type: retbleed mitigation: untrained return thunk; SMT vulnerable
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, STIBP:
    disabled, RSB filling, PBRSB-eIBRS: Not affected
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: AMD Pitcairn PRO [Radeon HD 7850 / R7 265 R9 270 1024SP]
    vendor: Micro-Star MSI driver: radeon v: kernel alternate: amdgpu
    arch: GCN-1 code: Southern Islands process: TSMC 28nm built: 2011-20 pcie:
    gen: 2 speed: 5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s ports:
    active: DVI-I-1 empty: DP-1,DP-2,HDMI-A-1 bus-ID: 01:00.0
    chip-ID: 1002:6819 class-ID: 0300 temp: 30.0 C
  Device-2: Huawei UVC Camera type: USB driver: snd-usb-audio,uvcvideo
    bus-ID: 5-3:4 chip-ID: 12d1:4321 class-ID: 0102 serial: <filter>
  Display: wayland server: X.org v: 1.21.1.6 with: Xwayland v: 22.1.7
    compositor: kwin_wayland driver: X: loaded: modesetting unloaded: fbdev,vesa
    dri: radeonsi gpu: radeon display-ID: 0
  Monitor-1: DVI-I-1 res: 1920x1080 size: N/A modes: N/A
  API: OpenGL v: 4.5 Mesa 22.3.2 renderer: PITCAIRN ( LLVM 15.0.6 DRM 2.50
    6.1.3-1-default) direct render: Yes
Audio:
  Device-1: AMD SBx00 Azalia vendor: Gigabyte driver: snd_hda_intel
    bus-ID: 5-3:4 v: kernel chip-ID: 12d1:4321 bus-ID: 00:14.2 class-ID: 0102
    chip-ID: 1002:4383 serial: <filter> class-ID: 0403
  Device-2: AMD Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000
    Series] vendor: Micro-Star MSI driver: snd_hda_intel v: kernel pcie:
    gen: 2 speed: 5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s
    bus-ID: 01:00.1 chip-ID: 1002:aab0 class-ID: 0403
  Device-3: Huawei UVC Camera type: USB driver: snd-usb-audio,uvcvideo
  Sound API: ALSA v: k6.1.3-1-default running: yes
  Sound Server-1: PulseAudio v: 16.1 running: no
  Sound Server-2: PipeWire v: 0.3.63 running: yes
Network:
  Device-1: Intel Wireless-AC 9260 driver: iwlwifi v: kernel pcie: gen: 1
    speed: 2.5 GT/s lanes: 1 link-max: gen: 2 speed: 5 GT/s bus-ID: 03:00.0
    chip-ID: 8086:2526 class-ID: 0280
  IF: wlp3s0 state: up mac: <filter>
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    vendor: Gigabyte driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s
    lanes: 1 port: d000 bus-ID: 04:00.0 chip-ID: 10ec:8168 class-ID: 0200
  IF: enp4s0 state: down mac: <filter>
  Device-3: Realtek RTL8192CE PCIe Wireless Network Adapter vendor: ASUSTeK
    driver: rtl8192ce v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1 port: c000
    bus-ID: 05:00.0 chip-ID: 10ec:8178 class-ID: 0280
  IF: wlp5s0 state: down mac: <filter>
Bluetooth:
  Device-1: Intel Wireless-AC 9260 Bluetooth Adapter type: USB driver: btusb
    v: 0.8 bus-ID: 7-1:2 chip-ID: 8087:0025 class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 0 state: up address: see --recommends
Drives:
  Local Storage: total: 5.79 TiB used: 701.84 GiB (11.8%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 840 Series
    size: 111.79 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    type: SSD serial: <filter> rev: 8B0Q scheme: MBR
  ID-2: /dev/sdb maj-min: 8:16 vendor: Toshiba model: DT01ACA200
    size: 1.82 TiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
    type: HDD rpm: 7200 serial: <filter> rev: ABB0 scheme: MBR
  ID-3: /dev/sdc maj-min: 8:32 vendor: Kingston model: SA400S37240G
    size: 223.57 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    type: SSD serial: <filter> rev: T1A3 scheme: GPT
  ID-4: /dev/sdd maj-min: 8:48 vendor: Western Digital
    model: WD20EZBX-00AYRA0 size: 1.82 TiB block-size: physical: 4096 B
    logical: 512 B speed: 6.0 Gb/s type: HDD rpm: 7200 serial: <filter>
    rev: 1A01 scheme: GPT
  ID-5: /dev/sde maj-min: 8:64 type: USB vendor: Western Digital
    model: WD My Passport 0820 size: 1.82 TiB block-size: physical: 512 B
    logical: 512 B type: N/A serial: <filter> rev: 1012 scheme: MBR
Partition:
  ID-1: / raw-size: 223.07 GiB size: 223.07 GiB (100.00%)
    used: 20.12 GiB (9.0%) fs: btrfs dev: /dev/sdc2 maj-min: 8:34
  ID-2: /boot/efi raw-size: 512 MiB size: 511 MiB (99.80%)
    used: 5.1 MiB (1.0%) fs: vfat dev: /dev/sdc1 maj-min: 8:33
  ID-3: /home raw-size: 1.82 TiB size: 1.82 TiB (100.00%)
    used: 681.72 GiB (36.6%) fs: btrfs dev: /dev/sdd1 maj-min: 8:49
  ID-4: /opt raw-size: 223.07 GiB size: 223.07 GiB (100.00%)
    used: 20.12 GiB (9.0%) fs: btrfs dev: /dev/sdc2 maj-min: 8:34
  ID-5: /var raw-size: 223.07 GiB size: 223.07 GiB (100.00%)
    used: 20.12 GiB (9.0%) fs: btrfs dev: /dev/sdc2 maj-min: 8:34
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default)
  ID-1: swap-1 type: partition size: 2 GiB used: 0 KiB (0.0%) priority: -2
    dev: /dev/sdd2 maj-min: 8:50
Sensors:
  System Temperatures: cpu: 36.0 C mobo: 27.0 C gpu: radeon temp: 30.0 C
  Fan Speeds (RPM): cpu: 675 fan-1: 2327 fan-3: 1169 fan-4: 0 fan-5: 0
  Power: 12v: N/A 5v: N/A 3.3v: N/A vbat: 3.10
Info:
  Processes: 314 Uptime: 0h 35m wakeups: 0 Memory: 15.58 GiB
  used: 4.23 GiB (27.1%) Init: systemd v: 252 default: graphical
  tool: systemctl Compilers: gcc: 12.2.1 alt: 12 Packages: pm: rpm pkgs: N/A
  note: see --rpm tools: yast,zypper pm: flatpak pkgs: 9 Shell: Bash v: 5.2.15
  running-in: konsole inxi: 3.3.23

Now for an update from my action plan.

  1. [COMPLETED - system crashed] Monitor temperatures until the next system freeze

Time before system crash: 11 hours
System performance at crash:

  • Max CPU temp: 24.0°C
  • Max GPU temp: 24.0°C
  • Max WiFi adapter temp: 26.0°C
  1. [COMPLETED - system crashed] Switch display drivers from radeon to modesetting, or vice versa

Time before system crash: 9.5 hours

Notes

  • The following relevant packages were already installed
xorg-x11-driver-video
xf86-video-fbdev
xf86-video-mach64
xf86-video-r128
xf86-video-vesa
  • xf86-video-mach64 and xf86-video-r128 were uninstalled and dmesg | grep drm gave the following output:
[    0.388280] ACPI: bus type drm_connector registered
[    0.390599] [drm] Initialized simpledrm 1.0.0 20200625 for simple-framebuffer.0 on minor 0
[    0.397026] simple-framebuffer simple-framebuffer.0: [drm] fb0: simpledrmdrmfb frame buffer device
[    2.358760] [drm] radeon kernel modesetting enabled.
[    2.420922] [drm] initializing kernel modesetting (PITCAIRN 0x1002:0x6819 0x1462:0x2730 0x00).
[    2.421056] [drm] Detected VRAM RAM=2048M, BAR=256M
[    2.421057] [drm] RAM width 256bits DDR
[    2.421074] [drm] radeon: 2048M of VRAM memory ready
[    2.421076] [drm] radeon: 2048M of GTT memory ready.
[    2.421082] [drm] Loading pitcairn Microcode
[    2.425093] [drm] Internal thermal controller with fan control
[    2.433455] [drm] radeon: dpm initialized
[    2.446963] [drm] Found VCE firmware/feedback version 50.0.1 / 17!
[    2.446969] [drm] GART: num cpu pages 524288, num gpu pages 524288
[    2.448428] [drm] PCIE gen 2 link speeds already enabled
[    2.457201] [drm] PCIE GART of 2048M enabled (table at 0x00000000001D6000).
[    2.478521] [drm] radeon: irq initialized.
[    2.667684] [drm] ring test on 0 succeeded in 4 usecs
[    2.667689] [drm] ring test on 1 succeeded in 1 usecs
[    2.667693] [drm] ring test on 2 succeeded in 1 usecs
[    2.667703] [drm] ring test on 3 succeeded in 6 usecs
[    2.667716] [drm] ring test on 4 succeeded in 6 usecs
[    2.843614] [drm] ring test on 5 succeeded in 2 usecs
[    2.843621] [drm] UVD initialized successfully.
[    2.952867] [drm] ring test on 6 succeeded in 21 usecs
[    2.952882] [drm] ring test on 7 succeeded in 4 usecs
[    2.952883] [drm] VCE initialized successfully.
[    2.953025] [drm] ib test on ring 0 succeeded in 0 usecs
[    2.953106] [drm] ib test on ring 1 succeeded in 0 usecs
[    2.953156] [drm] ib test on ring 2 succeeded in 0 usecs
[    2.953205] [drm] ib test on ring 3 succeeded in 0 usecs
[    2.953254] [drm] ib test on ring 4 succeeded in 0 usecs
[    3.612150] [drm] ib test on ring 5 succeeded
[    4.124128] [drm] ib test on ring 6 succeeded
[    4.636126] [drm] ib test on ring 7 succeeded
[    4.636660] [drm] Radeon Display Connectors
[    4.636660] [drm] Connector 0:
[    4.636661] [drm]   DP-1
[    4.636662] [drm]   HPD4
[    4.636663] [drm]   DDC: 0x6530 0x6530 0x6534 0x6534 0x6538 0x6538 0x653c 0x653c
[    4.636665] [drm]   Encoders:
[    4.636665] [drm]     DFP1: INTERNAL_UNIPHY2
[    4.636666] [drm] Connector 1:
[    4.636667] [drm]   DP-2
[    4.636667] [drm]   HPD5
[    4.636668] [drm]   DDC: 0x6540 0x6540 0x6544 0x6544 0x6548 0x6548 0x654c 0x654c
[    4.636669] [drm]   Encoders:
[    4.636670] [drm]     DFP2: INTERNAL_UNIPHY2
[    4.636671] [drm] Connector 2:
[    4.636671] [drm]   HDMI-A-1
[    4.636672] [drm]   HPD1
[    4.636672] [drm]   DDC: 0x6550 0x6550 0x6554 0x6554 0x6558 0x6558 0x655c 0x655c
[    4.636674] [drm]   Encoders:
[    4.636674] [drm]     DFP3: INTERNAL_UNIPHY1
[    4.636675] [drm] Connector 3:
[    4.636675] [drm]   DVI-I-1
[    4.636676] [drm]   HPD6
[    4.636676] [drm]   DDC: 0x6580 0x6580 0x6584 0x6584 0x6588 0x6588 0x658c 0x658c
[    4.636678] [drm]   Encoders:
[    4.636678] [drm]     DFP4: INTERNAL_UNIPHY
[    4.636679] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[    4.731661] [drm] fb mappable at 0xC05E1000
[    4.731664] [drm] vram apper at 0xC0000000
[    4.731664] [drm] size 8294400
[    4.731665] [drm] fb depth is 24
[    4.731666] [drm]    pitch is 7680
[    4.731769] fbcon: radeondrmfb (fb0) is primary device
[    4.793282] radeon 0000:01:00.0: [drm] fb0: radeondrmfb frame buffer device
[    4.808165] [drm] Initialized radeon 2.50.0 20080528 for 0000:01:00.0 on minor 0
[    7.283904] [drm] amdgpu kernel modesetting enabled.
[    8.179264] systemd[1]: Starting Load Kernel Module drm...
[    8.190962] systemd[1]: modprobe@drm.service: Deactivated successfully.
[    8.191100] systemd[1]: Finished Load Kernel Module drm.

I assume that this indicates that modesetting was used successfully.

Output from sudo lspci -nnk | grep -A3 VGA:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn PRO [Radeon HD 7850 / R7 265 / R9 270 1024SP] [1002:6819]
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:2730]
        Kernel driver in use: radeon
        Kernel modules: radeon, amdgpu
  1. [COMPLETED - system remained stable] Switch from Xorg to Wayland

Time before system crash: None. System remained stable for 24 hours

Notes

  • I discovered that I was booting in X11 and not Wayland by default
  1. [SKIPPED - AP3 succeeded] Switch display drivers and use Wayland
  1. [SKIPPED - AP3 succeeded] Test RAM sticks
  1. [COMPLETED - system crashed] Boot into IceWM

Time before system crash: 14h

I will now proceed to run my system on Wayland continuously with the hopes that this issue is resolved. Even if the issue is not, I’d like to extend a personal thanks to everyone who offered help on this issue and provided support: I couldn’t have gotten this far without you :slightly_smiling_face:

This is a first generation GCN GPU. You’re currently running on the radeon kernel driver with the modesetting display driver.

If it turns out the freezes are not gone, have a re-read here, and then consider a switch to the amdgpu kernel and display drivers. According to an openSUSE X maintainer, for GCN1 these drivers are considered “experimental” and inadequately tested. I use them with my GCN1 card with no apparent issues. OTOH, older cards are pretty old, more than about a decade, and thus ostensibly little used by driver developers and maintainers, making radeon drivers susceptible to unexposed regressions capable of causing crashing.

Hello All,

An update on my system stability status: my system has only remained stable up to 24h of uptime.

  • Using Wayland caused compatibility issues withe certain apps (known issue with SUSE and KDE - X11 is the preferred WM)
  • I switched back to X11 and switch my driver to amdgpu using this guide: SDB:AMDGPU.

Still better stability has not been reached. To further my investigation, I looked into whether my kernal was tainted, which it is by virtualbox.

dmesg | grep -i taint ouputs

vboxdrv: loading out-of-tree module taints kernel.

Output of kernel-chktaint shell scrip acquired from the Linux tainted kernels guide.

Kernel is "tainted" for the following reasons:
 * externally-built ('out-of-tree') module was loaded  (#12)
For a more detailed explanation of the various taint flags see
 Documentation/admin-guide/tainted-kernels.rst in the Linux kernel sources
 or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html
Raw taint value as int/string: 4096/'G           O      '

I feel this should not be a problem so I am continuing to follow the ( Configure crashkernel memory for kernel core dump analysis)[Configure crashkernel memory for kernel core dump analysis | Support | SUSE] in order to create more diagnostic information.

Additionally, I have also enabled the magic SysRq key presses in order to provide myself with further help.

Following updates will be posted.

Correct – the “Kernel tainted” is only a warning which Oracle’s VirtualBox triggers.

  • It’s a warning to remind that, Virtual Machines can, and do, affect the Kernel but, the influence on the Kernel’s behaviour is well known and, neither damaging nor destructive …

Hello All,

Quick update. My system has been running for two weeks now and I am no longer experiencing crashes. The only two major changes that I have made to my system are using the amdgpu video card driver and using Wayland as my window manager. I don’t know if one of the two has more effect than the other, but I’m just happy that my desktop is now running stable.