Kernel Hardware errors reported after upgrade from 15.3

Low-end HP Laptop 15s-eq1516sa (AMD Ryzen3, 4GB RAM, Realtek RTL8821CE WiFi)

No problems running 15.3 (apart from sleep/resume), but since upgrade to 15.4 Beta (Build 236.1) I am seeing mce errors, especially when I initiate network activities. I can see these several times in a session, but otherwise the machine works pretty well, including sleep/resume.

Message from syslogd@localhost at May 25 19:37:16 ...
 kernel: 8554.577754] T9294] [Hardware Error]: Deferred error, no action required.

Message from syslogd@localhost at May 25 19:37:16 ...
 kernel: 8554.577855] T9294] [Hardware Error]: CPU:0 (17:18:1) MC21_STATUS[Over|-|-|AddrV|-|-|SyndV|UECC|Deferred|-|-]: 0xd42030000001082b

Message from syslogd@localhost at May 25 19:37:16 ...
 kernel: 8554.577882] T9294] [Hardware Error]: Error Addr: 0x00007ffcffffff00

Message from syslogd@localhost at May 25 19:37:16 ...
 kernel: 8554.577892] T9294] [Hardware Error]: IPID: 0x0000002e00000001, Syndrome: 0x000000005b240205

Message from syslogd@localhost at May 25 19:37:16 ...
 kernel: 8554.577931] T9294] [Hardware Error]: Coherent Slave Ext. Error Code: 1, Address Violation.

Message from syslogd@localhost at May 25 19:37:16 ...
 kernel: 8554.577945] T9294] [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: WR, part-proc: SRC (no timeout)

I suspect this might be related to the WiFi module rtw_8821ce (I was using a community build one in 15.3, now it is built in), but unsure how to progress or diagnose this. Also, is it worth a bug report?

Any suggestions gratefully received.
Richard

Post:

zypper se -si rtw 8821
uname -a

PS
its working here in Leap 15.4 with rtw88 from Kernel…

LANG=C zypper se -si rtw88 8821
Loading repository data...
Reading installed packages...
No matching items found.

uname -a
Linux laptop 5.14.21-150400.22-default #1 SMP PREEMPT_DYNAMIC Wed May 11 06:57:18 UTC 2022 (49db222) x86_64 x86_64 x86_64 GNU/Linux

modinfo rtw88_8821ce
filename: ** /lib/modules/5.14.21-150400.22-default/kernel/drivers/net/wireless/realtek/rtw88/rtw88_8821ce.ko.zst**
license: Dual BSD/GPL
description: Realtek 802.11ac wireless 8821ce driver
author: Realtek Corporation
suserelease: SLE15-SP4
srcversion: 937569CB329370F6305F581
alias: pci:v000010ECd0000C821svsdbcsci*
depends: rtw88_pci,rtw88_8821c
supported: yes
retpoline: Y
intree: Y
name: rtw88_8821ce
vermagic: 5.14.21-150400.22-default SMP preempt mod_unload modversions
sig_id: PKCS#7
signer: SUSE Linux Enterprise Secure Boot CA
sig_key: ED:87:85:B7:8F:FC:12:7E
sig_hashalgo: sha256

$~> zypper se -si rtw 8821
Loading repository data...
Reading installed packages...

S  | Name                 | Type    | Version               | Arch   | Repository
---+----------------------+---------+-----------------------+--------+------------------
i  | libKPimImportWizard5 | package | 21.12.3-bp154.1.19    | x86_64 | (System Packages)
i+ | rtl8821ce-ueficert   | package | git20211120-lp153.2.9 | x86_64 | (System Packages)
i+ | rtw88-ueficert       | package | git20220514-lp153.4.1 | x86_64 | (System Packages)

$~> uname -a
Linux HP_Laptop 5.14.21-150400.22-default #1 SMP PREEMPT_DYNAMIC Wed May 11 06:57:18 UTC 2022 (49db222) x86_64 x86_64 x86_64 GNU/Linux

$~> sudo modinfo rtw88_8821ce
filename:       /lib/modules/5.14.21-150400.22-default/kernel/drivers/net/wireless/realtek/rtw88/rtw88_8821ce.ko.zst
license:        Dual BSD/GPL
description:    Realtek 802.11ac wireless 8821ce driver
...

Looks like I have the same module

thanks
Richard

i+ | rtl8821ce-ueficert   | package | git20211120-lp153.2.9 | x86_64 | (System Packages)
i+ | rtw88-ueficert       | package | git20220514-lp153.4.1 | x86_64 | (System Packages)

You can delete them, not needed anymore.

I think, your Problem has nothing to do with the wifi.

Thanks, removed.

sudo zypper rm rtl8821ce-ueficert rtw88-ueficert

And yes you are likely correct that it is not related to the wifi, though that is one area where I know there is a material change for 15.4. I never once saw mce errors in the year this machine ran 15.3

So the question becomes a slightly different one - please can anyone help me decode these mce errors? I note “mcelog” is limited to Intel chips and so useless here, is there an AMD equivalent?

Possibly this thread would better re-located under “Hardware”?

Cheers
Richard