soft lockup during boot with 6.0.3 kernel

edit: i just noticed my problem is a duplicate of this post: https://forums.opensuse.org/showthread.php/577225-Kernel-6-0-3-1-hangs-after-SMART-but-6-0-2-1-boots-fine

I updated my system (Tumbleweed) yesterday via: zypper dup. As i tried to boot the system today i got stuck with strange watchdog soft lock errors, right after “Starting Security Auditing Service…”. Like those:

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks....
rcu: blocking rcu_node structures (internal RCU debug):
watchdog: BUG: soft lockup - CPU#2 stuck for 22s!..

i immediately restarted the system and selected the older kernel (6.0.2-1) and had no problems. Until now i did not tried the other kernel (6.0.3) again to rule out that not the kernel is the problem but my hardware is starting to fail.

Does anyone has a clue what is the problem here?

my cpu: Intel Core i7-6850k

https://imgur.com/a/iAW8MxW

https://imgur.com/a/iAW8MxW

Hi
Have a read here: https://www.suse.com/support/kb/doc/?id=000018705

ok, i guess a change in the watchdog_threshold file in /proc would get lost after reboot right?

Hi
Setup a sysctl file containing your setting, for example;


/sbin/sysctl kernel.watchdog_thresh


/sbin/sysctl -w kernel.watchdog_thresh=15
kernel.watchdog_thresh = 15


cat /proc/sys/kernel/watchdog_thresh 
15


vi /etc/sysctl.d/98-watchdog.conf


kernel.watchdog_thresh=<your_value>

thanks i will try that. But do you know if a higher threshold could have any negative impacts? Like performance wise? And also, do you think the problem could also resolve over time by getting never kernel updates?

I had seen those but dismissed as being a temporary hardware issue, as I switched cables from iGPU to nvidia then back again, and the problem was gone.
Incidentally, an issue prevents grub from remembering the default kernel to boot, for some reason it always boots 6.0.1 which might be why the problem doesn’t reproduce now. I can ask to boot once to a specific kernel, just not permanently (I should create a thread or bug for this)

If I read this kb correctly, it would be a way to hide the message instead of fixing the issue. In any case, the default 10 second seems a lot of time to be stuck booting the system. It might have to do with the messages prior to that one. Thoughts?

$ systemd-analyze
Startup finished in 8.296s (firmware) + 1.982s (loader) + 774ms (kernel) + 2.490s (initrd) + 3.132s (userspace) = 16.675s
graphical.target reached after 3.080s in userspace.

I increased

kernel.watchdog_thresh

and it didn’t seem to fix the issue. I upgraded to 6.0.5 and it boots OK. I’m still keeping a 6.0.2 version as a backup.