Tumbleweed boot fails rcu_preempt detected stalls on CPUs/tasks on Kernel 6.3.9-1-default

I have had this error now a couple of times on my fully updated Tumbleweed/KDE system. First time I rebooted and used kernel 6.3.7 which for several days has booted with no problems. Now switched back to 6.3.9 and get

Jul 04 09:34:15 Tumbleweed kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Jul 04 09:34:15 Tumbleweed kernel: rcu:         Tasks blocked on level-1 rcu_node (CPUs 0-15): P1082/1:b.el
Jul 04 09:34:15 Tumbleweed kernel: rcu:         (detected by 6, t=15002 jiffies, g=433, q=25625 ncpus=8)
Jul 04 09:34:15 Tumbleweed kernel: task:irqbalance      state:D stack:0     pid:1082  ppid:1      flags:0x00000002
Jul 04 09:34:15 Tumbleweed kernel: Call Trace:
Jul 04 09:34:15 Tumbleweed kernel:  <TASK>
Jul 04 09:34:15 Tumbleweed kernel:  __schedule+0x439/0x1490
Jul 04 09:34:15 Tumbleweed kernel:  ? update_load_avg+0x7e/0x780
Jul 04 09:34:15 Tumbleweed kernel:  schedule+0x5e/0xd0
Jul 04 09:34:15 Tumbleweed kernel:  schedule_preempt_disabled+0x15/0x30
Jul 04 09:34:15 Tumbleweed kernel:  __mutex_lock.constprop.0+0x403/0x710
Jul 04 09:34:15 Tumbleweed kernel:  ? check_preempt_curr+0x61/0x70
Jul 04 09:34:15 Tumbleweed kernel:  synchronize_rcu_expedited+0x432/0x740
Jul 04 09:34:15 Tumbleweed kernel:  ? xas_load+0xe/0x50
Jul 04 09:34:15 Tumbleweed kernel:  ? wake_up_q+0x4e/0x90
Jul 04 09:34:15 Tumbleweed kernel:  ? rwsem_wake.isra.0+0x69/0x90
Jul 04 09:34:15 Tumbleweed kernel:  namespace_unlock+0xd2/0x1a0
Jul 04 09:34:15 Tumbleweed kernel:  put_mnt_ns+0x6d/0x90
Jul 04 09:34:15 Tumbleweed kernel:  free_nsproxy+0x1b/0x1b0
Jul 04 09:34:15 Tumbleweed kernel:  do_exit+0x334/0xa70
Jul 04 09:34:15 Tumbleweed kernel:  make_task_dead+0x81/0x170
Jul 04 09:34:15 Tumbleweed kernel:  rewind_stack_and_make_dead+0x17/0x20
Jul 04 09:34:15 Tumbleweed kernel: RIP: 0033:0x7ffa73a3f091
Jul 04 09:34:15 Tumbleweed kernel: RSP: 002b:00007fff176c0478 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Jul 04 09:34:15 Tumbleweed kernel: RAX: ffffffffffffffda RBX: 000055831cc232a0 RCX: 00007ffa73a3f091
Jul 04 09:34:15 Tumbleweed kernel: RDX: 0000000000000400 RSI: 000055831cc23500 RDI: 0000000000000003
Jul 04 09:34:15 Tumbleweed kernel: RBP: 00007ffa73b20660 R08: 0000000000000008 R09: 0000000000000001
Jul 04 09:34:15 Tumbleweed kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000002
Jul 04 09:34:15 Tumbleweed kernel: R13: 0000000000000a68 R14: 00007ffa73b1fd60 R15: 0000000000000a68
Jul 04 09:34:15 Tumbleweed kernel:  </TASK>
Jul 04 09:34:15 Tumbleweed kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P1082 } 15134 jiffies s: 777 root: 0x1/.
Jul 04 09:34:15 Tumbleweed kernel: rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x0/T

The issue is intermittent as I have now booted 6.3.9 fine again.

Can I have some advice as to how to find the cause please?

Stuart

What is process 1082?

It appears to be irqbalance which does shows different output when the failure happens.

Jul 04 09:33:15 Tumbleweed kernel: BUG: kernel NULL pointer dereference, address: 000000000000005a
Jul 04 09:33:15 Tumbleweed kernel: #PF: supervisor read access in kernel mode
Jul 04 09:33:15 Tumbleweed kernel: #PF: error_code(0x0000) - not-present page
Jul 04 09:33:15 Tumbleweed kernel: PGD 0 P4D 0 
Jul 04 09:33:15 Tumbleweed kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jul 04 09:33:15 Tumbleweed kernel: CPU: 5 PID: 1082 Comm: irqbalance Not tainted 6.3.9-1-default #1 openSUSE Tumbleweed 4b767630dbc263131e96e89ef291fd4fd2951892
Jul 04 09:33:15 Tumbleweed kernel: Hardware name: Micro-Star International Co., Ltd MS-7B86/B450-A PRO MAX (MS-7B86), BIOS M.C0 02/03/2021
Jul 04 09:33:15 Tumbleweed kernel: RIP: 0010:show_interrupts+0x24c/0x340
Jul 04 09:33:15 Tumbleweed kernel: Code: 85 d2 74 0f 48 c7 c6 c1 1c fa 99 48 89 ef e8 db 0a 2c 00 49 8b 5c 24 70 48 85 db 74 29 48 8b 53 50 48 c7 c6 c7 1c fa 99 eb 0b <48> 8b 53 50 48 c7 c6 9c 49 fd 99 48 89 ef e8 b1 0a 2c 00 48 8b 5b
Jul 04 09:33:15 Tumbleweed kernel: RSP: 0018:ffffbcef01987ce8 EFLAGS: 00010006
Jul 04 09:33:15 Tumbleweed kernel: RAX: 0000000000000000 RBX: 000000000000000a RCX: ffff0a00ffffff04
Jul 04 09:33:15 Tumbleweed kernel: RDX: 0000000000001000 RSI: 0000000000000004 RDI: 0000000048eb01e0
Jul 04 09:33:15 Tumbleweed kernel: RBP: ffff947c603ab3c0 R08: 0000000000000004 R09: ffff947d48eb01e1
Jul 04 09:33:15 Tumbleweed kernel: R10: ffffffffffffffff R11: 0000000000000000 R12: ffff947c40180a00
Jul 04 09:33:15 Tumbleweed kernel: R13: 0000000000000246 R14: ffff947c40180aa4 R15: 0000000000000000
Jul 04 09:33:15 Tumbleweed kernel: FS:  00007ffa73890780(0000) GS:ffff9482e0080000(0000) knlGS:0000000000000000
Jul 04 09:33:15 Tumbleweed kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 04 09:33:15 Tumbleweed kernel: CR2: 000000000000005a CR3: 000000010972e000 CR4: 00000000003506e0
Jul 04 09:33:15 Tumbleweed kernel: Call Trace:
Jul 04 09:33:15 Tumbleweed kernel:  <TASK>
Jul 04 09:33:15 Tumbleweed kernel:  ? __die+0x23/0x70
Jul 04 09:33:15 Tumbleweed kernel:  ? page_fault_oops+0x14d/0x490
Jul 04 09:33:15 Tumbleweed kernel:  ? number+0x320/0x3b0
Jul 04 09:33:15 Tumbleweed kernel:  ? exc_page_fault+0x6e/0x150
Jul 04 09:33:15 Tumbleweed kernel:  ? asm_exc_page_fault+0x26/0x30
Jul 04 09:33:15 Tumbleweed kernel:  ? show_interrupts+0x24c/0x340
Jul 04 09:33:15 Tumbleweed kernel:  ? show_interrupts+0x25f/0x340
Jul 04 09:33:15 Tumbleweed kernel:  seq_read_iter+0x2af/0x480
Jul 04 09:33:15 Tumbleweed kernel:  proc_reg_read_iter+0x51/0x90
Jul 04 09:33:15 Tumbleweed kernel:  vfs_read+0x1f8/0x2d0
Jul 04 09:33:15 Tumbleweed kernel:  ksys_read+0x67/0xe0
Jul 04 09:33:15 Tumbleweed kernel:  do_syscall_64+0x60/0x90
Jul 04 09:33:15 Tumbleweed kernel:  ? kmem_cache_free+0x19/0x360
Jul 04 09:33:15 Tumbleweed kernel:  ? do_sys_openat2+0x81/0x150
Jul 04 09:33:15 Tumbleweed kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Jul 04 09:33:15 Tumbleweed kernel:  ? do_syscall_64+0x6c/0x90
Jul 04 09:33:15 Tumbleweed kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Jul 04 09:33:15 Tumbleweed kernel:  ? do_syscall_64+0x6c/0x90
Jul 04 09:33:15 Tumbleweed kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Jul 04 09:33:15 Tumbleweed kernel: RIP: 0033:0x7ffa73a3f091
Jul 04 09:33:15 Tumbleweed kernel: Code: 00 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 90 90 80 3d ed 2a 0f 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 48 83 ec 28 48 89 54
Jul 04 09:33:15 Tumbleweed kernel: RSP: 002b:00007fff176c0478 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Jul 04 09:33:15 Tumbleweed kernel: RAX: ffffffffffffffda RBX: 000055831cc232a0 RCX: 00007ffa73a3f091
Jul 04 09:33:15 Tumbleweed kernel: RDX: 0000000000000400 RSI: 000055831cc23500 RDI: 0000000000000003
Jul 04 09:33:15 Tumbleweed kernel: RBP: 00007ffa73b20660 R08: 0000000000000008 R09: 0000000000000001
Jul 04 09:33:15 Tumbleweed kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000002
Jul 04 09:33:15 Tumbleweed kernel: R13: 0000000000000a68 R14: 00007ffa73b1fd60 R15: 0000000000000a68
Jul 04 09:33:15 Tumbleweed kernel:  </TASK>
Jul 04 09:33:15 Tumbleweed kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat iwlmvm mac80211 snd_hda_codec_realtek snd_hda_codec_generic libarc4 ledtrig_audio snd_usb_audio(+) snd_hda_codec_hdmi uvcvideo btusb intel_rapl_msr btrtl snd_hda_intel intel_rapl_common btbcm videobuf2_vmalloc snd_usbmidi_lib uvc snd_intel_dspcfg videobuf2_memops btintel snd_intel_sdw_acpi edac_mce_amd btmtk videobuf2_v4l2 r8169 snd_rawmidi snd_hda_codec snd_seq_device bluetooth videodev iwlwifi snd_hda_core kvm_amd snd_hwdep snd_pcm videobuf2_common realtek snd_timer kvm cfg80211 mdio_devres mc pcspkr irqbypass snd wmi_bmof ecdh_generic efi_pstore k10temp soundcore i2c_piix4 libphy rfkill joydev tiny_power_button gpio_amdpt gpio_generic acpi_cpufreq button fuse configfs dmi_sysfs ip_tables x_tables ext4 mbcache jbd2 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 i2c_algo_bit drm_ttm_helper ttm iommu_v2 drm_buddy xhci_pci
Jul 04 09:33:15 Tumbleweed kernel:  gpu_sched xhci_pci_renesas drm_display_helper nvme xhci_hcd aesni_intel crypto_simd cryptd cec firewire_ohci nvme_core ccp usbcore rc_core sp5100_tco firewire_core sr_mod cdrom crc_itu_t video wmi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs
Jul 04 09:33:15 Tumbleweed kernel: CR2: 000000000000005a
Jul 04 09:33:15 Tumbleweed kernel: ---[ end trace 0000000000000000 ]---
Jul 04 09:33:15 Tumbleweed kernel: RIP: 0010:show_interrupts+0x24c/0x340
Jul 04 09:33:15 Tumbleweed kernel: Code: 85 d2 74 0f 48 c7 c6 c1 1c fa 99 48 89 ef e8 db 0a 2c 00 49 8b 5c 24 70 48 85 db 74 29 48 8b 53 50 48 c7 c6 c7 1c fa 99 eb 0b <48> 8b 53 50 48 c7 c6 9c 49 fd 99 48 89 ef e8 b1 0a 2c 00 48 8b 5b
Jul 04 09:33:15 Tumbleweed kernel: RSP: 0018:ffffbcef01987ce8 EFLAGS: 00010006
Jul 04 09:33:15 Tumbleweed kernel: RAX: 0000000000000000 RBX: 000000000000000a RCX: ffff0a00ffffff04
Jul 04 09:33:15 Tumbleweed kernel: RDX: 0000000000001000 RSI: 0000000000000004 RDI: 0000000048eb01e0
Jul 04 09:33:15 Tumbleweed kernel: RBP: ffff947c603ab3c0 R08: 0000000000000004 R09: ffff947d48eb01e1
Jul 04 09:33:15 Tumbleweed kernel: R10: ffffffffffffffff R11: 0000000000000000 R12: ffff947c40180a00
Jul 04 09:33:15 Tumbleweed kernel: R13: 0000000000000246 R14: ffff947c40180aa4 R15: 0000000000000000
Jul 04 09:33:15 Tumbleweed kernel: FS:  00007ffa73890780(0000) GS:ffff9482e0080000(0000) knlGS:0000000000000000
Jul 04 09:33:15 Tumbleweed kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 04 09:33:15 Tumbleweed kernel: CR2: 000000000000005a CR3: 000000010972e000 CR4: 00000000003506e0
Jul 04 09:33:15 Tumbleweed kernel: note: irqbalance[1082] exited with irqs disabled
Jul 04 09:33:15 Tumbleweed kernel: note: irqbalance[1082] exited with preempt_count 1
Jul 04 09:33:15 Tumbleweed kernel: ------------[ cut here ]------------
Jul 04 09:33:15 Tumbleweed kernel: Voluntary context switch within RCU read-side critical section!
Jul 04 09:33:15 Tumbleweed kernel: WARNING: CPU: 5 PID: 1082 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x5e6/0x640
Jul 04 09:33:15 Tumbleweed kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat iwlmvm mac80211 snd_hda_codec_realtek snd_hda_codec_generic libarc4 ledtrig_audio snd_usb_audio(+) snd_hda_codec_hdmi uvcvideo btusb intel_rapl_msr btrtl snd_hda_intel intel_rapl_common btbcm videobuf2_vmalloc snd_usbmidi_lib uvc snd_intel_dspcfg videobuf2_memops btintel snd_intel_sdw_acpi edac_mce_amd btmtk videobuf2_v4l2 r8169 snd_rawmidi snd_hda_codec snd_seq_device bluetooth videodev iwlwifi snd_hda_core kvm_amd snd_hwdep snd_pcm videobuf2_common realtek snd_timer kvm cfg80211 mdio_devres mc pcspkr irqbypass snd wmi_bmof ecdh_generic efi_pstore k10temp soundcore i2c_piix4 libphy rfkill joydev tiny_power_button gpio_amdpt gpio_generic acpi_cpufreq button fuse configfs dmi_sysfs ip_tables x_tables ext4 mbcache jbd2 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 i2c_algo_bit drm_ttm_helper ttm iommu_v2 drm_buddy xhci_pci
Jul 04 09:33:15 Tumbleweed kernel:  gpu_sched xhci_pci_renesas drm_display_helper nvme xhci_hcd aesni_intel crypto_simd cryptd cec firewire_ohci nvme_core ccp usbcore rc_core sp5100_tco firewire_core sr_mod cdrom crc_itu_t video wmi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs
Jul 04 09:33:15 Tumbleweed kernel: CPU: 5 PID: 1082 Comm: irqbalance Tainted: G      D            6.3.9-1-default #1 openSUSE Tumbleweed 4b767630dbc263131e96e89ef291fd4fd2951892
Jul 04 09:33:15 Tumbleweed kernel: Hardware name: Micro-Star International Co., Ltd MS-7B86/B450-A PRO MAX (MS-7B86), BIOS M.C0 02/03/2021
Jul 04 09:33:15 Tumbleweed kernel: RIP: 0010:rcu_note_context_switch+0x5e6/0x640
Jul 04 09:33:15 Tumbleweed kernel: Code: 00 00 00 00 0f 85 31 fd ff ff 49 89 84 24 a0 00 00 00 e9 24 fd ff ff 48 c7 c7 88 86 01 9a c6 05 a3 ef f2 01 01 e8 3a af f4 ff <0f> 0b e9 6d fa ff ff c6 43 11 00 48 8b 73 20 ba 01 00 00 00 48 8b
Jul 04 09:33:15 Tumbleweed kernel: RSP: 0018:ffffbcef01987cf0 EFLAGS: 00010082
Jul 04 09:33:15 Tumbleweed kernel: RAX: 0000000000000000 RBX: ffff9482e00bab40 RCX: 0000000000000027
Jul 04 09:33:15 Tumbleweed kernel: RDX: ffff9482e00a74c8 RSI: 0000000000000001 RDI: ffff9482e00a74c0
Jul 04 09:33:15 Tumbleweed kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffbcef01987b98
Jul 04 09:33:15 Tumbleweed kernel: R10: 0000000000000003 R11: ffff9482fefbbee8 R12: ffff9482e00b9d00
Jul 04 09:33:15 Tumbleweed kernel: R13: ffff947cc2300000 R14: 0000000000000002 R15: 0000000000000000
Jul 04 09:33:15 Tumbleweed kernel: FS:  00007ffa73890780(0000) GS:ffff9482e0080000(0000) knlGS:0000000000000000
Jul 04 09:33:15 Tumbleweed kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 04 09:33:15 Tumbleweed kernel: CR2: 000000000000005a CR3: 00000003fea36000 CR4: 00000000003506e0
Jul 04 09:33:15 Tumbleweed kernel: Call Trace:
Jul 04 09:33:15 Tumbleweed kernel:  <TASK>
Jul 04 09:33:15 Tumbleweed kernel:  ? rcu_note_context_switch+0x5e6/0x640
Jul 04 09:33:15 Tumbleweed kernel:  ? __warn+0x81/0x130
Jul 04 09:33:15 Tumbleweed kernel:  ? rcu_note_context_switch+0x5e6/0x640
Jul 04 09:33:15 Tumbleweed kernel:  ? report_bug+0x171/0x1a0
Jul 04 09:33:15 Tumbleweed kernel:  ? handle_bug+0x3c/0x80
Jul 04 09:33:15 Tumbleweed kernel:  ? exc_invalid_op+0x17/0x70
Jul 04 09:33:15 Tumbleweed kernel:  ? asm_exc_invalid_op+0x1a/0x20
Jul 04 09:33:15 Tumbleweed kernel:  ? rcu_note_context_switch+0x5e6/0x640
Jul 04 09:33:15 Tumbleweed kernel:  __schedule+0xb0/0x1490
Jul 04 09:33:15 Tumbleweed kernel:  ? mt_destroy_walk.isra.0+0x2d0/0x330
Jul 04 09:33:15 Tumbleweed kernel:  ? kmem_cache_free+0x19/0x360
Jul 04 09:33:15 Tumbleweed kernel:  schedule+0x5e/0xd0
Jul 04 09:33:15 Tumbleweed kernel:  schedule_preempt_disabled+0x15/0x30
Jul 04 09:33:15 Tumbleweed kernel:  rwsem_down_write_slowpath+0x259/0x600
Jul 04 09:33:15 Tumbleweed kernel:  ? __slab_free+0xc4/0x300
Jul 04 09:33:15 Tumbleweed kernel:  ? mntput_no_expire+0x4a/0x250
Jul 04 09:33:15 Tumbleweed kernel:  down_write+0x5b/0x60
Jul 04 09:33:15 Tumbleweed kernel:  put_mnt_ns+0x38/0x90
Jul 04 09:33:15 Tumbleweed kernel:  free_nsproxy+0x1b/0x1b0
Jul 04 09:33:15 Tumbleweed kernel:  do_exit+0x334/0xa70
Jul 04 09:33:15 Tumbleweed kernel:  make_task_dead+0x81/0x170
Jul 04 09:33:15 Tumbleweed kernel:  rewind_stack_and_make_dead+0x17/0x20
Jul 04 09:33:15 Tumbleweed kernel: RIP: 0033:0x7ffa73a3f091
Jul 04 09:33:15 Tumbleweed kernel: Code: Unable to access opcode bytes at 0x7ffa73a3f067.
Jul 04 09:33:15 Tumbleweed kernel: RSP: 002b:00007fff176c0478 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Jul 04 09:33:15 Tumbleweed kernel: RAX: ffffffffffffffda RBX: 000055831cc232a0 RCX: 00007ffa73a3f091
Jul 04 09:33:15 Tumbleweed kernel: RDX: 0000000000000400 RSI: 000055831cc23500 RDI: 0000000000000003
Jul 04 09:33:15 Tumbleweed kernel: RBP: 00007ffa73b20660 R08: 0000000000000008 R09: 0000000000000001
Jul 04 09:33:15 Tumbleweed kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000002
Jul 04 09:33:15 Tumbleweed kernel: R13: 0000000000000a68 R14: 00007ffa73b1fd60 R15: 0000000000000a68
Jul 04 09:33:15 Tumbleweed kernel:  </TASK>
Jul 04 09:33:15 Tumbleweed kernel: ---[ end trace 0000000000000000 ]---

none of this shows when it boots OK.

Stuart

PS Just found Bug 1212833 in bugzilla which appears to be at least very similar.

Just to add link to bug at https://bugzilla.opensuse.org/show_bug.cgi?id=1212833

Stuart

Did you read the reply in the bug report ?

Takashi Iwai 2023-07-04 07:25:21 UTC

And, as TW is already moving to 6.4.x, it makes more sense to test with 6.4.x kernel at first.  (We won't commit bug fixes to 6.3.x any more.)
So, please test with the latest kernel in OBS Kernel:stable repo.

Yes I saw that and I may try tomorrow if I have time. Interestingly I only see this on my AMD desktop intermittently but not at all on my I7 laptop runnint the same TW/KDE system.

Stuart

1 Like