BUG: workqueue lockup

Sporadically there are errors like that:

019-11-01T14:41:21.930340+01:00 linux-2kgy kernel:    62.531947] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 54s!
2019-11-01T14:41:21.930351+01:00 linux-2kgy kernel:    62.531968] Showing busy workqueues and worker pools:
2019-11-01T14:41:21.930352+01:00 linux-2kgy kernel:    62.531969] workqueue events: flags=0x0
2019-11-01T14:41:21.930352+01:00 linux-2kgy kernel:    62.531970]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=5/256
2019-11-01T14:41:21.930354+01:00 linux-2kgy kernel:    62.531972]     in-flight: 157:snd_hdac_bus_process_unsol_events [snd_hda_core] snd_hdac_bus_process_unsol_events [snd_h
da_core]
2019-11-01T14:41:21.930364+01:00 linux-2kgy kernel:    62.531993]     pending: mei_timer [mei], push_to_pool, check_corruption
2019-11-01T14:41:21.930364+01:00 linux-2kgy kernel:    62.532016]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
2019-11-01T14:41:21.930365+01:00 linux-2kgy kernel:    62.532018]     pending: vmstat_shepherd
2019-11-01T14:41:21.930374+01:00 linux-2kgy kernel:    62.532024] workqueue events_freezable: flags=0x4
2019-11-01T14:41:21.930375+01:00 linux-2kgy kernel:    62.532026]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
2019-11-01T14:41:21.930375+01:00 linux-2kgy kernel:    62.532027]     pending: pci_pme_list_scan
2019-11-01T14:41:21.930376+01:00 linux-2kgy kernel:    62.532030] workqueue events_power_efficient: flags=0x80
2019-11-01T14:41:21.930377+01:00 linux-2kgy kernel:    62.532031]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
2019-11-01T14:41:21.930377+01:00 linux-2kgy kernel:    62.532032]     pending: fb_flashcursor, neigh_periodic_work
2019-11-01T14:41:21.930378+01:00 linux-2kgy kernel:    62.532037] workqueue mm_percpu_wq: flags=0x8
2019-11-01T14:41:21.930378+01:00 linux-2kgy kernel:    62.532038]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
2019-11-01T14:41:21.930379+01:00 linux-2kgy kernel:    62.532039]     pending: vmstat_update
2019-11-01T14:41:21.930379+01:00 linux-2kgy kernel:    62.532042] workqueue writeback: flags=0x4e
2019-11-01T14:41:21.930380+01:00 linux-2kgy kernel:    62.532043]   pwq 16: cpus=0-7 flags=0x4 nice=0 active=1/256
2019-11-01T14:41:21.930380+01:00 linux-2kgy kernel:    62.532044]     pending: wb_workfn
2019-11-01T14:41:21.930381+01:00 linux-2kgy kernel:    62.532059] workqueue e1000e: flags=0x8
2019-11-01T14:41:21.930385+01:00 linux-2kgy kernel:    62.532061]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
2019-11-01T14:41:21.930386+01:00 linux-2kgy kernel:    62.532062]     pending: e1000_watchdog_task [e1000e]
2019-11-01T14:41:21.930393+01:00 linux-2kgy kernel:    62.532072] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=54s workers=7 idle: 19 220 221 222 234 224
2019-11-01T14:41:28.202535+01:00 linux-2kgy systemd[1]: systemd-localed.service: Succeeded.
2019-11-01T14:41:52.650330+01:00 linux-2kgy kernel:    93.251273] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 85s!
2019-11-01T14:41:52.650342+01:00 linux-2kgy kernel:    93.251282] Showing busy workqueues and worker pools:
2019-11-01T14:41:52.650343+01:00 linux-2kgy kernel:    93.251283] workqueue events: flags=0x0
2019-11-01T14:41:52.650343+01:00 linux-2kgy kernel:    93.251284]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=5/256
2019-11-01T14:41:52.650344+01:00 linux-2kgy kernel:    93.251286]     in-flight: 157:snd_hdac_bus_process_unsol_events [snd_hda_core] snd_hdac_bus_process_unsol_events [snd_h
da_core]
2019-11-01T14:41:52.650346+01:00 linux-2kgy kernel:    93.251293]     pending: mei_timer [mei], push_to_pool, check_corruption
2019-11-01T14:41:52.650346+01:00 linux-2kgy kernel:    93.251300]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
2019-11-01T14:41:52.650347+01:00 linux-2kgy kernel:    93.251301]     pending: vmstat_shepherd
2019-11-01T14:41:52.650381+01:00 linux-2kgy kernel:    93.251305] workqueue events_freezable: flags=0x4
2019-11-01T14:41:52.650384+01:00 linux-2kgy kernel:    93.251306]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
2019-11-01T14:41:52.650385+01:00 linux-2kgy kernel:    93.251307]     pending: pci_pme_list_scan
2019-11-01T14:41:52.650385+01:00 linux-2kgy kernel:    93.251310] workqueue events_power_efficient: flags=0x80
2019-11-01T14:41:52.650386+01:00 linux-2kgy kernel:    93.251311]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
2019-11-01T14:41:52.650386+01:00 linux-2kgy kernel:    93.251312]     pending: fb_flashcursor, neigh_periodic_work
2019-11-01T14:41:52.650387+01:00 linux-2kgy kernel:    93.251316] workqueue mm_percpu_wq: flags=0x8
2019-11-01T14:41:52.650387+01:00 linux-2kgy kernel:    93.251317]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
2019-11-01T14:41:52.650387+01:00 linux-2kgy kernel:    93.251318]     pending: vmstat_update
2019-11-01T14:41:52.650388+01:00 linux-2kgy kernel:    93.251322] workqueue writeback: flags=0x4e
2019-11-01T14:41:52.650388+01:00 linux-2kgy kernel:    93.251323]   pwq 16: cpus=0-7 flags=0x4 nice=0 active=1/256
2019-11-01T14:41:52.650389+01:00 linux-2kgy kernel:    93.251327]     pending: wb_workfn
2019-11-01T14:41:52.650407+01:00 linux-2kgy kernel:    93.251343] workqueue e1000e: flags=0x8
2019-11-01T14:41:52.650407+01:00 linux-2kgy kernel:    93.251344]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
2019-11-01T14:41:52.650408+01:00 linux-2kgy kernel:    93.251345]     pending: e1000_watchdog_task [e1000e]
2019-11-01T14:41:52.650408+01:00 linux-2kgy kernel:    93.251353] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=85s workers=7 idle: 19 220 221 222 234 224


Top shows a kworker process at 100% CPU. And you can’t shutdown the system properly anymore.
After a reboot it is usually gone.
How do I interpret this ? Is this a problem with snd_hda_core ? Is there anything that I can do to debug this further ? Or is this enough to open a report a bugzilla.opensuse.org ?

Which version of openSUSE and the kernel are you running?
If it is kernel 5.3.* it could be:
https://bugzilla.kernel.org/show_bug.cgi?id=203317

It’s uptodate tumbleweed with kernel 5.3.7. Without ext4 filesystem and/or i815 driver it can’t be https://bugzilla.kernel.org/show_bug.cgi?id=203317

Hi
I’m running intel gpu and sound as well as nvidia, filesystem btrfs/xfs and don’t see any bugs/lockups here. Could be you particular cpu/gpu.

Can you show the output from;


inxi -GASzz

System:    Host: grover Kernel: 5.3.7-1-default x86_64 bits: 64 Desktop: Gnome 3.34.1 Distro: openSUSE Tumbleweed 20191101 
Graphics:  Device-1: Intel Xeon E3-1200 v2/3rd Gen Core processor Graphics driver: i915 v: kernel 
           Device-2: NVIDIA GK208B [GeForce GT 710] driver: nvidia v: 440.26 
           Device-3: NVIDIA GK208B [GeForce GT 710] driver: vfio-pci v: 0.2 
           Display: x11 server: X.Org 1.20.5 driver: modesetting,nouveau unloaded: fbdev,vesa resolution: 1920x1080~60Hz 
           OpenGL: renderer: Mesa DRI Intel Ivybridge Server v: 4.2 Mesa 19.2.1 
Audio:     Device-1: Intel 7 Series/C216 Family High Definition Audio driver: snd_hda_intel 
           Device-2: NVIDIA GK208 HDMI/DP Audio driver: snd_hda_intel 
           Device-3: NVIDIA GK208 HDMI/DP Audio driver: vfio-pci 
           Sound Server: ALSA v: k5.3.7-1-default 

 inxi -GASzz
System:    Host: linux-2kgy Kernel: 5.3.7-1-default x86_64 bits: 64 Console: tty 0 Distro: openSUSE Tumbleweed 20191101 
Graphics:  Device-1: NVIDIA GM206 [GeForce GTX 960] driver: nvidia v: 440.26 
           Display: server: X.Org 1.20.5 driver: nvidia unloaded: fbdev,modesetting,nouveau,vesa resolution: 3840x2160~60Hz 
           OpenGL: renderer: GeForce GTX 960/PCIe/SSE2 v: 4.6.0 NVIDIA 440.26 
Audio:     Device-1: Intel 100 Series/C230 Series Family HD Audio driver: snd_hda_intel 
           Device-2: NVIDIA GM206 High Definition Audio driver: snd_hda_intel 
           Sound Server: ALSA v: k5.3.7-1-default 


The system is installed completely on one btrfs LV:


mount
....
/dev/mapper/system--nvme-root_factory on / type btrfs (rw,relatime,ssd,space_cache,subvolid=5,subvol=/)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=18077)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
/dev/mapper/system--nvme-root_factory on /opt type btrfs (rw,relatime,ssd,space_cache,subvolid=263,subvol=/opt)
/dev/mapper/system--nvme-root_factory on /boot/grub2/x86_64-efi type btrfs (rw,relatime,ssd,space_cache,subvolid=264,subvol=/boot/grub2/x86_64-efi)
/dev/mapper/system--nvme-root_factory on /root type btrfs (rw,relatime,ssd,space_cache,subvolid=262,subvol=/root)
/dev/mapper/system--nvme-root_factory on /boot/grub2/i386-pc type btrfs (rw,relatime,ssd,space_cache,subvolid=265,subvol=/boot/grub2/i386-pc)
/dev/mapper/system--nvme-root_factory on /srv type btrfs (rw,relatime,ssd,space_cache,subvolid=261,subvol=/srv)
/dev/mapper/system--nvme-root_factory on /home type btrfs (rw,relatime,ssd,space_cache,subvolid=257,subvol=/home)
/dev/mapper/system--nvme-root_factory on /tmp type btrfs (rw,relatime,ssd,space_cache,subvolid=260,subvol=/tmp)
/dev/mapper/system--nvme-root_factory on /usr/local type btrfs (rw,relatime,ssd,space_cache,subvolid=259,subvol=/usr/local)
/dev/mapper/system--nvme-root_factory on /var type btrfs (rw,relatime,ssd,space_cache,subvolid=258,subvol=/var)
/dev/nvme0n1p2 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
...

Hi
So it’s an nvme device, what brand? I would suggest a bug report might be in order…

openSUSE:Submitting bug reports - openSUSE

I opened https://bugzilla.opensuse.org/show_bug.cgi?id=1155836
The SSD is a Samsung SSD 950 PRO 512GB and the filesystem seems to be OK:

2019-11-01T14:41:12.111858+01:00 linux-2kgy btrfs-scrub.sh[1915]: scrub device /dev/mapper/system--nvme-root_factory (id 1) done
2019-11-01T14:41:12.111891+01:00 linux-2kgy btrfs-scrub.sh[1915]: Scrub started:    Fri Nov  1 14:40:57 2019
2019-11-01T14:41:12.111933+01:00 linux-2kgy btrfs-scrub.sh[1915]: Status:           finished
2019-11-01T14:41:12.111955+01:00 linux-2kgy btrfs-scrub.sh[1915]: Duration:         0:00:15
2019-11-01T14:41:12.111976+01:00 linux-2kgy btrfs-scrub.sh[1915]: Total to scrub:   30.04GiB
2019-11-01T14:41:12.112028+01:00 linux-2kgy btrfs-scrub.sh[1915]: Rate:             1.56GiB/s
2019-11-01T14:41:12.112058+01:00 linux-2kgy btrfs-scrub.sh[1915]: Error summary:    no errors found


This problem seems to be related to the sound driver because

2019-11-01T14:41:21.930354+01:00 linux-2kgy kernel:    62.531972]     in-flight: 157:snd_hdac_bus_process_unsol_events [snd_hda_core] snd_hdac_bus_process_unsol_events [snd_hda_core]

And about 54s before the first lockup message the log shows:

2019-11-01T14:40:27.176299+01:00 linux-2kgy kernel:     6.596506] snd_hda_codec_ca0132 hdaudioC0D0: autoconfig for Recon3Di: line_outs=3 (0xb/0x11/0x10/0x0/0x0) type:line
2019-11-01T14:40:27.176301+01:00 linux-2kgy kernel:     6.596507] snd_hda_codec_ca0132 hdaudioC0D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
2019-11-01T14:40:27.176301+01:00 linux-2kgy kernel:     6.596509] snd_hda_codec_ca0132 hdaudioC0D0:    hp_outs=1 (0xf/0x0/0x0/0x0/0x0)
2019-11-01T14:40:27.176302+01:00 linux-2kgy kernel:     6.596510] snd_hda_codec_ca0132 hdaudioC0D0:    mono: mono_out=0x0
2019-11-01T14:40:27.176302+01:00 linux-2kgy kernel:     6.596511] snd_hda_codec_ca0132 hdaudioC0D0:    dig-out=0xc/0xd
2019-11-01T14:40:27.176302+01:00 linux-2kgy kernel:     6.596512] snd_hda_codec_ca0132 hdaudioC0D0:    inputs:
2019-11-01T14:40:27.176302+01:00 linux-2kgy kernel:     6.596513] snd_hda_codec_ca0132 hdaudioC0D0:      Mic=0x12
2019-11-01T14:40:27.176304+01:00 linux-2kgy kernel:     6.596515] snd_hda_codec_ca0132 hdaudioC0D0:      Line=0x13
2019-11-01T14:40:27.176305+01:00 linux-2kgy kernel:     6.688821] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
2019-11-01T14:40:27.176305+01:00 linux-2kgy kernel:     6.756911] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 40:8d:5c:e6:b1:4a
2019-11-01T14:40:27.176305+01:00 linux-2kgy kernel:     6.756913] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
2019-11-01T14:40:27.176306+01:00 linux-2kgy kernel:     6.756996] e1000e 0000:00:1f.6 eth0: MAC: 12, PHY: 12, PBA No: FFFFFF-0FF
2019-11-01T14:40:27.176306+01:00 linux-2kgy kernel:     6.801902] e1000e 0000:00:1f.6 enp0s31f6: renamed from eth0
2019-11-01T14:40:27.176306+01:00 linux-2kgy kernel:     7.043737] FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
2019-11-01T14:40:27.176308+01:00 linux-2kgy kernel:     7.047310] audit: type=1400 audit(1572615626.438:2): apparmor="STATUS" operation="profile_load" profile="unconfined" na
me="ping" pid=985 comm="apparmor_parser"
2019-11-01T14:40:27.176309+01:00 linux-2kgy kernel:     7.078124] audit: type=1400 audit(1572615626.470:3): apparmor="STATUS" operation="profile_load" profile="unconfined" na
me="ghostscript" pid=1011 comm="apparmor_parser"
2019-11-01T14:40:27.176309+01:00 linux-2kgy kernel:     7.078126] audit: type=1400 audit(1572615626.470:4): apparmor="STATUS" operation="profile_load" profile="unconfined" na
me="ghostscript///usr/bin/hpijs" pid=1011 comm="apparmor_parser"
2019-11-01T14:40:27.176310+01:00 linux-2kgy kernel:     7.184696] snd_hda_codec_ca0132 hdaudioC0D0: ca0132 DSP downloaded and running
2019-11-01T14:40:27.176310+01:00 linux-2kgy kernel:     7.238650] intel_rapl_common: Found RAPL domain package
2019-11-01T14:40:27.176310+01:00 linux-2kgy kernel:     7.238651] intel_rapl_common: Found RAPL domain core
2019-11-01T14:40:27.176312+01:00 linux-2kgy kernel:     7.238652] intel_rapl_common: Found RAPL domain dram
2019-11-01T14:40:27.176313+01:00 linux-2kgy kernel:     7.409911] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input21
2019-11-01T14:40:27.176313+01:00 linux-2kgy kernel:     7.409958] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input22
2019-11-01T14:40:27.176314+01:00 linux-2kgy kernel:     7.411197] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input23
2019-11-01T14:40:27.176314+01:00 linux-2kgy kernel:     7.411244] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input24
2019-11-01T14:40:27.176314+01:00 linux-2kgy kernel:     7.484662] input: HDA Intel PCH Mic as /devices/pci0000:00/0000:00:1f.3/sound/card0/input25
2019-11-01T14:40:27.176315+01:00 linux-2kgy kernel:     7.484703] input: HDA Intel PCH Line Out Surround as /devices/pci0000:00/0000:00:1f.3/sound/card0/input26
2019-11-01T14:40:27.176317+01:00 linux-2kgy kernel:     7.484741] input: HDA Intel PCH Line Out CLFE as /devices/pci0000:00/0000:00:1f.3/sound/card0/input27
2019-11-01T14:40:27.176317+01:00 linux-2kgy kernel:     7.484778] input: HDA Intel PCH Front Headphone as /devices/pci0000:00/0000:00:1f.3/sound/card0/input28