Results 1 to 5 of 5

Thread: NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]

  1. #1

    Default NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]

    I have had a crash.
    I couldn't access remotely, and after 4 days I could access to the computer, it was irresponsive, I did a hardware reboot and when I lloked at the logs I found It has beeen logging during the full 4 days.

    Code:
    2018-10-31T21:50:02.338301+01:00 tutatis systemd[10365]: Stopped target Basic System.
    2018-10-31T21:50:02.338593+01:00 tutatis systemd[10365]: Stopped target Sockets.
    2018-10-31T21:50:02.338874+01:00 tutatis systemd[10365]: Stopped target Timers.
    2018-10-31T21:50:02.339190+01:00 tutatis systemd[10365]: Stopped target Paths.
    2018-10-31T21:50:02.348416+01:00 tutatis systemd[10365]: Received SIGRTMIN+24 from PID 10401 (kill).
    2018-10-31T21:50:02.351498+01:00 tutatis systemd: pam_unix(systemd-user:session): session closed for user wwwrun
    2018-10-31T21:50:02.353029+01:00 tutatis systemd[1]: Stopped User Manager for UID 30.
    2018-10-31T21:50:02.353899+01:00 tutatis systemd[1]: Removed slice User Slice of wwwrun.
    
    2018-10-31T21:50:05.696180+01:00 tutatis CRON[10348]: pam_unix(crond:session): session closed for user fperal
    2018-10-31T21:56:02.500682+01:00 tutatis kernel: [221711.648107] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]
    
    2018-10-31T21:56:02.500699+01:00 tutatis kernel: [221711.648113] Modules linked in: parport_pc lp cdc_ether usbnet r8152 st nls_iso8859_1 nls_cp437 vfat fat uas usb_storage joydev fuse ctr ccm af_packet iscsi_ibft iscsi_boot_sysfs hwmon_vid msr arc4 rt2800usb rt2x00usb rt2800lib rt2x00lib mac80211 cfg80211 crc_ccitt rfkill snd_hda_codec_hdmi snd_hda_codec_realtek kvm snd_hda_codec_generic snd_hda_intel snd_hda_codec irqbypass snd_hda_core snd_hwdep snd_pcm snd_seq snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng snd aesni_intel aes_x86_64 lrw r8169 soundcore gf128mul glue_helper mii ablk_helper ppdev shpchp cryptd i2c_piix4 acpi_cpufreq fjes parport processor pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ext4 crc16 jbd2 mbcache raid1 raid10 raid0 md_mod sr_mod cdrom sd_mod hid_generic usbhid nouveau(O) crc32c_intel video mxm_wmi i2c_algo_bit drm_kms_helper(O) syscopyarea sysfillrect sysimgblt fb_sys_fops ttm(O) xhci_pci xhci_hcd drm(O) r8168(O) ahci usbcore libahci usb_common wmi button ata_generic pata_atiixp libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 [last unloaded: parport_pc]
    2018-10-31T21:56:02.500701+01:00 tutatis kernel: [221711.648208] CPU: 7 PID: 10435 Comm: mysqld Tainted: G       O 4.4.155-68-default #1
    2018-10-31T21:56:02.500702+01:00 tutatis kernel: [221711.648211] Hardware name: MSI MS-7A33/X370 GAMING PRO (MS-7A33), BIOS 4.00 03/20/2017
    2018-10-31T21:56:02.500703+01:00 tutatis kernel: [221711.648214] task: ffff8800d8678280 ti: ffff8800dc9c0000 task.ti: ffff8800dc9c0000
    2018-10-31T21:56:02.500704+01:00 tutatis kernel: [221711.648217] RIP: 0010:[<ffffffff81112875>]  [<ffffffff81112875>] smp_call_function_many+0x205/0x260
    2018-10-31T21:56:02.500705+01:00 tutatis kernel: [221711.648225] RSP: 0018:ffff8800dc9c3ce8  EFLAGS: 00000202
    2018-10-31T21:56:02.500706+01:00 tutatis kernel: [221711.648228] RAX: 0000000000000003 RBX: ffffffff810714e0 RCX: ffffe8ffffd82080
    2018-10-31T21:56:02.500721+01:00 tutatis kernel: [221711.648230] RDX: 000000000000000e RSI: 0000000000000010 RDI: ffff8801862de0a0
    2018-10-31T21:56:02.500722+01:00 tutatis kernel: [221711.648232] RBP: ffff8800dc9c3d30 R08: 0000000000004100 R09: 0000000000000000
    2018-10-31T21:56:02.500723+01:00 tutatis kernel: [221711.648234] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000010
    2018-10-31T21:56:02.500724+01:00 tutatis kernel: [221711.648236] R13: 0000000000000001 R14: 00000000000175c0 R15: ffff88041edd7600
    2018-10-31T21:56:02.500725+01:00 tutatis kernel: [221711.648240] FS:  00007f2f8bf24700(0000) GS:ffff88041edc0000(0000) knlGS:00000000f7541940
    2018-10-31T21:56:02.500726+01:00 tutatis kernel: [221711.648242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    2018-10-31T21:56:02.500727+01:00 tutatis kernel: [221711.648244] CR2: 00007fcdea9ab000 CR3: 0000000401d6e000 CR4: 00000000003406f0
    2018-10-31T21:56:02.500728+01:00 tutatis kernel: [221711.648246] Stack:
    2018-10-31T21:56:02.500729+01:00 tutatis kernel: [221711.648248]  000000018bf1f000 0000000000000007 00007f2f8bf1f000 00007f2f8bf1f000
    2018-10-31T21:56:02.500730+01:00 tutatis kernel: [221711.648251]  00007f2f8bf18000 ffff880407cc8040 ffff880407cc83b0 00007f2f8bf1f000
    2018-10-31T21:56:02.500731+01:00 tutatis kernel: [221711.648255]  ffffffff810719c9 ffff880407cc8040 00007f2f8bf18000 00007f2f8bf1f000
    2018-10-31T21:56:02.500732+01:00 tutatis kernel: [221711.648258] Call Trace:
    2018-10-31T21:56:02.500733+01:00 tutatis kernel: [221711.648274]  [<ffffffff810719c9>] native_flush_tlb_others+0x119/0x1a0
    2018-10-31T21:56:02.500734+01:00 tutatis kernel: [221711.648283]  [<ffffffff81071af6>] flush_tlb_mm_range+0xa6/0x190
    2018-10-31T21:56:02.500735+01:00 tutatis kernel: [221711.648291]  [<ffffffff811c87d6>] tlb_flush_mmu_tlbonly+0x66/0xc0
    2018-10-31T21:56:02.500736+01:00 tutatis kernel: [221711.648299]  [<ffffffff811c96bf>] tlb_finish_mmu+0xf/0x40
    2018-10-31T21:56:02.500737+01:00 tutatis kernel: [221711.648305]  [<ffffffff811cb511>] zap_page_range+0xc1/0xf0
    2018-10-31T21:56:02.500737+01:00 tutatis kernel: [221711.648313]  [<ffffffff811df97a>] SyS_madvise+0x35a/0x770
    2018-10-31T21:56:02.500738+01:00 tutatis kernel: [221711.648322]  [<ffffffff816455c7>] entry_SYSCALL_64_fastpath+0x22/0xbb
    2018-10-31T21:56:02.500739+01:00 tutatis kernel: [221711.651597] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x22/0xbb
    2018-10-31T21:56:02.500740+01:00 tutatis kernel: [221711.651598]
    2018-10-31T21:56:02.500741+01:00 tutatis kernel: [221711.651600] Leftover inexact backtrace:
    
    2018-10-31T21:56:02.500742+01:00 tutatis kernel: [221711.651600]
    2018-10-31T21:56:02.500743+01:00 tutatis kernel: [221711.651604] Code: 24 00 3b 05 b2 61 e5 00 89 c2 0f 8d 8f fe ff ff 48 98 49 8b 0f 48 03 0c c5 40 43 f5 81 8b 41 18 a8 01 74 09 f3 90 8b 41 18 a8 01 <75> f7 eb bd 0f b6 4c 24 04 48 83 c4 10 48 89 de 5b 48 89 ea 44
    2018-10-31T21:56:30.500664+01:00 tutatis kernel: [221739.648562] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]
    2018-10-31T21:56:30.500681+01:00 tutatis kernel: [221739.648568] Modules linked in: parport_pc lp cdc_ether usbnet r8152 st nls_iso8859_1 nls_cp437 vfat fat uas usb_storage joydev fuse ctr ccm af_packet iscsi_ibft iscsi_boot_sysfs hwmon_vid msr arc4 rt2800usb rt2x00usb rt2800lib rt2x00lib mac80211 cfg80211 crc_ccitt rfkill snd_hda_codec_hdmi snd_hda_codec_realtek kvm snd_hda_codec_generic snd_hda_intel snd_hda_codec irqbypass snd_hda_core snd_hwdep snd_pcm snd_seq snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng snd aesni_intel aes_x86_64 lrw r8169 soundcore gf128mul glue_helper mii ablk_helper ppdev shpchp cryptd i2c_piix4 acpi_cpufreq fjes parport processor pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ext4 crc16 jbd2 mbcache raid1 raid10 raid0 md_mod sr_mod cdrom sd_mod hid_generic usbhid nouveau(O) crc32c_intel video mxm_wmi i2c_algo_bit drm_kms_helper(O) syscopyarea sysfillrect sysimgblt fb_sys_fops ttm(O) xhci_pci xhci_hcd drm(O) r8168(O) ahci usbcore libahci usb_common wmi button ata_generic pat


    The problem begins at 2018-10-31T21:56:02 and ...... is related to mysql ?

    Code:
    
    
    Code:
    tutatis:~ # cat /var/log/messages |grep NMI
    2018-10-31T21:56:02.500682+01:00 tutatis kernel: [221711.648107] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]
    2018-10-31T21:56:30.500664+01:00 tutatis kernel: [221739.648562] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]
    2018-10-31T21:56:58.500672+01:00 tutatis kernel: [221767.649018] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]
    2018-10-31T21:57:26.500671+01:00 tutatis kernel: [221795.649474] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]
    2018-10-31T21:57:54.500664+01:00 tutatis kernel: [221823.649930] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:10435]
    
    
    


    have found documentation of similar errors, and this explanation
    https://www.suse.com/es-es/support/kb/doc/?id=7017652


    I don't know if it something related with mysqld or just a problem with the kernel and mysqld whas just running in thet moment.

    I have increase the watchdog threshols to 25:
    Code:
    tutatis:~ # sysctl -a |grep watchdog
    
    kernel.nmi_watchdog = 1
    
    kernel.soft_watchdog = 1
    
    kernel.watchdog = 1
    
    kernel.watchdog_cpumask = 0-15
    
    kernel.watchdog_thresh = 10
    
    tutatis:~ # sysctl -w kernel.watchdog_thresh=25
    
    kernel.watchdog_thresh = 25
    
    tutatis:~ #

    And I'm waiting... but I don't really know the cause of the error or if it will repeat or not.


    any help?
    thanks

  2. #2

    Default Re: NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]

    The hardware is


    AMD RYZEN 7 1700 ( I have checked, it is not one of the pre-25 week)
    G.Skill Ripjaws V Series F4-3200C16D-16GVR 16 GB (2x8GB) DDR4
    MSI X370 Gaming Pro

  3. #3

    Default Re: NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]

    It seems it is related to a problem with Ryzen processors and C6 states.

    I'm using this script

    And I'm testing two ways:

    First, I added a line in /etc/crontab

    Code:
    * * * * * root /usr/bin/stress-ng --cpu 2 --timeout 10
    It makes 2 of the cores of the CPU (it has 16) work intensely 10 seconds each minute (I know is a quick fix, if it works)

    Second, I tested the C6 states
    Code:
    tutatis:~/ZenStates-Linux # ./zenstates.py -l
    P0 - Enabled - FID = 78 - DID = 8 - VID = 3A - Ratio = 30.00 - vCore = 1.18750
    P1 - Enabled - FID = 87 - DID = A - VID = 50 - Ratio = 27.00 - vCore = 1.05000
    P2 - Enabled - FID = 7C - DID = 10 - VID = 6C - Ratio = 15.50 - vCore = 0.87500
    P3 - Disabled
    P4 - Disabled
    P5 - Disabled
    P6 - Disabled
    P7 - Disabled
    C6 State - Package - Enabled
    C6 State - Core - Enabled


    and then I disabled C6
    Code:
    tutatis:~/ZenStates-Linux # ./zenstates.py --c6-disable
    Disabling C6 state
    tutatis:~/ZenStates-Linux # ./zenstates.py -l
    P0 - Enabled - FID = 78 - DID = 8 - VID = 3A - Ratio = 30.00 - vCore = 1.18750
    P1 - Enabled - FID = 87 - DID = A - VID = 50 - Ratio = 27.00 - vCore = 1.05000
    P2 - Enabled - FID = 7C - DID = 10 - VID = 6C - Ratio = 15.50 - vCore = 0.87500
    P3 - Disabled
    P4 - Disabled
    P5 - Disabled
    P6 - Disabled
    P7 - Disabled
    C6 State - Package - Disabled
    C6 State - Core - Disabled
    tutatis:~/ZenStates-Linux #

    And I added in /etc/crontab
    Code:
    @reboot /usr/bin/python /root/ZenStates-Linux/zenstates.py --c6-disable &

    Let's see if it works

  4. #4

    Default Re: NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]

    It's working
    By the way. Maybe it's not Ryzen fault but PSU's fault.
    See this article.
    I'm using a Mars MPII750 Gaming.

  5. #5

    Default Re: NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [mysqld:10435]

    It has been working 24/7 almost 3 weeks now so I can say I found the solution.
    I remove the stress test in the cron and now it's only working with the C6 disabled.
    So I don't really now if it is a problem of the CPU, of the PSU, or both, but disabling C6 solves the problem.

    best regards

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •