Snapper freezes Leap 15.1

Hello!

I occasionally (once every few days it seems) have an issue with (I think) snapper freezing my system. I cannot do anything when it freezes (switch TTY, move mouse, anything). It freezes for about 5 minutes typically. This is a laptop with an SSD and BTRFS as the main system partition for OpenSUSE. It is a dual boot but I don’t know if that affects anything.

I unfortunately don’t have any journal logs to share at the moment, but I will be sure to catch them when it happens next (I only had a time to look at the journal briefly last time it happened). In the meantime, how may I invoke it manually to coerce it to freeze or troubleshoot what is going wrong? What are common sources of this issue?

Here is some info:

# btrfs scrub start -Bd /
scrub device /dev/mapper/system-root (id 1) done
        scrub started at Wed Jun 10 15:00:25 2020 and finished after 00:03:26
        total bytes scrubbed: 68.57GiB with 0 errors
# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
        Model Number:       CT1000MX500SSD1                         
        Serial Number:      1818E139D4A9        
        Firmware Revision:  M3CR020 
        Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
        Used: unknown (minor revision code 0x006d) 
        Supported: 10 9 8 7 6 5 
        Likely used: 10
Configuration:
        Logical         max     current
        cylinders       16383   0
        heads           16      0
        sectors/track   63      0
        --
        LBA    user addressable sectors:   268435455
        LBA48  user addressable sectors:  1953525168
        Logical  Sector size:                   512 bytes
        Physical Sector size:                   512 bytes
        Logical Sector-0 offset:                  0 bytes
        device size with M = 1024*1024:      953869 MBytes
        device size with M = 1000*1000:     1000204 MBytes (1000 GB)
        cache/buffer size  = unknown
        Form Factor: 2.5 inch
        Nominal Media Rotation Rate: Solid State Device
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 1   Current = 1
        Advanced power management level: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4 
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
           *    Advanced Power Management feature set
           *    48-bit Address feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    WRITE_{DMA|MULTIPLE}_FUA_EXT
           *    64-bit World wide name
           *    WRITE_UNCORRECTABLE_EXT command
           *    {READ,WRITE}_DMA_EXT_GPL commands
           *    Segmented DOWNLOAD_MICROCODE
                unknown 119[8]
           *    Gen1 signaling speed (1.5Gb/s)
           *    Gen2 signaling speed (3.0Gb/s)
           *    Gen3 signaling speed (6.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Phy event counters
           *    READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
           *    DMA Setup Auto-Activate optimization
                Device-initiated interface power management
           *    Software settings preservation
                Device Sleep (DEVSLP)
           *    SMART Command Transport (SCT) feature set
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
           *    SANITIZE_ANTIFREEZE_LOCK_EXT command
           *    SANITIZE feature set
           *    CRYPTO_SCRAMBLE_EXT command
           *    BLOCK_ERASE_EXT command
           *    reserved 69[3]
           *    reserved 69[4]
           *    reserved 69[7]
           *    DOWNLOAD MICROCODE DMA command
           *    WRITE BUFFER DMA command
           *    READ BUFFER DMA command
           *    Data Set Management TRIM supported (limit 8 blocks)
Logical Unit WWN Device Identifier: 500a0751e139d4a9
        NAA             : 5
        IEEE OUI        : 00a075
        Unique ID       : 1e139d4a9
Device Sleep:
        DEVSLP Exit Timeout (DETO): 100 ms (drive)
        Minimum DEVSLP Assertion Time (MDAT): 10 ms (drive)
Checksum: correct

Thanks!

Be sure to look at the journal as soon as the system unfreezes next time it happens:

sudo journalctl -eb

And so we know which scheduler is in use:


grep '\.*\]' /sys/block/*/queue/scheduler

Do you notice a pattern? Like X minutes after system boot up, every X days, etc?

Will do! Thank you!

$ grep '\.*\]' /sys/block/*/queue/scheduler
/sys/block/loop0/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/loop1/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/loop2/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/loop3/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/loop4/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/loop5/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/loop6/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/loop7/queue/scheduler:mq-deadline kyber [bfq] none
/sys/block/sda/queue/scheduler:noop [deadline] cfq 
/sys/block/sdb/queue/scheduler:noop deadline [cfq]

I think every other day. Though it may be happening more frequently when I am not at my machine. I typically leave my computer on and plugged in all day.

Filter all activity related to btrfs maintenance:

erlangen:~ # journalctl -o short-monotonic -u btrfs*
-- Logs begin at Tue 2020-06-09 18:47:46 CEST, end at Thu 2020-06-11 13:11:55 CEST. --
[43217.541586] erlangen systemd[1]: btrfs-balance.timer: Succeeded.
[43217.541625] erlangen systemd[1]: Stopped Balance block groups on a btrfs filesystem.
[43217.541678] erlangen systemd[1]: btrfs-defrag.timer: Succeeded.
[43217.541721] erlangen systemd[1]: Stopped Defragment file data and/or directory metadata.
[43217.541769] erlangen systemd[1]: btrfs-scrub.timer: Succeeded.
[43217.541805] erlangen systemd[1]: Stopped Scrub btrfs filesystem, verify block checksums.
[43218.297180] erlangen systemd[1]: btrfsmaintenance-refresh.path: Succeeded.
[43218.304427] erlangen systemd[1]: Stopped Watch /etc/sysconfig/btrfsmaintenance.
-- Reboot --
    3.679188] erlangen systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
    4.145077] erlangen systemd[1]: Started Balance block groups on a btrfs filesystem.
    4.145118] erlangen systemd[1]: Started Defragment file data and/or directory metadata.
    4.145168] erlangen systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
[35915.017277] erlangen systemd[1]: btrfs-balance.timer: Succeeded.
[35915.017394] erlangen systemd[1]: Stopped Balance block groups on a btrfs filesystem.
[35915.017479] erlangen systemd[1]: btrfs-defrag.timer: Succeeded.
[35915.017522] erlangen systemd[1]: Stopped Defragment file data and/or directory metadata.
[35915.017563] erlangen systemd[1]: btrfs-scrub.timer: Succeeded.
[35915.017652] erlangen systemd[1]: Stopped Scrub btrfs filesystem, verify block checksums.
[35917.414179] erlangen systemd[1]: btrfsmaintenance-refresh.path: Succeeded.
[35917.421175] erlangen systemd[1]: Stopped Watch /etc/sysconfig/btrfsmaintenance.
-- Reboot --
    4.184560] erlangen systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
    4.729975] erlangen systemd[1]: Started Balance block groups on a btrfs filesystem.
    4.730678] erlangen systemd[1]: Started Defragment file data and/or directory metadata.
    4.731326] erlangen systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
[25824.258050] erlangen systemd[1]: btrfs-balance.timer: Succeeded.
[25824.258122] erlangen systemd[1]: Stopped Balance block groups on a btrfs filesystem.
[25824.258191] erlangen systemd[1]: btrfs-defrag.timer: Succeeded.
[25824.258258] erlangen systemd[1]: Stopped Defragment file data and/or directory metadata.
[25824.258325] erlangen systemd[1]: btrfs-scrub.timer: Succeeded.
[25824.258391] erlangen systemd[1]: Stopped Scrub btrfs filesystem, verify block checksums.
[25827.497249] erlangen systemd[1]: btrfsmaintenance-refresh.path: Succeeded.
[25827.512614] erlangen systemd[1]: Stopped Watch /etc/sysconfig/btrfsmaintenance.
-- Reboot --
    3.658503] erlangen systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
    4.052803] erlangen systemd[1]: Started Balance block groups on a btrfs filesystem.
    4.053076] erlangen systemd[1]: Started Defragment file data and/or directory metadata.
    4.053832] erlangen systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
  223.829297] erlangen systemd[1]: btrfs-balance.timer: Succeeded.
  223.829339] erlangen systemd[1]: Stopped Balance block groups on a btrfs filesystem.
  223.829389] erlangen systemd[1]: btrfs-defrag.timer: Succeeded.
  223.829434] erlangen systemd[1]: Stopped Defragment file data and/or directory metadata.
  223.829480] erlangen systemd[1]: btrfs-scrub.timer: Succeeded.
  223.829530] erlangen systemd[1]: Stopped Scrub btrfs filesystem, verify block checksums.
  226.016180] erlangen systemd[1]: btrfsmaintenance-refresh.path: Succeeded.
  226.028786] erlangen systemd[1]: Stopped Watch /etc/sysconfig/btrfsmaintenance.
-- Reboot --
    3.630321] erlangen systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
    4.088465] erlangen systemd[1]: Started Balance block groups on a btrfs filesystem.
    4.088544] erlangen systemd[1]: Started Defragment file data and/or directory metadata.
    4.088583] erlangen systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
erlangen:~ # 

If too long for posting here use susepaste.

The recommended command (journalctl -eb) displays the last 1000 lines of the current boot.

I’d recommend instead the following in a windowed console, and just leave it open on your Desktop which displays your system events in real time.

journalctl -f

I suspect that snapper is not your problem since by default snapper snapshots your system on bootup, shutdown and whenever a package is installed or removed.

The more common cause of a system freezing intermittently is file system indexing.
What Desktop are you running?
I’d recommend disabling whatever file index app your Desktop is using if it’s causing problems. Your file lookups might take a bit longer but should not cause your system to become unresponsive.
The more elegant solution would be to modify the priority of the indexing app… In the old days you’d modify the “nice” value but today might be managed or set by a systemd service or value instead.

Another possibility but shouldn’t happen very often is if your Desktop has an auto update applet.
You can disable that and manually update instead.

TSU

It hasn’t happened today, but here you go:

# journalctl -o short-monotonic -u btrfs*
-- Logs begin at Thu 2020-06-11 17:10:01 EDT, end at Thu 2020-06-11 19:40:03 EDT. --
    9.224805] mycomputer systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
    9.307369] mycomputer systemd[1]: Starting Update cron periods from /etc/sysconfig/btrfsmaintenance...
    9.328079] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh script btrfs-scrub.sh for uninstall
    9.353114] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh script btrfs-defrag.sh for uninstall
    9.363776] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh script btrfs-balance.sh for uninstall
    9.374556] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh script btrfs-trim.sh for uninstall
    9.380668] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh timer btrfs-scrub for monthly
   10.561184] mycomputer systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
   10.562227] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh timer btrfs-defrag for none
   10.564805] mycomputer systemd[1]: Started Balance block groups on a btrfs filesystem.
   11.303647] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh timer btrfs-balance for weekly
   11.510640] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh timer btrfs-trim for none
   11.842651] mycomputer systemd[1]: Started Update cron periods from /etc/sysconfig/btrfsmaintenance.

I will start doing that!

KDE Plasma 5.12.8
KDE Frameworks 5.55.0

The only reason why I suspected snapper was that I saw it had run when I hastily checked logs after the last time it failed. I want not managing packages then though.

I did not think about that! Baloo usually crashes when I login, and I just have never cared I guess. When I open the file indexer monitor GUI, it says it is not running. I will try to disable it entirely. I ran balooctl disable as my user and root. Hopefully that does the trick.

OK, I just uninstalled plasma5-pk-updates, though I am not sure if that is the issue. It is set NOT to check for updates on battery power, but it froze on battery power once before (had been unplugged for a while).

Fingers-crossed it either won’t happen again (indexer or updator was the issue) or we will see what it was!

OK, the problem has returned! But maybe a different problem, I don’t know. Here is the sequence of events:

  1. I walk up to my computer (plugged into power). KDE is locked. I unplug it and move it to another location and plug in power, Ethernet, a USB keyboard, a USB mouse, and two monitors (VGA and mini-DP). I do this frequently and usually without issue.
  2. It works for like 15 seconds and then begins to slow a bit. The keyboard stops working (both built-in and USB). Mouse and interacting with apps work, but not the taskbar/panel. I first thought it was just global shortcuts that weren’t working, so I switched to tty1 (this worked) and killed plasma shell.
  3. When I returned to X’s tty, now I realize the keyboard isn’t working at all once I open a Konsole and am unable to type anything.
  4. Pretty soon clicking on anything stops working.
  5. I return to tty1 and try to login again and it freezes after typing my username, but it lets me switch back to tty7.
  6. And now I wait. Eventually all the keys and clicks I pressed happen and I have a terminal with a lot of random gunk typed in.
  7. I eventually unplugged all the peripherals from my computer.

Here is the systemd log. I unplugged my computer initially right around 17:00. https://paste.opensuse.org/60cb9701

One thing that I guess stands out in the logs is the CPU issue. My fans did not ramp up as they normally do under load, so I am not sure if it was under load. I am not sure anymore if it is accurate/what it means, as I have seen this issue on every distro I have installed on here since I first bought this machine. I get this warning 100% of the time when the computer boots up. I suppose I can re-seat the CPU and reapply the heatsink if I need to (I am able to do all of that with this machine).

There’s a few failures, time outs, crashes in the log. I’d:

  • Disable kscreen (if not guilty, at least remove some noise from the logs)
  • Uninstall kdeconnect-kde (disabling in system tray it’s not enough, you can reinstall later ofc)
  • Create a new user, or reset current user’s Plasma profile