I occasionally (once every few days it seems) have an issue with (I think) snapper freezing my system. I cannot do anything when it freezes (switch TTY, move mouse, anything). It freezes for about 5 minutes typically. This is a laptop with an SSD and BTRFS as the main system partition for OpenSUSE. It is a dual boot but I don’t know if that affects anything.
I unfortunately don’t have any journal logs to share at the moment, but I will be sure to catch them when it happens next (I only had a time to look at the journal briefly last time it happened). In the meantime, how may I invoke it manually to coerce it to freeze or troubleshoot what is going wrong? What are common sources of this issue?
Here is some info:
# btrfs scrub start -Bd /
scrub device /dev/mapper/system-root (id 1) done
scrub started at Wed Jun 10 15:00:25 2020 and finished after 00:03:26
total bytes scrubbed: 68.57GiB with 0 errors
# hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media
Model Number: CT1000MX500SSD1
Serial Number: 1818E139D4A9
Firmware Revision: M3CR020
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x006d)
Supported: 10 9 8 7 6 5
Likely used: 10
Configuration:
Logical max current
cylinders 16383 0
heads 16 0
sectors/track 63 0
--
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 1953525168
Logical Sector size: 512 bytes
Physical Sector size: 512 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 953869 MBytes
device size with M = 1000*1000: 1000204 MBytes (1000 GB)
cache/buffer size = unknown
Form Factor: 2.5 inch
Nominal Media Rotation Rate: Solid State Device
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 1 Current = 1
Advanced power management level: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
* Power Management feature set
* Write cache
* Look-ahead
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
* Advanced Power Management feature set
* 48-bit Address feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
unknown 119[8]
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Phy event counters
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Software settings preservation
Device Sleep (DEVSLP)
* SMART Command Transport (SCT) feature set
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
* SANITIZE_ANTIFREEZE_LOCK_EXT command
* SANITIZE feature set
* CRYPTO_SCRAMBLE_EXT command
* BLOCK_ERASE_EXT command
* reserved 69[3]
* reserved 69[4]
* reserved 69[7]
* DOWNLOAD MICROCODE DMA command
* WRITE BUFFER DMA command
* READ BUFFER DMA command
* Data Set Management TRIM supported (limit 8 blocks)
Logical Unit WWN Device Identifier: 500a0751e139d4a9
NAA : 5
IEEE OUI : 00a075
Unique ID : 1e139d4a9
Device Sleep:
DEVSLP Exit Timeout (DETO): 100 ms (drive)
Minimum DEVSLP Assertion Time (MDAT): 10 ms (drive)
Checksum: correct
I think every other day. Though it may be happening more frequently when I am not at my machine. I typically leave my computer on and plugged in all day.
The recommended command (journalctl -eb) displays the last 1000 lines of the current boot.
I’d recommend instead the following in a windowed console, and just leave it open on your Desktop which displays your system events in real time.
journalctl -f
I suspect that snapper is not your problem since by default snapper snapshots your system on bootup, shutdown and whenever a package is installed or removed.
The more common cause of a system freezing intermittently is file system indexing.
What Desktop are you running?
I’d recommend disabling whatever file index app your Desktop is using if it’s causing problems. Your file lookups might take a bit longer but should not cause your system to become unresponsive.
The more elegant solution would be to modify the priority of the indexing app… In the old days you’d modify the “nice” value but today might be managed or set by a systemd service or value instead.
Another possibility but shouldn’t happen very often is if your Desktop has an auto update applet.
You can disable that and manually update instead.
# journalctl -o short-monotonic -u btrfs*
-- Logs begin at Thu 2020-06-11 17:10:01 EDT, end at Thu 2020-06-11 19:40:03 EDT. --
9.224805] mycomputer systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
9.307369] mycomputer systemd[1]: Starting Update cron periods from /etc/sysconfig/btrfsmaintenance...
9.328079] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh script btrfs-scrub.sh for uninstall
9.353114] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh script btrfs-defrag.sh for uninstall
9.363776] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh script btrfs-balance.sh for uninstall
9.374556] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh script btrfs-trim.sh for uninstall
9.380668] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh timer btrfs-scrub for monthly
10.561184] mycomputer systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
10.562227] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh timer btrfs-defrag for none
10.564805] mycomputer systemd[1]: Started Balance block groups on a btrfs filesystem.
11.303647] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh timer btrfs-balance for weekly
11.510640] mycomputer btrfsmaintenance-refresh-cron.sh[1464]: Refresh timer btrfs-trim for none
11.842651] mycomputer systemd[1]: Started Update cron periods from /etc/sysconfig/btrfsmaintenance.
I will start doing that!
KDE Plasma 5.12.8
KDE Frameworks 5.55.0
The only reason why I suspected snapper was that I saw it had run when I hastily checked logs after the last time it failed. I want not managing packages then though.
I did not think about that! Baloo usually crashes when I login, and I just have never cared I guess. When I open the file indexer monitor GUI, it says it is not running. I will try to disable it entirely. I ran balooctl disable as my user and root. Hopefully that does the trick.
OK, I just uninstalled plasma5-pk-updates, though I am not sure if that is the issue. It is set NOT to check for updates on battery power, but it froze on battery power once before (had been unplugged for a while).
Fingers-crossed it either won’t happen again (indexer or updator was the issue) or we will see what it was!
OK, the problem has returned! But maybe a different problem, I don’t know. Here is the sequence of events:
I walk up to my computer (plugged into power). KDE is locked. I unplug it and move it to another location and plug in power, Ethernet, a USB keyboard, a USB mouse, and two monitors (VGA and mini-DP). I do this frequently and usually without issue.
It works for like 15 seconds and then begins to slow a bit. The keyboard stops working (both built-in and USB). Mouse and interacting with apps work, but not the taskbar/panel. I first thought it was just global shortcuts that weren’t working, so I switched to tty1 (this worked) and killed plasma shell.
When I returned to X’s tty, now I realize the keyboard isn’t working at all once I open a Konsole and am unable to type anything.
Pretty soon clicking on anything stops working.
I return to tty1 and try to login again and it freezes after typing my username, but it lets me switch back to tty7.
And now I wait. Eventually all the keys and clicks I pressed happen and I have a terminal with a lot of random gunk typed in.
I eventually unplugged all the peripherals from my computer.
One thing that I guess stands out in the logs is the CPU issue. My fans did not ramp up as they normally do under load, so I am not sure if it was under load. I am not sure anymore if it is accurate/what it means, as I have seen this issue on every distro I have installed on here since I first bought this machine. I get this warning 100% of the time when the computer boots up. I suppose I can re-seat the CPU and reapply the heatsink if I need to (I am able to do all of that with this machine).