Freeze-ups due to Snapper?

I’ve been here before…but I’ll take the liberty of raising this question again:

Snapper seems to jump in with its snapshot activities at such high priority that my computer is blocked on occasions for more than just a few seconds.

What snapper configuration parameters would be optimal for a home desktop, so that snapper can do its job and I can do mine without too much interference between us?

No replies on this post yet. Has no-one else noticed the problem?

I have changed the main snapper config parameters in /etc/snapper/configs/root to

# run daily number cleanup
NUMBER_CLEANUP="yes"

# limit for number cleanup
NUMBER_MIN_AGE="1800"
NUMBER_LIMIT="2-10"
NUMBER_LIMIT_IMPORTANT="4-10"


# create hourly snapshots
TIMELINE_CREATE="no"

# cleanup hourly snapshots after some time
TIMELINE_CLEANUP="yes"

# limits for timeline cleanup
TIMELINE_MIN_AGE="1800"
TIMELINE_LIMIT_HOURLY="12"
TIMELINE_LIMIT_DAILY="7"
TIMELINE_LIMIT_WEEKLY="4"
TIMELINE_LIMIT_MONTHLY="12"
TIMELINE_LIMIT_YEARLY="2"


# cleanup empty pre-post-pairs
EMPTY_PRE_POST_CLEANUP="yes"

# limits for empty pre-post-pair cleanup
EMPTY_PRE_POST_MIN_AGE="1800"


Nevertheless, there are times when the btrfs activity is high (up to 50% CPU time). (Even entering the commands to find that out is difficult). For example, after booting this morning (Monday), I had to wait (felt) 10 minutes before normal keyboard/mouse operation was possible.

If you really believe that snapper (or a process managed by snapper) has too high a priority, go ahead and try modifying its process priority (aka nice).

The following describes settings you can try in your systemd Unit file,
One is is the “LimitNICE=” in the Process Properties section, another is “Nice=” in the Scheduling section

https://www.freedesktop.org/software/systemd/man/systemd.exec.html

I’d also recommend you consider why and when your snapshots are being made… It looks like you’ve disabled hourly snapshots, in general your system should be creating snapshots only when you’re running zypper (installing or uninstalling something) and bootups and shutdowns. You shouldn’t be seeing activity at some other time.

Could you be confusing your disk activity for something else, like perhaps if you’re installed on an SSD and don’t enable TRIM, your disk could be desperately trying to clear traps on demand to enable new writes.

TSU

Thanks for replying!

No, I am only guessing that the problem arises directly from the Snapper processes, having been advised similarly in the past.

Aahhh…I am installed on SSD. Presumably, I have to change the SSD mount command in the boot-up sequence. Where do I find that? Note that I am only using the standard sequences provided in the OpenSUSE Leap 15 installation.

The following is probably one of the best references for explaining and providing what you need to do,
Follow the instruction to add the “discard” option to your first disk in your fstab

https://wiki.archlinux.org/index.php/Solid_state_drive

TSU

I hope I’ve got this right.

lsblk --discard 

returns non-zero values for DISC-GRAN and DISC-MAX on sda. Therefore, my SSD supports TRIM.

Thereupon, using the /etc/config editor under YaST, I updated the value of BTRFS_TRIM_MOUNTPOINTS from / to /sda.

BTRFS_TRIM_PERIOD remains at ‘weekly’

What you describe may work, too.
The setting I recommended configures the system to run TRIM as a background process whenever your system has idle resources… That’s usually better than guessing when your system should run TRIM, and then run the operation at normal priority.

TSU

I think you are recommending the continuous TRIM (which, according to your link, is not the preferred one for several distros). I assume OpenSUSE has periodic TRIM as standard since the parameter BTRFS_TRIM_PERIOD is set to ‘weekly’ in /etc/sysconfigs. I assume, therefore, I have to set that to ‘none’ to get a continuous TRIM.

In addition, the line you refer to in the link seems to be

/dev/sda1  /           ext4  defaults,**discard**   0  1

In my /etc/fstab file there is a line

UUID=9f991f1b-d552-4907-a6a6-c6a8e2e93fcb / btrfs defaults 0 0

I assume that this is the line into which I must insert the ‘discard’ value, since this is the UUID of my /sda. Is there a reason why the /sda is would not be checked for errors?

From the first time TRIM has been available,
How to implement has always been a matter of discussion which is why the ArchWiki describes all possible methods.

If you run TRIM periodically, that more or less assumes that you <know> your system activity, and your writes activity is predictable otherwise if an extreme number of writes could deplete the number of cleared traps and then you’d have the same experience of your system pausing while traps are cleared on demand. If you clear traps on a schedule though, the upside of course is that your system won’t be encumbered by disk operations however slight during your normal activity.

But,
The way most people use their systems is a pattern of infrequent periodic activity, most disk activity are reads and not writes and most of the time no disk activity at all. It’s during these times when your disk isn’t being used at all that could be used to do your disk maintenance clearing traps. When this is done whenever it shouldn’t be noticeable, your disk should always be ready for disk writes.

TSU

As you know, I am still trying to put my finger on freeze-ups that seem to occur shortly after I have logged in, for example. I may be wrong, but Monday seems to be a bad day for that :sarcastic:. However, I have noticed the same behaviour, perhaps not as extreme, while I am working on, say, a presentation. (15 seconds or more). By freeze-up, I mean, no mouse or keyboard activity possible, i.e. what I perceive. Who knows what is going on in the background! In the case of the presentation work, I could, perhaps, attribute the freeze-up to temporary back-up activity of LibreOffice, but it is a wee bit annoying.

So would ‘discard’ mitigate that, I wonder?

Discard (which runs TRIM) only makes your disk available for writes.
The ArchWiki should describe why this is necessary in an SSD, when something is deleted unlike an HDD the previous data cannot simply be over-written immediately. On an SSD, when something is deleted, it’s only marked for deletion and not yet available to be re-used. Sometime (and that’s the big difference to consider when scheduled or background continuous) the data has to be zero’d out, and only <then> new data can be written to that storage location.

If you didn’t run discard/TRIM at all, your disk is full of these locations were old deleted data hasn’t yet been zero’d out, when this occurs you can’t write to disk. Your system will strain to find some place, anywhere to perform the write operation you’ve commanded which can have effects like what you describe.

Go back to the ArchWiki and look for the command to do a TRIM on demand.
Hopefully only in your current extreme case, this has to be performed only this one time.
Then, whether you set up a schedule or run TRIM as a continuous background process, if the whether you choose is sufficient you shouldn’t see this problem again.

Or at least this probable cause…

TSU

I don’t really understand this rational. As I understand it, according to man(8) for fstrim

fstrim / 

will discard by default all deleted cells. Presumably, this is what happens when fstrim is run periodically as well. What am I missing here?

Also, looking at the settings in /etc/sysconfig, there are also 3 parameters for BTRFS_SCRUB_ (MOUNTPOINTS, PERIOD, PRIORITY). They are all set (_PERIOD = monthly). Assuming that SCRUB is precisely this discard process, what is the difference between this and a discard/TRIM on demand?

Sorry for the penetrating questions. As a layman, I am trying to be as precise as I can to explain what I mean.