My btrfs root partition ran out of space today. So I deleted some snapshots and did some btrfs balancing. But I’m puzzled as to why the system hadn’t automatically done that maintenance itself. I’ve never touched any of the default configs, so my understanding is that snapper should sort out deleting snapshots when there’s not enough disk space, and btrfs should do a balance every week - or something like that?!
How can I find out if btrfs and snapper maintance is happening? I think they’re no longer cron jobs?
If it helps, I’ve done online upgrades from Leap 15.2 onwards. If further info is required, let me know what. Thanks in advance.
It should if it is configured. You neither show snapper configuration nor the actual space usage of your filesystem so it is rather difficult to guess what happens.
systemctl status snapper-cleanup.timer
systemctl list-timers snapper-cleanup.timer
Thanks for pointing me to the systemd timers. I hadn’t come across those before. Looks like they’re all running OK:
❯ systemctl status snapper-cleanup.timer
● snapper-cleanup.timer - Daily Cleanup of Snapper Snapshots
Loaded: loaded (/usr/lib/systemd/system/snapper-cleanup.timer; enabled; vendor preset: enabled)
Active: active (waiting) since Tue 2024-02-20 15:27:24 GMT; 1 week 0 days ago
Trigger: Wed 2024-02-28 14:42:25 GMT; 23h left
Triggers: ● snapper-cleanup.service
Docs: man:snapper(8)
man:snapper-configs(5)
❯ systemctl list-timers snapper-cleanup.timer
NEXT LEFT LAST PASSED UNIT ACTIVATES
Wed 2024-02-28 14:42:25 GMT 23h left Tue 2024-02-27 14:42:25 GMT 58min ago snapper-cleanup.timer snapper-cleanup.service
❯ systemctl list-timers btrfs-balance
NEXT LEFT LAST PASSED UNIT ACTIVATES
Fri 2024-03-01 00:00:00 GMT 2 days left Fri 2024-02-02 16:32:55 GMT 3 weeks 3 days ago btrfs-balance.timer btrfs-balance.service
So, why did my / partition suddenly hit full? I’ve recently moved /var/log to a separate partition so it wasn’t logs suddenly filling it up, and as far as I’m aware, the only thing I installed was starship, which is only 9.5MB. Is there a problem with my snapper config above?
Thanks all. Here are the various outputs suggested:
❯ sudo journalctl -u snapper-cleanup
Feb 20 15:37:26 prodesk systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Feb 20 15:37:26 prodesk systemd-helper[6818]: running cleanup for 'root'.
Feb 20 15:37:26 prodesk systemd-helper[6818]: running number cleanup for 'root'.
Feb 20 15:39:31 prodesk systemd-helper[6818]: running timeline cleanup for 'root'.
Feb 20 15:39:31 prodesk systemd-helper[6818]: running empty-pre-post cleanup for 'root'.
Feb 20 15:39:31 prodesk systemd[1]: snapper-cleanup.service: Deactivated successfully.
Feb 27 14:42:25 prodesk systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Feb 27 14:42:26 prodesk systemd-helper[32153]: running cleanup for 'root'.
Feb 27 14:42:26 prodesk systemd-helper[32153]: running number cleanup for 'root'.
Feb 27 14:44:09 prodesk systemd-helper[32153]: running timeline cleanup for 'root'.
Feb 27 14:44:09 prodesk systemd-helper[32153]: running empty-pre-post cleanup for 'root'.
Feb 27 14:44:11 prodesk systemd[1]: snapper-cleanup.service: Deactivated successfully.
❯ sudo journalctl -u btrfs*
Feb 20 15:27:24 prodesk systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
Feb 20 15:27:25 prodesk systemd[1]: Started Balance block groups on a btrfs filesystem.
Feb 20 15:27:25 prodesk systemd[1]: Started Defragment file data and/or directory metadata.
Feb 20 15:27:25 prodesk systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
Feb 20 15:27:25 prodesk systemd[1]: Started Discard unused blocks on a mounted filesystem.
❯ sudo snapper --iso list
# | Type | Pre # | Date | User | Used Space | Cleanup | Description | Userdata
-----+--------+-------+---------------------+------+------------+---------+-----------------------+--------------
0 | single | | | root | | | current |
1* | single | | 2022-11-23 16:53:18 | root | 2.00 MiB | | first root filesystem |
779 | pre | | 2024-02-20 09:17:49 | root | 698.22 MiB | number | zypp(packagekitd) | important=yes
780 | post | 779 | 2024-02-20 09:38:09 | root | 2.64 MiB | number | | important=yes
781 | pre | | 2024-02-20 15:24:09 | root | 4.17 MiB | number | zypp(zypper) | important=yes
782 | post | 781 | 2024-02-20 15:26:11 | root | 10.80 MiB | number | | important=yes
785 | pre | | 2024-02-21 15:53:56 | root | 3.72 MiB | number | zypp(zypper) | important=no
786 | post | 785 | 2024-02-21 15:54:21 | root | 384.00 KiB | number | | important=no
787 | pre | | 2024-02-21 16:41:30 | root | 144.00 KiB | number | zypp(zypper) | important=no
788 | pre | | 2024-02-21 16:46:01 | root | 1.55 MiB | number | zypp(packagekitd) | important=no
789 | post | 788 | 2024-02-21 16:47:55 | root | 15.34 MiB | number | | important=no
790 | pre | | 2024-02-23 20:26:25 | root | 8.72 MiB | number | zypp(packagekitd) | important=no
791 | post | 790 | 2024-02-23 20:30:15 | root | 10.56 MiB | number | | important=no
794 | pre | | 2024-02-26 20:25:47 | root | 4.58 MiB | number | zypp(packagekitd) | important=no
795 | post | 794 | 2024-02-26 20:26:28 | root | 4.97 MiB | number | | important=no
796 | pre | | 2024-02-28 13:05:39 | root | 1.20 MiB | number | zypp(zypper) | important=no
❯ sudo btrfs subvolume list /
ID 256 gen 30 top level 5 path @
ID 257 gen 644705 top level 256 path @/var
ID 258 gen 644694 top level 256 path @/usr/local
ID 259 gen 644704 top level 256 path @/tmp
ID 260 gen 644252 top level 256 path @/srv
ID 261 gen 644700 top level 256 path @/root
ID 262 gen 644697 top level 256 path @/opt
ID 263 gen 644253 top level 256 path @/boot/grub2/x86_64-efi
ID 264 gen 644252 top level 256 path @/boot/grub2/i386-pc
ID 265 gen 644699 top level 256 path @/.snapshots
ID 266 gen 644705 top level 265 path @/.snapshots/1/snapshot
ID 933 gen 644252 top level 257 path @/var/lib/machines
ID 1127 gen 643797 top level 265 path @/.snapshots/779/snapshot
ID 1128 gen 643797 top level 265 path @/.snapshots/780/snapshot
ID 1129 gen 643797 top level 265 path @/.snapshots/781/snapshot
ID 1130 gen 643797 top level 265 path @/.snapshots/782/snapshot
ID 1133 gen 643797 top level 265 path @/.snapshots/785/snapshot
ID 1134 gen 643797 top level 265 path @/.snapshots/786/snapshot
ID 1135 gen 643797 top level 265 path @/.snapshots/787/snapshot
ID 1136 gen 643797 top level 265 path @/.snapshots/788/snapshot
ID 1137 gen 643797 top level 265 path @/.snapshots/789/snapshot
ID 1138 gen 643797 top level 265 path @/.snapshots/790/snapshot
ID 1139 gen 643797 top level 265 path @/.snapshots/791/snapshot
ID 1145 gen 644240 top level 265 path @/.snapshots/794/snapshot
ID 1146 gen 644242 top level 265 path @/.snapshots/795/snapshot
ID 1149 gen 644699 top level 265 path @/.snapshots/796/snapshot
Everything seems to be set up by default and running. So, I guess the snapshots you had before just needed too much room on your not super big root partition.
To prevent that in the future you can reduce the value for SPACE_LIMIT in /etc/snapper/configs/root
I quote from the documentation for better understanding:
" `SPACE_LIMIT`
Limit of space snapshots are allowed to use in fractions of 1 (100%). Valid values range from 0 to 1 (0.1 = 10%, 0.2 = 20%, ...)."
As for now it’s set to 50% that equals 14.35GiB out of your 28.7GiB. If you would put let’s say 0.2, you could limit it to 5.74GiB.
and
" If, for example, NUMBER_LIMIT=5-20 is set, Snapper will perform a first clean-up run and reduce the number of regular numbered snapshots to 20. In case these 20 snapshots exceed the quota, Snapper will delete the oldest ones in a second run until the quota is met. A minimum of five snapshots will always be kept, regardless of the amount of space they occupy."
Also you can reduce the NUMBER_LIMIT. Just make sure you still have a range, so that the quota rules can still apply. That way you would reduce the number of snapshots that are kept.
Or alternatively you can dump the quota rule and set a fixed value. As in 2 snapshots that are flagged important and 2 snapshots without or something like that.
Of course that measures will reduce the available snapshots for rollback options etc. But if you need to save space you need to save space, I guess.
OK thanks. That makes sense and I had wondered if that was what I should do, but didnt’t understand enough to be confident it was right.
Why, if FREE_LIMIT=0.2 didn’t snapper keep 5.74GiB free?
I use backintime to take daily snapshots to a NAS, and I’ve never had to use snapper to do a rollback, so I think reducing SPACE_LIMIT and NUMBER_LIMIT would be fine.
Having adjusted the snapper config, can I call the snapper-cleanup service manually (and if so, how?), or do I have to wait until the next systemctl timer?
Ah true, there is FREE_LIMIT set, I overread that.
But anyways: If the lower value of the NUMBER_LIMIT is reached the quota limits will be ignored. So, it indeed seems to be necessary to reduce at least the lower value of the NUMBER_LIMIT and NUMBER_LIMIT_IMPORTANT range to make sure that some more snapshots are cleaned up.
Or you dump the quota rules at all and set your NUMBER_LIMIT and NUMBER_LIMIT_IMPORTANT to fixed values.
How do you expdect anybody to guess what happened?
Snapper runs peridodically and something may have filled your filesystem(s) in between. You had to investigate when it happened, not when you cleaned up and destroyed any evidence.
Anyway, 1.5GiB in /tmp and 4+iGB in /var sounds excessive for such small partition. You may want to look into it.
You snapshots consume 4GiB in total, so it is far below any configured limit. It is very unlikely they were the reason for the lack of space. Besides, snapshots “grow” when you write into your currently active root, so they are never the root cause of “out of space” condition. Snapshots just reduce available space making “out of space” more likely, but you need to actively fill your filesystem to cause it.
Thanks again, @arvidjaar and @404_UsernameNotFound. I realise it’s pretty tricky to troubleshoot issues like these in retrospect, and remotely. However, your answers and suggestions have helped, both in tackling the specific problem I was facing, and in my own general learning of OpenSuse.
(And yes, I did find some further temp and cached items in both /tmp and /var that could be deleted, which were exacerbating the situation.)