btrfs, snapper, and recovering lost space after a rollback?

I’m running Tumbleweed with Snapper/snapshots. X.org broke one time and I used the snapshots to recover my system (and wait until the nVidia drivers were fixed). However, I now seem to have a lot of space missing on my main partition, and I’m trying to work out how to recover it.

I think what has happened is that my original “/” subvolume and the snapshot subvolume that I’m now booting from have diverged a lot (because it’s Tumbleweed) but I don’t know if there’s a way to recover the space.

$ mount | grep ^/dev
/dev/mapper/main-root on / type btrfs (rw,relatime,ssd,space_cache,subvolid=1017,subvol=/@/.snapshots/685/snapshot)
/dev/mapper/main-root on /srv type btrfs (rw,relatime,ssd,space_cache,subvolid=259,subvol=/@/srv)
/dev/mapper/main-root on /boot/grub2/i386-pc type btrfs (rw,relatime,ssd,space_cache,subvolid=260,subvol=/@/boot/grub2/i386-pc)
/dev/mapper/main-root on /.snapshots type btrfs (rw,relatime,ssd,space_cache,subvolid=272,subvol=/@/.snapshots)
/dev/mapper/main-root on /opt type btrfs (rw,relatime,ssd,space_cache,subvolid=258,subvol=/@/opt)
/dev/mapper/main-root on /boot/grub2/x86_64-efi type btrfs (rw,relatime,ssd,space_cache,subvolid=261,subvol=/@/boot/grub2/x86_64-efi)
/dev/sdb4 on /boot/efi type vfat (rw,relatime,fmask=0002,dmask=0002,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro)
/dev/sdb3 on /var type ext4 (rw,relatime,data=ordered)
/dev/sdc1 on /mnt/backup type ext4 (rw,relatime,stripe=32750,data=ordered)
/dev/sdb2 on /mnt/media type ext4 (rw,relatime,stripe=32747,data=ordered)
/dev/sdb5 on /tmp type ext4 (rw,relatime,data=ordered)
/dev/mapper/main-home on /home type ext4 (rw,noatime,discard,stripe=32692,data=ordered)


$ df -h /
Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/main-root   25G   16G  9.1G  63% /


$ sudo snapper list
Type   | #   | Pre # | Date                         | User | Cleanup  | Description | Userdata
-------+-----+-------+------------------------------+------+----------+-------------+---------
single | 0   |       |                              | root |          | current     |         
single | 685 |       | Tue 13 Feb 2018 21:10:48 GMT | root |          |             |         
single | 770 |       | Fri 02 Mar 2018 09:00:01 GMT | root | timeline | timeline    |         
single | 771 |       | Fri 02 Mar 2018 10:00:01 GMT | root | timeline | timeline    |    
     

$ sudo btrfs qgroup show -p /
qgroupid         rfer         excl parent               
--------         ----         ---- ------               
0/5          16.00KiB     16.00KiB ---                  
0/257         9.61GiB      5.06GiB ---                  
0/258       184.82MiB    184.82MiB ---                  
0/259        16.00KiB     16.00KiB ---                  
0/260        16.00KiB     16.00KiB ---                  
0/261         3.44MiB      3.44MiB ---                  
0/272        16.00KiB     16.00KiB ---                  
0/1017        9.71GiB     16.00KiB ---                  
0/1107        9.71GiB      1.58MiB ---                  
0/1108        9.71GiB    144.00KiB ---                  
0/1109        9.71GiB     16.00KiB ---                  
1/0           9.71GiB      1.83MiB 0/1107,0/1108,0/1109


$ sudo btrfs subvolume list  /
ID 257 gen 29387 top level 5 path @
ID 258 gen 29300 top level 257 path @/opt
ID 259 gen 29166 top level 257 path @/srv
ID 260 gen 29332 top level 257 path @/boot/grub2/i386-pc
ID 261 gen 29332 top level 257 path @/boot/grub2/x86_64-efi
ID 272 gen 29392 top level 257 path @/.snapshots
ID 1017 gen 29391 top level 272 path @/.snapshots/685/snapshot
ID 1107 gen 29360 top level 272 path @/.snapshots/770/snapshot
ID 1108 gen 29379 top level 272 path @/.snapshots/771/snapshot
ID 1109 gen 29391 top level 272 path @/.snapshots/772/snapshot


$ sudo btrfs sub get-default /
ID 1017 gen 29391 top level 272 path @/.snapshots/685/snapshot

The quotas say that I’m using <10GB, but df says I’m using 16GB. Subvolume 0/257 is my old “/”, but 0/1017 is my new “/”. 0/257 now shows >5GB exclusive data. I’d like to recover that space, but I didn’t want to experiment and break anything! I thought I might be able to delete 0/257, but all of my subvolumes have it as their top level (directly or indirectly).

Is there a correct way to safely recover that 5GB of space? I’m happy to assume that I’ll never roll back to 0/257!

Thanks.

Note 1: I’ve got SSD and HDDs with partitions across them, hence the lack of some normal subvolumes.
Note 2: I’ve manually deleted most of the snapshots to try to see where the disk usage is and save what space I can, hence the short list

I do not recommend that you follow these instructions - this is merely me exploring an option while understanding how btrfs works

I’ve done some experimental changes and I think I have a safe-but-unapproved way to recover space.

Check your disk usage for brtfs:

sudo btrfs qgroup show -p /

This showed 5GB used by 0/257 as “excl” (i.e. no other subvolume shared the data)

Mount the old subvolume (I used “/media” because nothing else was mounted there):

$ sudo mount -o subvolid=257 /dev/mapper/main-root /media

Then find something that hasn’t changed and delete it (e.g. Tango icons)

$ sudo rm -rf /media/usr/share/icons/Tango

Check the quotas and it hasn’t gone down, because it hasn’t changed and so the blocks are shared rather than exclusive.

So, what we need to do is find what changed. Someone made a btrfs-diff script, but it assumes that you haven’t touched your old snapshot since it was taken (which isn’t the case for me, because a) it’s the old r/w root rather than a r/o snapshot and b) I’ve been poking it and deleting things, which updated its “generation”!)

Assuming you’ve not added files to the old snapshot, run:

$ sudo btrfs subvolume find-new /media 0 | cut -f 14 -d" "|sort -nr|head -n1

This gives the last transaction ID/generation that something was added to the old subvolume (mounted at “/media”)

Now we use that to find what changed in our current snapshot since then.

First, find the current base snapshot:

sudo btrfs subvolume get-default / | sed 's/.*@//'

Then use that path and the last transaction ID from earlier to find the files that have changed in the current snapshot since the last addition in the old snapshot, e.g.:

sudo btrfs subvolume find-new /.snapshots/774/snapshot 27955  | cut -f17 -d" " | sort | uniq

(Field #17 is the file name, and we sort and uniq in case a file was updated multiple times)

From that list, find a large file or directory of files that have changed. e.g. I had upgraded Visual Studio Code. Preferably something not critical to the system that you can reinstall if necessary, in case you get something wrong! Delete them from the old subvolume:

$ rm -rf /media/usr/share/code

Now check the quotas again and the “excl” for 0/257 should have gone down.

It should be possible to take all of the output from the “btrfs subvolume find-new” command and programmatically delete them from the old subvolume, but remember that a) these are new files in the newer snapshot so they could be updates/replacements (which will exist) or they could be new files (which won’t exist) and b) scripting deletion from unknown output is dangerous!

I do not recommend that you follow these instructions - this is merely me exploring an option while understanding how btrfs works

There must be a better way to do this and free up the space from subvolume 0/257 when I’ve done a rollback and have a new default!

Do you really store Visual Studio code on BTRFS? :slight_smile:

I wonder if you might have been able to recover space <safely> by simply removing snapshots using snapper.
Snapper should be tried and/or used first always because by doing so you can ensure that anything you do would be safely done, eg merging when necessary.
When you don’t use snapper, you might have orphaned snapshots and possibly affected your snapper directory’s integrity.

TSU

What makes you think that there is “missing” space? I have limited vision and cannot follow all of your post, but it seems mostly irrelevant. The “df” command is not very applicable to Btrfs. Snapshots in Tumbleweed tend to be enormous because most of the root filesystem gets replaced on version upgrade, but you only have two.

What output do you get from:

 # btrfs fi show /

 # btrfs fi df /

If there is a large difference between the “Used” values try:

 # btrfs balance start /

(this can take a while)
then repeat the “btrfs fi” commands.
Check out the Btrfs Wiki, e.g. https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_get_.22No_space_left_on_device.22_errors.2C_but_df_says_I.27ve_got_lots_of_space

Yes. VS Code is an IDE. It installs as an RPM. So it does the Right Thing and puts all of its files under /usr.

I already tried that, and it won’t recover all of the space. Hence why I’m asking about how to recover the space that subvolume 0/257 is using now that I’ve got a different default subvolume due to a rollback.

“df” says I’ve used 16GB, the quotas say I’m using ~10GB on the live snapshot and ~5GB is “exclusive” to the old base 0/257 snapshot. That 5GB of exclusive space is lost, wasted space. I’m never going to go back to it (because I have a new default snapshot) but it is taking up space. Because it isn’t a snapshot then Snapper can’t clear it up.

The commands in the first post were to show my setup, where my system is using space, and what the current state of snapshots is on my machine. I’ve seen people berated for giving too little information, so I didn’t want to describe my problem in the abstract.

I only have two snapshots (which are small, according to the qgroups) because I’ve been deleting them to see how low I can get the disk space usage and whether I can get close to just the actual size used.

$ sudo btrfs fi show /
Label: none  uuid: 84c68236-6fcb-4647-a744-18a738e57fa5
    Total devices 1 FS bytes used 15.14GiB
    devid    1 size 25.00GiB used 24.91GiB path /dev/mapper/main-root


$ sudo btrfs fi df /
Data, single: total=23.50GiB, used=14.53GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=1.38GiB, used=624.84MiB
GlobalReserve, single: total=53.55MiB, used=0.00B

So they’re close but not perfect.

That’s not quite the problem that I have. The problem I have is that I actually run out of space because there are too many old files in old subvolumes. As you said, that is because of the big Tumbleweed updates. I want to do what I can to minimise that.

I let Snapper auto-clean and I do “snapper delete” when I need to, but there is still ~5GB of space used by files that will never be used again because they’re in the original “/” subvolume, which will never get mounted because I’ve done a rollback and now have a different default subvolume.

As I understand it, a 10GB root partition with no other snapshots should take up 10GB on Btrfs. Mine isn’t. I’m using ~16GB on Btrfs. I want to try and get it back down to 10GB so that I hit problems with disk usage less frequently.

Thanks.

That seems to be the problem. Normally on current openSUSE if you enable snapshots during installation your root will be @/.snapshots/1/snapshot, i.e. the very first snapshot created, and @ subvolume will be nearly empty. When did you install your system? Did you chose to enable snapshots during installation?

I do not think it is possible to move subvolumes, so what is left is carefully removing everything from @ except actual subvolumes (like @/.snapshot, @/.srv etc).

https://en.opensuse.org/SDB:Disk_space

Based on Zypper’s history, it was installed back in late October. I think I enabled snapshots at install time (because I left it as Btrfs rather than the ext4 that I’m more used to so that I could use snapshots for recovery), but I do remember enabling quotas later.

Maybe snapshots didn’t get enabled because I used custom partitioning (putting /var and /tmp on a HDD instead of the SSD)?

I’ll see what I can delete manually and hopefully not lose something important!

At least now that I’ve rolled back then I’m on a new default snapshot and the problem won’t happen again in the future :slightly_smiling_face:

Thanks.

I ran:

sudo rm -rf /media/{bin,etc,lib,lib64,sbin,usr}
sudo rm /media/boot/{b,c,i,s,S,v}*
sudo rm -rf /media/boot/grub2/{backgrounds,fonts,grub.cfg,grubenv,locale,themes}

Note: This was probably only safe on my system because some directories that are normally subvolumes are separate partitions! Understand what you’re doing before you run this command!

This has taken subvolume 0/257 down to under 1MB rfer and excl in the qgroup list.

I now have a ~10GB root partition and one 300MB snapshot (plus btrfs metadata) taking up 11GB according to the main df command. That should leave me plenty of space for snapshots for a while. Hopefully there won’t be issues down the line due to not having the correct setup initially (since my default subvolume is now snapshot 774, which looks like it mirrors the setup that I was supposed to have, but with a different snapshot number).

Thanks!

Removing snapshots with normal Linux file commands may break snapper.

Yes, it probably will. But no-one ever suggested that in this thread. All I said was:

If using Snapper to delete Snapper snapshots breaks Snapper then something is wrong!

Hi!
I had the same issue - the subvolume 5 of / was orphaned and contained ~17GB - which where no more available reducing my free space on the root file system. The origin of the problem is not clear, I had roll-back 2…3 times in tumbleweed in the past - the btrfs get-default seems to be snapshop 436 now.
Following your proposal I managed to massage it down to residual 300Mbyte or so - recovering more than 16Gb of free space!. The difficult thing seem to be that all subvolumes of the same btrfs that are mounted somewhere (such as /var/log, visible as /media/var/log) appear as not duplicated data - so deleting them deletes the data directly also in /var/log. so I needed to proceed very carefully when deleting files/directories on this orphaned subvolume/snapshot. so I proceed like this: 1) backup the directory 2) count the files in the original place 3) rename one file and cross check whether original location is unchanged 4) delete the directory 5) count the files in the original place to make sure nothing got deleted.
compared to your list I could not delete /usr/local … since this was another subvolume mounted. so all in /usr could go except /usr/local.

I have not understood btrfs well enough … why can this tedious manual procedure of “lost snapshots” not become resolved automatically?
could I re-create some snapshot description and mount the orphaned subvolume into /.snapshots/<some-id>/snapshot … and then delete it? How would I need to proceed?

Or is there any other way to recover from this situation and clean out the occupied space?

Based on arvidjaar’s earlier answer then this tedious process of “lost” snapshots isn’t handled automatically because it only happens when something wasn’t set up correctly. It’s effectively expected by undesirable behaviour. There’s not supposed to be anything in subvolume 5 because it was supposed to set up a snapshot (which would then get automatically deleted as normal) before it installed anything.

It is important to note that my deletes were based on remounting the root partition somewhere else (under /media in my case). When I did that then it didn’t seem to mount the subdirectories, so I found it safe to delete everything else. You can check by doing “df /path” and seeing where it says “mounted on”. “df /” reports “/”, “df /srv” reports “/srv” (because it is a subvolume), but remounting an old root under /media and doing “df /media/srv” reports “/media”, because it hasn’t remounted the subvolume under the remounted root.

[Edit] Actually, maybe not. I just went to see if there was more that I could clear up and while /media/srv isn’t remounted, /media/boot/grub2/x86_64-efi does appear to be an auto-mounted subvolume with up-to-date files! Presumably that’s why I carefully cleaned up around some directories![/Edit]

If you want to recover more space then mount your old subvolume and run “df -hx /path/to/old/subvolume | sort -h” and it’ll show you where the biggest remaining files are.

Deleting a file that is in two subvolumes (and hence included under “rfer” as shared data) should not result in the loss of files. It may result in it becoming “excl” data, because it is only on one subvolume and btrfs can’t now save space by deduplicating blocks, but the two subvolumes are separate.

And to reiterate: it is only safe to do any of this if you have restored at least once and have a new snapshot reported by “btrfs get-default”!

Based on arvidjaar’s earlier answer then this tedious process of “lost” snapshots isn’t handled automatically because it only happens when something wasn’t set up correctly. It’s effectively expected by undesirable behaviour. There’s not supposed to be anything in subvolume 5 because it was supposed to set up a snapshot (which would then get automatically deleted as normal) before it installed anything.

If you want to recover more space then mount your old subvolume and run “df -hx /path/to/old/subvolume | sort -h” and it’ll show you where the biggest remaining files are.

Note that deleting a file that is in two subvolumes (and hence included under “rfer” as shared data) should not result in the loss of files. It may result in it becoming “excl” data, because it is only on one subvolume and btrfs can’t now save space by deduplicating blocks, but the two subvolumes are separate. However, you’re right that sub-volumes are loaded with the mount, so if you mount “/” at /media then /media/srv and /srv will be the same content! That’s why I carefully selected the files I wanted to delete and left the whole of /var alone, because deleting from /media/var/logs is actually deleting from the same volume as /var/logs.

As I mount /usr/local from elsewhere then it was safe for me to delete the whole of /usr. This may not be true for other people!

And to reiterate: it is only safe to do any of this if you have restored at least once and have a new snapshot reported by “btrfs get-default”!