snapper/btrfs fails to adjust the subvolume id after a rollback

uliw · May 20, 2016, 3:47pm

Hi there,

noticed the following strange behavior with BTRFS and Snapper (opensuse tumbleweed with kernel 4.5.2-1-default)

If I select a previous write only snapshot to boot, and subsequently initiate a rollback to this this state (say snapshot id= 465),

btrfs sub get-default / correctly points to snapshot id 465 after the next reboot.

Subsequent snapper images images will be numbered 465+i as expected, but btrfs sub get-default / will still point to snapshot 465.

Upon closer inspection, I found that the actual subvolume mounted is the correct one, however snapper is unable to delete the actual snapshot 465. As such, snappers cleanup algorithm fails, and start to accumulate snapshots until the disk is full.

This problem only happens after a rollback operation. It is reproducible every time.

So I have two questions here:

is there a way to adjust the subvolume numbering manually
how can I report this bug?

Thanks

Uli

tsu2 · May 20, 2016, 4:13pm

When you rollback to an image, a new snapshot instance is created moving forward.
No snapshots are deleted/removed, ever unless manually or by script.

This means that… Let’s say you rolled back to a snapshot but found it was too early. You can still rollback to any other created snapshot forwards or backwards from that point and every time you do so, you’re incrementing a new snapshot each time. You may have rolled back, but you’re actually at the <equivalent> but with a current date/time and a new ID.

TSU

uliw · May 20, 2016, 4:31pm

Hi tsu2

thanks for getting back to me.

I get your point (I think) but that that does not explain (unless I fail to see the obvious) why the snapshot id pointed to by btrfs sub get-default is out of sync with the snapshot id listed by snapper list.

But for the sake of the argument, lets assume that this is the intended behavior, it completely breaks the cleanup algorithm of snapper, since snapper is unable to delete snapshot id=465 (the one reported by btrfs sub get default), all the while it is happily creating new snapshots. So within a few months you have hundreds of new snapshots since snapper fails to cleanup anything newer then snapshot id=465

Cheers

Uli

tsu2 · May 20, 2016, 7:36pm

I don’t remember the snapper auto-delete algorithm off the top of my head, but it’s supposed to be based on time, and is supposed to preserve something like once a day and once every 10 days or something like that.

In any case,
You can manually remove any snapshots you don’t want, are you sing the YAST GUI or the console command?
I find the running snapper in a console is far more powerful and useful than the GUI because snapper commands are so simple to run, and displayed info (like listing snapshots) is very informative.

Removing snapshots using snapper is very easy and ensures that you can’t make a mistake. When you remove a snapshot, all your remaining snapshots are re-compiled(? - Is that the right word?) to preserve each snapshot’s integrity.

TSU

uliw · May 20, 2016, 8:29pm

Ah that’s precisely my point. I discovered this problem because the auto clean up script failed. next I tried to delete manually (snapper delete XXX) but this fails too with the message that snapper cannot delete snapshot XXX (and it shouldn’t since it is the actual working copy)

so snapper list will give you

whereas btrfs sub get-default / yields

ID 868 gen 144978 top level 257 path .snapshots/475/snapshot

[FONT=arial]so snapper delete will result in “[/FONT]Deleting snapshot failed.” So basically, brtfs sub-get default points to 475 as the youngest snapshot, whereas snapper thinks that 467 is the youngest snapshot (based on id and date).

Cheers

Uli

arvidjaar · May 20, 2016, 8:43pm

When you ask about system behavior you should provide as precise information as possible, ideally copy and paste exact command(s) you type and output you get from them; or exact log lines that demonstrate issue. Your XXX cannot be associated with any subvolume you listed and use useless for troubleshooting.

You go to https://bugzilla.opensuse.org/enter_bug.cgi?product=openSUSE%20Tumbleweed using same account as on these forums; select Basesystem as component. Give as precise description as possible; attach actual log files that include error; show actual command with real numbers and its output; tell exact number of snapshot that is in question. Reporting bug with all those XXX and 465+i is just a waste of time and bandwidth.

uliw · May 20, 2016, 8:52pm

Thanks. will do

Uli

uliw · June 13, 2016, 4:52pm

ok, so it turns out that this is a problem with systemd. Please see the following discussion

https://bugzilla.suse.com/show_bug.cgi?id=980962.

Provided you do not use /var/lib/machines, the workaround is to simply delete var lib machines from the snapshot as in

btrfs subvolume delete .snapshots/1/snapshot/var/lib/machines

Afterwards, snapper cleanup behaves as expected.

Cheers

Uli