LEAP 42.2 btrfs root filesystem subvolume structure

mchnz · November 26, 2016, 4:00am

I wanted to share some notes I made while I was seeking to understand the subvolume structure of Leap 42.2 btrfs root filesystem and how I might go about copying it. In the notes below I’ve included sequences of commands that might prove useful in creating a root filesystem from scratch - **before using any of them please make sure you fully understand them, and make sure you check that prerequisite commands have worked before applying subsequent commands, if you fail to check for errors you may wind up targeting the wrong filesystem **(I use a sacrificial virtualbox to conduct my experiments with btrfs).

The default Leap 42.2 root filesystem supports rollback by employing btrfs snapshots (openSUSE includes the snapper utility for managing these snapshots). The rootfs is divided into separately mounted btrfs subvolumes based on whether the content of the subvolume needs to be included in snapshots. Only one subvolume is subject to snapshots, it contains the relatively static content of the rootfs, such as /etc, /bin, and /usr. Another twenty or so subvolumes are not subject to snapshots, they hold the more volatile/varying content, such as /tmp and /var/tmp (it’s not practical or useful to snapshot rapidly changing logs, working data, or datastores). There are indefinitely more subvolumes because each snapshot is also a subvolume.

The following command can be used to list the subvolumes participating in a root filesystem when mounted as / (but what about when / isn’t mounted?):

    linux-s7mu:~ # btrfs subvolume list /
    ID 257 gen 33 top level 5 path @
    ID 258 gen 62 top level 257 path @/.snapshots
    ID 259 gen 155 top level 258 path @/.snapshots/1/snapshot
    ID 260 gen 61 top level 257 path @/boot/grub2/i386-pc
    ID 261 gen 16 top level 257 path @/boot/grub2/x86_64-efi
    ID 262 gen 43 top level 257 path @/opt
    ID 263 gen 47 top level 257 path @/srv
    ID 264 gen 154 top level 257 path @/tmp
    ID 265 gen 66 top level 257 path @/usr/local
    ID 266 gen 155 top level 257 path @/var/cache
    ID 267 gen 36 top level 257 path @/var/crash
    ID 268 gen 24 top level 257 path @/var/lib/libvirt/images
    ID 269 gen 65 top level 257 path @/var/lib/machines
    ID 270 gen 25 top level 257 path @/var/lib/mailman
    ID 271 gen 27 top level 257 path @/var/lib/mariadb
    ID 272 gen 28 top level 257 path @/var/lib/mysql
    ID 273 gen 28 top level 257 path @/var/lib/named
    ID 274 gen 30 top level 257 path @/var/lib/pgsql
    ID 275 gen 155 top level 257 path @/var/log
    ID 276 gen 36 top level 257 path @/var/opt
    ID 277 gen 155 top level 257 path @/var/spool
    ID 278 gen 154 top level 257 path @/var/tmp
    ID 283 gen 61 top level 258 path @/.snapshots/2/snapshot

The subvolume called “@” holds the content of the root filesystem that is subject to snapshots. The @ subvolume is also the parent of the additional non-snapshotted subvolumes such as @/tmp and @/var/tmp. (I think “@” is just a name and it could easily be named “rootfs”, but there is a cross-distro convention to use “@”.)

Be aware that the @ subvolume is not the top of the filesystem, it is actually a child of the subvolume with an ID of zero, which is the top/initial/sole subvolume present after filesystem creation. The ID zero subvolume is not normally mounted, but can be explicitly mounted by using the mount option subvolid=0 (when creating or maintaining the overall filesystem).

Also its important to note that the actual subvolume that gets mounted on / at boot is not the one named @, but one of its snapshots subvolumes resident in @/.snapshots. Each btrfs filesystem internally records which subvolume to mount by default, in the case of Leap 42.2, this is set to a snapshot volume, the default value can be queried using issuing the following command:

    linux-s7mu:~ # btrfs subvolume get-default /
    ID 259 gen 177 top level 258 path @/.snapshots/1/snapshot

Although this is the default snapshot to mount, other snapshots could be selected instead. For example, boot time rollback is easily achieved by choosing from a selection of rootfs snapshots. For a running system, the snapper utility can be used to compare and manage changes across the available snapshots.

At this date Leap 42.2 appears to lack a post-install tool for recreating the subvolume structure of the root filesystem. If you need to replicate an existing root filesystem you will either have to run the installer and create a dummy installation, or manually create the filesystem using btrfs commands. The following commands replicate a typical Leap 42.2 root filesystem volume structure onto the partition /dev/sdb2:

    
    # Create a new btrfs filesystem
    mkfs.btrfs /dev/sdb2

    # Mount subvolid=0 subvolume
    mkdir -p /mnt/tmp_rootsv
    mount /dev/sdb2 -o subvolid=0 /mnt/tmp_rootsv

    # Create the main snapshotted subvolume of the root filesystem 
    btrfs subvolume create /mnt/tmp_rootsv/@

    # Create the non-snapshotted subvolumes
    mkdir -p /mnt/tmp_rootsv/@ && btrfs subvolume create /mnt/tmp_rootsv/@/.snapshots
    mkdir -p /mnt/tmp_rootsv/@/boot/grub2 && btrfs subvolume create /mnt/tmp_rootsv/@/boot/grub2/i386-pc
    mkdir -p /mnt/tmp_rootsv/@/boot/grub2 && btrfs subvolume create /mnt/tmp_rootsv/@/boot/grub2/x86_64-efi
    mkdir -p /mnt/tmp_rootsv/@ && btrfs subvolume create /mnt/tmp_rootsv/@/opt
    mkdir -p /mnt/tmp_rootsv/@ && btrfs subvolume create /mnt/tmp_rootsv/@/srv
    mkdir -p /mnt/tmp_rootsv/@ && btrfs subvolume create /mnt/tmp_rootsv/@/tmp
    mkdir -p /mnt/tmp_rootsv/@/usr && btrfs subvolume create /mnt/tmp_rootsv/@/usr/local
    mkdir -p /mnt/tmp_rootsv/@/var && btrfs subvolume create /mnt/tmp_rootsv/@/var/cache
    mkdir -p /mnt/tmp_rootsv/@/var && btrfs subvolume create /mnt/tmp_rootsv/@/var/crash
    mkdir -p /mnt/tmp_rootsv/@/var/lib/libvirt && btrfs subvolume create /mnt/tmp_rootsv/@/var/lib/libvirt/images
    mkdir -p /mnt/tmp_rootsv/@/var/lib && btrfs subvolume create /mnt/tmp_rootsv/@/var/lib/machines
    mkdir -p /mnt/tmp_rootsv/@/var/lib && btrfs subvolume create /mnt/tmp_rootsv/@/var/lib/mailman
    mkdir -p /mnt/tmp_rootsv/@/var/lib && btrfs subvolume create /mnt/tmp_rootsv/@/var/lib/mariadb
    mkdir -p /mnt/tmp_rootsv/@/var/lib && btrfs subvolume create /mnt/tmp_rootsv/@/var/lib/mysql
    mkdir -p /mnt/tmp_rootsv/@/var/lib && btrfs subvolume create /mnt/tmp_rootsv/@/var/lib/named
    mkdir -p /mnt/tmp_rootsv/@/var/lib && btrfs subvolume create /mnt/tmp_rootsv/@/var/lib/pgsql
    mkdir -p /mnt/tmp_rootsv/@/var && btrfs subvolume create /mnt/tmp_rootsv/@/var/log
    mkdir -p /mnt/tmp_rootsv/@/var && btrfs subvolume create /mnt/tmp_rootsv/@/var/opt
    mkdir -p /mnt/tmp_rootsv/@/var && btrfs subvolume create /mnt/tmp_rootsv/@/var/spool
    mkdir -p /mnt/tmp_rootsv/@/var && btrfs subvolume create /mnt/tmp_rootsv/@/var/tmp

    # Create a place to keep snapshots and create an initial snapshot
    mkdir /mnt/tmp_rootsv/@/.snapshots/1
    btrfs subvolume snapshot /mnt/tmp_rootsv/@ /mnt/tmp_rootsv/@/.snapshots/1/snapshot

    # Find the ID of the initial snapshot and set it to be the default to be mounted
    defaultsv="$(btrfs subvolume list -o /mnt/tmp_rootsv/@/.snapshots/1 | gawk '$NF ~ "1/snapshot$" {print $2}')"
    btrfs subvolume set-default $defaultsv /mnt/tmp_rootsv/@/.snapshots/1/snapshot

    # Finished - unmount complete filesystem
    umount /mnt/tmp_rootsv

Having replicated the subvolume structure you could then use any of the conventional commands such as rsync, tar, or cp to replicate content from one rootfs to another. You might also use btrfs utilities to replicate a snapshot of a running system to a snapshot on a new root filesystem. For example, I could use these commands to snapshot a running system and send the snapshot to a new filesystem in the partition /dev/sdb2:

    # Create a snapshot of the source root filesystem
    btrfs sub snap -r / //.snapshots/snapforcopy
    mkdir /mnt/newfs/
    mount /dev/sdb2 /mnt/newfs/
    mount /dev/sdb2 -o subvol=@/.snapshots /mnt/newfs/.snapshots

    # Create a folder on the destination to hold the snapshot
    mkdir /mnt/newfs/.snapshots/initial

    # Copy the snapshot to the destination filesystem
    btrfs send /.snapshots/snapforcopy | btrfs receive /mnt/newfs/.snapshots/initial

    # Rename the received snapshot to something more useful
    mv /mnt/newfs/.snapshots/initial/snapforcopy /mnt/newfs/.snapshots/initial/snapshot

    # Enable read/write on the copy
    btrfs property set -ts /mnt/newfs/.snapshots/initial/snapshot ro false

    # Create the snapshots folder (won't have been in the snapshot)
    mkdir /mnt/newfs/.snapshots/initial/snapshot/.snapshots
 
    # Find the ID of the snapshot
    defaultsv="$(btrfs subvolume list -o '/mnt/newfs'/.snapshots/ | gawk '$NF ~ "initial/snapshot$" {print $2}')"

    # Set the snapshot to be the default to be mounted at next mount/boot
    btrfs subvolume set-default $defaultsv /mnt/newfs/.snapshots/initial/snapshot

    # Finished
    umount /mnt/newfs/.snapshots /mnt/newfs
    mount /dev/sdb2 /mnt/newfs/

After the above commands have completed, I would still have to figure out what non-snapshot subvolume content is required and then use rsync or similar to copy it over to the new filesystem.

Another thing to note is that the output of du and df no longer present the full picture you should also consult the btrfs utility commands:

   btrfs filesystem df
   btrfs filesystem du

The totality of the above makes a btrfs root filesystem quite a bit more complex than the old alternatives. The structure of the subvolumes and snapshotting is built on a set of conventions and features that are complex, layered, and not always documented. If you’re going to switch to btrfs you need to become familiar with much of the above and its implications for your recovery and backup procedures.

I’m unsure if standards would have permitted a different approach to the subvolume layout, but I would have preferred to see just two sub-volumes, static and volatile, along with the liberal use of symbolic-links two to conform to the established Linux/UNIX layout.

Personally I’ve decided I won’t benefit much from root filesystem snapshots. The number and complexity of the subvolume layout is too high a price to pay for something I’m not using. If the subvolume layout was just split over two subvolumes, I would definitely use it. In order to continue using my existing backup/recover procedures I’ve found it simpler to stick with ext4.

glistwan · December 1, 2016, 12:11pm

Thank you for posting this. A really good practical insight into BTRFS.

Fraser_Bell · December 10, 2016, 6:53am

Yes, one of my main reasons for sticking with ext4 and not using BTRFS at this time.

Thanks, also, from me for the information.

hcvv · December 10, 2016, 12:31pm

Thanks a lot. This will make insight better for many, so they can take an informed decision on their choice between btrfs and ext4.

ndc33 · January 2, 2017, 9:26am

warning: the nodatacow parameter using chattr for some of the subvolumes appears to be missing from your recipe?

mchnz · January 2, 2017, 11:32am

Thanks, you’re right, it totally slipped my mind.

It’s another reason why I think btrfs might be a little too complex for my purposes - perhaps it will be OK when the tooling is in place to automate what has been discussed in this thread.

mchnz · January 2, 2017, 10:46pm

The original post could be extended/corrected as follows…

Some applications self-manage efficient random writes to their own data-files. Typically these are database or indexed file applications. Such applications may not benefit from btrfs’s copy-on-write (COW). Btrfs uses the file attribute C to flag folders and files that should not be subject to COW.

If /dev/sda1 is the original rootfs (the one being copied), you can identify the folders with the C attribute by doing:

(find `btrfs subvolume list / |  awk '$NF !~ "@]/.]snapshots" { sub("@/","/", $NF); print $NF}'` -xdev -type d  | xargs lsattr -d | awk '$1 ~ /C/ {print $NF}') 2> /dev/null
/var/lib/libvirt/images
/var/lib/mariadb
/var/lib/mysql
/var/lib/pgsql
/var/log/journal
/var/log/journal/03256cb72a0fd6173eb2071d582c1ef3

You would then need to mount the corresponding subvolumes in the new filesystem and use chattr +C to set the attribute:

mount -o subvol=@/var/lib/libvirt/images /dev/sdb2 /mnt/newfs/var/lib/libvirt/images
mount -o subvol=@/var/lib/mariadb /dev/sdb2 /mnt/newfs/var/lib/mariadb
mount -o subvol=@/var/lib/pgsql /dev/sdb2 /mnt/newfs/var/lib/pgsql
mount -o subvol=@/var/lib/mysql /dev/sdb2 /mnt/newfs/var/lib/mysql
mount -o subvol=@/var/log /dev/sdb2 /mnt/newfs/var/log && mkdir /var/log/journal

chattr +C /mnt/newfs/var/lib/libvirt/images  /mnt/newfs/var/lib/mariadb /mnt/newfs/var/lib/pgsql /mnt/newfs/var/lib/mysql /var/log/journal

Note that /var/log/journal is below the actual mount point /var/log, so I had to additionally create the journal folder. Note also that I didn’t copy the actual journal subfolder, that should probably copied by a backup utility.

In fact another alternative would be to copy all the attributes by using the relevant option of a utility such as rsync with -X. My own ext4 rootfs backup is based around the following rsync command that could probably be adapted for this situation:

rsync -ax -HAX --delete  --devices --sparse / /mnt/osbackup

So its as easy as that

(If the brrfs tools has a way of listing items not subject to copy on write, some of the above may be able to be simplified.)

aaccioly · June 25, 2017, 5:47am

Thanks for the great insights on how openSUSE uses BTRFS.

There are still rough edges for new users like me. For instance, I was not aware that / is actually mounted to a subvolume in @/.snapshots.
About a month ago I’ve tried to install NVIDIA drivers + Bumblebee and something went wrong. After a restart I was greeted with a Kernel Panic. Armed with Leap’s Reference Guide I booted from a snapshot and run:

sudo snapper rollback

Job well done, snapper saved the day.

Today I was deleting some old snapshots with YaST Snapper GUI. While trying to delete a random snapshot I’ve received the following error:http://i.imgur.com/iSAzyyf.png

Jumped to the command line and received another error message that wasn’t really helpful:

SAT-SIG-SUSE:~ # snapper delete 151
Deleting snapshot failed.

After hours trying to understand what was happening I’ve finally stumbled across your post. The random old snapshot is actually mounted asdefault subvolume!

SAT-SIG-SUSE:~ # btrfs subvolume get-default /
ID 450 gen 4782 top level 258 path @/.snapshots/151/snapshot.

It took me a while to connect the dots and figure out how it happened. From snapper’s man page:

rollback [options] number]Creates two new snapshots and sets the default subvolume. Per default the system boots from the default subvolume of the root file system. The exact actions depend on whether a number is provided or not:

Without a number, a first read-only snapshot of the default subvolume is created. A second read-write snapshot of the current system is created. The system is set to boot from the second snapshot.

With a number, a first read-only snapshot of the current system is created. A second read-write snapshot is created of number
. The system is set to boot from the second snapshot.>

Ok, fair enough, rollback created a new rw snapshot and since then this has been my new root. @/.snapshots/1/snapshot is now just another random snapshot.

BTRFS and snapper rock. But the whole “default root submodule” deal should have been mentioned in the Reference guide and error messages. Plus it would be great if snapper could “mark” the root snapshot somehow; just by setting the rollback snapshot description to “default submodule for / since [ISO 8601 date and time]” and displaying a sane error message like “Snapshot 151 is currently mounted to / and can’t be deleted” would greatly improve things.

Looks like I’m not the only one that struggled with this (checkout this thread for example).

I’m sticking with BTRFS and snapper for now, but I recon that it has a steep learning curve and can be very intimidating to a novice sysad.

nietgiftig · June 25, 2017, 9:48am

To steep for me, IMHO
Thanks to this kind of very good treads, I come to realise that BTRFS is not for me.

It adds a level of difficulty that it is not for every inexperienced user.
And in a time where many windows users are seeking for a way to escape their OS, it creates a unwanted extra level of difficulty for people who are willing to make the jump

After choosing which distribution AND choosing what Desktop then you will be using as novice, you also have the more or less hidden feature with the by default on BTRFS file system, when you choose for openSuse.

I think the openSuse ecosystem is very good, but the choice for this “default” is not.
An extra paragraph explaining the feature on the choose/download or startup page with pros and cons would be nice.
Or a easy to make choice in the install app.
At least novice people should be aware of this on by default feature so they could make a choice.

glistwan · June 26, 2017, 6:39am

nietgiftig:

To steep for me, IMHO
Thanks to this kind of very good treads, I come to realise that BTRFS is not for me.

It adds a level of difficulty that it is not for every inexperienced user.
And in a time where many windows users are seeking for a way to escape their OS, it creates a unwanted extra level of difficulty for people who are willing to make the jump

After choosing which distribution AND choosing what Desktop then you will be using as novice, you also have the more or less hidden feature with the by default on BTRFS file system, when you choose for openSuse.

I think the openSuse ecosystem is very good, but the choice for this “default” is not.
An extra paragraph explaining the feature on the choose/download or startup page with pros and cons would be nice.
Or a easy to make choice in the install app.
At least novice people should be aware of this on by default feature so they could make a choice.

I’m with you on this. I believe the default should be ext4 or XFS. I don’t think I will ever be using BTRFS on a personal PC.
I can see it being nice choice for a personal NAS for example but I don’t think the features would be ever useful for me on desktop.

hcvv · June 26, 2017, 10:08am

Please everybody,

This is a technical How To for those who use Btrfs. This is NOT a discussion thread about the values of Btrfs in general or it being a default in particular. Please stay on topic. When some body wants to have a discussion about the pros and cons of Btrfs, please start a thread in General Chitchat.

shwaybotx · January 17, 2018, 6:06am

Wow. This is a post that keeps getting read and keeps helping people like me! Here I was doing all this btrfs studying and wondering “Should I switch?” Then I came across this article and I was thinking, “Gee, btrfs seems to be an unneeded hassle.”

Thank you so much for taking the time to share your research, experience and conclusion to stick with ext4. That takes a huge burden off my shoulders and I’ll just happily go on in my ext4 world and worry about other stuff now. I don’t have to worry about switching to btrfs anymore.