Leap 15.1 does not boot, btrfs corrupted

Hardware: Lenovo Thinkpad X1 Tablet 3rd generation, 500GB SSD
Software: Dual boot with preinstalled Windows 10, OpenSUSE Leap 15.1

The device was bought 3 months ago, Linux was installed 2 months ago. It is, hence, a recent installation. Since date of purchase Lenovo updated Bios and firmware frequently.

Linux is installed on partition /dev/nvme0n1p5 formatted with btrfs, swap-space is on /dev/nvme0n1p6.
/dev/nvme0n1p1 to /dev/nvme0n1p4 is for Windows.

How the failure occurred:
While working under Linux suddenly some error messages appeared complaining that it was impossible to write data into the /home directory.
After shutdown and reboot Linux could not be started any more. The boot manager starts regularly and it is possible to boot Windows. However, if Linux is selected booting stops with the following error messages:

BTRFS critical (device nvme0n1p5): corrupt leaf: root=2 block=24961368064 slot=192, bad key order, prev (19392573440 168 4096) current (2038972416 0 4096)

BTRFS error (device nvme0n1p5): failed to read block groups: -5

BTRFS error (device nvme0n1p5): open_ctree failed

The output of journalctl contains the following error message:

sysroot.mount: Mount process exited, code=exited status=32
failed to mount /sysroot

I booted a rescue system from the installation DVD and tried the following but without success:

btrfs rescue super-recover /dev/nvme0n1p5
and
btrfs rescue chunk-recover /dev/nvme0n1p5

Both went through without any error messages but it did not help, booting Linux is still impossible.

The I executed the following command:

btrfs check --repair /dev/nvme0n1p5

This did not help either and terminated with the following error message:
bad key ordering 191 192
ERROR: Cannot open file system
This means: not even the maintenance programs cann access the partition.

I am really at the end of my knowledge. Does anybody have an idea what else I can try to get things running again (besides a complete reinstallation)?

Many thanks in advance.

Looks to me that the BTRFS fle system is corrupted beyond recovery. Id boot from a live media and run smartctl on the device to see if it is some how damaged.

IMO NVMe devices are a bit wobbly.

Hi, and welcome to the forums!

I can only guess, because I don’t have any experience with btrfs. I have ext4 for / and /home.

What you describe seems to be, that things worked well for a while, and then, at a certain point, even while the system was up, things didn’t work anymore.

So one reason of that may be that you ran out of disk space for /home, i.e. that your /home was filled up.

Could you check this?

Besides, /home isn’t formatted btrfs by default, as far as I experienced during fresh installs of openSUSE.

I think default install now for BTRFS is to put/home on a sub volume off the main BTRFS root tree not on a separate partition. You can of course override defaults and I think if you use EXT4 for root the defaults may go back to separate root and home partitions. But I always take control of the install and put what I want where I want it.

And checking for disk usage is possible in two quite easy ways:

  1. use gparted, which can be installed from the openSUSE repos or run from a live gparted

  2. use the command “df” (no sudo required) after you mounted your /home, even if you booted from an external device.

Generally, BTRFS is self-healing.
But, when experiencing problems like what is described, you are strongly encouraged to run, don’t walk to the BTRFS support channels… which are a mailing list and IRC on the following page

https://btrfs.wiki.kernel.org/index.php/Main_Page

Many people have written that the help you get directly from the BTRFS team is responsive and excellent.

TSU

Well, I did not receive any response until now and I have the distinct impression that there will be no answer in the future. Seems that also the luminaries from the btrfs mailing list don’t have any idea as to how to access the broken btrfs-partition.

Anyway, among all the expert discussions about the code to implement new features and to repair bugs there was a mail in some way addressing the problem I encountered. I do quote it here:

There are quite a lot btrfs extent tree corruption report in the mail
list.
Since btrfs will do mount time block group item search, one corrupted
leaf containing block group item will prevent the whole fs to be
mounted.

This patchset will try to address the problem by introducing a new mount
option, “rescue=skipbg”, as a last-resort rescue.
With “rescue=skipbg”, the whole extent tree will be skipped if we hit
some problems at mount time.
This brings some side effect that for super large fs, the mount time can
be hugely reduced by this mount option.

Of course this option will have a lot of restrictions to prevent further
screwing up the fs, including:

  • Permanent RO
    No remount rw is allowed
  • No dirty log
    Either clean the log or use rescue=nologreplay mount option

This “rescue=skipbg” has some advantage compared to user space tool
like “btrfs-restore”:

  • Unified recovery tool
    User can use any tool they’re familiar with, as long as the kernel
    doesn’t panic.
  • More info for subvolume.
    “btrfs subv list” can work now!

Also move the following mount options to “rescue=” group:

  • nologreplay
    to rescue=nologreplay
  • usebackuproot
    to rescue=usebackuproot

Old options are still available for compatibility purpose, but they are
deprecated in favor of new ‘rescue=’ super option.

Different rescue sub options can be separated by ‘:’, like:
“rescue=nologreplay:skipbg:usebackuproot”.
Or the traditional but longer way like:
“rescue=nologreplay,rescue=skipbg”

The separation character is chosen by:

  • No conflicts with existing character
    Especially no conflict with ‘,’.
  • No extra escaping/quota
    Original plan is ‘;’, but since it’ll be interpreted by bash, it’s
    changed to current ‘:’.

It seems, I have to do a complete reinstall. But this time I am going to use ext4 instead of btrfs. btrfs still seems to be too buggy.

Yes to ext4. While some regular posters here report positive experiences with btrfs, there are just too many cases of btrfs problems with high load, memory usage, sudden unmounts while booting, haywire kernel tasks like btrfs daemons/helpers/scrubbers and whatnot, all the way to total data loss. You just don’t notice ext4, it’s been working along silently and with next-to-none overhead in servers, desktop PCs and millions (if not billions) of Android smartphones.

I admire the advanced concepts in filesystems like JFS, XFS, ZFS and btrfs, but I still choose trusty ol’ ext4 unless I absolutely, positively and unequivocally need said advanced features. Cheers!