Btrfs crash

Hi folks,

I have a crashed Btrfs root file system after a hard power-off. No data are at stake (I have backups), will probably end up wiping and reinstalling from scratch, but am curious to learn how I might go about diagnosing/fixing something like this, and also just understanding what went wrong. Might be of interest to other Btrfs learners out there, too.

Background info: it’s a Toshiba Satellite CL10-B-100 netbook running openSUSE 13.2 and KDE. Internal storage is an eMMC device partitioned as /boot, /boot/efi, and then a LUKS encrypted partition containing an LVM2 volume group, which in turn contains a root (Btrfs) and a swap volume. I know what I’m doing with LVM2-on-LUKS, and that part is running fine.

Also as background info: the eMMC device throws these, starting at boot time and then onwards from time to time:

    5.147237] mmc0: Got command interrupt 0x00000001 even through no command ope
ration was in progress.

These have appeared for many months now with the system working stably (prior to today!), so I have been disregarding them. But if anyone can explain to me what these mean…?

What happens: I boot the laptop, GRUB2 opens fine, I boot into openSUSE, I get the prompt for the LUKS passphrase, then the screen goes black and the thing hangs, hard power-off being the only way out. If I select recovery mode from GRUB2, then it likewise prompts for LUKS, then continues up to “Starting Show Plymouth Boot Screen…” and crashes there.

So I boot from liveusb, open LUKS, activate LVM2, and try to mount the root filesystem:

# cryptsetup luksOpen /dev/mmcblk0p3 mycrypt
# lvm vgchange -a y vg0
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
# mkdir /mnt/root
# mount -t btrfs /dev/vg0/lvroot /mnt/root
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg0-lvroot,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

(Disregard the lvmetad warning.)
So that’s not working. But, there is a Btrfs file system still there:

# btrfs fi show /dev/vg0/lvroot
Label: none  uuid: 50f6c28f-f2db-4d0a-99a5-17ce7df3b053
        Total devices 1 FS bytes used 15.11GiB
        devid    1 size 24.43GiB used 23.54GiB path /dev/mapper/vg0-lvroot

I ran a “btrfs check /dev/vg0/lvroot > btrfscheck.txt 2>&1”, and I really needed to capture that into a file because it spits out 15MB of errors (ouch).

checking extents
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
...
Csum didn't match
...
ref mismatch on [20987904 16384] extent item 0, found 1
Backref 20987904 parent 3 root 3 not found in extent tree
backpointer mismatch on [20987904 16384]
owner ref check failed [20987904 16384]
...
...
...
Errors found in extent allocation tree or chunk allocation
checking free space
checksum very failed on 701562880 found 1C888086 wanted F35A8BA5
...
Csum didn't match
...
cache appears valid but isnt 29360128
Checking filesystem on /dev/vg0/lvroot
UUID: 50f6c28f-f2db-4d0a-99a5-17ce7df3b053
found 726515712 bytes used err is -22
total csum bytes: 0
total tree bytes: 8175616
total fs tree bytes: 7979008
total extent tree bytes: 98304
btree space waste bytes: 3970461
file data blocks allocated: 0
 referenced 0
Btrfs v3.16.2+20141003

The “…” indicate more lines similar to those quoted. The bulk of the 15MB of output from btrfs check is made up those quartets that start with “ref mismatch” and end on “owner ref check failed”.

I’m aware of Marc Merlin’s excellent blog, anyone looking for how to recover their stuff might have a look here:
http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html

I seem to be able to get most of everything back with:

btrfs restore -vv /dev/vg0/lvroot /mnt/root

But, I’m still stuck on:

  1. How did this happen? Can anyone point at anything stupid I might have done to cause this? Surely Btrfs should not die from a hard power-off?
  2. Is there a way of bringing this file system back to life? As the data are backed up, I’m open to any attempts, including potentially destructive ones.

All thoughts, ideas & advice are welcome!
K.

Hi
Have a read through here;
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ

Did you try to zero the btrfs log (btrfs-zero-log)?

# btrfs-zero-log /dev/vg0/lvroot
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found D761F540 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
Csum didn't match
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found D761F540 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
Csum didn't match
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found D761F540 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
Csum didn't match
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found D761F540 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
Csum didn't match
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found D761F540 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
Csum didn't match
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found D761F540 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
Csum didn't match
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
checksum verify failed on 701562880 found D761F540 wanted F35A8BA5
checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5
Csum didn't match
extent-tree.c:2711: alloc_reserved_tree_block: Assertion `ret` failed.
btrfs-zero-log[0x40c589]
btrfs-zero-log[0x40d9c7]
btrfs-zero-log[0x40da87]
btrfs-zero-log[0x411139]
btrfs-zero-log[0x4036dd]
btrfs-zero-log[0x403d1f]
btrfs-zero-log[0x4087fe]
btrfs-zero-log[0x40a2d7]
btrfs-zero-log[0x402842]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fc54ce50b05]
btrfs-zero-log[0x402953]

Doesn’t look too good; I subsequently tried mounting with “-o recovery,ro” but no luck.

Any other ideas?

Hi
Since you have a backup, then try the “What if I don’t have wipefs at hand?” in the btrfs link;

But first go through each partition and check;


wipefs /dev/sdX
wipefs /dev/sdXn

Then in the example change the /dev/sda to /dev/sdXn that matches your btrfs partition.

Wipefs said what it was supposed to (“btrfs filesystem” and the UUID), I wiped the superblock “magic strings” using dd as per those instructions, but no difference.

I’m calling it a night… thanks for the help, I don’t know how much more time I’ll have for this but will post back if I come across anything interesting.

Cheers, have a good 2016!
K.