I have a crashed Btrfs root file system after a hard power-off. No data are at stake (I have backups), will probably end up wiping and reinstalling from scratch, but am curious to learn how I might go about diagnosing/fixing something like this, and also just understanding what went wrong. Might be of interest to other Btrfs learners out there, too.
Background info: it’s a Toshiba Satellite CL10-B-100 netbook running openSUSE 13.2 and KDE. Internal storage is an eMMC device partitioned as /boot, /boot/efi, and then a LUKS encrypted partition containing an LVM2 volume group, which in turn contains a root (Btrfs) and a swap volume. I know what I’m doing with LVM2-on-LUKS, and that part is running fine.
Also as background info: the eMMC device throws these, starting at boot time and then onwards from time to time:
5.147237] mmc0: Got command interrupt 0x00000001 even through no command ope ration was in progress.
These have appeared for many months now with the system working stably (prior to today!), so I have been disregarding them. But if anyone can explain to me what these mean…?
What happens: I boot the laptop, GRUB2 opens fine, I boot into openSUSE, I get the prompt for the LUKS passphrase, then the screen goes black and the thing hangs, hard power-off being the only way out. If I select recovery mode from GRUB2, then it likewise prompts for LUKS, then continues up to “Starting Show Plymouth Boot Screen…” and crashes there.
So I boot from liveusb, open LUKS, activate LVM2, and try to mount the root filesystem:
# cryptsetup luksOpen /dev/mmcblk0p3 mycrypt # lvm vgchange -a y vg0 WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it! # mkdir /mnt/root # mount -t btrfs /dev/vg0/lvroot /mnt/root mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg0-lvroot, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so.
(Disregard the lvmetad warning.)
So that’s not working. But, there is a Btrfs file system still there:
# btrfs fi show /dev/vg0/lvroot Label: none uuid: 50f6c28f-f2db-4d0a-99a5-17ce7df3b053 Total devices 1 FS bytes used 15.11GiB devid 1 size 24.43GiB used 23.54GiB path /dev/mapper/vg0-lvroot
I ran a “btrfs check /dev/vg0/lvroot > btrfscheck.txt 2>&1”, and I really needed to capture that into a file because it spits out 15MB of errors (ouch).
checking extents checksum verify failed on 701562880 found 1C888086 wanted F35A8BA5 ... Csum didn't match ... ref mismatch on [20987904 16384] extent item 0, found 1 Backref 20987904 parent 3 root 3 not found in extent tree backpointer mismatch on [20987904 16384] owner ref check failed [20987904 16384] ... ... ... Errors found in extent allocation tree or chunk allocation checking free space checksum very failed on 701562880 found 1C888086 wanted F35A8BA5 ... Csum didn't match ... cache appears valid but isnt 29360128 Checking filesystem on /dev/vg0/lvroot UUID: 50f6c28f-f2db-4d0a-99a5-17ce7df3b053 found 726515712 bytes used err is -22 total csum bytes: 0 total tree bytes: 8175616 total fs tree bytes: 7979008 total extent tree bytes: 98304 btree space waste bytes: 3970461 file data blocks allocated: 0 referenced 0 Btrfs v3.16.2+20141003
The “…” indicate more lines similar to those quoted. The bulk of the 15MB of output from btrfs check is made up those quartets that start with “ref mismatch” and end on “owner ref check failed”.
I’m aware of Marc Merlin’s excellent blog, anyone looking for how to recover their stuff might have a look here:
I seem to be able to get most of everything back with:
btrfs restore -vv /dev/vg0/lvroot /mnt/root
But, I’m still stuck on:
- How did this happen? Can anyone point at anything stupid I might have done to cause this? Surely Btrfs should not die from a hard power-off?
- Is there a way of bringing this file system back to life? As the data are backed up, I’m open to any attempts, including potentially destructive ones.
All thoughts, ideas & advice are welcome!