Kernel panic in BTRFS - what should i do?

vovams · October 9, 2017, 2:07pm

I have a strange problem with my BTRFS filesystem, which i can reliably reproduce, but i think i need an advice on what to do about it.

First, here is my setup. openSUSE Tumbleweed 20171006 (except kernel version), BTRFS with Snapper on / (80GiB, 36GiB free), a second BTRFS partition mounted in a /mnt sub-directory, Samsung SSD 750 EVO 500GB. I also have a second HDD with multiple filesystems, including BTRFS, but it is not used and no partitions are mounted.

The problem: whenever i try to upgrade kernel using zypper dup, the system crashes (kernel panic). If there are multiple packages to upgrade, they install normally, until installation of kernel-default package starts, at which point the kernel panic occurs. When i block kernel update (zypper addlock “kernel-*”), all other updates install normally. When i re-enable kernel updates and try to update it, kernel panic occurs immediately. Here is how the crash message looks like:

https://drive.google.com/file/d/0B7WOH-iVHUA1WEg5N1RudklFSzg/view?usp=sharing
https://drive.google.com/file/d/0B7WOH-iVHUA1cFFRRkFrV2lPdHc/view?usp=sharing

As you can see, i am on Linux 4.13.3-1-default. The problem happened for the first time when i was upgrading from the last 4.12.x to 4.13.x (i.e. when i was still running 4.12). At that time i tried again several times and was able to complete the upgrade. Starting the update process from a tty (not from terminal emulator inside X) seemed to help at that point. But with every new kernel upgrade it required more and more retries, and now i am stuck on 4.13.3 and can not upgrade to 4.13.4 which is now available.

Whenever this happens i do a snapper rollback to the last pre snapshot, and the system seems to be fully stable after that (except that i can’t upgrade kernel, of course). Before i do the rollback, it looks like the system may crash with a similar message (also in BTRFS) randomly, even if not doing anything.

The first time it happened, i tried to check the root filesystem with btrfs check --check-data-csum , but it did not report any problems, except for some quota related mismatches, but this is normal as far as i understand (or am i wrong?). Also the SSD is only around 2 months old and, again, everything except kernel updates works fine, so i do not think it is the problem.

So, what should i do? I think i can make the problem go away, if i backup the whole content of the root partition, recreate it and restrore the backup. But that is some work and i am not sure whether it will actually help, because btrfs check does not report any problems. Also, since it is a kernel bug, i would be glad if i can help to fix it.

malcolmlewis · October 9, 2017, 2:23pm

Hi and welcome to the Forum
First off you need to remove the stuff that is tainting the kernel (eg vbox), to exclude any issue with that, if a bug gets raised it may be declined. Is the SSD firmware all up to date. Then suggest move to latest kernel (4.13.4) and see if it occurs.

xorbe · October 11, 2017, 8:13pm

That’s his problem, 4.13.3 dies when trying to install 4.13.4 …

vovams · October 11, 2017, 9:01pm

Yeah, that is a problem. I know i can use a recent Tumbleweed live USB to boot my system with 4.13.4 kernel, but i am not sure how. I can pass the root= to live USB kernel, but i do not know what to do with initrd.

As for VirtualBox modules, ok, i will try disabling them temporarily. Also there is ecryptfs module, i am not sure how much official it is. I have my /home/vovams protected with it.

eng-int · October 14, 2017, 7:15pm

The latest kernel is now 4.13.5-1-default from tumbleweed:20171010.
I suggest trying “zypper dup” again and make sure that you use the current version.
Apart from your latest working kernel (4.23.3), you should be able to choose the previous working kernel from grub. What happens if you try to upgrade from that one?