I have a strange problem with my BTRFS filesystem, which i can reliably reproduce, but i think i need an advice on what to do about it.
First, here is my setup. openSUSE Tumbleweed 20171006 (except kernel version), BTRFS with Snapper on / (80GiB, 36GiB free), a second BTRFS partition mounted in a /mnt sub-directory, Samsung SSD 750 EVO 500GB. I also have a second HDD with multiple filesystems, including BTRFS, but it is not used and no partitions are mounted.
The problem: whenever i try to upgrade kernel using zypper dup, the system crashes (kernel panic). If there are multiple packages to upgrade, they install normally, until installation of kernel-default package starts, at which point the kernel panic occurs. When i block kernel update (zypper addlock “kernel-*”), all other updates install normally. When i re-enable kernel updates and try to update it, kernel panic occurs immediately. Here is how the crash message looks like:
https://drive.google.com/file/d/0B7WOH-iVHUA1WEg5N1RudklFSzg/view?usp=sharing
https://drive.google.com/file/d/0B7WOH-iVHUA1cFFRRkFrV2lPdHc/view?usp=sharing
As you can see, i am on Linux 4.13.3-1-default. The problem happened for the first time when i was upgrading from the last 4.12.x to 4.13.x (i.e. when i was still running 4.12). At that time i tried again several times and was able to complete the upgrade. Starting the update process from a tty (not from terminal emulator inside X) seemed to help at that point. But with every new kernel upgrade it required more and more retries, and now i am stuck on 4.13.3 and can not upgrade to 4.13.4 which is now available.
Whenever this happens i do a snapper rollback to the last pre snapshot, and the system seems to be fully stable after that (except that i can’t upgrade kernel, of course). Before i do the rollback, it looks like the system may crash with a similar message (also in BTRFS) randomly, even if not doing anything.
The first time it happened, i tried to check the root filesystem with btrfs check --check-data-csum , but it did not report any problems, except for some quota related mismatches, but this is normal as far as i understand (or am i wrong?). Also the SSD is only around 2 months old and, again, everything except kernel updates works fine, so i do not think it is the problem.
So, what should i do? I think i can make the problem go away, if i backup the whole content of the root partition, recreate it and restrore the backup. But that is some work and i am not sure whether it will actually help, because btrfs check does not report any problems. Also, since it is a kernel bug, i would be glad if i can help to fix it.