Hard crash - root partition corrupted?

Hi all, this morning I had a nasty surprise.
I got to work and powered up my laptop (connected to the docking station) as usual. It booted normally and everything was well.
After a couple minutes however the computer crashed solidly. Nothing would work, no key combination would wake it up. Only the mouse pointer was moving but the system did not respond to clicks. I’ve never seen linux crash like this, unless when there are serious hardware problems.
I had to forcibly power off the laptop by pressing the power button for a long while (more than 4 secs). Upon restart, the system would not be able to reach a usable state.
It consistently failed to mount several mountpoints. I tried several times: using emergency mode, previous snapshots, etc. but in my heart I already knew the verdict: failing SSD :(.
So I downloaded the recovery image of Tumbleweed, burned it onto a USB drive, then booted from it.

Now the strange part: after booting from the USB drive, I can mount the /home partition (XFS) and read it without problems.
But as soon as I mount the root parition (btrfs) I get this from dmesg:

  +0.010535] ------------ cut here ]------------
  +0.000004] kernel BUG at fs/btrfs/relocation.c:1413!
  +0.000005] invalid opcode: 0000 #1] SMP PTI
  +0.000007] CPU: 2 PID: 2573 Comm: btrfs-balance Not tainted 5.1.7-1-default #1 openSUSE Tumbleweed (unreleased)
  +0.000003] Hardware name: LENOVO 20B7S43B00/20B7S43B00, BIOS GJET80WW (2.30 ) 10/20/2014
  +0.000037] RIP: 0010:create_reloc_root+0x1e8/0x1f0 [btrfs]
  +0.000004] Code: c7 85 dc 00 00 00 00 00 00 00 48 c7 85 e4 00 00 00 00 00 00 00 c6 85 ec 00 00 00 00 c6 85 ed 00 00 00 00 e9 17 ff ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 44 00 00 49 89 f9 48 89 d7 49 8b 01
  +0.000005] RSP: 0018:ffffb2d581377a00 EFLAGS: 00010282
  +0.000003] RAX: 00000000ffffffef RBX: ffff9b8890cc7000 RCX: ffff9b8882de2287
  +0.000003] RDX: 000000000000000c RSI: ffff9b8890c65e00 RDI: 0000000000000286
  +0.000002] RBP: ffff9b88946ebe00 R08: ffff9b888e5765e0 R09: 0000000000000001
  +0.000003] R10: 0000000000000002 R11: 0000000000000000 R12: ffff9b882299de38
  +0.000003] R13: fffffffffffffff7 R14: ffff9b888e58c000 R15: ffff9b888e58c000
  +0.000003] FS:  0000000000000000(0000) GS:ffff9b8897480000(0000) knlGS:0000000000000000
  +0.000003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  +0.000002] CR2: 00007fb63000c3b8 CR3: 000000019920e003 CR4: 00000000001606e0
  +0.000003] Call Trace:
  +0.000034]  btrfs_init_reloc_root+0x5b/0xb0 [btrfs]
  +0.000028]  record_root_in_trans+0xae/0xe0 [btrfs]
  +0.000026]  btrfs_record_root_in_trans+0x4f/0x70 [btrfs]
  +0.000024]  start_transaction+0xa5/0x480 [btrfs]
  +0.000026]  __btrfs_prealloc_file_range+0xaa/0x460 [btrfs]
  +0.000022]  ? generic_bin_search.constprop.0+0xdf/0x180 [btrfs]
  +0.000028]  btrfs_prealloc_file_range+0x10/0x20 [btrfs]
  +0.000030]  prealloc_file_extent_cluster+0x115/0x220 [btrfs]
  +0.000031]  relocate_file_extent_cluster+0x9a/0x570 [btrfs]
  +0.000031]  relocate_data_extent+0x81/0xd0 [btrfs]
  +0.000039]  relocate_block_group+0x26c/0x620 [btrfs]
  +0.000026]  btrfs_relocate_block_group+0x156/0x2f0 [btrfs]
  +0.000026]  btrfs_relocate_chunk+0x31/0xa0 [btrfs]
  +0.000025]  __btrfs_balance+0x3ef/0x9d0 [btrfs]
  +0.000038]  btrfs_balance+0x279/0x460 [btrfs]
  +0.000028]  ? btrfs_balance+0x460/0x460 [btrfs]
  +0.000035]  balance_kthread+0x35/0x50 [btrfs]
  +0.000004]  kthread+0x117/0x130
  +0.000003]  ? kthread_associate_blkcg+0x90/0x90
  +0.000005]  ret_from_fork+0x3a/0x50
  +0.000003] Modules linked in: xfs fuse af_packet xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter msr snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel intel_rapl uvcvideo iTCO_wdt x86_pkg_temp_thermal intel_powerclamp coretemp arc4 mei_hdcp iTCO_vendor_support kvm_intel iwlmvm kvm btusb mac80211 rmi_smbus rmi_core snd_hda_codec snd_hda_core videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev btrtl iwlwifi snd_hwdep irqbypass videobuf2_common btbcm snd_pcm thinkpad_acpi btintel cdc_acm snd_timer bluetooth ledtrig_audio cfg80211 rtsx_pci_ms snd ecdh_generic mei_me i2c_i801 pcspkr soundcore memstick mei rfkill joydev wmi_bmof
  +0.000030]  intel_rst lpc_ich thermal battery ac pcc_cpufreq overlay nls_iso8859_1 nls_cp437 vfat fat squashfs btrfs xor cdc_mbim cdc_ncm usbnet cdc_wdm mii uas usb_storage raid6_pq crct10dif_pclmul libcrc32c crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci_sdmmc mmc_core i915 aesni_intel i2c_algo_bit drm_kms_helper aes_x86_64 crypto_simd cryptd glue_helper syscopyarea sysfillrect xhci_pci sysimgblt fb_sys_fops serio_raw xhci_hcd scsi_transport_iscsi ehci_pci ehci_hcd drm e1000e rtsx_pci usbcore ptp pps_core wmi video button sunrpc dm_mirror dm_region_hash dm_log loop sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
  +0.000070] --- end trace 8f686a26df6f7435 ]---
  +0.000029] RIP: 0010:create_reloc_root+0x1e8/0x1f0 [btrfs]
  +0.000005] Code: c7 85 dc 00 00 00 00 00 00 00 48 c7 85 e4 00 00 00 00 00 00 00 c6 85 ec 00 00 00 00 c6 85 ed 00 00 00 00 e9 17 ff ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 44 00 00 49 89 f9 48 89 d7 49 8b 01
  +0.000016] RSP: 0018:ffffb2d581377a00 EFLAGS: 00010282
  +0.000004] RAX: 00000000ffffffef RBX: ffff9b8890cc7000 RCX: ffff9b8882de2287
  +0.000003] RDX: 000000000000000c RSI: ffff9b8890c65e00 RDI: 0000000000000286
  +0.000002] RBP: ffff9b88946ebe00 R08: ffff9b888e5765e0 R09: 0000000000000001
  +0.000003] R10: 0000000000000002 R11: 0000000000000000 R12: ffff9b882299de38
  +0.000004] R13: fffffffffffffff7 R14: ffff9b888e58c000 R15: ffff9b888e58c000
  +0.000003] FS:  0000000000000000(0000) GS:ffff9b8897480000(0000) knlGS:0000000000000000
  +0.000004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  +0.000003] CR2: 00007fb63000c3b8 CR3: 000000019920e003 CR4: 00000000001606e0

Parition is apparently correctly mounted, but as soon as I try to access it, the process trying to read it hangs and there’s no way to revive it or to kill it.
If, instead of mounting it, I run “btrfs check /dev/sda5” I get a normal report (i.e. no errors). Running SMART short self-test yields no problems, too.

Does anybody have any clue?
Should I report a bug somewhere? If so, where? OpenSUSE bug tracker or btrfs bug-tracker (provided there is one)?

Does somebody have any idea of how to recover the data? /etc would be more than enough for me.

Thank you in advance
Cris

I had a USB drive lying around with systemrescuecd 5.3.1 on it.
I tried booting with that USB drive and then mounting the root partition. Everything is apparently working smoothly.
Kernel on that ISO image is v4.14.70.

I am now able to extract the data I need to build a new disk.

However, the question remains: what caused the problem on Tumbeweed?

Thank you in advance
Cris

When you had it mounted during rescue, did you check for freespace? Snapshotting could have filled it up.

Hi mrmazda!

No, I always keep a close eye on the free space of the root partition, after I was bitten the first time a few years ago.
That partition has 12 Gb free.

Thank you!
Cris

Hi
Is it a Samsung SSD? If so, there was a trim bug with these devices and the earlier 5.1 kernel… maybe it’s still present…
http://forums.opensuse.org/showthread.php?t=536060

Hi Malcolm

Yes, it is a Samsung!! You could actually be right!!
I remember seeing that message, that initially scared me quite a lot (I also have a Samsung SSD in my desktop computer).
But after reading the complete thread I thought I would not have any problem because I do not use LVM and I do not have encrypted btrfs disks.
But maybe this is not enough?

Thank you
Cris

Hi
Maybe a regression??? Might be worth a bug report?
openSUSE:Submitting bug reports - openSUSE

“Invalid opcode: 0000” is always code bug. This needs bug report with at least complete stack trace.

I opened a bug report. I hope it’s complete enough.

Cris

Here is an explanation and a workaround.
Well… it explains why the system is not booting anymore, but it does not explain the reason for the hard lockup. Maybe there is some other bug at play here.

Cris

Hi
I just think you may have been hit by the described bug… best wait and see if one of the kernel folks respond…

Adding link to this commit to your bug report may expedite it.

Did you mean “comment”?
This is exactly how I discovered about it in the first place: a folk added a comment about the reddit thread in my bug report.

Cris