tumbleweed kernel going wild - btrfs problem?

Hi,

After the upgrade few moments ago (tumblewed) something went wrong. Journalctl is filling up with messages like this one below for each CPU one by one. Dozens of these in a second. It seems like btrfs. I have a SSD disk.

Anyone any clue?

Nov 26 23:30:52 otux kernel: ------------ cut here ]------------
Nov 26 23:30:52 otux kernel: WARNING: CPU: 0 PID: 109 at …/fs/btrfs/delayed-inode.c:1410 btrfs_commit_transaction.part.21+0x432/0x9f0 btrfs
Nov 26 23:30:52 otux kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat bnep bluetooth fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat bridge stp llc nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 dns_resolver nfs lockd sunrpc fscache af_packet cfg80211 rfkill ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables ppdev iTCO_wdt iTCO_vendor_support e1000e x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
Nov 26 23:30:52 otux kernel: aes_x86_64 ptp snd_hda_codec_realtek snd_hda_codec_generic dm_mod lrw snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep gf128mul snd_pcm joydev glue_helper ablk_helper cryptd pcspkr ie31200_edac serio_raw snd_timer pps_core parport_pc tpm_infineon parport edac_core battery snd tpm_tis mei_me lpc_ich mei i2c_i801 mfd_core thermal soundcore tpm processor fan btrfs xor raid6_pq uas usb_storage ata_generic crc32c_intel ata_piix i915 i2c_algo_bit drm_kms_helper drm xhci_hcd video button sg
Nov 26 23:30:52 otux kernel: CPU: 0 PID: 109 Comm: kworker/u16:1 Tainted: G W 3.17.2-1-desktop #1
Nov 26 23:30:52 otux kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q77M-D2H, BIOS F1 04/19/2012
Nov 26 23:30:52 otux kernel: Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
Nov 26 23:30:52 otux kernel: 0000000000000009 ffffffff8162cd5b 0000000000000000 ffffffff8105e157
Nov 26 23:30:52 otux kernel: ffff8800c21caa20 ffff88003712c800 ffff8800b9cc74b0 0000000000000000
Nov 26 23:30:52 otux kernel: 0000000000000000 ffffffffa0281b42 ffff8800c21caa20 ffff88003712c800
Nov 26 23:30:52 otux kernel: Call Trace:
Nov 26 23:30:52 otux kernel: <ffffffff8100518e>] dump_trace+0x8e/0x350
Nov 26 23:30:52 otux kernel: <ffffffff810054f6>] show_stack_log_lvl+0xa6/0x190
Nov 26 23:30:52 otux kernel: <ffffffff81006bf1>] show_stack+0x21/0x50
Nov 26 23:30:52 otux kernel: <ffffffff8162cd5b>] dump_stack+0x49/0x6a
Nov 26 23:30:52 otux kernel: <ffffffff8105e157>] warn_slowpath_common+0x77/0x90
Nov 26 23:30:52 otux kernel: <ffffffffa0281b42>] btrfs_commit_transaction.part.21+0x432/0x9f0 [btrfs]
Nov 26 23:30:52 otux kernel: <ffffffffa02689a9>] flush_space+0x89/0x4b0 [btrfs]
Nov 26 23:30:52 otux kernel: <ffffffffa0268ef3>] btrfs_async_reclaim_metadata_space+0x123/0x1a0 [btrfs]
Nov 26 23:30:52 otux kernel: <ffffffff81074d43>] process_one_work+0x143/0x400
Nov 26 23:30:52 otux kernel: <ffffffff81075114>] worker_thread+0x114/0x470
Nov 26 23:30:52 otux kernel: <ffffffff81079f3d>] kthread+0xbd/0xe0
Nov 26 23:30:52 otux kernel: <ffffffff8163397c>] ret_from_fork+0x7c/0xb0
Nov 26 23:30:52 otux kernel: — end trace 06361d762ec2f330 ]—

Thanks, Ivan

Hi
Wrong forum, we have a separate one for this :wink:
https://forums.opensuse.org/forumdisplay.php/677-Tumbleweed-Evergreen-and-PreRelease-Beta

Will close and move, nntp users hold off until it’s moved…!

Hi
Thread moved and open for consumption…

Malcom, thanks for the move. I had to write fast because I presumed that something will crash soon, and it did. OS just became unresponsive. This is maybe not connected directly to tumbleweed, but to btrfs. There are other traces of similar ‘bugs’ on internet but without resolution btrfs_async_reclaim_metadata_space. It could be that my SSD is worn out on some sectors but btrfs shouldn’t see that. Or? SSD is 2 years old with /tmp and /var/log mounted on tmpfs.

Further investigating…
If this is the same bug Btrfs: reclaim the reserved metadata space at background then kernel is not patched or the bug is reintroduced. (I’m not that much of a kernel developer :confused: )

Thanks, Ivan

Hi
First thing you need to do is remove the drivers from tainting the kernel, then reproduce, else no one will look at a bug if you create one…

What does smartctl say about the SSD?

Hi again.

Yesterday system halted, and I left it OFF till now. Now, errors are gone. I don’t know how come it’s gone now.

Furthermore, @malcomlewis, your ‘tainted’ tip was something new to me, I never faced such an issue, so I went to learn what it is. Non proprietary drivers? I have NONE of proprietary drivers. I don’t need one. So I checked dmesg output with “dmesg | grep -i taint” and with “journalctl --dmesg | grep -i taint” and there is nothing in the kernel! So how come that kernel was ‘tainted’ yesterday? Well, checked yesterdays logs and voila…

journalctl -m --dmesg -b -1 | grep -i taint
Nov 26 23:25:58 otux kernel: CPU: 0 PID: 315 Comm: mount Not tainted 3.17.2-1-desktop #1
Nov 26 23:25:58 otux kernel: CPU: 0 PID: 315 Comm: mount Tainted: G        W      3.17.2-1-desktop #1
Nov 26 23:25:59 otux kernel: CPU: 2 PID: 109 Comm: kworker/u16:1 Tainted: G        W      3.17.2-1-desktop #1
Nov 26 23:25:59 otux kernel: CPU: 5 PID: 109 Comm: kworker/u16:1 Tainted: G        W      3.17.2-1-desktop #1
Nov 26 23:25:59 otux kernel: CPU: 1 PID: 109 Comm: kworker/u16:1 Tainted: G        W      3.17.2-1-desktop #1
Nov 26 23:25:59 otux kernel: CPU: 3 PID: 109 Comm: kworker/u16:1 Tainted: G        W      3.17.2-1-desktop #1

It was just yesterday! I checked all 33 boots I have since 2014-11-05 . So yesterday something called G W tainted the kernel?!?!

Here is the smartctl output:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       12398
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       239
177 Wear_Leveling_Count     0x0013   093   093   000    Pre-fail  Always       -       73
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   067   059   000    Old_age   Always       -       33
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       175
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       7365082920

And here are contents of journal folder.

otux:/var/log # l journal/ab68fbd510094e19b691b95e4b3038e7/total 598016
drwxr-sr-x  1 root systemd-journal      3002 Nov 27 18:53 ./
drwxr-sr-x  1 root systemd-journal        64 Sep 29 05:59 ../
-rwxr-xr-x  1 root systemd-journal  16777216 Nov 26 23:26 system@000508ca83c66567-64ffe61187efdb2c.journal~*
-rwxr-xr-x  1 root systemd-journal  83886080 Nov 27 18:53 system@000508dad24119b6-3d2e2d367d2d3e99.journal~*
-rwxr-xr-x  1 root systemd-journal   8388608 Nov 26 23:26 system@7e708eaedb8743b0a7740d6890c62773-0000000000000001-000508ca82edb947.journal*
-rwxr-xr-x  1 root systemd-journal   8388608 Nov 26 23:26 system@7e708eaedb8743b0a7740d6890c62773-000000000007ef8e-000508ca841b09d3.journal*
-rwxr-xr-x  1 root systemd-journal   8388608 Nov 26 23:27 system@7e708eaedb8743b0a7740d6890c62773-00000000000814b0-000508ca85afb7cf.journal*
-rwxr-xr-x  1 root systemd-journal  33554432 Nov 22 18:23 system@b0e6378fe7984d8db16b988a3dfc3014-0000000000000001-00050720c2a5e70a.journal*
-rwxr-xr-x  1 root systemd-journal  16777216 Nov 24 21:12 system@b0e6378fe7984d8db16b988a3dfc3014-000000000000d218-00050875d08b88bf.journal*
-rwxr-xr-x  1 root systemd-journal   8388608 Nov 26 21:56 system@b0e6378fe7984d8db16b988a3dfc3014-000000000003c4ad-000508a06cfbc855.journal*
-rw-r-----  1 root systemd-journal   8388608 Nov 27 18:54 system.journal
-rwxr-xr-x+ 1 root systemd-journal  50331648 Nov 26 23:26 user-1000@000508ca85ba488f-53961f53e488b991.journal~*
-rwxr-xr-x+ 1 root systemd-journal   8388608 Nov 26 23:27 user-1000@5d7adf742dd44ad4a978d1ecde0904d3-000000000008156c-000508ca85bd6091.journal*
-rwxr-xr-x+ 1 root systemd-journal  16777216 Nov 22 18:22 user-1000@c5b80ffd6fd74f8081ba05dd35c171a4-00000000000003b5-00050720c41311d5.journal*
-rwxr-xr-x+ 1 root systemd-journal 134217728 Nov 24 21:12 user-1000@c5b80ffd6fd74f8081ba05dd35c171a4-000000000000d227-00050875d08ba5cf.journal*
-rwxr-xr-x+ 1 root systemd-journal 134217728 Nov 26 21:56 user-1000@c5b80ffd6fd74f8081ba05dd35c171a4-000000000003bf39-000508a06baddd29.journal*
-rw-r-----+ 1 root systemd-journal   8388608 Nov 27 18:55 user-1000.journal
-rwxr-xr-x+ 1 root systemd-journal   8388608 Nov 26 23:31 user-478@000508dad26468c5-7c9df47e50091476.journal~*
-rwxr-xr-x+ 1 root systemd-journal   8388608 Nov 24 21:12 user-478@6706f7e5e2954fe3ab82e8c7cb0dd6c8-0000000000015c7e-0005087d008cacb6.journal*
-rwxr-xr-x+ 1 root systemd-journal   8388608 Nov 26 23:26 user-478@6706f7e5e2954fe3ab82e8c7cb0dd6c8-000000000007c4b9-000508c9eabb0830.journal*
-rwxr-xr-x+ 1 root systemd-journal   8388608 Nov 26 23:26 user-478@6706f7e5e2954fe3ab82e8c7cb0dd6c8-000000000007f2bd-000508ca843b7d10.journal*
-rwxr-xr-x+ 1 root systemd-journal   8388608 Nov 26 23:27 user-478@6706f7e5e2954fe3ab82e8c7cb0dd6c8-000000000008313d-000508ca86d34878.journal*
-rw-r-----+ 1 root systemd-journal   8388608 Nov 27 18:54 user-478.journal
-rwxr-xr-x+ 1 root systemd-journal   8388608 Nov 22 18:23 user-487@5f00ce1d2d4f417981370b9846cc9355-0000000000000374-00050720c2d8da04.journal*
-rwxr-xr-x+ 1 root systemd-journal   8388608 Nov 23 02:57 user-487.journal*



That is 584 MB of logs. And It’s a fresh system since 13.2 came out. I skimmed through logs and tracker is no1 log killer. Seems to me that logs are going on tmpfs :frowning:

So, although it’s OK now, I see this as BIG problem since BTRFS is THE filesystem on 13.2.

Best regards and thanks for help, Ivan

Hi
Interesting…

So your smart data;

  • 177 Wear_Leveling_Count, AFAIK (and it may be different for your device) when it gets down to 20, time to look at backing up your data and budgeting for a new drive…

  • 235 POR_Recovery_Count, this is remapping after a power failure, seems you have had a few?

Or is btrfs having an issue with your SSD? Is the firmware on the SSD up to date?

Note, I do have 3 systems running btrfs on SSD’s for over a year and have had no issues to date.

What scheduler are you running, any other tweaks with the default mounts?