Leap 42.1 XFS corrupted file problem

I booted my ASUS M5A78L-M/USB3 this morning (16 gig ram and FX8320 CPU) and got the following message:

Welcome to emergency mode! After logging in, type “journalct1 -xb” to vie
system logs, “systemct1 reboot” to reboot, “systemct1 default” to try again
to boot into default mode
Give root password for maintenance
(or press contol-D to continue):
XFS (sda3): Metadata corruption detected at_write_verify+0xd5/0xe0 [xfs], block 0x113e04a8
XFS (sda3): Unmount and run xfs_repair
XFS (sda3): First 64 bytes of corrupted metadata
ffff88041858d000: 00 00 00 00 00 00 00 00 3d f1 00 00 6c 5b c2 e6
ffff88041858d010: 00 00 00 00 11 3e 04 a8 00 00 00 08 00 03 98 6b
ffff88041858d020: fe 46 ee 45 ce 25 48 31 ae 52 c0 55 27 cf 6d 44
ffff88041858d030: 00 00 00 00 10 00 00 42 00 52 c0 03 00 00 00 00
XFS (sda3): Corruption of in-memory data detected. Shutting down filesystem
XFS (sda3): Please unmount the filesystem and rectify the problem(s)

I’m new to XFS and don’t know how to handle the corrupted partition without a reload of the system. Some guidance would be appreciated. I checked all the parts of the machine that I could with no faults found. the boot partition seems to work fine since the machine does boot at least to the point outlined above.

If you are in a terminal run fsck /dev/sda3

If that does not correct it you may need to restore from backup.

Also run smartctl /dev/dsa to see if there is a problem with the disk itself.

Ran fsck /dev/sda3

result was:
fsck from util-linux 2.25 If you wish to check the consistency of an XFS file system or repair a damaged filesystem, see xfs_repair(8) Like to know hat the (8) points to.

Saw the list of switches on the help screen

Ran smartctl /dev/sda3 result was:

smartctl 6.2 2013-11-07 r3856 [x86_64-linux-4.1.27-24-default] (SUSE RPM)
Copywright (C) 20002-13, Bruce Allen, Christian Franke, www.smartmontools.org
ATA device successfully opened Use ‘smartctl -a’ (or ‘-x’) to print SMART (and more) information

Please follow the dancing ball and sing along with us:

Which means:

  1. Type the password of the “root” user; blind (no stars or other visual help); of course <Carriage-Return> at the end but, absolutely nothing else in between!!!
  2. Unmount sda3.
  3. Run “xfs_repair sda3”.
  4. Reboot.

My situation appears to be potentially related, but not certain… BTW in reference to the post from dcurtisfra, tried those steps you listed without any luck.

While looking around to see if I could find details, I tripped across this thread as well as one on Canonical’s bug list site.

Not sure if (or how much) this thread is related to another bug referenced on Ubuntu/Canonical, but as it might be, here is what I ran into. Had complaints that the squid proxy was not working for several machines on the network, so I investigated the Leap 42.1 box I have handling proxy services. It would ping and ports would respond to Nagios TCP checks (service checks were faulting), but I couldn’t log in and none of the services on the box were responsive. I power cycled the machine and came up to the emergency/maintenance recovery login. Once in the maintenance mode I determined the /var file system was corrupted and any attempt to mount it, xfs_repair, etc. had the effect of hanging the system indefinitely requiring another power cycle to recover. Booting from the Leap USB stick, I was able to get a little further, but was unsuccessful in getting /var back. I had tried mounting readonly with norecover but it still refused. Flushing the log/metadata with the -L option to xfs_repair was the only way to get past the problem. (I have backups of the logs from the night before, so just lost a little syslog data from some other systems, not a big issue here).

Once I had /var mounted, I was able to look at the messages in the log which referred to _xfs_buf_find: Block out of range errors. These occurred when logrotate was trying to swap logs around. It was still writing logs against my main system messages file (I use syslog_ng vs systemd journal logging) up until the point I power cycled the system so I guess it had sufficient allocation on that file without requesting more. As squid had stopped working, along with logins, etc, I expect the /var file system failure was preventing opening and/or writing other logs (thus appearing locked up).

Best I can tell based on the SMART results (I have smartd running) and lack of any other kernel warnings of disk failures, this does not appear to have been a disk failure. The Call Trace in the Ubuntu bug listed a grow_inode call, which was not present in the Call Trace from my crash, so while they both list the block out of range fault, they may not be related…

Not sure if this is an XFS bug, bad karma, etc, but figured I’d share what I have.

Here’s the dump information from one entry. There were a total of six dumps within several seconds, all referencing the same PID (logrotate).

Oct 29 01:00:05 shadows kernel: XFS (dm-9): _xfs_buf_find: Block out of range: block 0x7fffffff8, EOFS 0x1000000 Oct 29 01:00:05 shadows kernel: [665260.471535] XFS (dm-9): _xfs_buf_find: Block out of range: block 0x7fffffff8, EOFS 0x1000000
Oct 29 01:00:05 shadows kernel: [665260.471581] ------------ cut here ]------------
Oct 29 01:00:05 shadows kernel: [665260.471626] WARNING: CPU: 3 PID: 4863 at …/fs/xfs/xfs_buf.c:473 _xfs_buf_find+0x2a1/0x2f0 xfs
Oct 29 01:00:05 shadows kernel: [665260.471627] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x
_tables nf_nat nf_conntrack br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop af_packet iscsi_ibft iscsi_boot_sysfs joydev hid_generic usbhid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_code
c_generic intel_rapl snd_hda_intel snd_hda_controller i915 snd_hda_codec snd_hda_core x86_pkg_temp_thermal snd_hwdep intel_powerclamp coretemp video snd_pcm drm_kms_helper iTCO_wdt iTCO_vendor_support snd_timer gpio_ich snd soundcore
i2c_i801 drm mei_me kvm mei e1000e lpc_ich ptp mfd_core pps_core serio_raw i2c_algo_bit crct10dif_pclmul ppdev parport_pc tpm_tis tpm 8250_fintek pcspkr processor parport wmi button crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul g
lue_helper ablk_helper cryptd xfs libcrc32c crc32c_intel sr_mod cdrom ehci_pci ehci_hcd usbcore usb_common dm_mod sg
Oct 29 01:00:05 shadows kernel: [665260.471680] CPU: 3 PID: 4863 Comm: logrotate Not tainted 4.1.31-30-default #1
Oct 29 01:00:05 shadows kernel: [665260.471682] Hardware name: LENOVO 7005AK8/ , BIOS 9HKT46AUS 12/15/2011
Oct 29 01:00:05 shadows kernel: [665260.471684] 0000000000000286 0000000000000000 ffffffff8165ef0d 0000000000000000
Oct 29 01:00:05 shadows kernel: [665260.471687] 0000000000000000 ffffffffa016c24c ffffffff81068961 ffff88042a2b3340
Oct 29 01:00:05 shadows kernel: [665260.471689] 0000000000000008 00000007fffffff8 0000000000000000 0000000000000001
Oct 29 01:00:05 shadows kernel: [665260.471692] Call Trace:
Oct 29 01:00:05 shadows kernel: [665260.471704] <ffffffff810055cc>] dump_trace+0x8c/0x340
Oct 29 01:00:05 shadows kernel: [665260.471709] <ffffffff8100597c>] show_stack_log_lvl+0xfc/0x1a0
Oct 29 01:00:05 shadows kernel: [665260.471712] <ffffffff81006ec1>] show_stack+0x21/0x50
Oct 29 01:00:05 shadows kernel: [665260.471717] <ffffffff8165ef0d>] dump_stack+0x5d/0x79
Oct 29 01:00:05 shadows kernel: [665260.471722] <ffffffff81068961>] warn_slowpath_common+0x81/0xb0
Oct 29 01:00:05 shadows kernel: [665260.471747] <ffffffffa012ab91>] _xfs_buf_find+0x2a1/0x2f0 [xfs]
Oct 29 01:00:05 shadows kernel: [665260.471773] <ffffffffa012ac07>] xfs_buf_get_map+0x27/0x2c0 [xfs]
Oct 29 01:00:05 shadows kernel: [665260.471800] <ffffffffa0158871>] xfs_trans_get_buf_map+0x131/0x1e0 [xfs]
Oct 29 01:00:05 shadows kernel: [665260.471826] <ffffffffa0102abc>] xfs_btree_get_bufs+0x4c/0x60 [xfs]
Oct 29 01:00:05 shadows kernel: [665260.471845] <ffffffffa00ebaa9>] xfs_alloc_fix_freelist+0x179/0x410 [xfs]
Oct 29 01:00:05 shadows kernel: [665260.471863] <ffffffffa00ec4e8>] xfs_free_extent+0x88/0x110 [xfs]
Oct 29 01:00:05 shadows kernel: [665260.471886] <ffffffffa0126d27>] xfs_bmap_finish+0x137/0x190 [xfs]
Oct 29 01:00:05 shadows kernel: [665260.471912] <ffffffffa013e174>] xfs_itruncate_extents+0x184/0x330 [xfs]
Oct 29 01:00:05 shadows kernel: [665260.471935] <ffffffffa013e3aa>] xfs_inactive_truncate+0x8a/0x110 [xfs]
Oct 29 01:00:05 shadows kernel: [665260.471957] <ffffffffa013f218>] xfs_inactive+0x128/0x150 [xfs]
Oct 29 01:00:05 shadows kernel: [665260.471964] <ffffffff811f9e00>] evict+0xb0/0x170
Oct 29 01:00:05 shadows kernel: [665260.471968] <ffffffff811f5b70>] __dentry_kill+0x170/0x1e0
Oct 29 01:00:05 shadows kernel: [665260.471973] <ffffffff811f5d66>] dput+0x186/0x240
Oct 29 01:00:05 shadows kernel: [665260.471982] <ffffffff811e0bc0>] __fput+0x150/0x1c0
Oct 29 01:00:05 shadows kernel: [665260.471987] <ffffffff81085057>] task_work_run+0xa7/0xe0
Oct 29 01:00:05 shadows kernel: [665260.471991] <ffffffff81002f59>] do_notify_resume+0x69/0x90
Oct 29 01:00:05 shadows kernel: [665260.471997] <ffffffff816658c1>] int_signal+0x12/0x17
Oct 29 01:00:05 shadows kernel: [665260.472004] <00007f4b210f52d0>] 0x7f4b210f52d

Hmmm . . .
Looking at your report and, the Ubuntu Bug Report, it could be that we have a Leap 42.1 Kernel issue with respect to XFS here.

It may be a good idea to raise a Bug Report <https://bugzilla.opensuse.org/&gt; containing everything that you’ve found.

The Ubuntu folks are suggesting that, the latest Kernel version may alleviate this issue but, that Kernel is appearing in the openSUSE distribution with Leap 42.2 which, is still in the Release Candidate testing phase.

Posted bug as requested.

Bugzilla – Bug 1008107
Potential XFS Kernel bug - _xfs_buf_find: Block out of range