Page 1 of 2 12 LastLast
Results 1 to 10 of 17

Thread: Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

  1. #1
    Join Date
    Jun 2008
    Location
    UK
    Posts
    5,500

    Default Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

    Earlier this week my Tumbleweed, with root partition on btrfs from oS 12.3 through 13.1, finally fell after update "kernel-desktop-3.17.1-52.1.g5c4d099-x86_64" did its worst. Unfortunately, this being the first major Tw or btrfs failure I've had, it happens close to the end of its current life cycle.

    Previously with kernel-desktop-3.17.1-51.1, it ran trouble-free for several hours, with no relevant error messages (/var/log/messages). Re-booting after the kernel update (3.17.1-52.1), it seemed to proceed normally right through into KDE4's desktop, but gui applications such as YaST, Dolphin, internet browsers, etc., etc., were all unusable as system files reported to be non-writeable, according to desktop error messages. In other words the root file system (including /home) had become read-only. Command line access through Konsole or tty was possible but limited to query or display commands e.g rpm -q, zypper search or list repos, etc., whereas zypper remove failed or refresh failed repo by repo. The system is effectively rendered useless and unmaintainable!

    Mounting the btrfs partition from a standard oS 13.1 system enabled easier investigation with its KDE4 but superuser Dolphin, etc., failed to provide any write access to the partition. Direct chroot access just confirmed the read-only status.

    /var/log/messages contained many entries like this one after the initial "BTRFS info" message:
    Code:
    ...kernel: [   22.259911] BTRFS info (device sda8): disk space caching is enabled
    ...kernel: [   23.318003] parent transid verify failed on 949858304 wanted 186937 found 186939
    Running "btrfsck /dev/sda8" (from 13.1) on the unmounted partition, it reported many errors. However, 13.1 doesn't have latest version of btrfsprogs. Since my Tumbleweed partition includes no important user data, I took the last resort and ran btrfsck --repair. It eventually aborted with this:
    Code:
    Extent back ref already exists for 998006784 parent 24822198272 root 0 
    Well this shouldn't happen, extent record overlaps but is metadata? [998006784, 4096]
    Aborted
    Subsequently, I see a relevant bug report at http://bugzilla.opensuse.org/show_bug.cgi?id=897774, and a thread somewhat strangely posted in our Applications forum at https://forums.opensuse.org/showthre...ns-to-readonly.

    Apparently this was all a known issue for kernel 3.17 and read-only snapshots. It's certainly catastrophic for users of btrfs and snapper. Rebooting with previous kernels e.g. 3.16.3 doesn't solve it.

    I still have the corrupted Tumbleweed installed if anyone can suggest a repair? Otherwise I will have to reinstall it, probably over 13.2 release.

    "The Tumbleweed is dead. Long live the Tumbleweed (to be regenerated on 4 November)"!
    Leap 42.3 (ext4, KDE Plasma 5.8.7) ~ stable
    Manjaro (ext4, Xfce) ~ rolling updates
    Tumbleweed (ext4, KDE Plasma5) ~ managed updates via "Tumbleweed Snapshots" service.

  2. #2
    Join Date
    Jul 2008
    Location
    Toronto, Canada
    Posts
    1,253

    Default Re: Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

    Ref: http://bugzilla.opensuse.org/show_bug.cgi?id=897774
    Copied from David Sterba's response.

    This was caused by a patch in 3.17 and the error is persistent on the image. There's a fsck fix on the way.

    It should be resolved in time for the release of openSUSE 13.2
    My Linux Box
    OS:
    openSUSE 51.1 - Plasma 5.12.8
    OS:
    Tumbleweed Plasma 5.16.4
    ASUS P5Q | Intel Quad 6600 @3.02 GHz | 8GB G.SKILL RAM | Nvidia GeForce 750 Ti

  3. #3
    Join Date
    Jun 2008
    Location
    UK
    Posts
    5,500

    Default Re: Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

    Quote Originally Posted by Romanator View Post
    Ref: http://bugzilla.opensuse.org/show_bug.cgi?id=897774
    Copied from David Sterba's response.

    This was caused by a patch in 3.17 and the error is persistent on the image. There's a fsck fix on the way.

    It should be resolved in time for the release of openSUSE 13.2
    That's the same in link(s) I already provided, so it's safe to assume I also read comments in the bug report.

    It doesn't need to be resolved in time for 13.2, releasing with kernel 3.16.6 (shouldn't have the offending patch). Kulow's comments in the Factory ML, at http://lists.opensuse.org/opensuse-f.../msg00622.html, confirm the kernel release version as that.

    Many btrfs updates were on the way for 3.17, but apparently haven't made it in time and will merge with 3.18 instead, see this from Phoronix, http://www.phoronix.com/scan.php?pag...tem&px=MTc2MzU.

    I somehow doubt the fix will actually repair any corruption of extents on the file system (as seen by btrfsck), who knows?
    Leap 42.3 (ext4, KDE Plasma 5.8.7) ~ stable
    Manjaro (ext4, Xfce) ~ rolling updates
    Tumbleweed (ext4, KDE Plasma5) ~ managed updates via "Tumbleweed Snapshots" service.

  4. #4
    Join Date
    Jul 2008
    Location
    Toronto, Canada
    Posts
    1,253

    Default Re: Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

    Quote Originally Posted by consused View Post
    That's the same in link(s) I already provided, so it's safe to assume I also read comments in the bug report.

    It doesn't need to be resolved in time for 13.2, releasing with kernel 3.16.6 (shouldn't have the offending patch). Kulow's comments in the Factory ML, at http://lists.opensuse.org/opensuse-f.../msg00622.html, confirm the kernel release version as that.

    Many btrfs updates were on the way for 3.17, but apparently haven't made it in time and will merge with 3.18 instead, see this from Phoronix, http://www.phoronix.com/scan.php?pag...tem&px=MTc2MzU.
    That's what happens when you don't submit your patches on time. Linus has to be strict about the dates due to the large amount of people contributing code.

    I somehow doubt the fix will actually repair any corruption of extents on the file system (as seen by btrfsck), who knows?
    Argh! Since A lot of people are trying out btrfs (including me) for the first time. Let's hope that it doesn't happen before 3.18 is available for download.

    Check out this link: http://www.spinics.net/lists/linux-btrfs/msg38372.html
    The author suggest switching to writable snapshots instead for read-only snapshots.

    Maybe the openSUSE kernel devs can backport the fsck patch.
    My Linux Box
    OS:
    openSUSE 51.1 - Plasma 5.12.8
    OS:
    Tumbleweed Plasma 5.16.4
    ASUS P5Q | Intel Quad 6600 @3.02 GHz | 8GB G.SKILL RAM | Nvidia GeForce 750 Ti

  5. #5
    Join Date
    Jun 2008
    Location
    UK
    Posts
    5,500

    Default Re: Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

    Quote Originally Posted by Romanator View Post
    Argh! Since A lot of people are trying out btrfs (including me) for the first time. Let's hope that it doesn't happen before 3.18 is available for download.

    Check out this link: http://www.spinics.net/lists/linux-btrfs/msg38372.html
    The author suggest switching to writable snapshots instead for read-only snapshots.
    Yes, it is a very serious bug affecting btrfs users blundering into 3.17 kernel, but fortunately doesn't affect those on standard 13.1 and hopefully not on the new standard release.

    I had seen that "linux-btrfs" link from the other thread (in Applications), but the workarounds are only useful to a system that has fixed the "read only" state of the file system image. Without that, the only part of Tumbleweed I have write access to is /boot separately partitioned as ext2. I could manually delete the 3.17 kernel(s), but it won't solve anything unless I can regenerate the root partition's btrfs, either from a previous cloned image (>20GB) or as in my case it means a reinstall.

    On the other hand, the author believing writeable snapshots don't trigger the issue, could imply that any real corruption may be limited to the read-only snapshots in /.snapshots. I had already noticed some more recent snapshots filed there, compared to those last reported by "snapper list". I need to investigate timing and content of those differences. It may also pinpoint exactly when the problem occurred, e.g after the first 3.17.1 update rather than the second. Also, /var/log/Snapper.log has no entries after rebooting from the second 3.17.1 update.
    Leap 42.3 (ext4, KDE Plasma 5.8.7) ~ stable
    Manjaro (ext4, Xfce) ~ rolling updates
    Tumbleweed (ext4, KDE Plasma5) ~ managed updates via "Tumbleweed Snapshots" service.

  6. #6
    Join Date
    Jun 2008
    Location
    UK
    Posts
    5,500

    Default Re: Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

    Quote Originally Posted by Romanator View Post
    Maybe the openSUSE kernel devs can backport the fsck patch.
    Well if they do, but I ran "btrfsck --repair" which could mean the patched fsck can't fix my corrupted snapshots, as stated in the last posting of the thread you linked to. However there were no indications from btrfsck that it made any changes before aborting, so anything is possible.

    I've now located corrupted snapshots on the file system under /.snapshots, named 3688, and 3691 through 3697. All were created on the system while running the 3.17.1-51.1 kernel. They are best viewed as directories under /.snapshots using command line, for example:
    Code:
    /.snapshots # ls -l 3688
    ls: cannot access 3688/snapshot: Cannot allocate memory
    total 356
    -rw------- 1 root root 356950 Oct 20 18:36 filelist-3687.txt
    -rw------- 1 root root    187 Oct 20 18:33 info.xml
    d????????? ? ?    ?         ?            ? snapshot
    The "snapshot" directory with the "?" marks identifies corruption (whereas Dolphin does not provide that clue, with no message/directory displayed). Compare that to a normal one (3687) created a few minutes before the now corrupted 3688:
    Code:
    /.snapshots # ls -l 3687
    total 8
    -rw------- 1 root root 202 Oct 20 18:30 info.xml
    drwxr-xr-x 1 root root 186 Oct 20 18:14 snapshot
    In fact 3687 is the "pre" snapshot of an update to "bash" (via zypper dup) and the missing one, 3688, is the "post" snapshot (it has the additional filelist to show changed files).

    I say "missing snapshot" because when I now run "snapper list" on the system updated with 3.17.1-52.1 it fails to llist snapshots 3688, and 3691 through 3697 as they are all corrupted with an inaccessible "snapshot" directory.

    I hope this provides a useful illustration of what to look for if anyone else has similar problems in future.
    Leap 42.3 (ext4, KDE Plasma 5.8.7) ~ stable
    Manjaro (ext4, Xfce) ~ rolling updates
    Tumbleweed (ext4, KDE Plasma5) ~ managed updates via "Tumbleweed Snapshots" service.

  7. #7
    Join Date
    Jun 2008
    Location
    UK
    Posts
    5,500

    Default Re: Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

    Quote Originally Posted by Romanator View Post
    Maybe the openSUSE kernel devs can backport the fsck patch.
    Just seen a recent bug report re 3.17.1/btrfs for Archlinux at https://bugs.archlinux.org/task/4256...ened&sort=desc, with comments about a potential workaround/fix using btrfsprogs 3.17, but it's not in Tumbleweed yet and must first appear in Factory. A new comment has appeared in openSUSE's bug report, also requesting btrfsprogs 3.17.

    The archlinux report comments on a fix in kernel 3.18rc2.
    Leap 42.3 (ext4, KDE Plasma 5.8.7) ~ stable
    Manjaro (ext4, Xfce) ~ rolling updates
    Tumbleweed (ext4, KDE Plasma5) ~ managed updates via "Tumbleweed Snapshots" service.

  8. #8
    Join Date
    Jun 2008
    Location
    UK
    Posts
    5,500

    Default Re: Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

    Quote Originally Posted by consused View Post
    Just seen a recent bug report re 3.17.1/btrfs for Archlinux at https://bugs.archlinux.org/task/4256...ened&sort=desc, with comments about a potential workaround/fix using btrfsprogs 3.17
    It seems clearer now that this bug is fixed in kernel 3.17.2, or it will be when it arrives in factory, and more specifically in new Tumbleweed. That archlinux report now has a comment at the end, and there is also this external thread at http://www.spinics.net/lists/linux-btrfs/msg38877.html. It confirms that "Data corruption occurs when creating RO snapshots", and it seems to only affect the snapshots.

    For repairing my old Tumbleweed: btrfsprogs 3.17 will first be needed to correct the metadata, and kernel-desktop 3.17.2 to avoid the issue again. Neither are in Factory as of today, although 3.17.2 is in Kernel:stable (OBS, 1-click). We wait in anticipation...
    Leap 42.3 (ext4, KDE Plasma 5.8.7) ~ stable
    Manjaro (ext4, Xfce) ~ rolling updates
    Tumbleweed (ext4, KDE Plasma5) ~ managed updates via "Tumbleweed Snapshots" service.

  9. #9

    Default Re: Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

    I just fixed my version of tumbleweed that was affected by this read-only snapshot bug. Here's how I did it:


    1. Note: I've heard this only works if you haven't yet run the btrfs repair command. Keep your backups at the ready.
    2. Upgrade to kernel-desktop 3.17.2 and btrfsprogs 3.17 via software.opensuse.org.
    3. Note: I didn't subscribe to their associated repos. The key is that from now on you don't downgrade back to the old buggy versions.
    4. Reboot and confirm both upgrades are running/active. Hint: use
      Code:
      uname -r
      in the terminal. Also check yast's software manager.
    5. Grab yourself another linux system running kernel 3.17.2 and btrfsprogs 3.17. This is necessary because the btrfs command we will run cannot be performed on a mounted filesystem.
    6. I used a live usb of archbang from november 3rd. It came with linux 3.17.2 and once booted I used
      Code:
      sudo pacman -Syyu
      to bring the system up to date and
      Code:
      sudo pacman -S btrfs-progs
      to upgrade btrfs-progs to version 3.17.
    7. From your extra linux system that is now also fully upgraded, run
      Code:
      sudo btrfs check --repair /dev/sdxy
      with x and y corresponding to your corrupted root btrfs partition. (Encryption obviously makes this more complicated, but I managed to stumble through it. You can too.)


    With any luck, you can boot into your tumbleweed install and the corrupted snapshots will be gone. The snapper gui and its command line equivalent should both look nice and tidy. If this isn't the case or if you have experienced other file system corruption you might need to nuke and pave. Remember those backups from earlier?

    Now you need to keep 3.17.2 and 3.17 around until the tumbleweed repos match or exceed those versions. Of course, it would have been a little easier to wait for the tumbleweed repos to update to 3.17.2 and 3.17 but where's the fun in that?

  10. #10

    Default Re: Tumbleweed after Update to Kernel 3.17.1-52.1 has corrupted Btrfs Root Partition

    Looks like I missed the 10min edit window.

    After running the btrfs command it should output 0 errors. If it outputs 8 or a different number, that is not good and it needs to be looked at.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •