BTRFS file system problem -- unable to boot, unable to mount -> stuck!

Dear all,

my laptop is unusable since yesterday evening due to a file system problem.

Here is the problem. I boot into my Leap 42.1, and I get

btrfs (/dev/sda2) parent transid verify failed on 109973766144 wanted 31781 found 31780
btrfs (/dev/sda2) parent transid verify failed on 109973766144 wanted 31781 found 31780
btrfs (/dev/sda2) parent transid verify failed on 109973766144 wanted 31781 found 31780
btrfs (/dev/sda2) parent transid verify failed on 109973766144 wanted 31781 found 31780

(…and a few more times). Then the logo appears, and it loads but never gets anywhere: the loading stops at


 OK ] Reached Target Initrd File Systems
 OK ] Reached Target Initrd Defeault Target

and that’s it: it does not move anymore from there.

This malfunctioning can be due to 2 things I did before the problem (not sure which of the two)

  • I installed google-drive-ocamlfuse (on a ext4 filesystem) -> maybe a stupid idea**?**
  • had a kernel panic during a system update, so the system rebooted

But whatever the reason, the problem is that now I cannot boot.

Solutions tried up to now

  1. burn Leap to a usb key, boot from the key, go to rescue system, try to mount the former /root filesystem to use one of the solutions found on the web about btrfs journal/filesystem mismatch problem -> NO: the root filesystem does not mount, claiming a general I/O error
  2. tried to do a fresh install of Leap (yes, I would lose all the data, but I had all in the cloud) -> I can install only on the /sda3 and >3 partitions, /sda2/ is locked and will not be touched in any way by the partitioner, no matter what
  3. booted a live usb with Lubuntu, tried to chroot to the /dev/sda2/ partition -> Apparently I can mount it, but as soon as the command is entered, then lubuntu freezes solid and I cannot do anything on the partition.

Any hints? that would be much appreciated since I have to travel next week and I would be without a laptop…
Please note: at this point I do not even care about data, I am happy with a fresh install wiping off everything; the point is that I cannot even do that, since I will not use /sda2 anyways, thus losing about 40% of drive space and not solving the problem.

thanks!

P

Did you use smartctl to see if the drive is damaged?

nope. DO you think I can do it from the lubuntu live USB? I will try and report results.

sure

just point it to the drive

Ok, the disk does not seem damaged.

smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-21-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SanDisk based SSDs
Device Model:     SanDisk SSD U100 128GB
Serial Number:    121283300402
LU WWN Device Id: 5 001b44 748a14832
Firmware Version: 10.01.04
User Capacity:    128.035.676.160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      1.8 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jun 17 14:34:44 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (  120) seconds.
Offline data collection
capabilities:              (0x11) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  32) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0002   100   100   000    Old_age   Always       -       1
  9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       5388
 12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       3980
171 Program_Fail_Count      0x0002   100   100   000    Old_age   Always       -       1
172 Erase_Fail_Count        0x0002   100   100   000    Old_age   Always       -       0
173 Avg_Write/Erase_Count   0x0002   100   100   000    Old_age   Always       -       83
174 Unexpect_Power_Loss_Ct  0x0002   100   100   000    Old_age   Always       -       274
187 Reported_Uncorrect      0x0002   100   100   000    Old_age   Always       -       0
230 Perc_Write/Erase_Count  0x0002   100   100   000    Old_age   Always       -       276
232 Perc_Avail_Resrvd_Space 0x0003   096   100   005    Pre-fail  Always       -       0
234 Perc_Write/Erase_Ct_BC  0x0002   100   100   000    Old_age   Always       -       512
241 Total_LBAs_Written      0x0002   100   100   000    Old_age   Always       -       8520655052
242 Total_LBAs_Read         0x0002   100   100   000    Old_age   Always       -       20457265918

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported


Another update: by proper tinkering with the partitioner, I can use all HD and make a fresh install on the whole machine. Still, the Leap installer does not see the existing mount points (the lubuntu usb drive does) so it does not allow me to just reset the / partition and keep /home (where all the data sits) untouched. I would like to try to do this – or, even better, to fix the btrfs problem and do not have to reinstall.

Is it possible that mounting the /dev/sda2 (30Gb) partition fails because my pendrive is ~1GB and I have only 4Gb of RAM? I was wondering that it might get stuck because if I mount it it gets somewhat copied to the USB and/or RAM and of course ther eis not enough space.

This means some writes had been lost on this disk. The last transaction number recorded on disk is lower than last transaction number recorded in journal. The only way to actually fix it is to drop journal which means some more writes will be lost. After getting access to filesystem I would actually revert to the last good snapshot and started over :slight_smile:

See http://ram.kossboss.com/btrfs-transid-issue-explained-fix/ for some more details and suggestions.

Bottom side - this is either software bug or hardware issue. I would recommend opening bug report to verify this is not known software issue. But the worst case is if your hard drive lies to you that it committed data to stable storage. Hopefully it is software bug.

MMM this is a SSD. Generally they work or don’t work

I agree with arvidjaar try his suggestions

So,

I did what Aarvidjar suggested. Strangely enough, I got absolutely no change. This is strange since the site linked says that I am hitting btrfs with a hammer and should definitely result in something…

So I went on and decided to do a fresh install.

  • Yast installer proposed not ot use the corrupted partition (/sda2) and to install the whole system into the old /home partition (not recognized as such, /sda3). This being a uselesse waste of space, I overrid the default and asked to set up three partitions on the whole disk
  • Then, once installation had to start with partitioning, the installer spat an error
Failure occurred during the following action: Deleting partition /dev/sda2
DISK_REMOVE_PARTITION_PARTED_FAILES

error code: -1014
  • Same error for /sda1 and /sda3;
  • I asked Yast to ignore the error and continue, to be able to stare into the abyss, and I got a further error (of course) when it tried to create the new partitions, this time with code -1007.

So it seems that not only I cannot use the filesystem, I cannot even delete it!

What I tried next was to reboot from the usb key (I am writing from there now) and try again to work on the btrfs /sda2 to see if I could do it some harm -> i.e. delete it completely, at this point I do not even care about the data anymore, it is backupped anyway. But this time fdisk -l does not see the disk anymore, and of course if I try to use btrfsck and other tools they tell me they are not being targeted to a btrfs disk.

How can I delete a corrupted partition to start anew?

thanks!

Me again.

so I succeeded in making the usb live lubuntu ‘see’ the btrfs partition.

  • btrfsck returns a never-ending loop of errors. That is, they look like the same errors repeated over and over. I did not succeed in sinking the log to a text file (weird) but you get the same errors over and over again.
  • btrfs-zero-log results in this
root@lubuntu:~# btrfs-zero-log /dev/sda2
WARNING: this utility is deprecated, please use 'btrfs rescue zero-log'

Clearing log on /dev/sda2, previous log_root 0, level 0
parent transid verify failed on 256704512 wanted 31781 found 31780
parent transid verify failed on 256704512 wanted 31781 found 31780
Ignoring transid failure
  • btrfs restore also does not work at all, claiming the exact same error / stopping point

parent transid verify failed on 130367488 wanted 31780 found 31778
parent transid verify failed on 130367488 wanted 31780 found 31778
Ignoring transid failure
leaf parent key incorrect 130367488
Error searching -1
Error searching /USB/@/tmp/gpg-Wq6Duw
Error searching /USB/@/tmp/gpg-Wq6Duw
Error searching /USB/@/tmp/gpg-Wq6Duw

…and, finally:

I can now mount the filesystem using lubuntu usb key. It mounts fine, and I can access the files etc. But the problem with the installer if I wish to wipe it out still exists.

And what did you do to achieve it?

I followed the advice in your link and mounted with ro,recovery options.

Still, I cannot do anything with it – does not boot, would not be deleted. Do you think it is safe to do a memory cell clearing?

P

I do not know what “memory cell cleaning is”. But can you mount it normally from live distro? What errors you get when you boot?

Deleting a partition should not involve the file system or partition area at all it happens at the partition table. Not being able to mod the partition table indicates a hardware problem. Perhaps the drive is out of space to swap modified blocks. IMHO I would no longer trust that drive