KDE, error on bootup, BTRFS corrupted

When I boot to the linux system, I get message about trying these commands.

btrfs dev/sda1 --readonly

btrfs dev/sda1 --repair

Both of these command show many pages of errors. The repair command shows at the end, “well this shoudn’t happen”.

I can’t get my flash drive to mount. So I can’t copy the datafile with the command output.

btrfsck /dev/sda1/ --repair

I can’t find master dvd of opensuse leap 42.3 So, I will try mounting the flash drive, after downloading and burning another copy of the dvd. So, I can use its bootup features. I printed up several pages, including this command which won’t work on this boot setup.

badblocks -n /dev/sda1

The hard drive may have bad data blocks that the opensuse installer dvd isn’t finding.

Need help here. FYI, this third time this year that a linux boot has failed.

I used opensuse dvd’s rescue system to get this data. Can some help interperate what this means? This is the second time BTRFS has been corrupted.

>btrfsck /dev/sda3.–readonly

checking extents
parent transid verify failed on 14842216448 wanted 1035 found 950
parent transid verify failed on 14842216448 wanted 1035 found 950
parent transid verify failed on 14842216448 wanted 1035 found 950
parent transid verify failed on 14842216448 wanted 1035 found 950
Ignoring transid failure
Extent back ref already exists for 12042878976 parent 12542935040 root 0 
Extent back ref already exists for 12042878976 parent 12486524928 root 0 
Extent back ref already exists for 12042878976 parent 12482904064 root 0 
Extent back ref already exists for 12042878976 parent 12354355200 root 0 
Extent back ref already exists for 12042878976 parent 12276563968 root 0 
Extent back ref already exists for 12042878976 parent 12261195776 root 0 
Extent back ref already exists for 12042895360 parent 12542935040 root 0 
Extent back ref already exists for 12042895360 parent 12486524928 root 0 
Extent back ref already exists for 12042895360 parent 12482904064 root 0 
Extent back ref already exists for 12042895360 parent 12354355200 root 0 
Extent back ref already exists for 12042895360 parent 12276563968 root 0 
Extent back ref already exists for 12042895360 parent 12261195776 root 0 
Extent back ref already exists for 12042911744 parent 12542935040 root 0 

Repeating…

parent transid verify failed on 14913044480 wanted 1035 found 946
parent transid verify failed on 14913044480 wanted 1035 found 946
parent transid verify failed on 14913044480 wanted 1035 found 946
parent transid verify failed on 14913044480 wanted 1035 found 946
Ignoring transid failure
parent transid verify failed on 14913060864 wanted 1035 found 946
parent transid verify failed on 14913060864 wanted 1035 found 946
parent transid verify failed on 14913060864 wanted 1035 found 946
parent transid verify failed on 14913060864 wanted 1035 found 946
Ignoring transid failure
parent transid verify failed on 14913077248 wanted 1035 found 946
parent transid verify failed on 14913077248 wanted 1035 found 946
parent transid verify failed on 14913077248 wanted 1035 found 946
parent transid verify failed on 14913077248 wanted 1035 found 946
Ignoring transid failure

parent transid verify failed on 14925119488 wanted 1035 found 951
parent transid verify failed on 14925119488 wanted 1035 found 951
parent transid verify failed on 14925135872 wanted 1035 found 951
parent transid verify failed on 14925135872 wanted 1035 found 951
parent transid verify failed on 14929018880 wanted 1035 found 951
parent transid verify failed on 14929018880 wanted 1035 found 951
parent transid verify failed on 14929018880 wanted 1035 found 951
parent transid verify failed on 14929018880 wanted 1035 found 951
Ignoring transid failure
parent transid verify failed on 14929543168 wanted 1035 found 951
parent transid verify failed on 14929543168 wanted 1035 found 951
parent transid verify failed on 14929559552 wanted 1035 found 951
parent transid verify failed on 14929559552 wanted 1035 found 951
parent transid verify failed on 14913060864 wanted 1035 found 946
Ignoring transid failure
parent transid verify failed on 14913077248 wanted 1035 found 946
Ignoring transid failure
parent transid verify failed on 14913044480 wanted 1035 found 946
Ignoring transid failure

leaf parent key incorrect 14913044480
parent transid verify failed on 14913044480 wanted 1035 found 946
Ignoring transid failure
leaf parent key incorrect 14913044480
parent transid verify failed on 14913060864 wanted 1035 found 946
Ignoring transid failure
leaf parent key incorrect 14913060864
parent transid verify failed on 14913060864 wanted 1035 found 946
Ignoring transid failure
leaf parent key incorrect 14913060864
parent transid verify failed on 14913060864 wanted 1035 found 946
Ignoring transid failure
leaf parent key incorrect 14913060864
parent transid verify failed on 14913077248 wanted 1035 found 946
Ignoring transid failure

parent transid verify failed on 14913093632 wanted 1035 found 946
Ignoring transid failure
parent transid verify failed on 14913077248 wanted 1035 found 946
Ignoring transid failure
parent transid verify failed on 14913093632 wanted 1035 found 946
Ignoring transid failure
parent transid verify failed on 14913077248 wanted 1035 found 946
Ignoring transid failure


leaf parent key incorrect 14913093632
parent transid verify failed on 14913142784 wanted 1035 found 946
Ignoring transid failure


arent transid verify failed on 14929543168 wanted 1035 found 951
parent transid verify failed on 14929543168 wanted 1035 found 951
parent transid verify failed on 14929543168 wanted 1035 found 951
parent transid verify failed on 14929543168 wanted 1035 found 951

ref mismatch on [11239489536 8192] extent item 0, found 1
Backref 11239489536 root 273 owner 296 offset 45056 num_refs 0 not found in extent tree
Incorrect local backref count on 11239489536 root 273 owner 296 offset 45056 found 1 wanted 0 back 0x3b5c4f0
backpointer mismatch on [11239489536 8192]
ref mismatch on [11239497728 8192] extent item 0, found 1
Backref 11239497728 root 273 owner 296 offset 49152 num_refs 0 not found in extent tree
Incorrect local backref count on 11239497728 root 273 owner 296 offset 49152 found 1 wanted 0 back 0x3b5c3c0
backpointer mismatch on [11239497728 8192]
ref mismatch on [11239505920 8192] extent item 0, found 1


Backref 12015255552 parent 14940717056 root 14940717056 not found in extent tree
backpointer mismatch on [12015255552 16384]
ref mismatch on [12015271936 16384] extent item 0, found 10
Backref 12015271936 parent 259 root 259 not found in extent tree
Backref 12015271936 parent 12118261760 root 12118261760 not found in extent tree
Backref 12015271936 parent 12118228992 root 12118228992 not found in extent tree

parent transid verify failed on 14925135872 wanted 1035 found 951
parent transid verify failed on 14929018880 wanted 1035 found 951
Ignoring transid failure
Error: could not find btree root extent for root 302


https://www.suse.com/support/kb/doc/?id=7018181

I used all of the commands. All failed to restore the system.

I’m stuck. My only remaining choice is restore from the latest backup. Then install opensuse 42.3 from scratch. (Again in the same month).

Any more ideas? Help.

  • btrfs comes with a nice wiki: btrfs Wiki

  • btrfs is reliable on reliable hardware only and requires maintenance. Repair is cumbersome.

  • High recoverability is affordable: Mirror your system on a budget SSD with ext4 and keep it reasonably up to date.

Currently I a watching Leap 15 with regard to btrfs and consider switching Tumbleweed from ext4 to btrfs. This only makes sense when the hardware is rock solid and a backup system is available.

Two times this month and again the month before that, BTRFS is not working well. If this happens again, I’m formatting and reinstalling the system without it. Hopefully, the latest kernel update fixed this error. Else, the hard drive could be failing.

Good news, combining two backup locations, I was able to recover all known files. There are no files missing! :slight_smile:

Next, I need to do complete reinstall. Thanks to all for the help in both forum topics.

Backups would be a must.

It’s relatively easy to mirror ext4 root to another partition by using something like rsync. For example, I’ve just switched to tumbleweed (ext4 root) which the safety-net that I can revert to leap-15 (ext4 root) by using rsync to overwrite tumbleweed (and then sort out the booting afterwards),

It’s not clear to me how to achieve a similar simple approach with btrfs. Even with good knowledge of the non-sub-volumes and copy-on-write exceptions, I’m still uncertain of a easy way to get a backup that is simple and easy to reinstate (and that can be held offline/offsite).

However it’s possible the new simpler btrfs openSUSE root layout may make it easier than before.

When I was previously pondering this issue I wrote up some notes here:

https://forums.opensuse.org/showthread.php/521277-LEAP-42-2-btrfs-root-filesystem-subvolume-structure

They were written prior to the recent improvements/simplifications to the btrfs root made in Leap-15 and Tumbleweed, but a lot of the info still applies.

It seems to me, that for simple DIY systems where root offline/offsite backup is based on something like tar or rsync, it’s difficult to switch to btrfs without revisiting your approach to root backups. But in respect to Tumbleweed, btrfs’s ability to be rolled back could well be very handy when an update breaks something. For the moment I’m keeping Tumbleweed on ext4-root and plan to rsync an offline backup of root whenever it seems stable/bug-free.

In respect to reliable hardware. Pulling the power out from under ext4 seems recoverable - I’ve never my desktop fail to restart due to a corrupt ext4 root (except if a disk is faulty). I’ve read that xfs requires reliable hardware including a UPS, but I’ve not read that btrfs is as picky. With btrfs it does seem like you have to manage the snapshot space well.

It would be interesting to hear other peoples experiences with some of the above.

I really don’t think btrfs on ssd is a good idea, the whole point of btrfs is not needing to do backups I mean the filesystem is one big backup
personally I don’t use btrfs but this is on a personal machine and if any root corruption appears I have no problem with a clean re-install as long as my data in my home is safe all is good
but as far as I understand it btrfs is pretty mature and issues like the above usually mean hardware faults I think you should keep a live usb with smartctl installed (it’s part of the smartmontools package)
you should boot from a live usb or rescue dvd and run

sudo smartctl -i /dev/sdX

where /dev/sdX is the device to be checked
if the drive is fine and mountable with the live usb try running

btrfsck /dev/sdX/ --repair

if the drive is dead there’s no point in fixing the filesystem

The SSD drive on my very old laptop is my backup linux system. The standard,magnetic, drive is used for my main linux system. The main system keep getting corrupted. FYI, I’m using the opensuse rescue system. After the first failure, I restored from a backup. Then installed into the whole drive, erasing all the partitions. Opensuse booted successfully. It worked on many bootups then the drive was corrupted again.

I use a manual backup to flash and have unison as another backup app. Also, I was using an mswin formatted drive to store backups. Now, my backups are on flash drive formatted in ext4.

Hardware faults? As in failing magnetic drive?

I have the latest version of the bios for the motherboard. So, software bug in the bios?

btrfsck /dev/sdX/ --repair

Did this no effect.

I’ll try that new command. smartctl -i /dev/sdX

ssd’s have a limited life span, a write intensive filesystem like btrfs can shorten the life span of an ssd (especially on older models)
it was my understanding that the root was on an ssd drive with btrfs
smartctl can check the state of ssd’s too not just magnetic drives
smartctl can show the remaining lifetime of the ssd

sudo smartctl -a /dev/sdX

will show the remaining lifetime of your ssd counting down from 100 to 0 (100 being a brand new and 0 being a dead ssd)
https://scottlinux.com/2014/07/15/determine-remaining-ssd-life-in-linux/

I can only boot opensuse dvd 43.]3 rescue system. That won’t allow ‘zypper in smartmontools’. The second command fails to repair. The complete reinstall worked last time. It should work this time.

you can download the smarmontools rpm from a different PC unpack the smartctl app from the rpm (use 7-zip on windows or ark on linux) upt the unpacked binary application on a usb and run it from the live media
you can find smartctl in

smartmontools-6.5-8.1.x86_64.rpm\smartmontools-6.5-8.1.x86_64.cpio\.\usr\sbin\smartctl

the above address was copied from 7-zip running windows
but as you have a 2nd hard drive why not install an OS on it temporally so you can check the condition of your ssd
or try gparted as it should allow you to check and repair your filesystem if it’s not hardware related

ps
according to the smartctl developers
https://www.smartmontools.org/wiki/LiveCDs
the opensuse rescue iso does come with smartctl (not sure about the kde/gnome live iso’s) but it should be included in the full install dvd when booted as a rescue system
but that was some time ago I’m not sure LEAP has it (it should but I don’t install with a dvd/usb so I can’t tell)

I reinstalled opensuse 42.3, erasing all previous partitions. I had over 400+ software updates. It booted successfully with one major software issue. See post.

This is another topic for BTRFS snapshots on laptops. Please follow me to the new post.

On Sun 24 Jun 2018 09:06:04 PM CDT, I A wrote:

ssd’s have a limited life span, a write intensive filesystem like btrfs
can shorten the life span of an ssd (especially on older models)
it was my understanding that the root was on an ssd drive with btrfs
smartctl can check the state of ssd’s too not just magnetic drives
smartctl can show the remaining lifetime of the ssd

Code:

sudo smartctl -a /dev/sdX

will show the remaining lifetime of your ssd counting down from 100 to 0
(100 being a brand new and 0 being a dead ssd)
http://tinyurl.com/oy2bbmq

Hi
My own experience with OCZ drives refutes this :wink: I have been running
btrfs/xfs for many years on multiple machines, I would surmise based on
posts in these forums SSD issues have been based on Samsung devices…
perhaps more popular? AFAIK some Samsung models are still blacklisted
trim by the kernel…

This is my oldest one… 45K plus hours…


=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     OCZ-AGILITY3
.....

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x000f   100   100   050    Pre-fail  Always       -       0/3159420
5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       1
9 Power_On_Hours_and_Msec 0x0032   048   048   000    Old_age   Always       -       45862h+53m+59.880s
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       550
171 Program_Fail_Count      0x0032   000   000   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       352
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       4
181 Program_Fail_Count      0x0032   000   000   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   030   030   000    Old_age   Always       -       30 (Min/Max 30/30)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/3159420
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       1
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/3159420
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/3159420
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0013   097   097   010    Pre-fail  Always       -       0
233 SandForce_Internal      0x0000   000   000   000    Old_age   Offline      -       11757
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       24279
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       24279
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age Always       -       28745


Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
SLES 15 | GNOME Shell 3.26.2 | 4.12.14-23-default
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!

Thanks to all for the help. I seem to be good shape for my SSD. I have all the info printed out for further reference. Also, I saved the pages to a flash drive.

If it happens again on my main system, I will use smart monitor tools when needed. Hopefully, it won’t and not need to post on here. :wink: