Btrfs disk issues - please advise on repair

i have a single /dev/sdc as a btrfs volume. recently the volume got corrupted and i cannot seem to find a way to restore its functionality by using any of internet howtos.
i tried standard ‘safe’ options and still did not run --repair option in fear of killing the data completely.
also if i check with btrfs tools v4 and v5 (from tumbleweed live usb) i get totally different results.
with v4 i ger simply:

and with v5 i get a log file of 32Mb with over 500k lines stating following:


parent transid verify failed on 3646998413312 wanted 3065 found 2747 
parent transid verify failed on 3646998413312 wanted 3065 found 2747
parent transid verify failed on 3646998413312 wanted 3065 found 2747
Ignoring transid failure
Opening filesystem to check...
Checking filesystem on /dev/sdc
UUID: bc677067-effd-430a-90fa-ab1c6cd51de1
[1/7] checking root items                      (0:00:00 elapsed, 144 items checked)
[1/7] checking root items                      (0:00:01 elapsed, 51113 items checked)
[1/7] checking root items                      (0:00:02 elapsed, 132156 items checked)
[1/7] checking root items                      (0:00:03 elapsed, 201203 items checked)
[1/7] checking root items                      (0:00:04 elapsed, 317426 items checked)
[2/7] checking extents                         (0:00:00 elapsed)
[2/7] checking extents                         (0:00:01 elapsed, 1940 items checked)
[2/7] checking extents                         (0:00:02 elapsed, 2355 items checked)
parent transid verify failed on 3646998642688 wanted 3065 found 2747
parent transid verify failed on 3646998642688 wanted 3065 found 2747
parent transid verify failed on 3646998642688 wanted 3065 found 2747
Ignoring transid failure
[2/7] checking extents                         (0:00:03 elapsed, 9973 items checked)
[2/7] checking extents                         (0:00:04 elapsed, 25702 items checked)
[2/7] checking extents                         (0:00:05 elapsed, 39437 items checked)
[2/7] checking extents                         (0:00:06 elapsed, 56114 items checked)
[2/7] checking extents                         (0:00:07 elapsed, 71956 items checked)
[2/7] checking extents                         (0:00:08 elapsed, 87850 items checked)
[2/7] checking extents                         (0:00:09 elapsed, 103417 items checked)
[2/7] checking extents                         (0:00:10 elapsed, 118146 items checked)
[2/7] checking extents                         (0:00:11 elapsed, 133337 items checked)
[2/7] checking extents                         (0:00:12 elapsed, 148490 items checked)
[2/7] checking extents                         (0:00:13 elapsed, 163390 items checked)
[2/7] checking extents                         (0:00:14 elapsed, 177768 items checked)
[2/7] checking extents                         (0:00:15 elapsed, 192416 items checked)
[2/7] checking extents                         (0:00:16 elapsed, 202829 items checked)
[2/7] checking extents                         (0:00:17 elapsed, 212968 items checked)
[2/7] checking extents                         (0:00:18 elapsed, 219076 items checked)
ref mismatch on [13631488 16384] extent item 1, found 0
incorrect local backref count on 13631488 root 5 owner 36919 offset 0 found 0 wanted 1 back 0x55c1895eaca0
backref disk bytenr does not match extent record, bytenr=13631488, ref bytenr=0
backpointer mismatch on [13631488 16384]
owner ref check failed [13631488 16384]
ref mismatch on [13647872 4096] extent item 1, found 0
incorrect local backref count on 13647872 root 5 owner 37225 offset 0 found 0 wanted 1 back 0x55c1895eae50
backref disk bytenr does not match extent record, bytenr=13647872, ref bytenr=0
backpointer mismatch on [13647872 4096]
owner ref check failed [13647872 4096]

...skipped many many lines

ref mismatch on [61423616 16384] extent item 1, found 0
backref 61423616 root 5 not referenced back 0x55c186e74550
incorrect global backref count on 61423616 found 1 wanted 0
backpointer mismatch on [61423616 16384]
owner ref check failed [61423616 16384]
ref mismatch on [268746752 16384] extent item 1, found 0
backref 268746752 root 5 not referenced back 0x55c1875bcd10
incorrect global backref count on 268746752 found 1 wanted 0
backpointer mismatch on [268746752 16384]
owner ref check failed [268746752 16384]
ref mismatch on [284639232 16384] extent item 1, found 0
backref 284639232 root 7 not referenced back 0x55c186ff9130
incorrect global backref count on 284639232 found 1 wanted 0
backpointer mismatch on [284639232 16384]
owner ref check failed [284639232 16384]

...skipping again many many lines

ref mismatch on [1104150528 134217728] extent item 1, found 0
incorrect local backref count on 1104150528 root 5 owner 260 offset 0 found 0 wanted 1 back 0x55c187a09e30
backref disk bytenr does not match extent record, bytenr=1104150528, ref bytenr=0
backpointer mismatch on [1104150528 134217728]
owner ref check failed [1104150528 134217728]
ref mismatch on [1238368256 134217728] extent item 1, found 0
incorrect local backref count on 1238368256 root 5 owner 260 offset 134217728 found 0 wanted 1 back 0x55c187a09fe0
backref disk bytenr does not match extent record, bytenr=1238368256, ref bytenr=0
backpointer mismatch on [1238368256 134217728]
owner ref check failed [1238368256 134217728]
ref mismatch on [1372585984 134217728] extent item 1, found 0
incorrect local backref count on 1372585984 root 5 owner 260 offset 268435456 found 0 wanted 1 back 0x55c187a0a110
backref disk bytenr does not match extent record, bytenr=1372585984, ref bytenr=0
backpointer mismatch on [1372585984 134217728]
owner ref check failed [1372585984 134217728]

...skipping again bundle

ref mismatch on [3671060365312 15777792] extent item 1, found 0
incorrect local backref count on 3671060365312 root 5 owner 37267 offset 18724372480 found 0 wanted 1 back 0x55c1895ea950
backref disk bytenr does not match extent record, bytenr=3671060365312, ref bytenr=0
backpointer mismatch on [3671060365312 15777792]
owner ref check failed [3671060365312 15777792]
ref mismatch on [3671076143104 16777216] extent item 1, found 0
incorrect local backref count on 3671076143104 root 5 owner 37267 offset 19730006016 found 0 wanted 1 back 0x55c1895eaa80
backref disk bytenr does not match extent record, bytenr=3671076143104, ref bytenr=0
backpointer mismatch on [3671076143104 16777216]
owner ref check failed [3671076143104 16777216]
[2/7] checking extents                         (0:00:20 elapsed, 220934 items checked)
ERROR: errors found in extent allocation tree or chunk allocation
cache and super generation don't match, space cache will be invalidated
[3/7] checking free space cache                (0:00:00 elapsed)
[3/7] checking free space cache                (0:00:00 elapsed)
[4/7] checking fs roots                        (0:00:00 elapsed)
root 5 root dir 256 not found
[4/7] checking fs roots                        (0:00:00 elapsed, 2 items checked)
ERROR: errors found in fs roots
found 3508584407040 bytes used, error(s) found
total csum bytes: 3417906432
total tree bytes: 3619651584
total fs tree bytes: 16384
total extent tree bytes: 26083328
btree space waste bytes: 135962918
file data blocks allocated: 880803840
 referenced 880803840

i can attach a full log if needed, but generally it comes down to types of errors above.

can this disk still be repaired or should i just reformat it and be done with it?
i did chunk scan (took over 10h) which came up with nothing.
zero-logs and other options result in abort; i cannot do btrfs-image (it appears to be much much smaller then i would expect).
volume info:



 Label: 'sum5'  uuid: xxx
        Total devices 1 FS bytes used 3.19TiB
        devid    1 size 7.28TiB used 3.30TiB path /dev/sdc

image dumped is just a couple of gigs, and -d says that data dump is not supported.

some other output of ‘safe’ commands i ran based on internet feedback:


thumlivex:~ # btrfs scrub start /dev/sdc ERROR: '/dev/sdc' is not a mounted btrfs device
thumlivex:~ # btrfs scrub start /dev/sdc
ERROR: '/dev/sdc' is not a mounted btrfs device
thumlivex:~ # btrfs fi sh
Label: 'sum5'  uuid: bc677067-effd-430a-90fa-ab1c6cd51de1
        Total devices 1 FS bytes used 3.19TiB
        devid    1 size 7.28TiB used 3.30TiB path /dev/sdc

thumlivex:~ # mount -o ro,usebackuproot /dev/sdc /mnt/3/
mount: /mnt/3: can't read superblock on /dev/sdc.
thumlivex:~ # btrfs restore /dev/sdc /mnt/1/temp/00/
parent transid verify failed on 3646998413312 wanted 3065 found 2747
parent transid verify failed on 3646998413312 wanted 3065 found 2747
parent transid verify failed on 3646998413312 wanted 3065 found 2747
Ignoring transid failure
thumlivex:~ # btrfs rescue super-recover /dev/sdc
All supers are valid, no need to recover
thumlivex:~ # btrfs rescue zero-log /dev/sdc
parent transid verify failed on 3646998413312 wanted 3065 found 2747
parent transid verify failed on 3646998413312 wanted 3065 found 2747
ERROR: could not open ctree
thumlivex:~ # btrfs rescue fix-device-size /dev/sdc
parent transid verify failed on 3646998413312 wanted 3065 found 2747
parent transid verify failed on 3646998413312 wanted 3065 found 2747
ERROR: could not open btrfs
thumlivex:~ # 

i also ran chunk-recover with no result.
any ideas would be welcome, it would be a pity to lose 3.3T of data because of a power outage… never had this kind of issues with ext4.

would be thankful for any insights on how to investigate more or fix partially as well.

thanks in advance!
Egor

By introducing the obligatory version prefix to the thread title some time ago, we hoped to get rid of threads were people forgot (out of excitement?) to post the most important information on what they are running. We have put there OTHER VERSION as a choice in the hope that people that apparently run an unsupported version then would be triggered to mention first and foremost what they use in their thread. Obviously we failed to get that message over to you.

Can you please tell us what version of openSUSE you are using on that system where you have problems?

As a general remark (not knowing what you are doing), you posted this in Hardware, thus you seem to think that the disk is broken. Then I would think that trying to repair the broken disk by e.g. doing things on a file system on it, or by re-creating a new file system on it, will fail. The best advice IMHO is then to recover as many files as you can ASAP (but I assume you already have a rather recent backup).

hi, thanks for these coments.

frankly i did not know which dorum to select, since this is disk related i chose this one. there is no hw issue on my side, i checked everything.

would be happy for suggestion where to move this thread.

thanks!

opensuse 15.2

but i really thought topic is not version-related :slight_smile:

It not being hardware related, but on the system level software, I moved it to Install/Boot/login.

And I changed the prefix to 15.2. It might be that you think it is not version related, but when we ask so, why should one on purpose ignore that question?

And it i thus not the disk that you want repaired, but the file system.

While I am not a btrfs expert, some things I see wonder me.

ERROR: '/dev/sdc' is not a mounted btrfs device

Which seems rather clear to me, but you carry on trying the same command (with the same result, that is where computer are good at, doing the same thing a lot of times).

Then you try to mount:

thumlivex:~ # mount -o ro,usebackuproot /dev/sdc /mnt/3/
mount: /mnt/3: can't read superblock on /dev/sdc.

It is always better to tell the munt command the type of file system you try to mount, else you depend on the “try and error” which will be done.
In any case, what the “try and error” resulted in, is an unreadable superblock. And that is normally the end of the story for a file system.

In any case I am interested in that disk. Can you please post:

fdisk -l

and

lsblk -f

Is it HDD or SSD? What exact model?

parent transid verify failed on 3646998413312 wanted 3065 found 2747

That is usually indication of hardware misbehavior. Kernel log from before the problem would be nice, although if it is on the same filesystem … did you have any power outage when this happened?

any ideas would be welcome, it would be a pity to lose 3.3T of data because of a power outage…

So there was power outage.

You can try “btrfs restore” if it will be able to recover something. Otherwise you better post to btrfs mailing list where there are better chances to get developer response. As fa as I know btrfs IRC channel is also pretty alive.

But with such amount of lost writes I would not hold my breath.

firstly, i tried in different variations ‘safe’ commands (also with -t btrfs for mount, same result), so above was just a transcript of one of my tries.

regarding outputs:

dtsrv:~ # fdisk -l
Disk /dev/sda: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk model: ST10000NM0016-1T
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: xxxx

Device        Start         End     Sectors  Size Type
/dev/sda1      2048    33556479    33554432   16G Linux swap
/dev/sda2  33556480    75499519    41943040   20G Linux filesystem
/dev/sda3  75499520 19532873694 19457374175  9.1T Linux filesystem


Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EZRX-00S
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: xxxx

Device     Start        End    Sectors  Size Type
/dev/sdb1   2048 7814035455 7814033408  3.7T Linux filesystem


Disk /dev/sdc: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000VN004-2M21
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdd: 4.6 TiB, 5000981078016 bytes, 9767541168 sectors
Disk model: WDC WD50EZRX-00M
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: xxxx

Device     Start        End    Sectors  Size Type
/dev/sdd1   2048 9767540735 9767538688  4.6T Linux filesystem
dtsrv:~ # lsblk -f
NAME   FSTYPE LABEL   UUID                                 FSAVAIL FSUSE% MOUNTPOINT
sda
├─sda1 swap   swap15  488546a8-7b60-419f-b06b-2673226e58bf                [SWAP]
├─sda2 ext4   leap15  deb04518-7639-410b-b4ec-423fd703b9b5   11.6G    35% /
└─sda3 ext4   sum1    93fc994a-6200-432d-9f7f-d649fa9327b4    1.5T    78% /sum1
sdb
└─sdb1 ext4   sum4    e1df6c9d-f840-4c16-b503-ef2bf68b66e4    1.4T    56% /sum4
sdc    btrfs  sum5    bc677067-effd-430a-90fa-ab1c6cd51de1
sdd
└─sdd1 ext4   sum5old 72e70bc1-1b5e-4f9b-b4c7-666d2f8fce33      3T    33% /sum5old

hi, its a ‘normal’ hdd connected through sata6. exact model is listed in the output above as well:

Disk /dev/sdc: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000VN004-2M21
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes 

yes, tried that as well:

dtsrv:~ # btrfs restore -xmSvi /dev/sdc /sum1/temp/00/
parent transid verify failed on 3646998413312 wanted 3065 found 2747
parent transid verify failed on 3646998413312 wanted 3065 found 2747
parent transid verify failed on 3646998413312 wanted 3065 found 2747
parent transid verify failed on 3646998413312 wanted 3065 found 2747
Ignoring transid failure
Done searching

as you can see not much. also if i just try to list stuff - it shows nothing.

ok will try to look for mailing lists, thanks!

haven’t been on irc for 15 years maybe :smiley: will install a client, seems like worthy thing to do.

thanks!

https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list

At least lsblk still thinks it is btrfs. So something must be recognizable even with a broken superblock.