Any way to pin metadata to SSD in btrfs RAID1 profile?

Hey everyone :wave:

I was wondering if it was possible to pin metadata to faster SSD drives in a btrfs RAID1 pool with data on HDDs? :thinking:

My pool is quite unusual and is made up of 4+2+1+1 TB external HDDs as of now.
Lately though I’ve been getting write errors with the smaller drives being too slow and getting bogged down, timing out on write commands. Just to make sure it wasn’t an enclosure or HDD problem I tried swapping both with new ones and it didn’t make it any better.

Aug 03 17:18:14 suse-pc kernel: sd 5:0:0:0: [sdd] tag#0 timing out command, waited 180s
Aug 03 17:18:14 suse-pc kernel: I/O error, dev sdd, sector 324666880 op 0x1:(WRITE) flags 0x100000 phys_seg 64 prio class 2

Layout and usage:

pavin@suse-pc:~> sudo btrfs filesystem show /usbraid/
Please enter the PIN: 
Please touch the device.
Label: 'usbraid'  uuid: 378f24e8-6537-4a27-b827-b133f902b3f4
	Total devices 4 FS bytes used 1.39TiB
	devid    1 size 1.82TiB used 1.08TiB path /dev/mapper/wd2t
	devid    4 size 3.64TiB used 1.41TiB path /dev/mapper/seagate4t
	devid    5 size 931.50GiB used 173.03GiB path /dev/mapper/seagate_red
	devid    6 size 931.50GiB used 165.03GiB path /dev/mapper/seagate_black

pavin@suse-pc:~> 
pavin@suse-pc:~> 
pavin@suse-pc:~> sudo btrfs filesystem usage -T /usbraid/
Overall:
    Device size:		   7.28TiB
    Device allocated:		   2.82TiB
    Device unallocated:		   4.46TiB
    Device missing:		     0.00B
    Device slack:		     0.00B
    Used:			   2.78TiB
    Free (estimated):		   2.25TiB	(min: 2.25TiB)
    Free (statfs, df):		   1.51TiB
    Data ratio:			      2.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)
    Multiple profiles:		        no

                             Data      Metadata System                               
Id Path                      RAID1     RAID1    RAID1     Unallocated Total     Slack
-- ------------------------- --------- -------- --------- ----------- --------- -----
 1 /dev/mapper/wd2t            1.08TiB  1.00GiB         -   757.97GiB   1.82TiB     -
 4 /dev/mapper/seagate4t       1.41TiB  2.00GiB  64.00MiB     2.23TiB   3.64TiB     -
 5 /dev/mapper/seagate_red   173.00GiB        -  32.00MiB   758.46GiB 931.50GiB     -
 6 /dev/mapper/seagate_black 164.00GiB  1.00GiB  32.00MiB   766.46GiB 931.50GiB     -
-- ------------------------- --------- -------- --------- ----------- --------- -----
   Total                       1.41TiB  2.00GiB  64.00MiB     4.46TiB   7.28TiB 0.00B
   Used                        1.39TiB  1.49GiB 240.00KiB                            
pavin@suse-pc:~> 

I have ordered two additional SSDs to act as a cache, planning to use it with bcachefs, but kind of apprehensive after reading this.
Bcachefs also doesn’t support scrub currently and I would rather try to make do with btrfs if possible. Searching the web provides conflicting information with github issues for adding this feature and Synology website saying they support it. IIRC they use btrfs with bcache as a caching layer. I’m not too fond of this idea either as my recent device replace/rebuild after moving to a new drive/enclosure for troubleshooting took 3 whole days! :flushed:
Looking at iostat while it was running it seems to have been caused by metadata update delays.

P.S. I have run btrfs check and there were no errors found, but I do get correctable errors during scrub after encountering write errors (which is common now unfortunately). Also no SMART errors on any drive.

No, it is not possible.

1 Like

Sigh, I will try my luck with bcachefs :pleading_face:

Update: I switched to bcachefs with 2 small SSDs as metadata/foreground targets to a group of mismatched HDDs and it’s working quite well as far as performance and stability is concerned :crossed_fingers:. Bcachefs doesn’t currently support full filesystem rebalance or scrub though patches in Linux kernel 6.11 should fix errors on the fly when reading.

Layout and usage:

Filesystem: 21b47fe4-4104-45fb-9e39-9be1cdddcda3
Size:                       7.93 TiB
Used:                       2.01 TiB
Online reserved:                 0 B

Data type       Required/total  Durability    Devices
btree:          1/2             2             [sdb sde]           1.00 MiB
btree:          1/2             2             [sdb sdc]           19.1 GiB
btree:          1/2             2             [sdc sde]            512 KiB
user:           1/2             2             [sdh sde]           51.2 GiB
user:           1/2             2             [sdg sdf]           2.70 GiB
user:           1/2             2             [sdf sdh]            937 GiB
user:           1/2             2             [sde sdd]           37.7 GiB
user:           1/2             2             [sdg sde]           25.5 GiB
user:           1/2             2             [sdf sdd]           6.73 MiB
user:           1/2             2             [sdh sdd]           16.8 MiB
user:           1/2             2             [sdg sdh]            937 GiB
user:           1/2             2             [sdg sdd]           3.91 MiB
user:           1/2             2             [sdf sde]           25.5 GiB
cached:         1/1             1             [sdc]                221 GiB
cached:         1/1             1             [sdf]               46.6 MiB
cached:         1/1             1             [sdb]                221 GiB
cached:         1/1             1             [sdg]               48.7 MiB
cached:         1/1             1             [sdh]               42.0 MiB

hdd.hgst1t (device 6):           sdd              rw
                                data         buckets    fragmented
  free:                      909 GiB         1860759
  sb:                       3.00 MiB               7       508 KiB
  journal:                  4.00 GiB            8192
  btree:                         0 B               0
  user:                     18.9 GiB           38781      53.2 MiB
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                  932 GiB         1907739

hdd.seagate4t (device 5):        sde              rw
                                data         buckets    fragmented
  free:                     3.56 TiB         3734647
  sb:                       3.00 MiB               4      1020 KiB
  journal:                  8.00 GiB            8192
  btree:                     768 KiB               3      2.25 MiB
  user:                     69.9 GiB           72601       985 MiB
  cached:                        0 B               0
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                 3.64 TiB         3815447

hdd.seagate_black (device 2):    sdg              rw
                                data         buckets    fragmented
  free:                      446 GiB         1826969
  sb:                       3.00 MiB              13       252 KiB
  journal:                  2.00 GiB            8192
  btree:                         0 B               0
  user:                      483 GiB         1979834       585 MiB
  cached:                   48.7 MiB             470
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                  932 GiB         3815478

hdd.seagate_red (device 3):      sdf              rw
                                data         buckets    fragmented
  free:                      446 GiB         1827018
  sb:                       3.00 MiB              13       252 KiB
  journal:                  2.00 GiB            8192
  btree:                         0 B               0
  user:                      483 GiB         1979785       590 MiB
  cached:                   46.6 MiB             470
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                  932 GiB         3815478

hdd.wd2t (device 4):             sdh              rw
                                data         buckets    fragmented
  free:                      897 GiB         3673111
  sb:                       3.00 MiB              13       252 KiB
  journal:                  2.00 GiB            8192
  btree:                         0 B               0
  user:                      963 GiB         3948814      1.06 GiB
  cached:                   42.0 MiB             658
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                 1.82 TiB         7630788

ssd.ssd_green (device 0):        sdb              rw
                                data         buckets    fragmented
  free:                     8.75 GiB           35826
  sb:                       3.00 MiB              13       252 KiB
  journal:                  1.75 GiB            7154
  btree:                    9.53 GiB           39039
  user:                          0 B               0
  cached:                    203 GiB          833714
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                  224 GiB          915746

ssd.ssd_orange (device 1):       sdc              rw
                                data         buckets    fragmented
  free:                     8.75 GiB           35826
  sb:                       3.00 MiB              13       252 KiB
  journal:                  1.75 GiB            7154
  btree:                    9.53 GiB           39038
  user:                          0 B               0
  cached:                    203 GiB          833715
  parity:                        0 B               0
  stripe:                        0 B               0
  need_gc_gens:                  0 B               0
  need_discard:                  0 B               0
  capacity:                  224 GiB          915746

Torvalds Expresses Regret Over Merging Bcachefs into Kernel

"Overstreet defended Bcachefs’s reliability, claiming it to be more trustworthy than its counterpart, Btrfs, especially in scenarios where data recovery is crucial.

He cited numerous instances and comparisons where Bcachefs outperformed other file systems, including XFS, regarding robustness and reliability.

Torvalds responded skeptically, suggesting that broader adoption and testing across major Linux distros would be necessary to validate such claims."

Just some light kernel drama before the 6.11 release :popcorn:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.