Infamous Host erlangen

Wear and tear of desktop class 4TB HDD model WD40EZRX-22SPEB0

Vendors issue very optimistic figures on MTBF of their drives. Actual values are much lower. From avherald.com:

Yes, I always have a good number of HDDs on stock - they usually fail after 20,000-50,000 hours (2.2 years to 5.7 years) although the manufacturer claims a MTBF of 1.6 million hours (182 years).

The above matches well other experience: SMART/HDD/WDC/README.md at master · linuxhw/SMART · GitHub

A thorough test of the drive was squeezing the existing 4TB ext4 partition, adding a btrfs partition and rsyncing the two. Then the ext4 on partition sdb1 was deleted and the unused space added to btrfs on sdb2.

erlangen:~ # btrfs filesystem usage -T /media/61fc4107-d7da-4c0b-a1f4-d92aa6fc1d26/
Overall:
    Device size:                   3.64TiB
    Device allocated:              1.17TiB
    Device unallocated:            2.47TiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                          1.17TiB
    Free (estimated):              2.47TiB      (min: 1.24TiB)
    Free (statfs, df):             2.47TiB
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                  no

             Data    Metadata System                             
Id Path      single  DUP      DUP       Unallocated Total   Slack
-- --------- ------- -------- --------- ----------- ------- -----
 1 /dev/sdb2 1.16TiB  6.00GiB  16.00MiB   762.10GiB 1.91TiB     -
 2 /dev/sdb1 6.00GiB        -         -     1.72TiB 1.73TiB     -
-- --------- ------- -------- --------- ----------- ------- -----
   Total     1.16TiB  3.00GiB   8.00MiB     2.47TiB 3.64TiB 0.00B
   Used      1.16TiB  2.50GiB 176.00KiB                          
erlangen:~ # 

Issuing btrfs device remove /dev/sdb2 resulted in a fatal failure. Moving blocks using this command is a great sanity check of any drive.

Metadata are perfectly consistent:

erlangen:~ # btrfs check --force /dev/sdb2
Opening filesystem to check...
WARNING: filesystem mounted, continuing because of --force
Checking filesystem on /dev/sdb2
UUID: 61fc4107-d7da-4c0b-a1f4-d92aa6fc1d26
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 1278897782784 bytes used, no error found
total csum bytes: 1246301616
total tree bytes: 2684928000
total fs tree bytes: 1093910528
total extent tree bytes: 148422656
btree space waste bytes: 423718490
file data blocks allocated: 1276212854784
 referenced 1276212854784
erlangen:~ # 

However data are rotten as exposed by the numerous errors in the journal.

erlangen:~ # smartctl -A /dev/sdb
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.4.4-1-default] (SUSE RPM)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       3191
  3 Spin_Up_Time            0x0027   253   175   021    Pre-fail  Always       -       2291
  4 Start_Stop_Count        0x0032   091   091   000    Old_age   Always       -       9028
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       14636
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       3493
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       130
193 Load_Cycle_Count        0x0032   198   198   000    Old_age   Always       -       8928
194 Temperature_Celsius     0x0022   114   107   000    Old_age   Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       108

erlangen:~ # 

Backup of infamous host erlangen now relies on SSDs and NAS grade HDDs:

  1. Samsung model: SSD 970 EVO Plus 2TB , size: 1.82 TiB
  2. Seagate model: ST8000VN004-2M2101 size: 7.28 TiB