Extra Fun With Backup To External Disk

Backup failed with:

**erlangen:~ #** journalctl --since 07:00 --until 07:10 -u backup-home-WD25.service   
-- Logs begin at Fri 2021-06-04 21:27:14 CEST, end at Wed 2021-06-16 15:14:22 CEST. -- 
Jun 16 07:03:18 erlangen systemd[1]: Starting Backup /home to USB disk... 
Jun 16 07:03:22 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/Bilder/2015/2015-02-12/20150216075315-77010-map.html": Input/output error (5) 
Jun 16 07:03:22 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/Bilder/2015/2015-03-10/20150316022150-04180-map.html": Input/output error (5) 
Jun 16 07:03:29 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/Bilder/2017/2017-06-16/20170617124057-07168-map.html": Input/output error (5) 
Jun 16 07:03:34 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/Bilder/2018/2018-05-30/.jalbum/p1230368.mp4.info": Input/output error (5) 
Jun 16 07:03:39 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/Bilder/2019/2019-09-04/p1310878.jpg": Input/output error (5) 
Jun 16 07:04:27 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/jAlbums/Bilder/2009/2009-07-14/thumbs/imgp7089.jpg": Input/output error (5) 
Jun 16 07:04:32 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/jAlbums/Bilder/2010/2010-05-19/slides/imgp9397.jpg": Input/output error (5) 
Jun 16 07:04:34 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/jAlbums/Bilder/2010/2010-08-09/thumbs/imgp0064.jpg": Input/output error (5) 
Jun 16 07:04:37 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/jAlbums/Bilder/2014/2014-02-01/thumbs/p1010670.jpg": Input/output error (5) 
Jun 16 07:04:38 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/jAlbums/Bilder/2014/2014-03-26/thumbs/p1020445.jpg": Input/output error (5) 
Jun 16 07:04:39 erlangen rsync[13087]: rsync: [generator] recv_generator: failed to stat "/WD25/backup/home/Albums/jAlbums/Bilder/2014/2014-05-17/slides/p1030429.jpg": Input/output error (5) 
Jun 16 07:04:58 erlangen rsync[13086]: rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(701) [sender=3.2.3] 
Jun 16 07:04:58 erlangen systemd[1]: **backup-home-WD25.service: Main process exited, code=exited, status=20/n/a**
Jun 16 07:04:58 erlangen rsync[13087]: rsync error: received SIGUSR1 (code 19) at main.c(1608) [generator=3.2.3] 
Jun 16 07:04:58 erlangen systemd[1]: **backup-home-WD25.service: Failed with result 'exit-code'.**
Jun 16 07:04:58 erlangen systemd[1]: Stopped Backup /home to USB disk. 
Jun 16 07:04:58 erlangen systemd[1]: **backup-home-WD25.service: Consumed 17.297s CPU time.**
**erlangen:~ #**

Unmounted and ran fsck on 2TB ext4. Got prompted for continuing after read error in directory inode. When prompted for overwriting I did so. Fsck would hang and even ‘kill -9’ wouldn’t work. Thus I unplugged the USB drive and plugged it in again. Another fsck reported a clean file system. I started a long selftest (400 minutes):

**erlangen:~ #** smartctl -l selftest /dev/sdd            
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.12.9-1-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org 

=== START OF READ SMART DATA SECTION === 
SMART Self-test log structure revision number 1 
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error 
**# 1  Extended offline    Completed without error       00%       899         - **
# 2  Extended offline    Completed without error       00%         8         - 

**erlangen:~ #**

Restarted backup service and got a nice message:

**erlangen:~ #** journalctl --since 13:00 -u backup-home-WD25.service                 
-- Logs begin at Fri 2021-06-04 21:27:14 CEST, end at Wed 2021-06-16 15:24:47 CEST. -- 
Jun 16 15:07:31 erlangen systemd[1]: Starting Backup /home to USB disk... 
Jun 16 15:12:55 erlangen systemd[1]: backup-home-WD25.service: Succeeded. 
Jun 16 15:12:55 erlangen systemd[1]: Finished Backup /home to USB disk. 
Jun 16 15:12:55 erlangen systemd[1]: **backup-home-WD25.service: Consumed 42.747s CPU time.**
**erlangen:~ #**

Smart attributes are what they should be:

**erlangen:~ #** smartctl -A /dev/sdd                                 
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.12.9-1-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org 

=== START OF READ SMART DATA SECTION === 
SMART Attributes Data Structure revision number: 16 
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE 
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0 
  3 Spin_Up_Time            0x0027   208   200   021    Pre-fail  Always       -       4600 
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       598 
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0 
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0 
**  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       899 **
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0 
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0 
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       302 
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       258 
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2471 
194 Temperature_Celsius     0x0022   113   101   000    Old_age   Always       -       39 
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0 
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0 
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0 
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0 
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0 

**erlangen:~ #** 

Note the 899 Power_On_Hours. An error can occur at any age of the drive.

This drive started to fail smart testing:

**erlangen:~ #** smartctl -i /dev/sdb          
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.2-1-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org 

=== START OF INFORMATION SECTION === 
Model Family:     Western Digital Green 
Device Model:     WDC WD40EZRX-22SPEB0 
Serial Number:    WD-WCC4E2FYXSNV 
LU WWN Device Id: 5 0014ee 262d2e71e 
Firmware Version: 80.00A80 
User Capacity:    4,000,787,030,016 bytes [4.00 TB] 
Sector Sizes:     512 bytes logical, 4096 bytes physical 
Rotation Rate:    5400 rpm 
Device is:        In smartctl database [for details use: -P show] 
ATA Version is:   ACS-2 (minor revision not indicated) 
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) 
Local Time is:    Thu Jul 22 06:48:17 2021 CEST 
SMART support is: Available - device has SMART capability. 
SMART support is: Enabled 

**erlangen:~ #**

Bad sectors caused short offline tests to fail:

**erlangen:~ #** smartctl -l selftest /dev/sdb 
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.2-1-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org 

=== START OF READ SMART DATA SECTION === 
SMART Self-test log structure revision number 1 
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error 
** # 1  Extended offline    Completed without error       00%     12832         - **
** # 2  Short offline       Completed without error       00%     12815         - **
# 3  Short offline       Completed: read failure       20%     12799         11653928 
# 4  Short offline       Completed: read failure       20%     12799         11651807 
# 5  Short offline       Completed: read failure       20%     12799         11651806 
# 6  Short offline       Completed: read failure       20%     12799         11651805 
# 7  Short offline       Completed: read failure       20%     12798         11651803 
# 8  Short offline       Completed: read failure       20%     12798         11651802 
# 9  Short offline       Completed: read failure       20%     12798         11651801 
#10  Short offline       Completed: read failure       20%     12798         11650680 
#11  Short offline       Completed: read failure       20%     12798         11651800 
#12  Short offline       Completed: read failure       90%     12734         11651800 
#13  Short offline       Completed: read failure       90%     12718         11648558 
#14  Short offline       Interrupted (host reset)      90%     12678         - 
#15  Short offline       Completed: read failure       20%     12662         11651800 
**#16  Extended offline    Completed: read failure       90%     12641         11651800 **
#17  Conveyance offline  Completed without error       00%     12564         - 
#18  Extended offline    Interrupted (host reset)      90%     11508         - 
#19  Short offline       Completed without error       00%     10641         - 
#20  Extended offline    Interrupted (host reset)      90%     10386         - 
#21  Extended offline    Interrupted (host reset)      70%     10331         - 
13 of 13 failed self-tests are outdated by newer successful extended offline self-test # 1 

**erlangen:~ #**

Recovery used procedures from https://www.smartmontools.org/wiki/BadBlockHowto debugfs revealed all bad sectors were owned by a single file. Read errors were confirmed using hdparm --read-sector 11651800 /dev/sdb Writing to the bad sectors fixed them: hdparm --yes-i-know-what-i-am-doing --write-sector 11651800 /dev/sdb

Subsequent checking revealed that the affected file was intact. Reading the drive with ‘dd if=/dev/sdb of=/dev/null bs=4M status=progress’ succeeded as well as short and extended SMART self tests.

Since recovering error rates #1 and #200 stay constant during operation. Pending sectors #197 and uncorrectable sectors #198 are fixed but reported erroneously.

[FONT=monospace]**erlangen:~ #** smartctl -A /dev/sdb              
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.2-1-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org 

=== START OF READ SMART DATA SECTION === 
SMART Attributes Data Structure revision number: 16 
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE 
**  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       492 **
  3 Spin_Up_Time            0x0027   180   175   021    Pre-fail  Always       -       7966 
  4 Start_Stop_Count        0x0032   094   094   000    Old_age   Always       -       6868 
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0 
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0 
  9 Power_On_Hours          0x0032   083   083   000    Old_age   Always       -       12840 
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0 
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0 
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2375 
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       68 
193 Load_Cycle_Count        0x0032   198   198   000    Old_age   Always       -       6831 
194 Temperature_Celsius     0x0022   121   110   000    Old_age   Always       -       31 
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0 
**197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       4 **
**198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       2 **
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0 
**200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       110 **

**erlangen:~ #**
[/FONT]

The following test is expected to fix the inconsistency:

**erlangen:~ #** smartctl -t offline /dev/sdb              
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.2-1-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org 

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === 
Sending command: "Execute SMART off-line routine immediately in off-line mode". 
Drive command "Execute SMART off-line routine immediately in off-line mode" successful. 
Testing has begun. 

Please wait **53520** seconds for test to complete. 
Test will complete after Thu Jul 22 22:05:05 2021 CEST 
Use smartctl -X to abort test. 
**erlangen:~ #**