BTRFS error writing

Today got 2 errors with a console screen logout:

BTRFS error wr 59, rd 2, flush 0

Kernel 6.0.1.1-default

**Disk /dev/nvme0n1p2: 476.44 GiB, 511570870272 bytes, 999161856 sectors**
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

sudo smartctl -x /dev/nvme0n1p2

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.0.1-1-default] (SUSE RPM)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       ADATA SX8200PNP
Serial Number:                      2J1720082838
Firmware Version:                   R0906I
PCI Vendor/Subsystem ID:            0x1cc1
IEEE OUI Identifier:                0x000000
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Nov  2 21:51:03 2022 +04
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     75 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0
 1 +     4.60W       -        -    1  1  1  1        0       0
 2 +     3.80W       -        -    2  2  2  2        0       0
 3 -   0.0450W       -        -    3  3  3  3     2000    2000
 4 -   0.0040W       -        -    4  4  4  4     6000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        41 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    53%
Data Units Read:                    97,588,296 [49.9 TB]
Data Units Written:                 293,629,194 [150 TB]
Host Read Commands:                 821,170,565
Host Write Commands:                11,886,970,122
Controller Busy Time:               36,097
Power Cycles:                       2,976
Power On Hours:                     12,287
Unsafe Shutdowns:                   142
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Thermal Temp. 1 Transition Count:   74
Thermal Temp. 2 Transition Count:   2
Thermal Temp. 1 Total Time:         408
Thermal Temp. 2 Total Time:         15

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

sudo btrfs insp dump-s /dev/nvme0n1p2 | grep dev_item.total_bytes


**dev_item.total_bytes**    511570870272

ADATA specifies 320 TBW: https://www.adata.com/upload/downloadfile/Datasheet_XPG%20SX8200%20Pro_EN_20181017.pdf The drive experiences some write amplification. The high number of write commands points to non-sequential writes, which are bad. You may need a new drive.

It’s possible to mark bad sectors on the SSD, what the utilities do it?

systemctl status fstrim.timer
**●** fstrim.timer - Discard unused blocks once a week
     Loaded: loaded (/usr/lib/systemd/system/fstrim.timer; **enabled**; preset: **enabled**)
     Active: **active (waiting)** since Thu 2022-11-03 10:30:35 +04; 34min ago
      Until: Thu 2022-11-03 10:30:35 +04; 34min ago
    Trigger: Mon 2022-11-07 00:04:17 +04; 3 days left
   Triggers: ● fstrim.service
       Docs: man:fstrim


Looks like nothing more can be done.

This is about HDDs, but the procedure should apply to SSDs too: https://forums.opensuse.org/showthread.php/555649-Extra-Fun-With-Backup-To-External-Disk

Your SSD experiences undue stress:


Available Spare:                    100%
Available Spare Threshold:          10% 
Percentage Used:                    53% 
Data Units Read:                    97,588,296 [49.9 TB]
Data Units Written:                 293,629,194 [150 TB] 
Host Read Commands:                 821,170,565
Host Write Commands:                11,886,970,122 

More balanced values are:

**erlangen:~ #** smartctl -x /dev/nvme0n1 
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.0.5-1-default] (SUSE RPM) 
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org 

=== START OF INFORMATION SECTION === 
Model Number:                       Samsung SSD 950 PRO 512GB 
Serial Number:                      S2GMNX0H609998K 
Firmware Version:                   2B0QBXX7 
PCI Vendor/Subsystem ID:            0x144d 
IEEE OUI Identifier:                0x002538 
Controller ID:                      1 
NVMe Version:                       <1.2 
Number of Namespaces:               1 
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB] 
Namespace 1 Utilization:            167,286,226,944 [167 GB] 
Namespace 1 Formatted LBA Size:     512 
Namespace 1 IEEE EUI-64:            002538 5661b057b5 
Local Time is:                      Thu Nov  3 07:52:56 2022 CET 
...
=== START OF SMART DATA SECTION === 
SMART overall-health self-assessment test result: PASSED 

SMART/Health Information (NVMe Log 0x02) 
Critical Warning:                   0x00 
Temperature:                        33 Celsius 
**Available Spare:                    100% 
Available Spare Threshold:          10% 
Percentage Used:                    2% **
**Data Units Read:                    80,766,327 [41.3 TB] 
Data Units Written:                 50,549,161 [25.8 TB] 
Host Read Commands:                 1,010,845,176 
Host Write Commands:                998,356,243 
**Controller Busy Time:               6,831 
Power Cycles:                       2,822 
Power On Hours:                     21,606 
Unsafe Shutdowns:                   103 
Media and Data Integrity Errors:    0 
Error Information Log Entries:      6,114 
...
**erlangen:~ #**

Should I use

smartctl -t offline

or something else? Because this provides me nothing:

sudo smartctl -t offline /dev/nvme0n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.0.1-1-default] (SUSE RPM)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

NVMe device successfully opened

Use 'smartctl -a' (or '-x') to print SMART (and more) information


From the man page

offline - [ATA] runs SMART Immediate Offline Test. This immediately starts the test described above. This command can be given during normal system operation. The effects of this test are visible only in that it updates the SMART Attribute values, and if errors are found they will appear in the SMART error log, visible with the ‘-l error’ option.

So, does it possible to mark bad sectors to prolong the NVME live?

SSD’s have spare memory cells and the internal SSD controler makes sure to reallocate and use the spare memory cells if needed. You can’t trigger this process as this is the job of the internal SSD controler.

Here an explanation (from another SSD manufacturer as yours): My SSD Has Bad Sectors | Crucial.com

Yeah, it should do it automatically, but I got the errors every day. So, at least I freed up the space, possible it could help for a while.

Thanks for advice!