PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID)

After June 1 there are a lot of error messages in the log:

kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:00:00.0
kernel: pcieport 0000:00:01.1: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
kernel: pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000080/00006000
kernel: pcieport 0000:00:01.1:    [ 7] BadDLLP               
kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID)
kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0

PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID)

Threads on various forums give the answer that it is not very dangerous and to change the settings in BIOS or kernel parameters.

But last days got some crashes and next error means to replace NVMe SSD?
BTRFS error (device nvme0n1p2 state EA): bdev /dev/nvme0n1p2 errs: wr 741, rd 10, flush 0, corrupt 0, gen 0

2 days before did upgrade BIOS, will try to downgrade it. Same system, where had problems with kernel 6.8.9 on AMD Ryzen 1700 and RX 6600XT

Only a year old Samsung

Model Number:                       Samsung SSD 970 EVO Plus 2TB
Serial Number:                      S6P1NX0TA13447Z
Firmware Version:                   4B2QEXM7

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    78,074,959 [39.9 TB]
Data Units Written:                 62,231,474 [31.8 TB]
Host Read Commands:                 796,731,063
Host Write Commands:                1,733,613,218
Controller Busy Time:               6,730
Power Cycles:                       404
Power On Hours:                     1,945
Unsafe Shutdowns:                   13
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               45 Celsius
Temperature Sensor 2:               60 Celsius

@desmond Hi, that’s one busy device!!! So does it have a heatsink? Perhaps it needs re-seating in the M.2 slot or some new thermal tape fitted…

Based on your info, the spec allows 0.328/DWPD, your system is running at 0.394/DWPD

DWPD = TBW / (365 * Warranty (Years) * Capacity (TB) )

For that device TBW is 1200, Warranty is 5 years Capacity 2.

If you think your not writing to the device much, then you need to investigate what is…

Problem solved, replaced with new nvme disk. Maybe it’s not a good idea to put so many Virtualbox VMs on that disk

@desmond I use multiple vm’s but not on btrfs, but xfs, but still, it’s on the same nvme… But even then, that’s a lot of writing in a short time.