The system swears at ssd.

Arbichev · July 2, 2021, 6:16pm

Background. Collected a system unit for the house. For the system, I allocated a new ssd with an operating time of 35 hours and 62 Power-on. I was tormented for a long time what to put - Leap or Tumbleweed. In the end, I decided that Tumbleweed was more interesting to me. I put it from fresh: on June 16 I downloaded the image from June 15 and on June 17 I installed the system. After the first reboot, a message came out:

https://susepaste.org/17493695

I was very surprised, as I have been using openSUSE for over 15 years, but during this time I have not seen anything like it. Interestingly, this message appears only once after the system boots. Close this message and move on. The system is working perfectly. Here is such a weirdness.

dcurtisfra · July 2, 2021, 7:08pm

@Arbichev:

Please translate the system message for us – we’re not universal – at most we’re bi-lingual but, mostly not …

Svyatko · July 2, 2021, 10:31pm

~ “KDE diagnoses device unstable work for /dev/sda”

doscott · July 3, 2021, 12:38am

This is a new KDE feature (annoyance). It means that something has been found in the SMART log of the drive. You can examine the log with GSmartControl app. It may be something serious, or something caused by a power failure reset. The annoying thing is, this feature triggers a warning every time you boot, instead of just triggering on new occurrences.

Miuku · July 3, 2021, 7:46am

I’d say informing the user of possible hardware failure and catastrophic data loss is not an annoyance.

hcvv · July 3, 2021, 9:40am

But I doubt that this is a task for an end-user stack of programs like KDE.

And how can KDE show that on boot? KDE is not started on boot at all.

Confusing to me, but it could be that I did not digest correctly the small pieces of information spread throughout this thread :\

doscott · July 3, 2021, 12:48pm

I fully agree with you, for the first dozen or so warnings of the same event. But compare this with

your neighbor notices you have a flat tire on your car
they write this down, and the next time they see you, they let you know : great neighbor
you go away, come back with the issue resolved, they check their notes, see that you had a flat tire yesterday and then they call you up again
they keep doing this every day (over that original instance) : now they are an annoyance; each time you need to check if you have a new flat tire.
after a while you ignore your neighbors warnings and then fail to check when a new legitimate warning is made.

A good neighbor would have remembered they had notified you and stopped until something new came up.

karlmistelberger · July 3, 2021, 1:14pm

Users should alway be concerned about new messsages. I never have seen this since assembling host erlangen in 2016:

**erlangen:~ #** journalctl -b -2 -p3 -o short-monotonic -q 
    0.078041] erlangen kernel: **mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 3: f200000000800400**
    0.078044] erlangen kernel: **mce: [Hardware Error]: TSC 0 **
    0.078046] erlangen kernel: **mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1625243229 SOCKET 0 APIC 0 microcode ea**
    0.079849] erlangen kernel: **mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 3: fe00000000800400**
    0.079849] erlangen kernel: **mce: [Hardware Error]: TSC 0 ADDR 7fa4c68969d3 MISC 7fa4c68969d3 **
    0.079849] erlangen kernel: **mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1625243229 SOCKET 0 APIC 2 microcode ea**
    0.079849] erlangen kernel: **mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 3: fe00000000800400**
    0.079849] erlangen kernel: **mce: [Hardware Error]: TSC 0 ADDR ffffffffb00652df MISC ffffffffb00652df **
    0.079849] erlangen kernel: **mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1625243229 SOCKET 0 APIC 4 microcode ea**
    0.081423] erlangen kernel: **mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 3: fe00000000800400**
    0.081427] erlangen kernel: **mce: [Hardware Error]: TSC 0 ADDR 7fcaa8c6f2b0 MISC 7fcaa8c6f2b0 **
    0.081430] erlangen kernel: **mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1625243229 SOCKET 0 APIC 6 microcode ea**
    4.983272] erlangen systemd[1]: **Failed to start Rotate log files.**
 1320.870795] erlangen systemd[1]: **mandb.timer: Unit to trigger vanished.**
**erlangen:~ #**

You may want to check for scsi and usb messages:


**erlangen:~ #** journalctl -b -q -o short-monotonic _KERNEL_SUBSYSTEM=scsi _KERNEL_SUBSYSTEM=usb 
    0.336123] erlangen kernel: scsi host0: ahci 
    0.336381] erlangen kernel: scsi host1: ahci 
    0.336620] erlangen kernel: scsi host2: ahci 
    0.336733] erlangen kernel: scsi host3: ahci 
    0.336840] erlangen kernel: scsi host4: ahci 
    0.336931] erlangen kernel: scsi host5: ahci 
    0.669754] erlangen kernel: **scsi 2:0:0:0: Direct-Access     ATA      CT2000BX500SSD1  030  PQ: 0 ANSI: 5**
    0.670155] erlangen kernel: **sd 2:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)**
    0.670174] erlangen kernel: **sd 2:0:0:0: [sda] Write Protect is off**
    0.670179] erlangen kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
    0.670203] erlangen kernel: **sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA**
    0.670343] erlangen kernel: **scsi 3:0:0:0: Direct-Access     ATA      WDC WD40EZRX-22S 0A80 PQ: 0 ANSI: 5**
    0.670710] erlangen kernel: **sd 3:0:0:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)**
    0.670718] erlangen kernel: **sd 3:0:0:0: [sdb] 4096-byte physical blocks**
    0.670732] erlangen kernel: **sd 3:0:0:0: [sdb] Write Protect is off**
    0.670736] erlangen kernel: sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
    0.670756] erlangen kernel: **sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA**
    0.670901] erlangen kernel: **scsi 4:0:0:0: Direct-Access     ATA      Samsung SSD 850  3B6Q PQ: 0 ANSI: 5**
    0.671338] erlangen kernel: **sd 4:0:0:0: [sdc] 976773168 512-byte logical blocks: (500 GB/466 GiB)**
    0.671357] erlangen kernel: **sd 4:0:0:0: [sdc] Write Protect is off**
    0.671363] erlangen kernel: sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
    0.671389] erlangen kernel: **sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA**
    0.684670] erlangen kernel: **scsi 5:0:0:0: CD-ROM            PIONEER  DVD-RW  DVR-221  1.00 PQ: 0 ANSI: 5**
    0.696532] erlangen kernel: **sd 4:0:0:0: [sdc] supports TCG Opal**
    0.696540] erlangen kernel: **sd 4:0:0:0: [sdc] Attached SCSI disk**
    0.715642] erlangen kernel: **sd 2:0:0:0: [sda] Attached SCSI disk**
    0.763628] erlangen kernel: **sd 3:0:0:0: [sdb] Attached SCSI disk**
    0.886205] erlangen kernel: **sd 2:0:0:0: Attached scsi generic sg0 type 0**
    0.886224] erlangen kernel: **sd 3:0:0:0: Attached scsi generic sg1 type 0**
    0.886241] erlangen kernel: **sd 4:0:0:0: Attached scsi generic sg2 type 0**
    0.886255] erlangen kernel: **scsi 5:0:0:0: Attached scsi generic sg3 type 5**
    1.968685] erlangen kernel: usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.12 
    1.968688] erlangen kernel: usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 
    1.968689] erlangen kernel: usb usb1: Product: xHCI Host Controller 
    1.968691] erlangen kernel: usb usb1: Manufacturer: Linux 5.12.13-1-default xhci-hcd 
    1.968692] erlangen kernel: usb usb1: SerialNumber: 0000:00:14.0 
    1.969444] erlangen kernel: hub 1-0:1.0: USB hub found 
    1.969465] erlangen kernel: hub 1-0:1.0: 16 ports detected 
    1.970741] erlangen kernel: sr 5:0:0:0: [sr0] scsi3-mmc drive: 40x/40x writer dvd-ram cd/rw xa/form2 cdda tray 
    1.971628] erlangen kernel: usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.12 
    1.971631] erlangen kernel: usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 
    1.971633] erlangen kernel: usb usb2: Product: xHCI Host Controller 
    1.971634] erlangen kernel: usb usb2: Manufacturer: Linux 5.12.13-1-default xhci-hcd 
    1.971635] erlangen kernel: usb usb2: SerialNumber: 0000:00:14.0 
    1.971721] erlangen kernel: hub 2-0:1.0: USB hub found 
    1.971734] erlangen kernel: hub 2-0:1.0: 10 ports detected 
    2.050771] erlangen kernel: sr 5:0:0:0: Attached scsi CD-ROM sr0
    2.226503] erlangen kernel: usb 1-7: new low-speed USB device number 2 using xhci_hcd 
    2.392148] erlangen kernel: usb 1-7: New USB device found, idVendor=046a, idProduct=0011, bcdDevice= 1.00 
    2.392153] erlangen kernel: usb 1-7: New USB device strings: Mfr=0, Product=0, SerialNumber=0 
    2.518503] erlangen kernel: usb 1-8: new full-speed USB device number 3 using xhci_hcd 
    2.670717] erlangen kernel: usb 1-8: New USB device found, idVendor=046d, idProduct=c542, bcdDevice= 3.02 
    2.670727] erlangen kernel: usb 1-8: New USB device strings: Mfr=1, Product=2, SerialNumber=0 
    2.670733] erlangen kernel: usb 1-8: Product: Wireless Receiver 
    2.670737] erlangen kernel: usb 1-8: Manufacturer: Logitech 
**erlangen:~ #**

See also: https://forums.opensuse.org/showthread.php/555649-Extra-Fun-With-Backup-To-External-Disk

Arbichev · July 3, 2021, 6:03pm

I’m translate:
"The ssd storage device (128 GB “/dev/sda”) is showing signs of unstable performance".

Arbichev · July 3, 2021, 6:24pm

Complete GSmartControl error log:

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 18 (device log contains only the most recent 4 errors)
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.


Error 18 [1] occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
  When the command that caused the error occurred, the device was in an unknown state.


  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 40 00 00 00 00 38 00 6d 38 00 00   at LBA = 0x38006d38 = 939552056


  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 04 00 00 00 00 00 38 00 6d 38 40 08     00:03:58.890  READ FPDMA QUEUED
  60 00 18 00 a0 00 0d 31 00 1f a0 40 08     00:03:58.890  READ FPDMA QUEUED
  60 01 b8 00 f8 00 00 c6 00 6d b8 40 08     00:03:58.890  READ FPDMA QUEUED
  60 04 00 00 38 00 00 27 00 71 a0 40 08     00:03:58.880  READ FPDMA QUEUED
  60 04 00 00 30 00 00 45 00 6d 28 40 08     00:03:58.880  READ FPDMA QUEUED


Error 17 [0] occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
  When the command that caused the error occurred, the device was in an unknown state.


  After command completion occurred, registers were:

Arbichev · July 3, 2021, 6:27pm

Complete GSmartControl error log:

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 18 (device log contains only the most recent 4 errors)
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.


Error 18 [1] occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
  When the command that caused the error occurred, the device was in an unknown state.


  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 40 00 00 00 00 38 00 6d 38 00 00   at LBA = 0x38006d38 = 939552056


  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 04 00 00 00 00 00 38 00 6d 38 40 08     00:03:58.890  READ FPDMA QUEUED
  60 00 18 00 a0 00 0d 31 00 1f a0 40 08     00:03:58.890  READ FPDMA QUEUED
  60 01 b8 00 f8 00 00 c6 00 6d b8 40 08     00:03:58.890  READ FPDMA QUEUED
  60 04 00 00 38 00 00 27 00 71 a0 40 08     00:03:58.880  READ FPDMA QUEUED
  60 04 00 00 30 00 00 45 00 6d 28 40 08     00:03:58.880  READ FPDMA QUEUED


Error 17 [0] occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
  When the command that caused the error occurred, the device was in an unknown state.


  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 40 00 a0 00 00 9e 00 9a 68 00 00   at LBA = 0x9e009a68 = 2650839656


  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 03 70 00 a0 00 00 9e 00 9a 68 40 08     00:00:29.400  READ FPDMA QUEUED
  60 01 00 00 98 00 0d d9 00 33 00 40 08     00:00:29.400  READ FPDMA QUEUED
  60 00 10 00 d8 00 02 9e 00 96 b0 40 08     00:00:29.400  READ FPDMA QUEUED
  60 04 00 00 d0 00 00 a1 00 9a d8 40 08     00:00:29.400  READ FPDMA QUEUED
  60 01 08 00 90 00 0d ee 00 2c 00 40 08     00:00:29.400  READ FPDMA QUEUED


Error 16 [3] occurred at disk power-on lifetime: 20 hours (0 days + 20 hours)
  When the command that caused the error occurred, the device was in an unknown state.


  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 40 00 40 00 0d 39 00 10 00 00 00   at LBA = 0xd39001000 = 56790880256


  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 00 00 40 00 0d 39 00 10 00 40 08     00:00:29.240  READ FPDMA QUEUED
  60 00 08 00 a8 00 0e 09 00 91 a8 40 08     00:00:29.240  READ FPDMA QUEUED
  60 00 f0 00 38 00 0d c3 00 59 68 40 08     00:00:29.240  READ FPDMA QUEUED
  60 00 70 00 30 00 0d 03 00 20 50 40 08     00:00:29.240  READ FPDMA QUEUED
  60 00 20 00 28 00 0d c3 00 59 48 40 08     00:00:29.240  READ FPDMA QUEUED


Error 15 [2] occurred at disk power-on lifetime: 15 hours (0 days + 15 hours)
  When the command that caused the error occurred, the device was in an unknown state.


  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 40 00 18 00 0e d2 00 98 f0 00 00   at LBA = 0xed20098f0 = 63652796656


  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 08 00 18 00 0e d2 00 98 f0 40 08     00:00:25.130  READ FPDMA QUEUED
  60 04 00 00 10 00 00 94 00 31 98 40 08     00:00:25.130  READ FPDMA QUEUED
  60 00 08 00 68 00 0d 68 00 63 18 40 08     00:00:25.130  READ FPDMA QUEUED
  60 04 00 00 08 00 00 90 00 31 98 40 08     00:00:25.130  READ FPDMA QUEUED
  60 00 08 00 00 00 03 1d 00 51 68 40 08     00:00:25.130  READ FPDMA QUEUED

dcurtisfra · July 5, 2021, 3:45pm

@Arbichev:

Please provide the output of “smartctl --attributes --log=error /dev/sda” – user “root” …
[HR][/HR]For example – 120 GB SATA III SSD – not in the SmartCtl database:


 # smartctl --attributes --log=error /dev/sda
smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-lp152.78-default] (SUSE RPM)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       2604
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       419
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       100
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       11
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1724
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       6
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       3
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       7000
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       100
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       52
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       0
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       773
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       100
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       3807
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       35497
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1881

SMART Error Log Version: 1
No Errors Logged

 #

I’ve marked the types of errors which need attention …

Arbichev · July 5, 2021, 6:09pm

smartctl log file:


smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.12.13-1-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org 

=== START OF READ SMART DATA SECTION === 
SMART Attributes Data Structure revision number: 1 
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE 
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0 
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       0 
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       52 
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       78 
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
161 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       2856 
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0 
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       11 
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       7087 
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       0 
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0 
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0 
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       4 
194 Temperature_Celsius     0x0032   100   100   050    Old_age   Always       -       40 
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0 
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       0 
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       0 
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       0 
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0 
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       0 
241 Total_LBAs_Written      0x0032   100   100   050    Old_age   Always       -       2190 
242 Total_LBAs_Read         0x0032   100   100   050    Old_age   Always       -       1666 
249 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       2333 

SMART Error Log Version: 1 
ATA Error Count: 18 (device log contains only the most recent five errors) 
        CR = Command Register [HEX] 
        FR = Features Register [HEX] 
        SC = Sector Count Register [HEX] 
        SN = Sector Number Register [HEX] 
        CL = Cylinder Low Register [HEX] 
        CH = Cylinder High Register [HEX] 
        DH = Device/Head Register [HEX] 
        DC = Device Command Register [HEX] 
        ER = Error register [HEX] 
        ST = Status register [HEX] 
Powered_Up_Time is measured from power on, and printed as 
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, 
SS=sec, and sss=millisec. It "wraps" after 49.710 days. 

Error 18 occurred at disk power-on lifetime: 22 hours (0 days + 22 hours) 
  When the command that caused the error occurred, the device was in an unknown state. 

  After command completion occurred, registers were: 
  ER ST SC SN CL CH DH 
  -- -- -- -- -- -- -- 
  04 40 00 38 38 6d 00   at LBA = 0x006d3838 = 7157816 

  Commands leading to the command that caused the error were: 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name 
  -- -- -- -- -- -- -- --  ----------------  -------------------- 
  60 00 00 38 38 6d 40 08      00:03:58.890  READ FPDMA QUEUED 
  60 18 a0 a0 31 1f 40 08      00:03:58.890  READ FPDMA QUEUED 
  60 b8 f8 b8 c6 6d 40 08      00:03:58.890  READ FPDMA QUEUED 
  60 00 38 a0 27 71 40 08      00:03:58.880  READ FPDMA QUEUED 
  60 00 30 28 45 6d 40 08      00:03:58.880  READ FPDMA QUEUED 

Error 17 occurred at disk power-on lifetime: 22 hours (0 days + 22 hours) 
  When the command that caused the error occurred, the device was in an unknown state. 

  After command completion occurred, registers were: 
  ER ST SC SN CL CH DH 
  -- -- -- -- -- -- -- 
  04 40 a0 68 9e 9a 00   at LBA = 0x009a9e68 = 10133096 

  Commands leading to the command that caused the error were: 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name 
  -- -- -- -- -- -- -- --  ----------------  -------------------- 
  60 70 a0 68 9e 9a 40 08      00:00:29.400  READ FPDMA QUEUED 
  60 00 98 00 d9 33 40 08      00:00:29.400  READ FPDMA QUEUED 
  60 10 d8 b0 9e 96 40 08      00:00:29.400  READ FPDMA QUEUED 
  60 00 d0 d8 a1 9a 40 08      00:00:29.400  READ FPDMA QUEUED 
  60 08 90 00 ee 2c 40 08      00:00:29.400  READ FPDMA QUEUED 

Error 16 occurred at disk power-on lifetime: 20 hours (0 days + 20 hours) 
  When the command that caused the error occurred, the device was in an unknown state. 

  After command completion occurred, registers were: 
  ER ST SC SN CL CH DH 
  -- -- -- -- -- -- -- 
  04 40 40 00 39 10 00   at LBA = 0x00103900 = 1063168 

  Commands leading to the command that caused the error were: 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name 
  -- -- -- -- -- -- -- --  ----------------  -------------------- 
  60 00 40 00 39 10 40 08      00:00:29.240  READ FPDMA QUEUED 
  60 08 a8 a8 09 91 40 08      00:00:29.240  READ FPDMA QUEUED 
  60 f0 38 68 c3 59 40 08      00:00:29.240  READ FPDMA QUEUED 
  60 70 30 50 03 20 40 08      00:00:29.240  READ FPDMA QUEUED 
  60 20 28 48 c3 59 40 08      00:00:29.240  READ FPDMA QUEUED 

Error 15 occurred at disk power-on lifetime: 15 hours (0 days + 15 hours) 
  When the command that caused the error occurred, the device was in an unknown state. 

  After command completion occurred, registers were: 
  ER ST SC SN CL CH DH 
  -- -- -- -- -- -- -- 
  04 40 18 f0 d2 98 00   at LBA = 0x0098d2f0 = 10015472 

  Commands leading to the command that caused the error were: 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name 
  -- -- -- -- -- -- -- --  ----------------  -------------------- 
  60 08 18 f0 d2 98 40 08      00:00:25.130  READ FPDMA QUEUED 
  60 00 10 98 94 31 40 08      00:00:25.130  READ FPDMA QUEUED 
  60 08 68 18 68 63 40 08      00:00:25.130  READ FPDMA QUEUED 
  60 00 08 98 90 31 40 08      00:00:25.130  READ FPDMA QUEUED 
  60 08 00 68 1d 51 40 08      00:00:25.130  READ FPDMA QUEUED 

Error 14 occurred at disk power-on lifetime: 13 hours (0 days + 13 hours) 
  When the command that caused the error occurred, the device was in an unknown state. 

  After command completion occurred, registers were: 
  ER ST SC SN CL CH DH 
  -- -- -- -- -- -- -- 
  04 40 b8 00 14 90 00   at LBA = 0x00901400 = 9442304 

  Commands leading to the command that caused the error were: 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name 
  -- -- -- -- -- -- -- --  ----------------  -------------------- 
  60 08 b8 00 14 90 40 08      00:09:36.960  READ FPDMA QUEUED 
  60 08 b0 00 0f 90 40 08      00:09:36.960  READ FPDMA QUEUED 
  60 08 a8 a0 10 91 40 08      00:09:36.960  READ FPDMA QUEUED 
  60 08 a0 00 0b 90 40 08      00:09:36.960  READ FPDMA QUEUED 
  60 08 98 b8 14 51 40 08      00:09:36.960  READ FPDMA QUEUED

malcolmlewis · July 5, 2021, 6:21pm

Hi
What model etc of SSD?


smartctl -a /dev/sda

I see attribute 163 as Initial Bad Block Count, you need to find out what the raw value means for that device.

Arbichev · July 6, 2021, 12:33pm

Output:



smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.12.13-1-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org 

=== START OF INFORMATION SECTION === 
Device Model:     SSD 128GB 
Serial Number:    YS2020030162 
LU WWN Device Id: 0 000000 000000000 
Firmware Version: FW200326 
User Capacity:    128 035 676 160 bytes [128 GB] 
Sector Size:      512 bytes logical/physical 
Rotation Rate:    Solid State Device 
Form Factor:      2.5 inches 
TRIM Command:     Available 
Device is:        Not in smartctl database [for details use: -P showall] 
ATA Version is:   ACS-2 T13/2015-D revision 3 
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s) 
Local Time is:    Tue Jul  6 13:29:43 2021 MSK 
SMART support is: Available - device has SMART capability. 
SMART support is: Enabled 

=== START OF READ SMART DATA SECTION === 
SMART overall-health self-assessment test result: PASSED 

General SMART Values: 
Offline data collection status:  (0x00) Offline data collection activity 
                                        was never started. 
                                        Auto Offline Data Collection: Disabled. 
Self-test execution status:      (   0) The previous self-test routine completed 
                                        without error or no self-test has ever  
                                        been run. 
Total time to complete Offline  
data collection:                (  120) seconds. 
Offline data collection 
capabilities:                    (0x5d) SMART execute Offline immediate. 
                                        No Auto Offline data collection support. 
                                        Abort Offline collection upon new 
                                        command. 
                                        Offline surface scan supported. 
                                        Self-test supported. 
                                        No Conveyance Self-test supported. 
                                        Selective Self-test supported. 
SMART capabilities:            (0x0002) Does not save SMART data before 
                                        entering power-saving mode. 
                                        Supports SMART auto save timer. 
Error logging capability:        (0x01) Error logging supported. 
                                        General Purpose Logging supported. 
Short self-test routine  
recommended polling time:        (   2) minutes. 
Extended self-test routine 
recommended polling time:        (   4) minutes. 

SMART Attributes Data Structure revision number: 1 
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE 
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0 
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       0 
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       53 
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       80 
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
161 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       2856 
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0 
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0 
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       13 
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       7699 
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       0 
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0 
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0 
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       4 
194 Temperature_Celsius     0x0032   100   100   050    Old_age   Always       -       40 
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0 
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       0 
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       0 
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       0 
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0 
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       0 
241 Total_LBAs_Written      0x0032   100   100   050    Old_age   Always       -       2233 
242 Total_LBAs_Read         0x0032   100   100   050    Old_age   Always       -       1765 
249 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       2379 

SMART Error Log Version: 1 
ATA Error Count: 18 (device log contains only the most recent five errors) 
        CR = Command Register [HEX] 
        FR = Features Register [HEX] 
        SC = Sector Count Register [HEX] 
        SN = Sector Number Register [HEX] 
        CL = Cylinder Low Register [HEX] 
        CH = Cylinder High Register [HEX] 
        DH = Device/Head Register [HEX] 
        DC = Device Command Register [HEX] 
        ER = Error register [HEX] 
        ST = Status register [HEX] 
Powered_Up_Time is measured from power on, and printed as 
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, 
SS=sec, and sss=millisec. It "wraps" after 49.710 days. 

Error 18 occurred at disk power-on lifetime: 22 hours (0 days + 22 hours) 
  When the command that caused the error occurred, the device was in an unknown state. 

  After command completion occurred, registers were: 
  ER ST SC SN CL CH DH 
  -- -- -- -- -- -- -- 
  04 40 00 38 38 6d 00   at LBA = 0x006d3838 = 7157816 

  Commands leading to the command that caused the error were: 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name 
  -- -- -- -- -- -- -- --  ----------------  -------------------- 
  60 00 00 38 38 6d 40 08      00:03:58.890  READ FPDMA QUEUED 
  60 18 a0 a0 31 1f 40 08      00:03:58.890  READ FPDMA QUEUED 
  60 b8 f8 b8 c6 6d 40 08      00:03:58.890  READ FPDMA QUEUED 
  60 00 38 a0 27 71 40 08      00:03:58.880  READ FPDMA QUEUED 
  60 00 30 28 45 6d 40 08      00:03:58.880  READ FPDMA QUEUED 

Error 17 occurred at disk power-on lifetime: 22 hours (0 days + 22 hours) 
  When the command that caused the error occurred, the device was in an unknown state. 

  After command completion occurred, registers were: 
  ER ST SC SN CL CH DH 
  -- -- -- -- -- -- -- 
  04 40 a0 68 9e 9a 00   at LBA = 0x009a9e68 = 10133096 

  Commands leading to the command that caused the error were: 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name 
  -- -- -- -- -- -- -- --  ----------------  -------------------- 
  60 70 a0 68 9e 9a 40 08      00:00:29.400  READ FPDMA QUEUED 
  60 00 98 00 d9 33 40 08      00:00:29.400  READ FPDMA QUEUED 
  60 10 d8 b0 9e 96 40 08      00:00:29.400  READ FPDMA QUEUED 
  60 00 d0 d8 a1 9a 40 08      00:00:29.400  READ FPDMA QUEUED 
  60 08 90 00 ee 2c 40 08      00:00:29.400  READ FPDMA QUEUED 

Error 16 occurred at disk power-on lifetime: 20 hours (0 days + 20 hours) 
  When the command that caused the error occurred, the device was in an unknown state. 

  After command completion occurred, registers were: 
  ER ST SC SN CL CH DH 
  -- -- -- -- -- -- -- 
  04 40 40 00 39 10 00   at LBA = 0x00103900 = 1063168 

  Commands leading to the command that caused the error were: 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name 
  -- -- -- -- -- -- -- --  ----------------  -------------------- 
  60 00 40 00 39 10 40 08      00:00:29.240  READ FPDMA QUEUED 
  60 08 a8 a8 09 91 40 08      00:00:29.240  READ FPDMA QUEUED 
  60 f0 38 68 c3 59 40 08      00:00:29.240  READ FPDMA QUEUED 
  60 70 30 50 03 20 40 08      00:00:29.240  READ FPDMA QUEUED 
  60 20 28 48 c3 59 40 08      00:00:29.240  READ FPDMA QUEUED 

Error 15 occurred at disk power-on lifetime: 15 hours (0 days + 15 hours) 
  When the command that caused the error occurred, the device was in an unknown state. 

  After command completion occurred, registers were: 
  ER ST SC SN CL CH DH 
  -- -- -- -- -- -- -- 
  04 40 18 f0 d2 98 00   at LBA = 0x0098d2f0 = 10015472 

  Commands leading to the command that caused the error were: 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name 
  -- -- -- -- -- -- -- --  ----------------  -------------------- 
  60 08 18 f0 d2 98 40 08      00:00:25.130  READ FPDMA QUEUED 
  60 00 10 98 94 31 40 08      00:00:25.130  READ FPDMA QUEUED 
  60 08 68 18 68 63 40 08      00:00:25.130  READ FPDMA QUEUED 
  60 00 08 98 90 31 40 08      00:00:25.130  READ FPDMA QUEUED 
  60 08 00 68 1d 51 40 08      00:00:25.130  READ FPDMA QUEUED 

Error 14 occurred at disk power-on lifetime: 13 hours (0 days + 13 hours) 
  When the command that caused the error occurred, the device was in an unknown state. 

  After command completion occurred, registers were: 
  ER ST SC SN CL CH DH 
  -- -- -- -- -- -- -- 
  04 40 b8 00 14 90 00   at LBA = 0x00901400 = 9442304 

  Commands leading to the command that caused the error were: 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name 
  -- -- -- -- -- -- -- --  ----------------  -------------------- 
  60 08 b8 00 14 90 40 08      00:09:36.960  READ FPDMA QUEUED 
  60 08 b0 00 0f 90 40 08      00:09:36.960  READ FPDMA QUEUED 
  60 08 a8 a0 10 91 40 08      00:09:36.960  READ FPDMA QUEUED 
  60 08 a0 00 0b 90 40 08      00:09:36.960  READ FPDMA QUEUED 
  60 08 98 b8 14 51 40 08      00:09:36.960  READ FPDMA QUEUED 

SMART Self-test log structure revision number 1 
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error 
# 1  Offline             Completed without error       00%        53         - 
# 2  Offline             Self-test routine in progress 10%        53         - 
# 3  Offline             Self-test routine in progress 10%        53         - 
# 4  Offline             Self-test routine in progress 10%        53         - 
# 5  Offline             Self-test routine in progress 10%        53         - 
# 6  Offline             Self-test routine in progress 10%        53         - 
# 7  Offline             Self-test routine in progress 10%        53         - 
# 8  Offline             Self-test routine in progress 10%        53         - 
# 9  Offline             Self-test routine in progress 10%        53         - 
#10  Offline             Self-test routine in progress 10%        53         - 
#11  Offline             Self-test routine in progress 10%        53         - 
#12  Offline             Self-test routine in progress 10%        53         - 
#13  Offline             Self-test routine in progress 10%        53         - 
#14  Offline             Self-test routine in progress 10%        53         - 
#15  Offline             Self-test routine in progress 10%        53         - 
#16  Offline             Self-test routine in progress 10%        53         - 
#17  Offline             Self-test routine in progress 10%        53         - 
#18  Offline             Self-test routine in progress 10%        53         - 
#19  Offline             Self-test routine in progress 10%        53         - 
#20  Offline             Self-test routine in progress 10%        53         - 
#21  Offline             Self-test routine in progress 10%        53         - 

SMART Selective self-test log data structure revision number 0 
Note: revision number not 1 implies that no selective self-test has ever been run 
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS 
    1        0        0  Not_testing 
    2        0        0  Not_testing 
    3        0        0  Not_testing 
    4        0        0  Not_testing 
    5        0        0  Not_testing 
Selective self-test flags (0x0): 
  After scanning selected spans, do NOT read-scan remainder of disk. 
If Selective self-test is pending on power-up, resume after 0 minute delay.

Arbichev · July 6, 2021, 12:37pm

Attribute 163 - Initial Bad Block Count
This attribute is the initial bad block count of the device when leaving the factory.

https://www.micromat.com/product_manuals/drive_scope_manual_01.pdf

malcolmlewis · July 6, 2021, 1:18pm

Hi
Yes, but it’s a raw value does it equate to 2856 bad blocks or some other value… I would consider submitting a return request from wherever you purchased for replacement…

karlmistelberger · July 6, 2021, 1:27pm

KDE message on smart error:
http://mistelberger.net/smart.png

dcurtisfra · July 6, 2021, 2:20pm

The handbook seems to be related to Apple hardware –

Can you please tell us if, you can open the system’s case?
If you can, please check the hardware connection to the drive – bent connector pins – dirty connections – moist insects such as cockroaches crawling around on the printed circuit.
Please supply the output of “inxi --admin --filter --disk”.

[HR][/HR]Yes, cockroaches – a DEC customer had a misbehaving system which was controlling a brewery – the plant’s automation intermittently went crazy – a front loader didn’t stop and turn while loading a railway wagon → it headed “straight on” and landed on the tracks on the other side of the wagon, with the loaded pallet – the plastic foil wrapper didn’t stop → 2 feet of foil wrapped around a pallet – the beer can filling machine began to throw the lids around as if they were (very sharp edged) Frisbees …

Cause, cockroaches crawling around on the backplane pins – Repair: installing “bug-catchers” inside the VAX cabinets – which were regularly emptied by the system administrators …

Arbichev · July 6, 2021, 5:45pm

Super Speed Suntrsi SSD Internal Solid State Drive 128GB 2.5 inch SATA3 S660ST SSD for PC Laptop Desktop Black SSD
https://www.aliexpress.com/item/32856981645.html?spm=a2g0s.9042311.0.0.274233eduZrT18