Background. Collected a system unit for the house. For the system, I allocated a new ssd with an operating time of 35 hours and 62 Power-on. I was tormented for a long time what to put - Leap or Tumbleweed. In the end, I decided that Tumbleweed was more interesting to me. I put it from fresh: on June 16 I downloaded the image from June 15 and on June 17 I installed the system. After the first reboot, a message came out:
https://susepaste.org/17493695
I was very surprised, as I have been using openSUSE for over 15 years, but during this time I have not seen anything like it. Interestingly, this message appears only once after the system boots. Close this message and move on. The system is working perfectly. Here is such a weirdness.
@Arbichev :
Please translate the system message for us – we’re not universal – at most we’re bi-lingual but, mostly not …
~ “KDE diagnoses device unstable work for /dev/sda”
This is a new KDE feature (annoyance). It means that something has been found in the SMART log of the drive. You can examine the log with GSmartControl app. It may be something serious, or something caused by a power failure reset. The annoying thing is, this feature triggers a warning every time you boot, instead of just triggering on new occurrences.
Miuku
July 3, 2021, 7:46am
5
I’d say informing the user of possible hardware failure and catastrophic data loss is not an annoyance.
hcvv
July 3, 2021, 9:40am
6
But I doubt that this is a task for an end-user stack of programs like KDE.
And how can KDE show that on boot? KDE is not started on boot at all.
Confusing to me, but it could be that I did not digest correctly the small pieces of information spread throughout this thread :\
I fully agree with you, for the first dozen or so warnings of the same event. But compare this with
your neighbor notices you have a flat tire on your car
they write this down, and the next time they see you, they let you know : great neighbor
you go away, come back with the issue resolved, they check their notes, see that you had a flat tire yesterday and then they call you up again
they keep doing this every day (over that original instance) : now they are an annoyance; each time you need to check if you have a new flat tire.
after a while you ignore your neighbors warnings and then fail to check when a new legitimate warning is made.
A good neighbor would have remembered they had notified you and stopped until something new came up.
Arbichev:
Background. Collected a system unit for the house. For the system, I allocated a new ssd with an operating time of 35 hours and 62 Power-on. I was tormented for a long time what to put - Leap or Tumbleweed. In the end, I decided that Tumbleweed was more interesting to me. I put it from fresh: on June 16 I downloaded the image from June 15 and on June 17 I installed the system. After the first reboot, a message came out:
SUSE Paste
I was very surprised, as I have been using openSUSE for over 15 years, but during this time I have not seen anything like it. Interestingly, this message appears only once after the system boots. Close this message and move on. The system is working perfectly. Here is such a weirdness.
Users should alway be concerned about new messsages. I never have seen this since assembling host erlangen in 2016:
**erlangen:~ #** journalctl -b -2 -p3 -o short-monotonic -q
0.078041] erlangen kernel: **mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 3: f200000000800400**
0.078044] erlangen kernel: **mce: [Hardware Error]: TSC 0 **
0.078046] erlangen kernel: **mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1625243229 SOCKET 0 APIC 0 microcode ea**
0.079849] erlangen kernel: **mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 3: fe00000000800400**
0.079849] erlangen kernel: **mce: [Hardware Error]: TSC 0 ADDR 7fa4c68969d3 MISC 7fa4c68969d3 **
0.079849] erlangen kernel: **mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1625243229 SOCKET 0 APIC 2 microcode ea**
0.079849] erlangen kernel: **mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 3: fe00000000800400**
0.079849] erlangen kernel: **mce: [Hardware Error]: TSC 0 ADDR ffffffffb00652df MISC ffffffffb00652df **
0.079849] erlangen kernel: **mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1625243229 SOCKET 0 APIC 4 microcode ea**
0.081423] erlangen kernel: **mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 3: fe00000000800400**
0.081427] erlangen kernel: **mce: [Hardware Error]: TSC 0 ADDR 7fcaa8c6f2b0 MISC 7fcaa8c6f2b0 **
0.081430] erlangen kernel: **mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1625243229 SOCKET 0 APIC 6 microcode ea**
4.983272] erlangen systemd[1]: **Failed to start Rotate log files.**
1320.870795] erlangen systemd[1]: **mandb.timer: Unit to trigger vanished.**
**erlangen:~ #**
You may want to check for scsi and usb messages:
**erlangen:~ #** journalctl -b -q -o short-monotonic _KERNEL_SUBSYSTEM=scsi _KERNEL_SUBSYSTEM=usb
0.336123] erlangen kernel: scsi host0: ahci
0.336381] erlangen kernel: scsi host1: ahci
0.336620] erlangen kernel: scsi host2: ahci
0.336733] erlangen kernel: scsi host3: ahci
0.336840] erlangen kernel: scsi host4: ahci
0.336931] erlangen kernel: scsi host5: ahci
0.669754] erlangen kernel: **scsi 2:0:0:0: Direct-Access ATA CT2000BX500SSD1 030 PQ: 0 ANSI: 5**
0.670155] erlangen kernel: **sd 2:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)**
0.670174] erlangen kernel: **sd 2:0:0:0: [sda] Write Protect is off**
0.670179] erlangen kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
0.670203] erlangen kernel: **sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA**
0.670343] erlangen kernel: **scsi 3:0:0:0: Direct-Access ATA WDC WD40EZRX-22S 0A80 PQ: 0 ANSI: 5**
0.670710] erlangen kernel: **sd 3:0:0:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)**
0.670718] erlangen kernel: **sd 3:0:0:0: [sdb] 4096-byte physical blocks**
0.670732] erlangen kernel: **sd 3:0:0:0: [sdb] Write Protect is off**
0.670736] erlangen kernel: sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
0.670756] erlangen kernel: **sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA**
0.670901] erlangen kernel: **scsi 4:0:0:0: Direct-Access ATA Samsung SSD 850 3B6Q PQ: 0 ANSI: 5**
0.671338] erlangen kernel: **sd 4:0:0:0: [sdc] 976773168 512-byte logical blocks: (500 GB/466 GiB)**
0.671357] erlangen kernel: **sd 4:0:0:0: [sdc] Write Protect is off**
0.671363] erlangen kernel: sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
0.671389] erlangen kernel: **sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA**
0.684670] erlangen kernel: **scsi 5:0:0:0: CD-ROM PIONEER DVD-RW DVR-221 1.00 PQ: 0 ANSI: 5**
0.696532] erlangen kernel: **sd 4:0:0:0: [sdc] supports TCG Opal**
0.696540] erlangen kernel: **sd 4:0:0:0: [sdc] Attached SCSI disk**
0.715642] erlangen kernel: **sd 2:0:0:0: [sda] Attached SCSI disk**
0.763628] erlangen kernel: **sd 3:0:0:0: [sdb] Attached SCSI disk**
0.886205] erlangen kernel: **sd 2:0:0:0: Attached scsi generic sg0 type 0**
0.886224] erlangen kernel: **sd 3:0:0:0: Attached scsi generic sg1 type 0**
0.886241] erlangen kernel: **sd 4:0:0:0: Attached scsi generic sg2 type 0**
0.886255] erlangen kernel: **scsi 5:0:0:0: Attached scsi generic sg3 type 5**
1.968685] erlangen kernel: usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.12
1.968688] erlangen kernel: usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
1.968689] erlangen kernel: usb usb1: Product: xHCI Host Controller
1.968691] erlangen kernel: usb usb1: Manufacturer: Linux 5.12.13-1-default xhci-hcd
1.968692] erlangen kernel: usb usb1: SerialNumber: 0000:00:14.0
1.969444] erlangen kernel: hub 1-0:1.0: USB hub found
1.969465] erlangen kernel: hub 1-0:1.0: 16 ports detected
1.970741] erlangen kernel: sr 5:0:0:0: [sr0] scsi3-mmc drive: 40x/40x writer dvd-ram cd/rw xa/form2 cdda tray
1.971628] erlangen kernel: usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.12
1.971631] erlangen kernel: usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
1.971633] erlangen kernel: usb usb2: Product: xHCI Host Controller
1.971634] erlangen kernel: usb usb2: Manufacturer: Linux 5.12.13-1-default xhci-hcd
1.971635] erlangen kernel: usb usb2: SerialNumber: 0000:00:14.0
1.971721] erlangen kernel: hub 2-0:1.0: USB hub found
1.971734] erlangen kernel: hub 2-0:1.0: 10 ports detected
2.050771] erlangen kernel: sr 5:0:0:0: Attached scsi CD-ROM sr0
2.226503] erlangen kernel: usb 1-7: new low-speed USB device number 2 using xhci_hcd
2.392148] erlangen kernel: usb 1-7: New USB device found, idVendor=046a, idProduct=0011, bcdDevice= 1.00
2.392153] erlangen kernel: usb 1-7: New USB device strings: Mfr=0, Product=0, SerialNumber=0
2.518503] erlangen kernel: usb 1-8: new full-speed USB device number 3 using xhci_hcd
2.670717] erlangen kernel: usb 1-8: New USB device found, idVendor=046d, idProduct=c542, bcdDevice= 3.02
2.670727] erlangen kernel: usb 1-8: New USB device strings: Mfr=1, Product=2, SerialNumber=0
2.670733] erlangen kernel: usb 1-8: Product: Wireless Receiver
2.670737] erlangen kernel: usb 1-8: Manufacturer: Logitech
**erlangen:~ #**
See also: https://forums.opensuse.org/showthread.php/555649-Extra-Fun-With-Backup-To-External-Disk
I’m translate:
"The ssd storage device (128 GB “/dev/sda”) is showing signs of unstable performance".
Complete GSmartControl error log:
SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 18 (device log contains only the most recent 4 errors)
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 18 [1] occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
04 -- 40 00 00 00 00 38 00 6d 38 00 00 at LBA = 0x38006d38 = 939552056
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 04 00 00 00 00 00 38 00 6d 38 40 08 00:03:58.890 READ FPDMA QUEUED
60 00 18 00 a0 00 0d 31 00 1f a0 40 08 00:03:58.890 READ FPDMA QUEUED
60 01 b8 00 f8 00 00 c6 00 6d b8 40 08 00:03:58.890 READ FPDMA QUEUED
60 04 00 00 38 00 00 27 00 71 a0 40 08 00:03:58.880 READ FPDMA QUEUED
60 04 00 00 30 00 00 45 00 6d 28 40 08 00:03:58.880 READ FPDMA QUEUED
Error 17 [0] occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
Complete GSmartControl error log:
SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 18 (device log contains only the most recent 4 errors)
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 18 [1] occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
04 -- 40 00 00 00 00 38 00 6d 38 00 00 at LBA = 0x38006d38 = 939552056
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 04 00 00 00 00 00 38 00 6d 38 40 08 00:03:58.890 READ FPDMA QUEUED
60 00 18 00 a0 00 0d 31 00 1f a0 40 08 00:03:58.890 READ FPDMA QUEUED
60 01 b8 00 f8 00 00 c6 00 6d b8 40 08 00:03:58.890 READ FPDMA QUEUED
60 04 00 00 38 00 00 27 00 71 a0 40 08 00:03:58.880 READ FPDMA QUEUED
60 04 00 00 30 00 00 45 00 6d 28 40 08 00:03:58.880 READ FPDMA QUEUED
Error 17 [0] occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
04 -- 40 00 a0 00 00 9e 00 9a 68 00 00 at LBA = 0x9e009a68 = 2650839656
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 03 70 00 a0 00 00 9e 00 9a 68 40 08 00:00:29.400 READ FPDMA QUEUED
60 01 00 00 98 00 0d d9 00 33 00 40 08 00:00:29.400 READ FPDMA QUEUED
60 00 10 00 d8 00 02 9e 00 96 b0 40 08 00:00:29.400 READ FPDMA QUEUED
60 04 00 00 d0 00 00 a1 00 9a d8 40 08 00:00:29.400 READ FPDMA QUEUED
60 01 08 00 90 00 0d ee 00 2c 00 40 08 00:00:29.400 READ FPDMA QUEUED
Error 16 [3] occurred at disk power-on lifetime: 20 hours (0 days + 20 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
04 -- 40 00 40 00 0d 39 00 10 00 00 00 at LBA = 0xd39001000 = 56790880256
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 01 00 00 40 00 0d 39 00 10 00 40 08 00:00:29.240 READ FPDMA QUEUED
60 00 08 00 a8 00 0e 09 00 91 a8 40 08 00:00:29.240 READ FPDMA QUEUED
60 00 f0 00 38 00 0d c3 00 59 68 40 08 00:00:29.240 READ FPDMA QUEUED
60 00 70 00 30 00 0d 03 00 20 50 40 08 00:00:29.240 READ FPDMA QUEUED
60 00 20 00 28 00 0d c3 00 59 48 40 08 00:00:29.240 READ FPDMA QUEUED
Error 15 [2] occurred at disk power-on lifetime: 15 hours (0 days + 15 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
04 -- 40 00 18 00 0e d2 00 98 f0 00 00 at LBA = 0xed20098f0 = 63652796656
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 18 00 0e d2 00 98 f0 40 08 00:00:25.130 READ FPDMA QUEUED
60 04 00 00 10 00 00 94 00 31 98 40 08 00:00:25.130 READ FPDMA QUEUED
60 00 08 00 68 00 0d 68 00 63 18 40 08 00:00:25.130 READ FPDMA QUEUED
60 04 00 00 08 00 00 90 00 31 98 40 08 00:00:25.130 READ FPDMA QUEUED
60 00 08 00 00 00 03 1d 00 51 68 40 08 00:00:25.130 READ FPDMA QUEUED
@Arbichev :
Please provide the output of “smartctl --attributes --log=error /dev/sda ” – user “root” …
[HR][/HR]For example – 120 GB SATA III SSD – not in the SmartCtl database:
# smartctl --attributes --log=error /dev/sda
smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-lp152.78-default] (SUSE RPM)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 2604
12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 419
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
161 Unknown_Attribute 0x0033 100 100 050 Pre-fail Always - 100
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 11
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 1724
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 6
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 1
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 3
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 7000
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 100
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 0
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 52
194 Temperature_Celsius 0x0022 100 100 050 Old_age Always - 0
195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always - 773
196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 0
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 100
241 Total_LBAs_Written 0x0030 100 100 050 Old_age Offline - 3807
242 Total_LBAs_Read 0x0030 100 100 050 Old_age Offline - 35497
245 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 1881
SMART Error Log Version: 1
No Errors Logged
#
I’ve marked the types of errors which need attention …
smartctl log file:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.12.13-1-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 52
12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 78
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
161 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 2856
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 11
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 7087
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 4
194 Temperature_Celsius 0x0032 100 100 050 Old_age Always - 40
195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 0
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 0
241 Total_LBAs_Written 0x0032 100 100 050 Old_age Always - 2190
242 Total_LBAs_Read 0x0032 100 100 050 Old_age Always - 1666
249 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 2333
SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 18 occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 40 00 38 38 6d 00 at LBA = 0x006d3838 = 7157816
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 38 38 6d 40 08 00:03:58.890 READ FPDMA QUEUED
60 18 a0 a0 31 1f 40 08 00:03:58.890 READ FPDMA QUEUED
60 b8 f8 b8 c6 6d 40 08 00:03:58.890 READ FPDMA QUEUED
60 00 38 a0 27 71 40 08 00:03:58.880 READ FPDMA QUEUED
60 00 30 28 45 6d 40 08 00:03:58.880 READ FPDMA QUEUED
Error 17 occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 40 a0 68 9e 9a 00 at LBA = 0x009a9e68 = 10133096
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 70 a0 68 9e 9a 40 08 00:00:29.400 READ FPDMA QUEUED
60 00 98 00 d9 33 40 08 00:00:29.400 READ FPDMA QUEUED
60 10 d8 b0 9e 96 40 08 00:00:29.400 READ FPDMA QUEUED
60 00 d0 d8 a1 9a 40 08 00:00:29.400 READ FPDMA QUEUED
60 08 90 00 ee 2c 40 08 00:00:29.400 READ FPDMA QUEUED
Error 16 occurred at disk power-on lifetime: 20 hours (0 days + 20 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 40 40 00 39 10 00 at LBA = 0x00103900 = 1063168
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 40 00 39 10 40 08 00:00:29.240 READ FPDMA QUEUED
60 08 a8 a8 09 91 40 08 00:00:29.240 READ FPDMA QUEUED
60 f0 38 68 c3 59 40 08 00:00:29.240 READ FPDMA QUEUED
60 70 30 50 03 20 40 08 00:00:29.240 READ FPDMA QUEUED
60 20 28 48 c3 59 40 08 00:00:29.240 READ FPDMA QUEUED
Error 15 occurred at disk power-on lifetime: 15 hours (0 days + 15 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 40 18 f0 d2 98 00 at LBA = 0x0098d2f0 = 10015472
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 18 f0 d2 98 40 08 00:00:25.130 READ FPDMA QUEUED
60 00 10 98 94 31 40 08 00:00:25.130 READ FPDMA QUEUED
60 08 68 18 68 63 40 08 00:00:25.130 READ FPDMA QUEUED
60 00 08 98 90 31 40 08 00:00:25.130 READ FPDMA QUEUED
60 08 00 68 1d 51 40 08 00:00:25.130 READ FPDMA QUEUED
Error 14 occurred at disk power-on lifetime: 13 hours (0 days + 13 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 40 b8 00 14 90 00 at LBA = 0x00901400 = 9442304
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 b8 00 14 90 40 08 00:09:36.960 READ FPDMA QUEUED
60 08 b0 00 0f 90 40 08 00:09:36.960 READ FPDMA QUEUED
60 08 a8 a0 10 91 40 08 00:09:36.960 READ FPDMA QUEUED
60 08 a0 00 0b 90 40 08 00:09:36.960 READ FPDMA QUEUED
60 08 98 b8 14 51 40 08 00:09:36.960 READ FPDMA QUEUED
Hi
What model etc of SSD?
smartctl -a /dev/sda
I see attribute 163 as Initial Bad Block Count, you need to find out what the raw value means for that device.
Output:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.12.13-1-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: SSD 128GB
Serial Number: YS2020030162
LU WWN Device Id: 0 000000 000000000
Firmware Version: FW200326
User Capacity: 128 035 676 160 bytes [128 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue Jul 6 13:29:43 2021 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5d) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 4) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 53
12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 80
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
161 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 2856
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 13
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 7699
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 4
194 Temperature_Celsius 0x0032 100 100 050 Old_age Always - 40
195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 0
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 0
241 Total_LBAs_Written 0x0032 100 100 050 Old_age Always - 2233
242 Total_LBAs_Read 0x0032 100 100 050 Old_age Always - 1765
249 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 2379
SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 18 occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 40 00 38 38 6d 00 at LBA = 0x006d3838 = 7157816
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 38 38 6d 40 08 00:03:58.890 READ FPDMA QUEUED
60 18 a0 a0 31 1f 40 08 00:03:58.890 READ FPDMA QUEUED
60 b8 f8 b8 c6 6d 40 08 00:03:58.890 READ FPDMA QUEUED
60 00 38 a0 27 71 40 08 00:03:58.880 READ FPDMA QUEUED
60 00 30 28 45 6d 40 08 00:03:58.880 READ FPDMA QUEUED
Error 17 occurred at disk power-on lifetime: 22 hours (0 days + 22 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 40 a0 68 9e 9a 00 at LBA = 0x009a9e68 = 10133096
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 70 a0 68 9e 9a 40 08 00:00:29.400 READ FPDMA QUEUED
60 00 98 00 d9 33 40 08 00:00:29.400 READ FPDMA QUEUED
60 10 d8 b0 9e 96 40 08 00:00:29.400 READ FPDMA QUEUED
60 00 d0 d8 a1 9a 40 08 00:00:29.400 READ FPDMA QUEUED
60 08 90 00 ee 2c 40 08 00:00:29.400 READ FPDMA QUEUED
Error 16 occurred at disk power-on lifetime: 20 hours (0 days + 20 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 40 40 00 39 10 00 at LBA = 0x00103900 = 1063168
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 40 00 39 10 40 08 00:00:29.240 READ FPDMA QUEUED
60 08 a8 a8 09 91 40 08 00:00:29.240 READ FPDMA QUEUED
60 f0 38 68 c3 59 40 08 00:00:29.240 READ FPDMA QUEUED
60 70 30 50 03 20 40 08 00:00:29.240 READ FPDMA QUEUED
60 20 28 48 c3 59 40 08 00:00:29.240 READ FPDMA QUEUED
Error 15 occurred at disk power-on lifetime: 15 hours (0 days + 15 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 40 18 f0 d2 98 00 at LBA = 0x0098d2f0 = 10015472
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 18 f0 d2 98 40 08 00:00:25.130 READ FPDMA QUEUED
60 00 10 98 94 31 40 08 00:00:25.130 READ FPDMA QUEUED
60 08 68 18 68 63 40 08 00:00:25.130 READ FPDMA QUEUED
60 00 08 98 90 31 40 08 00:00:25.130 READ FPDMA QUEUED
60 08 00 68 1d 51 40 08 00:00:25.130 READ FPDMA QUEUED
Error 14 occurred at disk power-on lifetime: 13 hours (0 days + 13 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 40 b8 00 14 90 00 at LBA = 0x00901400 = 9442304
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 b8 00 14 90 40 08 00:09:36.960 READ FPDMA QUEUED
60 08 b0 00 0f 90 40 08 00:09:36.960 READ FPDMA QUEUED
60 08 a8 a0 10 91 40 08 00:09:36.960 READ FPDMA QUEUED
60 08 a0 00 0b 90 40 08 00:09:36.960 READ FPDMA QUEUED
60 08 98 b8 14 51 40 08 00:09:36.960 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Offline Completed without error 00% 53 -
# 2 Offline Self-test routine in progress 10% 53 -
# 3 Offline Self-test routine in progress 10% 53 -
# 4 Offline Self-test routine in progress 10% 53 -
# 5 Offline Self-test routine in progress 10% 53 -
# 6 Offline Self-test routine in progress 10% 53 -
# 7 Offline Self-test routine in progress 10% 53 -
# 8 Offline Self-test routine in progress 10% 53 -
# 9 Offline Self-test routine in progress 10% 53 -
#10 Offline Self-test routine in progress 10% 53 -
#11 Offline Self-test routine in progress 10% 53 -
#12 Offline Self-test routine in progress 10% 53 -
#13 Offline Self-test routine in progress 10% 53 -
#14 Offline Self-test routine in progress 10% 53 -
#15 Offline Self-test routine in progress 10% 53 -
#16 Offline Self-test routine in progress 10% 53 -
#17 Offline Self-test routine in progress 10% 53 -
#18 Offline Self-test routine in progress 10% 53 -
#19 Offline Self-test routine in progress 10% 53 -
#20 Offline Self-test routine in progress 10% 53 -
#21 Offline Self-test routine in progress 10% 53 -
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Attribute 163 - Initial Bad Block Count
This attribute is the initial bad block count of the device when leaving the factory.
https://www.micromat.com/product_manuals/drive_scope_manual_01.pdf
Hi
Yes, but it’s a raw value does it equate to 2856 bad blocks or some other value… I would consider submitting a return request from wherever you purchased for replacement…
The handbook seems to be related to Apple hardware –
Can you please tell us if, you can open the system’s case?
If you can, please check the hardware connection to the drive – bent connector pins – dirty connections – moist insects such as cockroaches crawling around on the printed circuit.
Please supply the output of “inxi --admin --filter --disk”.
[HR][/HR]Yes, cockroaches – a DEC customer had a misbehaving system which was controlling a brewery – the plant’s automation intermittently went crazy – a front loader didn’t stop and turn while loading a railway wagon → it headed “straight on” and landed on the tracks on the other side of the wagon, with the loaded pallet – the plastic foil wrapper didn’t stop → 2 feet of foil wrapped around a pallet – the beer can filling machine began to throw the lids around as if they were (very sharp edged) Frisbees …
Cause, cockroaches crawling around on the backplane pins – Repair: installing “bug-catchers” inside the VAX cabinets – which were regularly emptied by the system administrators …
Super Speed Suntrsi SSD Internal Solid State Drive 128GB 2.5 inch SATA3 S660ST SSD for PC Laptop Desktop Black SSD
https://www.aliexpress.com/item/32856981645.html?spm=a2g0s.9042311.0.0.274233eduZrT18