I had configured smart to do every day selftests.
It now sends me everyday a mail containing this :
This message was generated by the smartd daemon running on:
host name: xxx
DNS domain: xxx
The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], Self-Test Log error count increased from 11 to 12
Device info:
M4-CT128M4SSD2, S/N:000000001238091641E5, WWN:5-00a075-1091641e5, FW:000F, 128 GB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Oct 20 09:03:55 2017 CEST
Another message will be sent in 24 hours if the problem persists.
That report doesn’t look too bad… I’ve seen a lot worse
Pre-fail and Old_age are the category of error, it’s not indicating either are imminent.
Ignore any RAW values, as they’re vendor specific there’s no way to interpret them (which is why smartmon tools maintains a drive database with interpretations for those values). You can find a description of the meanings of many of the flags here: http://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes
What’s actually being recorded in the Self Test log?
On Sat 04 Nov 2017 11:36:01 AM CDT, Christophe deR wrote:
I had configured smart to do every day selftests.
It now sends me everyday a mail containing this :
<snip>
What should i do ?
Isn’t there a trim utility somewhere that would cure this problem ?
Should i use some vendor utility to repair this ?
Hi
Why a test everyday, why at all unless there is an issue?
Balance and trim are taken care of automagically if running btrfs (and
others?).
–
Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
openSUSE Leap 42.2|GNOME 3.20.2|4.4.90-18.32-default
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!
What’s the matter? Unless I’m getting blind I see that most parameters are still at 100%. The only two that aren’t are “173 Wear_Leveling_Count” and “202 Perc_Rated_Life_Used” both at 88%.
If I’m still good at math, that translates to something like more than 13 years of life remaining, assuming 24/7 operation. Or, your system is likely to go to the junkyard before the disk is worn out at your current usage.
“Pre-fail” and “Old Age” are <types> of counters, not the status.
So, you’re completely mis-reading the meaning (there is no meaning, they’re categories of counters and have the same importance if called “Bob” and “Joe.”
Similarly,
The numbers that are displayed like 100 and 88 don’t really have any meaning, I don’t think they necessarily mean percentage of something(some might be, but again… usually not important).
A lot of what is displayed in a smartmontools report has little or no importance. Sure, it’s interesting to know how many hours it’s been powered on, and how many lifetime writes… Those are things I look at only when purchasing a used disk to guestimate the amount of wear and tear the disk has already undergone(You can’t trust what a Seller says, only what is reported by smartmontools because it can’t be altered except by the factory).
More importantly and the only real indication of imminent disk failure is bad sectors, and if you are able to compare results over a few weeks whether the number of bad sectors are increasing.
And <then> if you have a disk that’s going bad,
Get your data off that disk ASAP.
Research your disk and disk manufacturer. For example Seagate(with its less than sterling reliability ratings) has a Seatools utility that can try to repair bad sectors by first trying to write to those sectors and either report as healthy and usable or mapped out permanently (All disk drives have a reserve of sectors).
What I’ve described applies to both HDD and SSD, the latter because of its HDD layout emulation.
As for the stuff about trim, that’s something else that’s only tangentially relevant by having something to do with SSD traps. But, nothing to do with what is in your smartmontools report.
I had btrfs until 2 weeks ago and the system failed.
Btrfs with 37GB for a system is danger and i had to report previous problems of low space and very frequent system freeze and failures (see my previous topics)
I had to re-install the whole system and i lost some datas, not vital but important.
I swore at many people during these bad days.
I used ext4 for my new install (btrfs = never again).
But i have these strange smart reports that i did not have before …
Due to btrfs over strain ?
Or disk failures caused btrfs mess ?
Don’t know.
Hi
How is the SSD connected, if in a laptop, does it have a cable/header or just a header connection. Sometimes cable connections can cause things like freezes, check the system logs as well.
SSDs fail differently from spinning rust. With HDD you will most likely see bad sectors before complete failure. With SSD You will run out of memory pool and not be able to write or it just stops working. So the fail modes are not the same as a HDD. With smart set on there are more writes and it is writing that will kill a SDD
--root@xxx 13:56:36 /home/christophe] mount
....
/dev/sda4 on / type ext4 (rw,relatime,data=ordered)
.....
--root@xxx 13:56:46 /home/christophe] systemctl status fstrim.timer
● fstrim.timer - Discard unused blocks once a week
Loaded: loaded (/usr/lib/systemd/system/fstrim.timer; enabled; vendor preset: enabled)
Active: active (waiting) since dim. 2017-11-05 08:05:59 CET; 5h 51min ago
Docs: man:fstrim
nov. 05 08:05:59 diesel systemd[1]: Started Discard unused blocks once a week.
And,
numerous forum threads by others who have asked the same question.
First,
It should be understood that a few bad HDD blocks are not unusual, and ordinarily the disk will map out the bad blocks to good blocks in the disk’s reserve.
On an SSD however, I’m not as certain how expected bad “blocks” should be since the concept of a block is a virtual layout on top of a different physical structure. But, assuming that if faulty traps can be identified, in the same way they should be mapped out automatically.
There are a few things I’d note in this thread…
The original and full selftest output has never been posted, so who knows whether the tests might be valid?
Whether the @OP intended to or not, the disk model was posted which definitely identifies the disk as an SSD.
The Wiki reference describes how to handle suspected bad blocks… Basically, you just force a write to that sector to force the disk firmware to do it’s thing (mapping out the bad block). In practice, people often use the dd command to zero out all free space which pretty much ensures writing to every available block. Then, run the selftest again to verify the block(s) aren’t still reported bad. I don’t know though that forcing all bad blocks to be mapped out immediately is necessary unless you just want peace of mind and that extra assurance that the disk is good… It probably is not any better than just allowing your normal disk activity to eventually encounter the bad blocks and then your disk will address the matter then. As i described in my earlier post, probably the most important thing to note is the bad block count reported.
AFAIK balance is not necessary for EXT4.
Trim is needed for an SSD even with EXT4, but apparently you already have fstrim started once a week, which is more than enough unless your system is a heavy duty database server or the like. As an alternative, for a laptop say, you could start fstrim on each reboot.
Like many other threads on the Forum suggest, I bet that your btrfs problems were just “disk full” problems and not “bad disk” or “bad sectors” related problems, so not really a “disk failure” but possibly a “faulty btrfs maintenance” problem.
Btrfs does strain the disk a bit more than EXT4, but not to a point to be worried about with modern SSDs. Anyway you are now on EXT4, so don’t worry (but Malcolm could say the same with correctly maintained btrfs…).
> Btrfs with 37GB for a system is danger and i had to report previous
> problems of low space and very frequent system freeze and failures (see
> my previous topics)
BTRFS is still work in progress, and for example, yast install reduce
the snapshot number in case of small space on disk on recent versions
The kernel (or the system, I don’t know) is able to see that the disk is
a ssd and do it’s job accordingly, I never do any special thing, sure
than I’m not smart enough for this.
Thanks again to everyone.
I appreciate your concern and your inputs.
I think i will wait and see.
I am prepared to have system failure at anytime and i do a daily copy/save of my work and personal datas in case my system suddenly fails.
I agree that from the problem description,
The problem was most likely a problem of the disk filling up with snapshots (One of these days, someone will write a snapper configuration that adds a parameter taking into account available free disk space).
As for the @OP’s questions about things like balance and trim (both which need to be manually configured for both BTRFS and EXT4), I’ve posted in other openSUSE forums links to the two following Arch Wiki articles which apply completely to openSUSE as well (all versions) Re-read these articles every time you install a new system, the information is always being updated to whatever is current.
The main **SSD Wiki **
On a new install, I always recommend skipping to the last section first to see if you need to install a firmware upgrade. After that, configure FSTAB for trim (IMO better than a CRON job), and modify your Disk Queueing algorithm. Everything else has less of an impact than these mentioned. If you do run a CRON job, my recommendation is every 48 hrs or as OrsoBruno suggests on boot if you boot at least every other day. The more full your disk is, the more often you need to run trim, if your disk space is hardly used then you can run trim far less often… But in general it’s better to run too often than not often enough (which is why I recommend the FSTAB configuration).
Hi
The current openSUSE install sets up udev rules for I/O scheduler (cfq/deadline *) depending on spinning rust or SSD, check the mount options many optimizations are done by default, then add additional as required to fstab. Trim is taken care of by the systemd service.
If running btrfs then check the maintenance systemd service has been run.
If using btrfs and select a partition size =< 20G then snapper is disabled.*