Tools and methods to check and assess SSD health?

I wonder what tools are available and most commonly used, in Linux in general but openSUSE in particular, to check the health of a computer’s solid state drives.

I have no particular problems at all, just want to know what means are at hand if any to do a health check if the necessity arises.

Thanks in advance

@VariableStar Hi, look at using smartctl -a /dev/[sdX,nvmeN]

1 Like

I find smarctl hard to read and prefer using skdump, that gives more guidance on what is going on.

You have to run it as root:

sudo skdump /dev/[sdX,nvmeN]

1 Like

Users may want check their drives periodically:

erlangen:~ # systemctl status btrfs-scrub.timer
● btrfs-scrub.timer - Scrub btrfs filesystem, verify block checksums
     Loaded: loaded (/usr/lib/systemd/system/btrfs-scrub.timer; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/btrfs-scrub.timer.d
             └─schedule.conf
     Active: active (waiting) since Thu 2023-08-31 21:14:54 CEST; 1 day 13h ago
    Trigger: Sun 2023-10-01 00:00:00 CEST; 4 weeks 0 days left
   Triggers: ● btrfs-scrub.service
       Docs: man:btrfs-scrub

Aug 31 21:14:54 erlangen systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
erlangen:~ # 

On infamous host erlangen btrfs-scrub.service runs monthly.

erlangen:~ # journalctl -b -u btrfs-scrub.service 
Sep 01 03:57:55 erlangen systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
Sep 01 03:57:55 erlangen btrfs-scrub.sh[15554]: Running scrub on /
Sep 01 04:02:04 erlangen btrfs-scrub.sh[15554]: Scrub device /dev/nvme1n1p2 (id 1) done
Sep 01 04:02:04 erlangen btrfs-scrub.sh[15554]: Scrub started:    Fri Sep  1 03:57:55 2023
Sep 01 04:02:04 erlangen btrfs-scrub.sh[15554]: Status:           finished
Sep 01 04:02:04 erlangen btrfs-scrub.sh[15554]: Duration:         0:04:09
Sep 01 04:02:04 erlangen btrfs-scrub.sh[15554]: Total to scrub:   594.07GiB
Sep 01 04:02:04 erlangen btrfs-scrub.sh[15554]: Rate:             2.31GiB/s
Sep 01 04:02:04 erlangen btrfs-scrub.sh[15554]: Error summary:    no errors found
Sep 01 04:02:04 erlangen btrfs-scrub.sh[15554]: flock: getting lock took 0.000003 seconds
Sep 01 04:02:04 erlangen btrfs-scrub.sh[15554]: flock: executing btrfs
Sep 01 04:02:04 erlangen systemd[1]: btrfs-scrub.service: Deactivated successfully.
Sep 01 04:02:04 erlangen systemd[1]: btrfs-scrub.service: Consumed 32.886s CPU time.
erlangen:~ # 

See also: Scrub — BTRFS documentation

btrfs scrub is useless when you want to check your SSD health (hardware). btrfs scrub is only a filesystem level tool…

Nope. I don’t agree.

You don’t agree but can’t state any facts. If you would have read your own article you would agree…

It really only checks checksums of data and tree blocks, it doesn’t ensure the content of tree blocks is valid and consistent.

btrfs scrub is not able to monitor hardware health.

If you disagree then explain how you use btrfs scrub to monitor your SSD/HDD health like:

  • temperatures (min/max/act)
  • lifetime power-on resets
  • power on time
  • power cycle count
  • number of write commands
  • number of read commands
  • total bad blocks
  • erase count
  • online/offline short/extented self-tests
  • error logs
  • life courve status
  • media wear out indicator
  • reserved block count
  • and many more…

Because above are informations provided by real hardware monitoring (health) tools like smartctl or skdump (and the various GUI apps for it like GSmartControl).

1 Like

btrfs scrub checksums all blocks. Issues of the hardware below the file system will result in inconsistent checksums.

Several 100,000 power on hours of HDDs and SSDs in infamous host erlangen and its siblings have shown that hardware problems exist which are not detected by smartctl and others. The Swiss cheese model applies: Swiss cheese model - Wikipedia

btrfs users are advised to run btrfs scrub regularly. They may want to identify and remove the root cause of issues encountered by btrfs-scrub.service.

Switched scrub from monthly to weekly.

erlangen:~ # systemctl cat btrfs-scrub.timer 
# /usr/lib/systemd/system/btrfs-scrub.timer
[Unit]
Description=Scrub btrfs filesystem, verify block checksums
Documentation=man:btrfs-scrub

[Timer]
OnCalendar=monthly
AccuracySec=1h
Persistent=true

[Install]
WantedBy=timers.target

# /etc/systemd/system/btrfs-scrub.timer.d/schedule.conf
[Timer]
OnCalendar=weekly
erlangen:~ # 
erlangen:~ # systemctl status btrfs-scrub.timer 
● btrfs-scrub.timer - Scrub btrfs filesystem, verify block checksums
     Loaded: loaded (/usr/lib/systemd/system/btrfs-scrub.timer; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/btrfs-scrub.timer.d
             └─schedule.conf
     Active: active (waiting) since Sun 2023-09-03 06:19:32 CEST; 3h 25min ago
    Trigger: Mon 2023-09-04 00:00:00 CEST; 14h left
   Triggers: ● btrfs-scrub.service
       Docs: man:btrfs-scrub

Sep 03 06:19:32 erlangen systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
erlangen:~ # 

Again no facts, ignoring questions and showing only some random output from ridiculous hosts. So again:

How do you use btrfs scrub to monitor your SSD/HDD health like:

  • temperatures (min/max/act)
  • lifetime power-on resets
  • power on time
  • power cycle count
  • number of write commands
  • number of read commands
  • total bad blocks
  • erase count
  • online/offline short/extented self-tests
  • error logs
  • life courve status
  • media wear out indicator
  • reserved block count
  • and many more…

Additionally you ignore the fact that experienced users my don‘t use btrfs and set up their machines in a more professional way that suits their need by using another filesystem. So how does your btrfs scrub work on xfs, ext4 or any other up to date used filesystem?

The TO didn‘t say which filesystem he is using but asked for ways to check SSD (HDD) health in general. That means recommending a filesystem level tool (which only works with btrfs) which doesn‘ t even check basic hardware indicators (S.M.A.R.T) is useless and off topic.