NVMe Trouble - fstrim.service Skipping /home, Delayed Appearence of Files After Write

Problems occur on a SSD 950 PRO 512GB:

erlangen:~ # lsblk -f /dev/nvme0n1
NAME        FSTYPE LABEL      UUID                                 FSAVAIL FSUSE% MOUNTPOINT
nvme0n1                                                                           
├─nvme0n1p1 ext4   Fedora     047d4d83-a9a7-482e-8d15-a1c855a637ea                
├─nvme0n1p2 ext4   Tumbleweed 8b190950-c141-4351-9198-7a9592b4fb34   11.2G    59% /
├─nvme0n1p3 ext4   Home       704621ef-9b45-4e96-ba7f-1becd3924f08  169.8G    58% /home
└─nvme0n1p4 vfat              6DEC-64F9                                84M    16% /boot/efi
erlangen:~ # 
  • fstrim.service successfully runs as scheduled by fstrim.timer, but skips the /home partition.
  • Mail moved shortly after reboot by postfix/local does not show up in the folder of the user, but suddenly appears hours later, when new mail gets moved.

Any idea?

Hi
If you manually run the fstrim -v /home does it provide more information? There are a number of fstrim threads, one thing in common seems to be samsung devices… The firmware is all up to date on the NVMe?


/sbin/lspci | grep Non
01:00.0 Non-Volatile memory controller: Sandisk Corp WD Black 2018/PC SN520 NVMe SSD (rev 01)

lsblk -f /dev/nvme0n1
NAME        FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
nvme0n1                                      
├─nvme0n1p1                       26G    35% /
└─nvme0n1p2                    191.2G     1% /data

systemctl status fstrim.service
● fstrim.service - Discard unused blocks on filesystems from /etc/fstab
   Loaded: loaded (/usr/lib/systemd/system/fstrim.service; static; vendor preset: disabled)
   Active: inactive (dead) since Mon 2019-10-14 00:00:16 CDT; 7h ago
     Docs: man:fstrim(8)
  Process: 7763 ExecStart=/usr/sbin/fstrim --fstab --verbose --quiet (code=exited, status=0/SUCCESS)
 Main PID: 7763 (code=exited, status=0/SUCCESS)

Oct 14 00:00:01 grover systemd[1]: Starting Discard unused blocks on filesystems from /etc/fstab...
Oct 14 00:00:16 grover fstrim[7763]: /boot/efi: 254.8 MiB (267132928 bytes) trimmed on /dev/sda1
Oct 14 00:00:16 grover fstrim[7763]: /data: 191.5 GiB (205558149120 bytes) trimmed on /dev/nvme0n1p2
Oct 14 00:00:16 grover fstrim[7763]: /stuff: 38.1 GiB (40915017728 bytes) trimmed on /dev/sda3
Oct 14 00:00:16 grover fstrim[7763]: /boot: 676.8 MiB (709619712 bytes) trimmed on /dev/sda2
Oct 14 00:00:16 grover fstrim[7763]: /: 26.3 GiB (28203761664 bytes) trimmed on /dev/nvme0n1p1
Oct 14 00:00:16 grover systemd[1]: fstrim.service: Succeeded.
Oct 14 00:00:16 grover systemd[1]: Started Discard unused blocks on filesystems from /etc/fstab.

FWIW I’m seeing the same thing (TW 20191011)

I have two ssd drives, one for root/swap and one for home.

paul@Orion-15:~$ lsblk -f /dev/sda
NAME   FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
sda                                     
├─sda1                     37.4G    15% /
└─sda2                                  [SWAP]
paul@Orion-15:~$ lsblk -f /dev/sdb
NAME   FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
sdb                                     
└─sdb1                     69.7G    32% /home
paul@Orion-15:~$

and this is the result

● fstrim.service - Discard unused blocks on filesystems from /etc/fstab
   Loaded: loaded (/usr/lib/systemd/system/fstrim.service; static; vendor preset: disabled)
   Active: inactive (dead) since Mon 2019-10-14 15:00:04 BST; 1min 50s ago
     Docs: man:fstrim(8)
  Process: 2246 ExecStart=/usr/sbin/fstrim --fstab --verbose --quiet (code=exited, status=0/SUCCESS)
 Main PID: 2246 (code=exited, status=0/SUCCESS)

Oct 14 15:00:01 Orion-15.openSUSE systemd[1]: Starting Discard unused blocks on filesystems from /etc/fstab...
Oct 14 15:00:04 Orion-15.openSUSE fstrim[2246]: /: 39.8 GiB (42761093120 bytes) trimmed on /dev/sda1
Oct 14 15:00:04 Orion-15.openSUSE systemd[1]: fstrim.service: Succeeded.
Oct 14 15:00:04 Orion-15.openSUSE systemd[1]: Started Discard unused blocks on filesystems from /etc/fstab.

Running fstrim manually works fine.

paul@Orion-15:~$ sudo fstrim -v /home
[sudo] password for root: 
/home: 56.5 MiB (59219968 bytes) trimmed
paul@Orion-15:~$ 

For the moment I’m reverting back to using a cron job…

Edit: neither are Samsung drives, sda is a Corsair and sdb Crucial.

No such problems on a Leap 15.1 system…

paul@Orion-17:~> journalctl -u fstrim.service      
-- Logs begin at Mon 2019-05-27 17:39:13 BST, end at Mon 2019-10-14 15:42:54 BST. --
Jun 08 08:31:45 Orion-17 systemd[1]: Starting Discard unused blocks...
Jun 08 08:31:58 Orion-17.openSUSE fstrim[920]: /home: 158 GiB (169667760128 bytes) trimmed
Jun 08 08:31:58 Orion-17.openSUSE fstrim[920]: /: 38.8 GiB (41687035904 bytes) trimmed
Jun 08 08:31:58 Orion-17.openSUSE systemd[1]: Started Discard unused blocks.
-- Reboot --

snip

-- Reboot --
Oct 14 12:01:36 Orion-17 systemd[1]: Starting Discard unused blocks on filesystems from /etc/fstab...
Oct 14 12:01:53 Orion-17.openSUSE fstrim[1032]: /home: 156 GiB (167553302528 bytes) trimmed on /dev/sda2
Oct 14 12:01:53 Orion-17.openSUSE fstrim[1032]: /: 37.7 GiB (40440094720 bytes) trimmed on /dev/sda1
Oct 14 12:01:53 Orion-17.openSUSE systemd[1]: Started Discard unused blocks on filesystems from /etc/fstab.

paul@Orion-17:~> 

I should add also none of these drives are NVMe.

@karlmistelberger

any idea at which (TW) snapshot this problem started, I’m fairly sure all was OK a few weeks ago… but I seldom check to see if fstrim has ran :wink:

Works for me in 20191012, at least in VM environment. What is in your /etc/fstab?

UUID=2511d4d8-0425-4e80-bd14-86798396bdfb /      ext4  noatime,acl,user_xattr   1 1
UUID=08a4ff75-8318-40cf-b78b-b7049064dfb5 swap   swap  defaults                 0 0
UUID=801838af-06ff-405e-a6c3-194ba5878291 /home  ext4  relatime,acl,user_xattr  1 2

looks OK to me… the first two are sda1 and sda2, the third sdb1

the command line for fstrim (in “/usr/lib/systemd/system/fstrim.service”) has “–fstab”, could try changing that to “-a” I suppose :\

Running fstrim manually works fine. Firmware is up to date. Noticed the problem when mail would show up only delayed some weeks ago.

erlangen:~ # journalctl -b --grep nvme
-- Logs begin at Sun 2019-10-06 15:16:15 CEST, end at Mon 2019-10-14 21:25:28 CEST. --
Oct 14 18:18:21 erlangen kernel: nvme nvme0: pci function 0000:04:00.0
Oct 14 18:18:21 erlangen kernel: nvme nvme0: 8/0/0 default/read/poll queues
Oct 14 18:18:21 erlangen kernel:  nvme0n1: p1 p2 p3 p4
Oct 14 18:18:21 erlangen kernel: EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null)
Oct 14 18:18:22 erlangen kernel: EXT4-fs (nvme0n1p2): re-mounted. Opts: (null)
Oct 14 18:18:22 erlangen kernel: EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null)
Oct 14 18:18:22 erlangen kernel: FAT-fs (nvme0n1p4): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
**Oct 14 18:18:22 erlangen systemd[1]: Condition check resulted in Auto-connect to subsystems on FC-NVME devices during boot being skipped.**
erlangen:~ # 

The same here. No problems with Leap 15.1. Presumably problems with Tumbleweed started some weeks ago when mail experienced delay after boot.

Found this:

https://bbs.archlinux.org/viewtopic.php?id=247751

where it was suggested that removing (or commenting out) “ProtectHome=yes” in “/usr/lib/systemd/system/fstrim.service” resolves the problem.

At the moment I’m unsure exactly what the purpose of “ProtectHome=yes” is…

Edit: OK… found some information here: https://www.freedesktop.org/software/systemd/man/systemd.exec.html

Still thinking about it…

It’s possible the mail problem is unrelated, just coincidental ?

This is fixed upstream in fstrim: fix systemd service protection - util-linux/util-linux.git - The util-linux code repository.

Submit openSUSE bug report.

Seems to be a feature, not a bug, but: fstrim: fix systemd service protection · util-linux/util-linux@c64d452 · GitHub :slight_smile:

https://bugzilla.opensuse.org/show_bug.cgi?id=1154023

Patched fstrim.service:

erlangen:~ # systemctl cat fstrim.service 
# /usr/lib/systemd/system/fstrim.service
[Unit]
Description=Discard unused blocks on filesystems from /etc/fstab
Documentation=man:fstrim(8)

[Service]
Type=oneshot
ExecStart=/usr/sbin/fstrim --fstab --verbose --quiet
ProtectSystem=strict
ProtectHome=yes
PrivateDevices=no
PrivateNetwork=yes
PrivateUsers=no
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
MemoryDenyWriteExecute=yes
SystemCallFilter=@default @file-system @basic-io @system-service

**# /etc/systemd/system/fstrim.service.d/override.conf
[Service]
ProtectHome=read-only**
erlangen:~ # 

After rebooting fstrim.service now trims /home and mail problems are gone for now.

Yes, changing ProtectHome to “read-only” has restored fstrimming of /home on both of the TW installs I have here.

I owe you a thanks, as I wasn’t, until your initial post, aware /home wasn’t being trimmed.

I’m not quite sure how that would affect postfix (which I don’t use) though… Still all is working, so a good result.

Switched to a new system partition using btrfs and kept home partition. Same symptoms as before: fstrim skipped /home and after several days email would not show up anymore at ~/.local/share/local-mail/. Fixed fstrim.service and everything is back to normal now.