NFS is very slow to start

tumbleweed v20231026
NFS v2.6.3

Since about two weeks ago, NFS has been very slow to start after logging into my account. Previously, all of the remote volumes were ready as soon as the account opened.

Currently it takes roughly 2 minutes before remote volumes are ready after logging in. I am guessing there is a timeout involved. There is nothing in the system journal regarding NFS.

Is this at boot up? If yes, perhaps take a look at:

systemd-analyze critical-chain

In the last month or some TW users noticed that NetworkManager was taking much longer than it used to to bring up the Ethernet interface - minutes instead of seconds. In my case, I didn’t need to wait for it, so I masked NetworkManager-wait-online.service.

I do not have the NFS mounts in /etc/fstab as there has been issues with NFS in the past that have severely extended boot time. Instead a script is run by cron at boot (@reboot).

I do not see anything out of the ordinary with the timing.

# systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @39.885s
└─multi-user.target @39.885s
  └─getty.target @39.885s
    └─getty@tty1.service @39.885s
      └─plymouth-quit-wait.service @37.333s +2.471s
        └─systemd-user-sessions.service @37.293s +38ms
          └─remote-fs.target @37.292s
            └─t\x2dsmb.mount @36.722s +569ms
              └─network-online.target @36.685s
                └─network.target @36.685s
                  └─wicked.service @6.634s +30.051s
                    └─wickedd-nanny.service @6.626s +6ms
                      └─wickedd.service @6.615s +10ms
                        └─wickedd-dhcp6.service @6.593s +19ms
                          └─network-pre.target @6.555s
                            └─wpa_supplicant.service @39.236s +46ms
                              └─dbus.service @5.925s +46ms
                                └─basic.target @5.921s
                                  └─sockets.target @5.921s
                                    └─pcscd.socket @5.920s
                                      └─sysinit.target @5.915s
                                        └─systemd-update-utmp.service @5.884s +31ms
                                          └─auditd.service @5.862s +20ms
                                            └─systemd-tmpfiles-setup.service @5.815s +46ms
                                              └─systemd-journal-flush.service @5.487s +301ms
                                                └─var.mount @1.375s +4.087s
                                                  └─dev-sdb1.device @584542y 2w 2d 20h 1min 46.613s +4.260s

There’s nothing in there that would explain minutes of delay.

I was a bit surprised to see wicked adding 30 seconds delay, everything else adds milliseconds (ms).

Is 39 seconds to the login prompt typical for your system, or is this recent?

Wicked isn’t adding minutes, so it doesn’t appear to be the cause of your issue.

I wonder if attempts to do NFS mounts backoff exponentially on timeouts? That might be one way a slow wickd might cause an even longer delay to mounting. Any clues in the journal?

(It’s probably not related to your issue, but the delay with wicked looks a bit like the kind of delay I was experiencing with NetworkManager. If you could eliminate the delay due to wicked, it looks like you could get to the login prompt in about 9 seconds.)

Wickedd has been taking longer to complete. Although 25 - 40 seconds is not unusual; it has been this way for at least a year.

NFS has almost no presence in the journal at all during boot and desktop load. Nothing that offers a clue why there is this delay. I see that cron job starting to mount the remote volumes; nary a clue when it ends.

Can you show the /etc/fstab entries pertaining to the nfs mounts (assuming that is what you are using)?

Could you stop the usual job from running, or have the job exit without doing anything, then run the mount normally on the command line (as root?) and see how it behaves?

You could first enable some debugging. I haven’t used it, but came across this reference: SUSE: How to enable NFS related debug logging.

Apparently rpcdebug can enable nfs client side debugging, for example:

rpcdebug -m nfs -s all

I did try this: I used the cron script that un-/mounts the volumes. They un-/mounted the volumes in less than a second.

The issue seems to be at boot time.

If it were me, I would disable the job for the next boot. Boot up, and then manually run it. That would help tell whether its a first time thing or a boot time thing.

I might also see what gets logged on the server for the normal boot with the job active. For example, are the times logged on the server delayed?

Then I would move on to trying to get some debugging via rpcdebug - maybe on both sides of the equation.

I suppose wireshark could also be run on the server.

But this is all guesswork on investigating, sorry, I could just be wasting your time. I think I did have some issues once when the client and server were using different versions of NFS.

1 Like

Here:

      └─wicked.service @1.821s +1.147s
        └─wickedd-nanny.service @1.813s +6ms
          └─wickedd.service @1.801s +10ms
            └─wickedd-dhcp4.service @1.789s +11ms

I had delays on wicked start before disabling IPV6 on YAST, may be the case here. Or not.

I also change WAIT_FOR_INTERFACES=“1” in /etc/sysconfig/network/config to speed things up.

I disabled the boot time cron-run script to mount the NFS volumes. Then ran it from a command prompt immediately after booting. There was no delay.

So:

  1. the delay was fixed with the most recent update (20231102), or
  2. The delay is related to running the script with cron.

After booting today systemd-analyze critical-chain showed this for wickedd:

└─wicked.service @2.983s +30.045s
 └─wickedd-nanny.service @2.976s +6ms
   └─wickedd.service @2.965s +10ms
     └─wickedd-dhcp4.service @2.947s +16ms
       └─network-pre.target @2.876s
         └─wpa_supplicant.service @35.688s +39ms

wicked.service took just over 30 seconds to start. 30 seconds is the default value for WAIT_FOR_INTERFACES.
I shall try a smaller value for WAIT_FOR_INTERFACES next boot.

An update on this issue…

Reducing the value for WAIT_FOR_INTERFACES has certainly made boot time faster.

The issue with NFS remains. With the shorter boot time, I am seeing that NFS is active for up to 30 seconds, ceases to function for 90 seconds, and is fine thereafter.
I note that in /etc/sysconfig/nfsthe setting NFSV4GRACETIME is 90 seconds. Its description implies there is a NFS restart happening.

On demand mounting is fast and reliable:

karl@erlangen:~> time cd /home_karl_6700k/

real    0m0,355s
user    0m0,000s
sys     0m0,000s
karl@erlangen:/home_karl_6700k> 
karl@erlangen:~> journalctl -b -u home_karl_6700k.automount -u home_karl_6700k.mount
Nov 20 19:51:21 erlangen systemd[1]: Set up automount home_karl_6700k.automount.
Nov 21 20:44:54 erlangen systemd[1]: home_karl_6700k.automount: Got automount request for /home_karl_6700k, triggered by 7290 (bash)
Nov 21 20:44:54 erlangen systemd[1]: Mounting /home_karl_6700k...
Nov 21 20:44:54 erlangen systemd[1]: Mounted /home_karl_6700k.
Nov 21 20:48:42 erlangen systemd[1]: Unmounting /home_karl_6700k...
Nov 21 20:48:42 erlangen systemd[1]: home_karl_6700k.mount: Deactivated successfully.
Nov 21 20:48:42 erlangen systemd[1]: Unmounted /home_karl_6700k.
karl@erlangen:~>