Extremely slow NFS file system

athoopen · January 1, 2019, 8:27pm

Guru’s,

I could do with some help of you. I have a sever (openSUSE Leap 15.0) which is also my fileserver. My client (openSUSE Leap 15.0) mounts the server disk (/home) over nfs. If I write a file of 1Gb on the server onto /home

arjan@schuurpc:/home/arjan> dd if=/dev/zero of=file1 bs=1GB count=1
1+0 records gelezen
1+0 records geschreven
1000000000 bytes (1,0 GB, 954 MiB) copied, 7,97734 s, 125 MB/s
arjan@schuurpc:/home/arjan>

It’s written at 125Mb/s. If I do the same on the client

arjan@arjanpc:/home/arjan> dd if=/dev/zero of=file1 bs=1GB count=1
1+0 records gelezen
1+0 records geschreven
1000000000 bytes (1,0 GB, 954 MiB) copied, 139,974 s, 7,1 MB/s
arjan@arjanpc:/home/arjan>

The speed drops to 7.1Mb/s. I do except a slower speed but this just to much

I did consider my internal network speed. On the sever I started iperf3 -s and on the client iperf3 -c 10.0.0.150 -d

Here is the output on the sever

Accepted connection from 10.0.0.164, port 36960
  5] local 10.0.0.150 port 5201 connected to 10.0.0.164 port 36962
 ID] Interval           Transfer     Bitrate
  5]   0.00-1.00   sec   107 MBytes   896 Mbits/sec                  
  5]   1.00-2.00   sec   112 MBytes   938 Mbits/sec                  
  5]   2.00-3.00   sec   112 MBytes   938 Mbits/sec                  
  5]   3.00-4.00   sec   112 MBytes   937 Mbits/sec                  
  5]   4.00-5.00   sec   112 MBytes   938 Mbits/sec                  
  5]   5.00-6.00   sec   112 MBytes   938 Mbits/sec                  
  5]   6.00-7.00   sec   112 MBytes   937 Mbits/sec                  
  5]   7.00-8.00   sec   112 MBytes   938 Mbits/sec                  
  5]   8.00-9.00   sec   112 MBytes   937 Mbits/sec                  
  5]   9.00-10.00  sec   112 MBytes   938 Mbits/sec                  
  5]  10.00-10.05  sec  5.17 MBytes   939 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
 ID] Interval           Transfer     Bitrate
  5]   0.00-10.05  sec  1.09 GBytes   934 Mbits/sec                  receiver

So my internal network is not the problem.

On the server:

/etc/sysconfig/nfs

USE_KERNEL_NFSD_NUMBER="4"
MOUNTD_PORT=""
NFS3_SERVER_SUPPORT="yes"
NFS4_SUPPORT="yes"
SM_NOTIFY_OPTIONS=""
STATD_PORT=""
STATD_HOSTNAME=""
LOCKD_TCPPORT=""
LOCKD_UDPPORT=""
STATD_OPTIONS=""
NFSV4LEASETIME=""
RPC_PIPEFS_DIR=""
SVCGSSD_OPTIONS=""
NFSD_OPTIONS=""
GSSD_OPTIONS=""
MOUNTD_OPTIONS=""
NFS_GSSD_AVOID_DNS="no"
NFS_SECURITY_GSS="no"

The file /etc/nfs.conf.local does not exist.

/etc/nfs.conf reads

[environment]
include = /etc/sysconfig/nfs
include = /etc/nfs.conf.local
[general]
 pipefs-directory=$RPC_PIPEFS_DIR
 avoid-dns=$NFS_GSSD_AVOID_DNS
[lockd]
 port=$LOCKD_TCPPORT
 udp-port=$LOCKD_UDPPORT
[mountd]
 port= $MOUNTD_PORT
[nfsd]
 threads= $USE_KERNEL_NFSD_NUMBER
 lease-time=$NFSV4LEASETIME
 vers3=$NFS3_SERVER_SUPPORT
 vers4=$NFS4_SUPPORT
[statd]
 port=$STATD_PORT
 name=$STATD_HOSTNAME

/etc/exports reads for /home

/home   *(rw,root_squash,sync,no_subtree_check)

On the client /etc/fstab reads for /home

10.0.0.150:/home /home nfs nfsvers=4 0  0

On the server is a firewall however all ports for the internal network are open.

firewall-cmd --list-all-zones

 --- many lines deleted ---
trusted (active)
  target: ACCEPT
  icmp-block-inversion: no
  interfaces: eth0 eth2
  sources: 
  services: 
  ports: 
  protocols: 
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules:
 --- many lines deleted ---

The client is connected via eth0 to the server.

Can somebody explain why my nfs is so terrible slow …

TIA!!

nrickert · January 1, 2019, 8:52pm

On the server:

That’s actually 125MB/s (mega-bytes per second), or around 1Gb/s if we compute the bit rate – assuming that I am correctly reading your output.

If I do the same on the client
The speed drops to 7.1Mb/s.

That seems to be 7.1 MB/s, or around 60 Mb/s. If your network speed is 100Mb/s, that might be as good as you can get. If you have gigabit network speeds, you should be able to do better.

I’m using a 100 Mb/s ethernet here, and a 4G iso file, such as the Tumbleweed DVD installer does take several minutes to copy. I don’t do that often enough for it to bother me, but I should upgrade my home network to higher speeds.

In any case, start with checking your network speeds (cables, switches, ethernet cards).

athoopen · January 2, 2019, 9:52am

nrickert thanks for your time and effort however I think you are missing the point I am trying to make. On the server AND on the client I used the command “dd if=/dev/zero of=file1 bs=1GB count=1” The server reports 125Mb/s and the client reports 7Mb/s so the units are the same (no calculation needed to compare them).
At the bottom of my initial post I also reported my network speed, around 938Mb/s (Mega bit/s) which comes down to (almost) a Gibabit network (which is correct if I look at the switches and cables).
NFS over a Gigabit network should go well beyond 7Mbyte/s or (around) 56Mbit/sec (I think :)) Do you agree??

For the reference, I pinged the server from the client, here are the results:

arjan@arjanpc:~> ping 10.0.0.150
PING 10.0.0.150 (10.0.0.150) 56(84) bytes of data.
64 bytes from 10.0.0.150: icmp_seq=1 ttl=64 time=0.182 ms
64 bytes from 10.0.0.150: icmp_seq=2 ttl=64 time=0.190 ms
64 bytes from 10.0.0.150: icmp_seq=3 ttl=64 time=0.173 ms
64 bytes from 10.0.0.150: icmp_seq=4 ttl=64 time=0.169 ms
64 bytes from 10.0.0.150: icmp_seq=5 ttl=64 time=0.104 ms
64 bytes from 10.0.0.150: icmp_seq=6 ttl=64 time=0.176 ms
64 bytes from 10.0.0.150: icmp_seq=7 ttl=64 time=0.175 ms
64 bytes from 10.0.0.150: icmp_seq=8 ttl=64 time=0.203 ms
^C
--- 10.0.0.150 ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 7167ms
rtt min/avg/max/mdev = 0.104/0.171/0.203/0.030 ms
arjan@arjanpc:~>

I assume that somewhere in the NFS config it goes wrong … but where …

deano_ferrari · January 2, 2019, 10:54am

You should investigate r/w testing with different block sizes as this will have an effect on throughput speed.

https://www.slashroot.in/linux-file-system-read-write-performance-test

NFS tuning can be done to help optimize performance of you server for your given network. NFS block size (rsize and wsize values) can be explicitly set as mount options and may need to be higher than the default values. After setting them and re-mounting the NFS shares, you can conduct comparative r/w performance testing again.

Current data chunk size can be got for the NFS mounts via

cat /proc/mounts

The network MTU is also important with respect to throughput as it represents highest amount of data that can be passed in one Ethernet frame without fragmentation being required, so this can have a direct effect on a server and client with file transfers. It also needs a reasonable understanding of your network hardware between server and client machines.

This might be useful to you
http://www.admin-magazine.com/HPC/Articles/Useful-NFS-Options-for-Tuning-and-Management

athoopen · January 2, 2019, 1:27pm

deano_ferrari Thanks for your time and effort! The last link was very helpful. I did some quick and dirty test with varying the number of nfsd. In /etc/sysconfig/nfs USE_KERNEL_NFSD_NUMBER was set to 4. I tried increasing this number and it seems that 8 gives an optimum.
Will do some more decent testing, but it’s going into the right direction.

Do you have any idea/educed-guess what speed I might expect (raw disk is 125 Mb/s ; network app. 938Mbit/s)

Again thanks for your time and effot!!

deano_ferrari · January 2, 2019, 8:52pm

Experiment with increasing the rsize and wsize (block size) values. Your initial dd test used a 1GB block size. That is huge and so I’m not surprised it adversely affected the observed transfer rate (with the default NFS rsize and wsize values). Streaming media files for example will benefit from larger values. Network fragmentation due to MTU sizes can impact as well, but start with the NFS parameters first.

Do you have any idea/educed-guess what speed I might expect (raw disk is 125 Mb/s ; network app. 938Mbit/s)

Again thanks for your time and effot!!

I don’t want to guess at that, but you should be able to substantially improve on your initial results with better tuning. NFSv4 uses TCP (flow control, re-transmissions etc) so that will incur some overhead as well.

athoopen · January 5, 2019, 3:21pm

After experimenting I figured out that increasing the number of nfsd threats from 4 to 12 (on quad-core i5 with 4Gb RAM) brings back the performance the users of /home were expecting.

nrickert · January 5, 2019, 3:25pm

Thanks for reporting back, and telling us what worked.

deano_ferrari · January 5, 2019, 8:59pm

That reads like progress. Having an appropriate number of NFS threads available on the server is an important consideration for multi-client environments. The NFS server statistics can be examined to get load information etc

nfsstat -4 -s

For more info

man nfsstat

dcurtisfra · January 7, 2019, 7:28pm

@athoopen:

You may consider perusing the following information:

1. “Optimizing NFS Performance”: <http://nfs.sourceforge.net/nfs-howto/ar01s05.html>

[INDENT=2]-- section 5.6. “Number of Instances of the NFSD Server Daemon”[/INDENT]
The reference is in the openSUSE documentation: <https://doc.opensuse.org/documentation/leap/reference/html/book.opensuse.reference/cha.nfs.html#sec.nfs.info>.

For those who don’t want to plough through the text, the default number of threads originally proposed by Sun was 8 – and everyone else used to simply follow that rule of thumb, except for …

The file “/proc/net/rpc/nfsd” gives some useful hints for the current thread usage – the “nfsd” (7) man page explains how to interpret the ‘cat’ output …