NFS: Server not responding, still trying

Kev_in · July 29, 2014, 5:40pm

Hi Guys,

i have a problem with my NFS-Server. I´ve got an OpenSuse 13.1 on witch I safe my backups via NFSv4. First I backuped two or three Servers on my backup-system and i had no problems till I add some more NFS-Clients (15 Clients). Actually the connection between the clients an the NFS-Server breaks. The Clients and the Server are reachable for example ping and they are all in the same Subnet. If I run the nfs configuration with yast i become after finishing the following Error-Message: idmapd not running - check your Domainname.

I´ve enable the debuging options and receive the following errors:

On Client site: run dmesg

Jul 29 15:46:06 kernel:   267.089482] nfs: server "server_ip" not responding, still trying
Jul 29 15:56:21 kernel:   882.245553] nfs: server "server_ip" not responding, timed out

The nfsstat says on the client:

Client rpc stats:
calls      retrans    authrefrsh
701        37         648     


Client nfs v4:
null         read         write        commit       open         open_conf    
0         0% 211      30% 13        1% 0         0% 32        4% 5         0% 
open_noat    open_dgrd    close        setattr      fsinfo       renew        
0         0% 0         0% 23        3% 0         0% 8         1% 26        3% 
setclntid    confirm      lock         lockt        locku        access       
6         0% 6         0% 0         0% 0         0% 0         0% 127      18% 
getattr      lookup       lookup_root  remove       rename       link         
47        6% 74       10% 2         0% 0         0% 0         0% 0         0% 
symlink      create       pathconf     statfs       readlink     readdir      
0         0% 8         1% 6         0% 83       11% 0         0% 5         0% 
server_caps  delegreturn  getacl       setacl       fs_locations rel_lkowner  
14        2% 4         0% 0         0% 0         0% 0         0% 0         0% 
secinfo      exchange_id  create_ses   destroy_ses  sequence     get_lease_t  
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
reclaim_comp layoutget    getdevinfo   layoutcommit layoutreturn getdevlist   
0         0% 0         0% 0         0% 0         0% 0         0% 0         0% 
(null)       
0         0%

on Server site: run dmesg:

 svc: socket ffff88040815f000 sendto([ffff8804005da000 48... ], 48) = 48 (addr 10.8.xx.xxx, port=835) 7508.930435] svc: server ffff88040050a000 waiting for data (to = 900000)
 7510.225141] NFSD: laundromat service - starting
 7510.225148] NFSD: purging unused client (clientid 00000050)
 7510.225322] RPC:       shutting down nfs4_cb client for 10.8.xx.xxx
 7510.225327] RPC:       rpc_release_client(ffff880407471600)
 7510.225331] RPC:       destroying nfs4_cb client for 10.8.20.223
 7510.225349] RPC:       destroying transport ffff880409281000
 7510.225354] RPC:       xs_destroy xprt ffff880409281000
 7510.225357] RPC:       xs_close xprt ffff880409281000
 7510.225360] RPC:       disconnected transport ffff880409281000
 7510.225389] NFSD: laundromat_main - sleeping for 1 seconds

The nfsstat on the server says:

Server rpc stats:

calls badcalls badclnt badauth xdrcall
1270 0 0 0 0

Server nfs v3:
null getattr setattr lookup access readlink
0 0% 7 22% 5 16% 1 3% 5 16% 0 0%
read write create mkdir symlink mknod
0 0% 3 9% 2 6% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
2 6% 0 0% 0 0% 0 0% 0 0% 5 16%
fsstat fsinfo pathconf commit
0 0% 0 0% 0 0% 1 3%

Server nfs v4:
null compound
1 0% 1238 99%

Server nfs v4 operations:
op0-unused op1-unused op2-future access close commit
0 0% 0 0% 0 0% 101 4% 26 1% 1 0%
create delegpurge delegreturn getattr getfh link
8 0% 0 0% 5 0% 397 17% 80 3% 0 0%
lock lockt locku lookup lookup_root nverify
0 0% 0 0% 0 0% 74 3% 0 0% 0 0%
open openattr open_conf open_dgrd putfh putpubfh
35 1% 0 0% 6 0% 0 0% 634 28% 0 0%
putrootfh read readdir readlink remove rename
30 1% 214 9% 6 0% 0 0% 1 0% 0 0%
renew restorefh savefh secinfo setattr setcltid
445 19% 0 0% 0 0% 0 0% 5 0% 79 3%
setcltidconf verify write rellockowner bc_ctl bind_conn
79 3% 0 0% 16 0% 0 0% 0 0% 0 0%
exchange_id create_ses destroy_ses free_stateid getdirdeleg getdevinfo
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
getdevlist layoutcommit layoutget layoutreturn secinfononam sequence
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
set_ssv test_stateid want_deleg destroy_clid reclaim_comp
0 0% 0 0% 0 0% 0 0% 0 0%



If i restart the clients it works for a few minutes and the connection breaks. A restart of the rcnfs on the client takes a lot of time.

Has anyone any ideas and thank you for your help....

Best Regards,
Kev_in