Simple NFS mount fails. Software? Hardware?

Since 9.3 I have used NFS to cross mount my home directories on two machines that are connected through an ethernet switch. I will not say that it has always been satisfactory, but given that one machine gets frequently rebooted into Windows and back, the problems have always been fixable by reboots of the other machine or restart of services on one or the other with Yast. I employ static IP addresses assigned by DHCP on my router, and I have rarely had problems at boot time of the nfs mounts failing. I do not use NFSv4

This reasonably satisfactory arrangement worked for two computers that, 7 years ago, started out as a Pentium 4 and an Athlon 64. I subsequently replaced the Athlon64 with a Phenom II X4 (945) machine about 16 months ago, and then the Pentium 4 failed and was replaced, 2 months ago, with an AMD 8120 machine. After the first hardware transition, 16 months ago, the NFS mounts continued to function under 10.3 and 11.2, 11.4. In the most recent hardware change, still under 11.4, they have become completely unreliable. Perusing the forum I seem to have sampled many of the problems experienced by others in terms of exceptionally long boot times, total failures to mount, and also freezing of KDE applications (the starter panel) on both machines.

Although I seem to have basic connectivity most of the time between the machines using ping, it does occasionally occur that the 8120 machine shows as unreachable, even though simultaneously each machine can ping my network printer.

I have given up on the cross-mount temporarily, and simply tried to get the 8120 machine to export my home directory explicitly to the 945 machine through the NFS server and client modules in Yast on the two machines. This export/import fails, and I get the following sequence of messages on the 8120 (islajura) machine during the original NFS server invocation

Feb 12 14:02:04 islajura kernel: [16677.840519] nfsd: last server has exited, flushing export cache
Feb 12 14:02:04 islajura rpc.statd[9013]: Caught signal 15, un-registering and exiting
Feb 12 14:02:04 islajura rpc.mountd[9010]: Caught signal 15, un-registering and exiting.
Feb 12 14:02:04 islajura rpc.mountd[13341]: Version 1.2.3 starting
Feb 12 14:02:04 islajura rpc.statd[13344]: Version 1.2.3 starting
Feb 12 14:02:04 islajura rpc.statd[13344]: Flags: TI-RPC
Feb 12 14:02:04 islajura kernel: [16678.198425] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Feb 12 14:02:04 islajura kernel: [16678.198436] NFSD: starting 90-second grace period
Feb 12 14:02:04 islajura sm-notify[13353]: Version 1.2.3 starting

and then one minute later when I attempt to start the client services on the 945 machine (kyoto) which has local LAN address 192.168.0.102

Feb 12 14:03:02 islajura kernel: [16736.004317] svc: 192.168.0.102, port=904: unknown version (4 for prog 100003, nfsd)
Feb 12 14:03:15 islajura kernel: [16749.023798] svc: 192.168.0.102, port=786: unknown version (4 for prog 100003, nfsd)
Feb 12 14:03:28 islajura kernel: [16762.041163] svc: 192.168.0.102, port=863: unknown version (4 for prog 100003, nfsd)
Feb 12 14:03:41 islajura kernel: [16775.056560] svc: 192.168.0.102, port=845: unknown version (4 for prog 100003, nfsd)

While this is occurring on the 8120 machine, I am getting an error message on the 945 machine from the NFS client module that services could not be started after some time-out. At this time, the tail of /var/log/messages shows this sequence on the 945 machine ( both machines run ntp, so the times are within fractions of a second)

Feb 12 14:03:02 kyoto sm-notify[32682]: Version 1.2.3 starting
Feb 12 14:03:02 kyoto sm-notify[32682]: Already notifying clients; Exiting!

I am seeking advice on what the “unknown version” messages on the 8120 machine mean. Unknown version of what? nfsd? And why is this a problem?

Separately, I am hoping for suggestions on ways to test the motherboard network hardware on the 8120 to see if it possibly is unreliable, and needs to be BIOS disabled and replaced by a separate Gigabit card.

Any suggestions appreciated.

On 2012-02-13 00:26, lrkeefe wrote:

> Although I seem to have basic connectivity most of the time between the
> machines using ping, it does occasionally occur that the 8120 machine
> shows as unreachable, even though simultaneously each machine can ping
> my network printer.

Have a script ping one another every minute.

I have a laptop connecting (via wifi) a share from the desktop, and
sometimes it fails. What I do is ping from one machine to the other, and
viceversa. I think I have also seen the share to disconnect once, after a
period of non activity.

In my case the culprit is the router, it forgets that the wired machines
have to route through wifi to get to the notepad. I get connectivity after
I make the laptop ping the desktop.

> Feb 12 14:02:04 islajura kernel: [16677.840519] nfsd: last server has
> exited, flushing export cache

Use code tags to post code. Like this:


Feb 12 14:02:04 islajura rpc.statd[9013]: Caught signal 15, un-registering
and exiting

> I am seeking advice on what the “unknown version” messages on the 8120
> machine mean.

No idea.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

lrkeefe wrote:
> I do not use NFSv4

> In the most recent hardware change, still under
> 11.4, they have become completely unreliable. Perusing the forum I seem
> to have sampled many of the problems experienced by others in terms of
> exceptionally long boot times, total failures to mount, and also
> freezing of KDE applications (the starter panel) on both machines.
>
> Although I seem to have basic connectivity most of the time between the
> machines using ping, it does occasionally occur that the 8120 machine
> shows as unreachable, even though simultaneously each machine can ping
> my network printer.

This indicates that you have more fundamental network problems, not
connected with NFS. Carlos gave good advice. You might want to check
your DNS and routing settings. But it may also be a hardware problem.

If possible, I would turn off NFS altogether for the moment until you
have a 100% reliable network. Perhaps try copying large files using sftp
or somesuch to generate some network traffic.

> Feb 12 14:02:04 islajura rpc.mountd[13341]: Version 1.2.3 starting
> Feb 12 14:02:04 islajura rpc.statd[13344]: Version 1.2.3 starting
> Feb 12 14:02:04 islajura rpc.statd[13344]: Flags: TI-RPC
> Feb 12 14:02:04 islajura kernel: [16678.198425] NFSD: Using
> /var/lib/nfs/v4recovery as the NFSv4 state recovery directory

You said you were not using NFSv4, but this seems to indicate that it is
being started.

> Feb 12 14:02:04 islajura kernel: [16678.198436] NFSD: starting
> 90-second grace period
> Feb 12 14:02:04 islajura sm-notify[13353]: Version 1.2.3 starting
>
> and then one minute later when I attempt to start the client services
> on the 945 machine (kyoto) which has local LAN address 192.168.0.102
>
> Feb 12 14:03:02 islajura kernel: [16736.004317] svc: 192.168.0.102,
> port=904: unknown version (4 for prog 100003, nfsd)

I don’t know what this means, but the number four here makes me
suspicious too.

> I am seeking advice on what the “unknown version” messages on the 8120
> machine mean. Unknown version of what? nfsd? And why is this a
> problem?

Did you google for the messages?

> Separately, I am hoping for suggestions on ways to test the motherboard
> network hardware on the 8120 to see if it possibly is unreliable, and
> needs to be BIOS disabled and replaced by a separate Gigabit card.
>
> Any suggestions appreciated.

google for the motherboard and/or network chips and linux and problem