Possible cause of NFS server hang with high CPU usage

Hello all.
A few (years?) back I started getting problems with NFS in my LAN. A few diskless clients PXE boots with NFS and the usual setup. After a while the server nfsd treads starts consuming high CPU (as seen from top) and the clients hangs (typically late in the client boot process where a lot of IO has been going on). My workaround has been to restart the nfs server (rcnfsserver restart).

I have searched the net periodically since then but two days ago I came across this one:

https://patchwork.kernel.org/patch/1351331/ (svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping)

I patched my OpenSuSE 12.1 (server) kernel and the problems have apparently disappeared. I can see that at least an Enterprise kernel from SuSE has got this patch applied but I could not see it for my release from the descriptions available in yast.

It could be helpful if someone more skilled than me could identify exactly what (and if) OpenSuSE releases are still left affected by this bug.

Disclaimer if I have made an error: I normally do run a locally patched kernel on my server so I might have missed the bugfix if it is already available from SuSE.
Disclaimer 2: This is my first post here, so sorry if I have not followed all etiquette.

Sincerely
Bjørn Ove Isaksen

On 04/11/2013 01:26 PM, brennelv wrote:
>
> Hello all.
> A few (years?) back I started getting problems with NFS in my LAN. A
> few diskless clients PXE boots with NFS and the usual setup. After a
> while the server nfsd treads starts consuming high CPU (as seen from
> top) and the clients hangs (typically late in the client boot process
> where a lot of IO has been going on). My workaround has been to restart
> the nfs server (rcnfsserver restart).
>
> I have searched the net periodically since then but two days ago I came
> across this one:
>
> https://patchwork.kernel.org/patch/1351331/ (svcrpc: fix
> svc_xprt_enqueue/svc_recv busy-looping)
>
> I patched my OpenSuSE 12.1 (server) kernel and the problems have
> apparently disappeared. I can see that at least an Enterprise kernel
> from SuSE has got this patch applied but I could not see it for my
> release from the descriptions available in yast.
>
> It could be helpful if someone more skilled than me could identify
> exactly what (and if) OpenSuSE releases are still left affected by this
> bug.
>
> Disclaimer if I have made an error: I normally do run a locally patched
> kernel on my server so I might have missed the bugfix if it is already
> available from SuSE.
> Disclaimer 2: This is my first post here, so sorry if I have not
> followed all etiquette.

That patch was merged with the mainline kernel on Aug 17, 2012 with the
annotation that it be backported to all of the stable kernels. Kernel 3.1 is not
one of the kernels being maintained as stable; however, 3.7 and 3.4 should both
have the fix. As a result, all supported versions of openSUSE should have had
this problem fixed, but inspecting the source is the only way to be certain.

Unsupported releases do not get security updates, nor do they get kernel fixes.
You can run newer kernels to get around the latter problem, but not the first.
Of course, the security holes are even worse after the fix has been published as
the bad guys now know what parts of the unfixed systems are vulnerable.

Thank you for the high level information. However according to https://en.opensuse.org/Lifetime OpenSuSE 12.1 is still marked as supported, and as far as I know it is based on the 3.1 kernel. I can not say for sure if the patch has been applied by SuSE for the 3.1 kernel because I just rebuilt an older 12.1 kernel that I have not moved on from, however I could not see that the patch was applied in the kernel patches offered by yast (from the detailed description), and a few google searches did not give me comfort.

The patch solved my problem, a problem that I have been trying to chase down for a very long time. If others have a similar problem, it could be worth it to inspect the source or change log to see if this patch has been applied, and perhaps better get this thread to a conclusion showing what if any releases is left affected.

Bjørn Ove