2011-07-20 19:43:24

by Ray Van Dolson

[permalink] [raw]
Subject: NFS server responds to SYN with ACK only

We have a couple legacy CentOS (RHEL)-based appliances with slightly
dated NFS implementations.

Server (CentOS 4 based):

nfs-utils-1.0.6-70.EL4
Kernel 2.6.9-42.0.10.plus.c4smp

Client (CentOS 5 based):

nfs-utils-1.0.9-42.el5.x86_64
Kernel 2.6.18-164.15.1.el5

The client has a long-lived NFSv3 mount to the server that sometimes
stops responding (blocks). We can lazy unmount it, but subsequent
mount requests hang and the following is observed via tcpdump:

1. Client GETPORT for NFS service succeeds.
2. Client GETPORT for MOUNT succeeds
3. Client MNT call succeeds (server gives valid response including
file handle)
4. Client sends a SYN packet to NFS port on server
5. Server responds with ACK *only*

When we bounce the NFS daemon on the server, everything starts working
and in step 5 above, we get a SYN,ACK as expected in response to #4,
and everything proceeds along nicely.

Does this jog anybody on a long-ago fixed bug? I'm thinking updating
the kernel and nfs-utils on the server will likely help, but would love
to find where behavior like the above is referenced as a "bug".

Thanks,
Ray


2011-07-21 16:51:51

by Ray Van Dolson

[permalink] [raw]
Subject: Re: NFS server responds to SYN with ACK only

On Wed, Jul 20, 2011 at 12:23:42PM -0700, Ray Van Dolson wrote:
> We have a couple legacy CentOS (RHEL)-based appliances with slightly
> dated NFS implementations.
>
> Server (CentOS 4 based):
>
> nfs-utils-1.0.6-70.EL4
> Kernel 2.6.9-42.0.10.plus.c4smp
>
> Client (CentOS 5 based):
>
> nfs-utils-1.0.9-42.el5.x86_64
> Kernel 2.6.18-164.15.1.el5
>
> The client has a long-lived NFSv3 mount to the server that sometimes
> stops responding (blocks). We can lazy unmount it, but subsequent
> mount requests hang and the following is observed via tcpdump:
>
> 1. Client GETPORT for NFS service succeeds.
> 2. Client GETPORT for MOUNT succeeds
> 3. Client MNT call succeeds (server gives valid response including
> file handle)
> 4. Client sends a SYN packet to NFS port on server
> 5. Server responds with ACK *only*
>
> When we bounce the NFS daemon on the server, everything starts working
> and in step 5 above, we get a SYN,ACK as expected in response to #4,
> and everything proceeds along nicely.
>
> Does this jog anybody on a long-ago fixed bug? I'm thinking updating
> the kernel and nfs-utils on the server will likely help, but would love
> to find where behavior like the above is referenced as a "bug".
>
> Thanks,
> Ray

After thinking on this a bit more, I'm wondering if perhaps the server
side had a connection still "open" (didn't check with netstat) and thus
sent back only the ACK.

Maybe in this case the client should respond with a RST or something
else to indicate we need to start from scratch?

Is there a way, on the server side to kill an ESTABLISHED TCP
connection (specifically an NFS connection?)? Probably setting a
connection timeout value via /proc ...

I'm thinking on the client side I could inject a RST packet to the
server to clean things up?

Ray