LinuxLists.cc - [PATCH 0/7] Improve the NFS/TCP reconnection code

2007-11-07 00:37:59

Subject: [PATCH 0/7] Improve the NFS/TCP reconnection code

The following series of patches attempts to fix a problem with the
current Linux RPC client disconnect/reconnect logic in the TCP client.

Our current strategy whenever we need to disconnect from the server and
then reconnect is to force a reset of the connection (by issuing a
'special' connect(AF_UNSPEC) call that essentially causes the TCP layer
to send an RST to terminate the current connection.
This then allows us to reuse the port immediately without worrying about
TIME-WAIT states.

The problem is that RST is not supposed to be acked by the server. We
can therefore never be entirely sure that the connection was correctly
terminated on the server side. This again may cause a SYN, RST loop
when we try to reconnect since the server may think that we are trying
to reconnect from a port that is already connected.

The solution is to use a combination of the shutdown() command, and the
connect(AF_UNSPEC).

By using shutdown() to initiate the disconnection, we are able to hang
onto the socket and monitor the shutdown process via the ->state_change()
callback. Better, we can continue to receive replies on the socket until
the FIN from the server arrives to tell that it is done sending.

After the connection is shut down, we can then use the connect(AF_UNSPEC)
trick in order to reset the socket without releasing the port number. Note
that because the socket is already closed, no RST is sent to the server.
A curious side-effect of this is that the TIME-WAIT state gets moved to
a different port number. I'm not sure how to avoid this...

Cheers
Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-11-08 15:40:37

From: Trond Myklebust <[email protected]>

When scheduling the autoclose RPC call, we want to ensure that we don't
race against the test_bit() call in xprt_clear_locked().

Signed-off-by: Trond Myklebust <[email protected]>
---

include/linux/sunrpc/xprt.h | 1 +
net/sunrpc/xprt.c | 20 ++++++++++++++++++++
net/sunrpc/xprtsock.c | 5 +----
3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 30b17b3..6f524a9 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -246,6 +246,7 @@ struct rpc_rqst * xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid);
void xprt_complete_rqst(struct rpc_task *task, int copied);
void xprt_release_rqst_cong(struct rpc_task *task);
void xprt_disconnect(struct rpc_xprt *xprt);
+void xprt_force_disconnect(struct rpc_xprt *xprt);

/*
* Reserved bit positions in xprt->state
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 282a9a2..48c5a8b 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -570,6 +570,7 @@ static void xprt_autoclose(struct work_struct *work)

xprt_disconnect(xprt);
xprt->ops->close(xprt);
+ clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
xprt_release_write(xprt, NULL);
}

@@ -588,6 +589,25 @@ void xprt_disconnect(struct rpc_xprt *xprt)
}
EXPORT_SYMBOL_GPL(xprt_disconnect);

+/**
+ * xprt_force_disconnect - force a transport to disconnect
+ * @xprt: transport to disconnect
+ *
+ */
+void xprt_force_disconnect(struct rpc_xprt *xprt)
+{
+ /* Don't race with the test_bit() in xprt_clear_locked() */
+ spin_lock_bh(&xprt->transport_lock);
+ set_bit(XPRT_CLOSE_WAIT, &xprt->state);
+ /* Try to schedule an autoclose RPC call */
+ if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0)
+ queue_work(rpciod_workqueue, &xprt->task_cleanup);
+ else if (xprt->snd_task != NULL)
+ rpc_wake_up_task(xprt->snd_task);
+ spin_unlock_bh(&xprt->transport_lock);
+}
+EXPORT_SYMBOL_GPL(xprt_force_disconnect);
+
static void
xprt_init_autodisconnect(unsigned long data)
{
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 02298f5..322e4e2 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1118,10 +1118,7 @@ static void xs_tcp_state_change(struct sock *sk)
case TCP_SYN_RECV:
break;
case TCP_CLOSE_WAIT:
- /* Try to schedule an autoclose RPC calls */
- set_bit(XPRT_CLOSE_WAIT, &xprt->state);
- if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0)
- queue_work(rpciod_workqueue, &xprt->task_cleanup);
+ xprt_force_disconnect(xprt);
default:
xprt_disconnect(xprt);
}

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-11-07 22:21:53

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (259.00 B)
(No filename) (314.00 B)
(No filename) (140.00 B)
Download all attachments

2007-11-07 23:06:02

by Myklebust, Trond

On Wed, 2007-11-07 at 18:28 -0500, Chuck Lever wrote:
> > We've always had to deal with breaking out of the loops. If we don't, we
> > will deadlock the computer.
>
> In that case it might make sense to allow some scheduling between each
> socket bind attempt, and do only a single pass over the port range
> before breaking out. Then you could forget about whether the range
> changes -- you will use the new range the next time a connection is
> attempted.

No. We shouldn't ever schedule from an rpciod process context.

Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-11-07 23:47:28

by Matt Weatherford

[permalink] [raw]

Subject: setclientid: string in use on NFS v4 share on Debian Etch & hosts file "solution"

Hi,

Im not sure this is the right place to ask, but Im wondering if someone can
illuminate what the following console messages means on my NFS v4 server....

NFSD: setclientid: string in use by client(clientid 471c1b08/000003e9)
NFSD: setclientid: string in use by client(clientid 471c1b08/00000410)
nfs4_cb: server 10.11.12.221 not responding, timed out
nfs4_cb: server 10.11.12.222 not responding, timed out
NFSD: setclientid: string in use by client(clientid 471c1b08/0000043a)
NFSD: setclientid: string in use by client(clientid 471c1b08/0000043c)
NFSD: setclientid: string in use by client(clientid 471c1b08/00000447)
NFSD: setclientid: string in use by client(clientid 471c1b08/0000044d)
NFSD: setclientid: string in use by client(clientid 471c1b08/00000453)
nfs4_cb: server 10.11.12.221 not responding, timed out

(and yes, those IPs (10.11.12.221, 222) are good)

I am on Debian Etch, on x86 hardware, using multiple NFS v4 clients with
all the latest package updates.

I have multiple other NFS clients which the server is not complaining about.
I am sharing NFSv4 shares on both (2) NIC's.

NFS v4 client:

client42:~# uname -a
Linux client42 2.6.18-5-686 #1 SMP Wed Oct 3 00:12:50 UTC 2007 i686 GNU/Linux
client42:~# apt-show-versions | grep nfs
libnfsidmap2/etch uptodate 0.18-0
nfs-common/etch uptodate 1:1.0.10-6+etch.1
client42:~#

mount
client42:~# mount
...omittted non v4 shares...
yadda:/etch-llocal on /net/LLocal4 type nfs4
(rw,hard,intr,rsize=32768,wsize=32768,addr=192.168.105.123)
client42:~>

NFS v4 server:

yadda:~# apt-show-versions | grep -i nfs
nfs-common/etch uptodate 1:1.0.10-6+etch.1
nfs-kernel-server/etch uptodate 1:1.0.10-6+etch.1
libnfsidmap2/etch uptodate 0.18-0
yadda:~#

Im not sure that this is the answer, but somewhere I saw a thread that I may
need to name the host like this in my hosts table on my nfsv4 client:

127.0.0.1 localhost
127.0.1.1 client42

If this is the solution (is it?) ...this is a hassle, since I already have a
hostname, and an IP address for that hostname defined in a hosts table that I
push out to every cluster node and would like to keep them the same - is
there another way to solve this problem that lets me keep all my hosts files
the same across my environment?

Thanks,

Matt

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-11-07 23:59:13

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [NFS] [PATCH 4/7] SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket

On Wed, 2007-11-07 at 18:11 -0500, Chuck Lever wrote:
> Trond Myklebust wrote:
> > From: Trond Myklebust <[email protected]>
> >
> > By using shutdown() rather than close() we allow the RPC client to wait
> > for the TCP close handshake to complete before we start trying to reconnect
> > using the same port.
> > We use shutdown(SHUT_WR) only instead of shutting down both directions,
> > however we wait until the server has closed the connection on its side.
>
> Yeah, that seems like a more friendly and nuanced behavior, and is
> probably one of the more important features of this set of changes.
>
> > Signed-off-by: Trond Myklebust <[email protected]>
> > ---
> >
> > net/sunrpc/xprtsock.c | 53 +++++++++++++++++++++++++++++++++++++++++++------
> > 1 files changed, 46 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index 99c0166..d610d28 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -614,6 +614,34 @@ static int xs_udp_send_request(struct rpc_task *task)
> > return status;
> > }
> >
> > +static int xs_shutdown(struct socket *sock, int how)
> > +{
> > + /*
> > + * Note: 'how - 1' trick converts
> > + * RCV_SHUTDOWN -> SHUT_RD = 0
> > + * SEND_SHUTDOWN -> SHUT_WR = 1
> > + * RCV_SHUTDOWN|SEND_SHUTDOWN -> SHUT_RDWR = 2
> > + */
> > + return sock->ops->shutdown(sock, how - 1);
> > +}
> > +
> > +/**
> > + * xs_tcp_shutdown - gracefully shut down a TCP socket
> > + * @xprt: transport
> > + *
> > + * Initiates a graceful shutdown of the TCP socket by calling the
> > + * equivalent of shutdown(SHUT_WR);
> > + */
> > +static void xs_tcp_shutdown(struct rpc_xprt *xprt)
> > +{
> > + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> > + struct socket *sock = transport->sock;
> > +
> > + if (sock != NULL)
> > + xs_shutdown(sock, SEND_SHUTDOWN);
>
> I'm not sure why simply
>
> sock->ops->shutdown(sock, SHUT_WR);
>
> isn't adequate here.

trondmy@heimdal:~/devel/git/linux/kernel-2.6.x/linux-2.6$ git grep SHUT_WR
arch/um/os-Linux/file.c:#ifndef SHUT_WR
arch/um/os-Linux/file.c:#define SHUT_WR 1
arch/um/os-Linux/file.c: else if(w) what = SHUT_WR;
net/sctp/socket.c: * SHUT_WR

I suppose I could define SHUT_WR, but that would have to be part of a
separate patch, perhaps as part of a definition of a
kernel_sock_shutdown() wrapper. I wanted to get the RPC code right
first, then send in the cleanup via davem.

BTW: Both CIFS and NBD appear to be calling sock->ops->shutdown() with a
SEND_SHUTDOWN argument. From what I can read in inet_shutdown() that is
incorrect. (Ccing Steve and Paul).

> > + clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
> > +}
> > +
> > static inline void xs_encode_tcp_record_marker(struct xdr_buf *buf)
> > {
> > u32 reclen = buf->len - sizeof(rpc_fraghdr);
> > @@ -691,7 +719,7 @@ static int xs_tcp_send_request(struct rpc_task *task)
> > default:
> > dprintk("RPC: sendmsg returned unrecognized error %d\n",
> > -status);
> > - xprt_disconnect(xprt);
> > + xs_tcp_shutdown(xprt);
>
> Hrm. I would have thought this case was handled adequately by
> xprt_release_write?

Nope. We really do want to get rid of xprt_disconnect(). The
XPRT_CONNECTED flag needs to be managed by the transport, and the
transport alone. See Patch 7.

> > break;
> > }
> >
> > @@ -1627,8 +1655,7 @@ static void xs_tcp_connect_worker4(struct work_struct *work)
> > break;
> > default:
> > /* get rid of existing socket, and retry */
> > - xs_close(xprt);
> > - break;
> > + xs_tcp_shutdown(xprt);
> > }
> > }
> > out:
> > @@ -1687,8 +1714,7 @@ static void xs_tcp_connect_worker6(struct work_struct *work)
> > break;
> > default:
> > /* get rid of existing socket, and retry */
> > - xs_close(xprt);
> > - break;
> > + xs_tcp_shutdown(xprt);
> > }
> > }
> > out:
> > @@ -1735,6 +1761,19 @@ static void xs_connect(struct rpc_task *task)
> > }
> > }
> >
> > +static void xs_tcp_connect(struct rpc_task *task)
> > +{
> > + struct rpc_xprt *xprt = task->tk_xprt;
> > +
> > + /* Initiate graceful shutdown of the socket if not already done */
> > + if (!test_bit(XPRT_CONNECTING, &xprt->state))
> > + xs_tcp_shutdown(xprt);
> > + /* Exit if we need to wait for socket shutdown to complete */
> > + if (test_bit(XPRT_CLOSING, &xprt->state))
> > + return;
> > + xs_connect(task);
> > +}
> > +
> > /**
> > * xs_udp_print_stats - display UDP socket-specifc stats
> > * @xprt: rpc_xprt struct containing statistics
>
> Adding xs_tcp_connect will allow a lot more clean up in xs_connect,
> especially simplifying UDP transport connects.

Yup.

> > @@ -1805,12 +1844,12 @@ static struct rpc_xprt_ops xs_tcp_ops = {
> > .release_xprt = xs_tcp_release_xprt,
> > .rpcbind = rpcb_getport_async,
> > .set_port = xs_set_port,
> > - .connect = xs_connect,
> > + .connect = xs_tcp_connect,
> > .buf_alloc = rpc_malloc,
> > .buf_free = rpc_free,
> > .send_request = xs_tcp_send_request,
> > .set_retrans_timeout = xprt_set_retrans_timeout_def,
> > - .close = xs_close,
> > + .close = xs_tcp_shutdown,
> > .destroy = xs_destroy,
> > .print_stats = xs_tcp_print_stats,
> > };

2007-11-07 23:58:49

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH 7/7] SUNRPC: Rename xprt_disconnect()

On Wed, 2007-11-07 at 18:16 -0500, Chuck Lever wrote:
> Oops. Looks like you did the rename in xs_tcp_read_fraghdr in this patch.

I agree. That really should be part of patch 6. I'll respin these 2
patches...

> Trond Myklebust wrote:
> > From: Trond Myklebust <[email protected]>
> >
> > xprt_disconnect() should really only be called when the transport shutdown
> > is completed, and it is time to wake up any pending tasks. Rename it to
> > xprt_disconnect_done() in order to reflect the semantical change.
> >
> > Signed-off-by: Trond Myklebust <[email protected]>
> > ---
> >
> > include/linux/sunrpc/xprt.h | 2 +-
> > net/sunrpc/xprt.c | 6 +++---
> > net/sunrpc/xprtrdma/transport.c | 4 ++--
> > net/sunrpc/xprtsock.c | 6 +++---
> > 4 files changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> > index afb9e6a..2554cd2 100644
> > --- a/include/linux/sunrpc/xprt.h
> > +++ b/include/linux/sunrpc/xprt.h
> > @@ -245,7 +245,7 @@ void xprt_adjust_cwnd(struct rpc_task *task, int result);
> > struct rpc_rqst * xprt_lookup_rqst(struct rpc_xprt *xprt, __be32 xid);
> > void xprt_complete_rqst(struct rpc_task *task, int copied);
> > void xprt_release_rqst_cong(struct rpc_task *task);
> > -void xprt_disconnect(struct rpc_xprt *xprt);
> > +void xprt_disconnect_done(struct rpc_xprt *xprt);
> > void xprt_force_disconnect(struct rpc_xprt *xprt);
> >
> > /*
> > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > index fcdc4b8..4d71152 100644
> > --- a/net/sunrpc/xprt.c
> > +++ b/net/sunrpc/xprt.c
> > @@ -574,11 +574,11 @@ static void xprt_autoclose(struct work_struct *work)
> > }
> >
> > /**
> > - * xprt_disconnect - mark a transport as disconnected
> > + * xprt_disconnect_done - mark a transport as disconnected
> > * @xprt: transport to flag for disconnect
> > *
> > */
> > -void xprt_disconnect(struct rpc_xprt *xprt)
> > +void xprt_disconnect_done(struct rpc_xprt *xprt)
> > {
> > dprintk("RPC: disconnected transport %p\n", xprt);
> > spin_lock_bh(&xprt->transport_lock);
> > @@ -586,7 +586,7 @@ void xprt_disconnect(struct rpc_xprt *xprt)
> > xprt_wake_pending_tasks(xprt, -ENOTCONN);
> > spin_unlock_bh(&xprt->transport_lock);
> > }
> > -EXPORT_SYMBOL_GPL(xprt_disconnect);
> > +EXPORT_SYMBOL_GPL(xprt_disconnect_done);
> >
> > /**
> > * xprt_force_disconnect - force a transport to disconnect
> > diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> > index dc55cc9..06e8b50 100644
> > --- a/net/sunrpc/xprtrdma/transport.c
> > +++ b/net/sunrpc/xprtrdma/transport.c
> > @@ -449,7 +449,7 @@ xprt_rdma_close(struct rpc_xprt *xprt)
> > struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
> >
> > dprintk("RPC: %s: closing\n", __func__);
> > - xprt_disconnect(xprt);
> > + xprt_disconnect_done(xprt);
> > (void) rpcrdma_ep_disconnect(&r_xprt->rx_ep, &r_xprt->rx_ia);
> > }
> >
> > @@ -682,7 +682,7 @@ xprt_rdma_send_request(struct rpc_task *task)
> > }
> >
> > if (rpcrdma_ep_post(&r_xprt->rx_ia, &r_xprt->rx_ep, req)) {
> > - xprt_disconnect(xprt);
> > + xprt_disconnect_done(xprt);
> > return -ENOTCONN; /* implies disconnect */
> > }
> >
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index c0aa2b4..a7d53d1 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -789,7 +789,7 @@ clear_close_wait:
> > clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
> > clear_bit(XPRT_CLOSING, &xprt->state);
> > smp_mb__after_clear_bit();
> > - xprt_disconnect(xprt);
> > + xprt_disconnect_done(xprt);
> > }
> >
> > /**
> > @@ -911,7 +911,7 @@ static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_rea
> > /* Sanity check of the record length */
> > if (unlikely(transport->tcp_reclen < 4)) {
> > dprintk("RPC: invalid TCP record fragment length\n");
> > - xprt_disconnect(xprt);
> > + xprt_force_disconnect(xprt);
> > return;
> > }
> > dprintk("RPC: reading TCP record fragment of length %d\n",
> > @@ -1160,7 +1160,7 @@ static void xs_tcp_state_change(struct sock *sk)
> > break;
> > case TCP_CLOSE:
> > /* Mark transport as closed and wake up all pending tasks */
> > - xprt_disconnect(xprt);
> > + xprt_disconnect_done(xprt);
> > clear_bit(XPRT_CLOSING, &xprt->state);
> > }
> > out:
> >
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems? Stop.
> > Now Search log events and configuration files using AJAX and a browser.
> > Download your FREE copy of Splunk now >> http://get.splunk.com/
> > _______________________________________________
> > NFS maillist - [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nfs

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs