2009-11-27 21:24:00

by Trond Myklebust

[permalink] [raw]
Subject: Re: Fw: Deadlock regression in v2.6.31.6

On Fri, 2009-11-27 at 01:14 +0100, Stephen R. van den Berg wrote:
> On Fri, Nov 27, 2009 at 01:07, Stephen R. van den Berg <[email protected]> wrote:
> > RPC: worker connecting xprt cfa94400 to address: addr=1.2.3.151
> > port=2049 proto=tcp
> > RPC: cfa94400 connect status 99 connected 0 sock state 7
>
> errno 99 means EADDRNOTAVAIL. In userspace this normally is solved by
> using the REUSEADDR sockopt. In xprtsock.c we try something like:
>
> /* We're probably in TIME_WAIT. Get rid of existing socket,
> * and retry
> */
> set_bit(XPRT_CONNECTION_CLOSE, &xprt->state);
> xprt_force_disconnect(xprt);
>
> I'd guess that this needs to be fixed, or the REUSEADDR sockopt needs to be set.

Does the following patch fix matters?

Trond

---------------------------------------------------------------------------------------------------------
SUNRPC: Ensure that we honour autoclose before attempting to reconnect

From: Trond Myklebust <[email protected]>

If the XPRT_CLOSE_WAIT flag is set, we need to ensure that we call
xprt->ops->close() while holding xprt_lock_write() before we can
start reconnecting.

Signed-off-by: Trond Myklebust <[email protected]>
---

net/sunrpc/xprt.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)


diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index fd46d42..469de29 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -700,6 +700,10 @@ void xprt_connect(struct rpc_task *task)
}
if (!xprt_lock_write(xprt, task))
return;
+
+ if (test_and_clear_bit(XPRT_CLOSE_WAIT, &xprt->state))
+ xprt->ops->close(xprt);
+
if (xprt_connected(xprt))
xprt_release_write(xprt, task);
else {




2009-11-28 00:20:02

by Stephen R. van den Berg

[permalink] [raw]
Subject: Re: Fw: Deadlock regression in v2.6.31.6

On Fri, Nov 27, 2009 at 22:23, Trond Myklebust
<[email protected]> wrote:
> On Fri, 2009-11-27 at 01:14 +0100, Stephen R. van den Berg wrote:
>> On Fri, Nov 27, 2009 at 01:07, Stephen R. van den Berg <[email protected]>=
wrote:
> Does the following patch fix matters?

> =A0 =A0 =A0 =A0if (!xprt_lock_write(xprt, task))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return;
> +
> + =A0 =A0 =A0 if (test_and_clear_bit(XPRT_CLOSE_WAIT, &xprt->state))
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 xprt->ops->close(xprt);
> +
> =A0 =A0 =A0 =A0if (xprt_connected(xprt))

Sorry. No go. I got the following trace, I'm not sure if this is
relevant, because it is difficult to determine if the logging
corresponds to the experienced problem.

RPC: 14194 xprt_connect_status: retrying
RPC: 14194 xprt_prepare_transmit
RPC: 14194 xprt_transmit(112)
RPC: disconnected transport cfa82400
RPC: 14194 xprt_connect xprt cfa82400 is not connected
RPC: 14194 xprt_connect_status: retrying
RPC: 14194 xprt_prepare_transmit
RPC: 14194 xprt_transmit(112)
RPC: disconnected transport cfa82400
RPC: 14194 xprt_connect xprt cfa82400 is not connected
RPC: 14194 xprt_connect_status: retrying
RPC: 14194 xprt_prepare_transmit
RPC: 14194 xprt_transmit(112)
RPC: disconnected transport cfa82400
RPC: 14194 xprt_connect xprt cfa82400 is not connected
RPC: 14194 xprt_connect_status: retrying
RPC: 14194 xprt_prepare_transmit
RPC: 14194 xprt_transmit(112)
RPC: disconnected transport cfa82400
RPC: 14194 xprt_connect xprt cfa82400 is not connected
RPC: 14194 xprt_connect_status: retrying
RPC: 14194 xprt_prepare_transmit
RPC: 14194 xprt_transmit(112)
--=20
Sincerely,
Stephen R. van den Berg.