We recently had a customer whose filer began closing the client's
connection upon receipt of a PATHCONF operation. The environment had a
mix of RHEL 6.2 and RHEL 6.4 clients. The RHEL 6.2 clients would wait 3
seconds before reconnecting while the RHEL 6.4 clients would reconnect
immediately, triggering what could be described as a DOS on the filer.
The difference in behavior was due to the inclusion of commit a519fc7
(SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT) in the
RHEL 6.4 kernel. With this commit in place, when the server initiates a
close we wind up destroying the transport socket, and a subsequent call
to xs_connect() we attempt to connect right away. Prior to this commit,
we would arrive in xs_connect() with a non-NULL socket and as a result
we'd delay for the reestablish_timeout before trying to connect.
It's still unknown what originally caused the filer to behave in this
manner, but I'm able to reproduce this behavior with a Linux NFS server
patched to close the connection upon receipt of a PATHCONF operation
(patch attached).
I've also attached two possible fixes using the old xprt->shutdown field
to indicate that the other end has initiated a shutdown of the
connection. The goal of both patches is to bring back some of the old
backoff logic without undoing the fix brought about by a519fc7. The
first option will delay any time the xprt->shutdown field is set -- the
drawback being that in the case of a simple restart of the NFS server,
the client may delay for 3 seconds before trying to reconnect. The
second option will only delay after 3 successive connect/close sequences
where we've not received a proper reply to an RPC request.
-Scott
On Fri, 28 Feb 2014 17:29:56 -0500
Scott Mayhew <[email protected]> wrote:
> From 2e3902fc0c66bda360a8e40e3e64d82e312a20d4 Mon Sep 17 00:00:00 2001
> From: Scott Mayhew <[email protected]>
> Date: Fri, 28 Feb 2014 15:23:50 -0500
> Subject: [PATCH] sunrpc: reintroduce xprt->shutdown with a new purpose (option
> 2)
>
> If a server is behaving pathologically and accepting our connections
> only to close the socket on the first RPC operation it receives, then
> we should probably delay when trying to reconnect.
>
> This patch reintroduces the xprt->shutdown field (this time as two
> bits). Previously this field was used to indicate that the transport
> was in the process of being shutdown, but now it will just be used to
> indicate that a shutdown was initiated by the server.
>
> If the server closes the connection 3 times without us having received
> an RPC reply in the interim, then we'll delay before attempting to
> connect again.
> ---
> include/linux/sunrpc/xprt.h | 3 ++-
> net/sunrpc/clnt.c | 2 ++
> net/sunrpc/xprtsock.c | 13 +++++++++++++
> 3 files changed, 17 insertions(+), 1 deletion(-)
>
This patch seems a little more reasonable than the other one if only
because it shouldn't cause artificial delays when there is some
temporary hiccup that causes the server to shut down the connection.
That said, this seems to be squarely a server-side bug so I'm not sure
we ought to go to any great lengths to work around it.
> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> index 8097b9d..621d583 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -173,7 +173,8 @@ struct rpc_xprt {
> unsigned int min_reqs; /* min number of slots */
> atomic_t num_reqs; /* total slots */
> unsigned long state; /* transport state */
> - unsigned char resvport : 1; /* use a reserved port */
> + unsigned char shutdown : 2, /* server initiated a shutdown */
> + resvport : 1; /* use a reserved port */
Hrm 2 bits...so you can go from 0..3. What happens when that wraps? I'd
be a little concerned that it might spill over into other bitfields
when that occurs. The K&R book seems to indicate that bitfield
semantics aren't well-defined so I'm not sure if that would happen or
not...
Even if it is safe, since this is only two bits, you'll only hit the
delay on non-zero multiples of '3', right? I don't think that's what
you're aiming for here.
So, it might be best not to use a bit-field there and use something
like a full char or something.
Also (nit) since "shutdown" used to have a completely different
meaning, I'd suggest not reusing that name. It'll be confusing for
people looking at old and new code. Maybe something like
"server_shutdowns" for the counter or something?
> unsigned int swapper; /* we're swapping over this
> transport */
> unsigned int bind_index; /* bind function index */
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 0edada9..c32de98 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -515,6 +515,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
> if (args->flags & RPC_CLNT_CREATE_NONPRIVPORT)
> xprt->resvport = 0;
>
> + xprt->shutdown = 0;
> +
> clnt = rpc_new_client(args, xprt, NULL);
> if (IS_ERR(clnt))
> return clnt;
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 0addefc..35cdb63 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1295,6 +1295,7 @@ static inline int xs_tcp_read_reply(struct rpc_xprt *xprt,
> xprt_complete_rqst(req->rq_task, transport->tcp_copied);
>
> spin_unlock(&xprt->transport_lock);
> + xprt->shutdown = 0;
> return 0;
> }
>
> @@ -1572,6 +1573,7 @@ static void xs_tcp_state_change(struct sock *sk)
> case TCP_CLOSE_WAIT:
> /* The server initiated a shutdown of the socket */
> xprt->connect_cookie++;
> + xprt->shutdown++;
> clear_bit(XPRT_CONNECTED, &xprt->state);
> xs_tcp_force_close(xprt);
> case TCP_CLOSING:
> @@ -2332,6 +2334,17 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
> xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
> if (xprt->reestablish_timeout > XS_TCP_MAX_REEST_TO)
> xprt->reestablish_timeout = XS_TCP_MAX_REEST_TO;
> + } else if (xprt->shutdown > 2) {
> + dprintk("RPC: server previously initiated shutdown of the "
> + "transport socket %d times\n", xprt->shutdown);
> + if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO)
> + xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
> + dprintk("RPC: xs_connect delayed xprt %p for %lu "
> + "seconds\n",
> + xprt, xprt->reestablish_timeout / HZ);
> + queue_delayed_work(rpciod_workqueue,
> + &transport->connect_worker,
> + xprt->reestablish_timeout);
> } else {
> dprintk("RPC: xs_connect scheduled xprt %p\n", xprt);
> queue_delayed_work(rpciod_workqueue,
--
Jeff Layton <[email protected]>
On Mar 3, 2014, at 17:10, Scott Mayhew <[email protected]> wrote:
> On Mon, 03 Mar 2014, Trond Myklebust wrote:
>
>>
>> On Mar 3, 2014, at 11:13, Jeff Layton <[email protected]> wrote:
>>
>>> On Fri, 28 Feb 2014 17:29:56 -0500
>>> Scott Mayhew <[email protected]> wrote:
>>>
>>>> From 2e3902fc0c66bda360a8e40e3e64d82e312a20d4 Mon Sep 17 00:00:00 2001
>>>> From: Scott Mayhew <[email protected]>
>>>> Date: Fri, 28 Feb 2014 15:23:50 -0500
>>>> Subject: [PATCH] sunrpc: reintroduce xprt->shutdown with a new purpose (option
>>>> 2)
>>>>
>>>> If a server is behaving pathologically and accepting our connections
>>>> only to close the socket on the first RPC operation it receives, then
>>>> we should probably delay when trying to reconnect.
>>>>
>>>> This patch reintroduces the xprt->shutdown field (this time as two
>>>> bits). Previously this field was used to indicate that the transport
>>>> was in the process of being shutdown, but now it will just be used to
>>>> indicate that a shutdown was initiated by the server.
>>>>
>>>> If the server closes the connection 3 times without us having received
>>>> an RPC reply in the interim, then we'll delay before attempting to
>>>> connect again.
>>>> ---
>>>> include/linux/sunrpc/xprt.h | 3 ++-
>>>> net/sunrpc/clnt.c | 2 ++
>>>> net/sunrpc/xprtsock.c | 13 +++++++++++++
>>>> 3 files changed, 17 insertions(+), 1 deletion(-)
>>>>
>>>
>>> This patch seems a little more reasonable than the other one if only
>>> because it shouldn't cause artificial delays when there is some
>>> temporary hiccup that causes the server to shut down the connection.
>>>
>>> That said, this seems to be squarely a server-side bug so I'm not sure
>>> we ought to go to any great lengths to work around it.
>>
>> So this is about a broken server that accepts connection requests and then immediately closes them?
>
> That's correct.
>
>> If so, then I agree with Jeff, it really isn?t something we need to fix on the client.
>
> Not even for the sake of 'politeness' (for lack of a better word)? This
> was in a grid environment and there were apparently a few thousand clients
> doing this, not just a single client.
>
No. If the problem is a broken server, then we fix the server. No politeness needed? :-)
_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]
On Mon, 03 Mar 2014, Trond Myklebust wrote:
>
> On Mar 3, 2014, at 11:13, Jeff Layton <[email protected]> wrote:
>
> > On Fri, 28 Feb 2014 17:29:56 -0500
> > Scott Mayhew <[email protected]> wrote:
> >
> >> From 2e3902fc0c66bda360a8e40e3e64d82e312a20d4 Mon Sep 17 00:00:00 2001
> >> From: Scott Mayhew <[email protected]>
> >> Date: Fri, 28 Feb 2014 15:23:50 -0500
> >> Subject: [PATCH] sunrpc: reintroduce xprt->shutdown with a new purpose (option
> >> 2)
> >>
> >> If a server is behaving pathologically and accepting our connections
> >> only to close the socket on the first RPC operation it receives, then
> >> we should probably delay when trying to reconnect.
> >>
> >> This patch reintroduces the xprt->shutdown field (this time as two
> >> bits). Previously this field was used to indicate that the transport
> >> was in the process of being shutdown, but now it will just be used to
> >> indicate that a shutdown was initiated by the server.
> >>
> >> If the server closes the connection 3 times without us having received
> >> an RPC reply in the interim, then we'll delay before attempting to
> >> connect again.
> >> ---
> >> include/linux/sunrpc/xprt.h | 3 ++-
> >> net/sunrpc/clnt.c | 2 ++
> >> net/sunrpc/xprtsock.c | 13 +++++++++++++
> >> 3 files changed, 17 insertions(+), 1 deletion(-)
> >>
> >
> > This patch seems a little more reasonable than the other one if only
> > because it shouldn't cause artificial delays when there is some
> > temporary hiccup that causes the server to shut down the connection.
> >
> > That said, this seems to be squarely a server-side bug so I'm not sure
> > we ought to go to any great lengths to work around it.
>
> So this is about a broken server that accepts connection requests and then immediately closes them?
That's correct.
> If so, then I agree with Jeff, it really isn?t something we need to fix on the client.
Not even for the sake of 'politeness' (for lack of a better word)? This
was in a grid environment and there were apparently a few thousand clients
doing this, not just a single client.
-Scott
>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>
On Mar 3, 2014, at 11:13, Jeff Layton <[email protected]> wrote:
> On Fri, 28 Feb 2014 17:29:56 -0500
> Scott Mayhew <[email protected]> wrote:
>
>> From 2e3902fc0c66bda360a8e40e3e64d82e312a20d4 Mon Sep 17 00:00:00 2001
>> From: Scott Mayhew <[email protected]>
>> Date: Fri, 28 Feb 2014 15:23:50 -0500
>> Subject: [PATCH] sunrpc: reintroduce xprt->shutdown with a new purpose (option
>> 2)
>>
>> If a server is behaving pathologically and accepting our connections
>> only to close the socket on the first RPC operation it receives, then
>> we should probably delay when trying to reconnect.
>>
>> This patch reintroduces the xprt->shutdown field (this time as two
>> bits). Previously this field was used to indicate that the transport
>> was in the process of being shutdown, but now it will just be used to
>> indicate that a shutdown was initiated by the server.
>>
>> If the server closes the connection 3 times without us having received
>> an RPC reply in the interim, then we'll delay before attempting to
>> connect again.
>> ---
>> include/linux/sunrpc/xprt.h | 3 ++-
>> net/sunrpc/clnt.c | 2 ++
>> net/sunrpc/xprtsock.c | 13 +++++++++++++
>> 3 files changed, 17 insertions(+), 1 deletion(-)
>>
>
> This patch seems a little more reasonable than the other one if only
> because it shouldn't cause artificial delays when there is some
> temporary hiccup that causes the server to shut down the connection.
>
> That said, this seems to be squarely a server-side bug so I'm not sure
> we ought to go to any great lengths to work around it.
So this is about a broken server that accepts connection requests and then immediately closes them? If so, then I agree with Jeff, it really isn?t something we need to fix on the client.
_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]
On Mon, 03 Mar 2014, Jeff Layton wrote:
> On Fri, 28 Feb 2014 17:29:56 -0500
> Scott Mayhew <[email protected]> wrote:
>
> > From 2e3902fc0c66bda360a8e40e3e64d82e312a20d4 Mon Sep 17 00:00:00 2001
> > From: Scott Mayhew <[email protected]>
> > Date: Fri, 28 Feb 2014 15:23:50 -0500
> > Subject: [PATCH] sunrpc: reintroduce xprt->shutdown with a new purpose (option
> > 2)
> >
> > If a server is behaving pathologically and accepting our connections
> > only to close the socket on the first RPC operation it receives, then
> > we should probably delay when trying to reconnect.
> >
> > This patch reintroduces the xprt->shutdown field (this time as two
> > bits). Previously this field was used to indicate that the transport
> > was in the process of being shutdown, but now it will just be used to
> > indicate that a shutdown was initiated by the server.
> >
> > If the server closes the connection 3 times without us having received
> > an RPC reply in the interim, then we'll delay before attempting to
> > connect again.
> > ---
> > include/linux/sunrpc/xprt.h | 3 ++-
> > net/sunrpc/clnt.c | 2 ++
> > net/sunrpc/xprtsock.c | 13 +++++++++++++
> > 3 files changed, 17 insertions(+), 1 deletion(-)
> >
>
> This patch seems a little more reasonable than the other one if only
> because it shouldn't cause artificial delays when there is some
> temporary hiccup that causes the server to shut down the connection.
>
> That said, this seems to be squarely a server-side bug so I'm not sure
> we ought to go to any great lengths to work around it.
>
> > diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> > index 8097b9d..621d583 100644
> > --- a/include/linux/sunrpc/xprt.h
> > +++ b/include/linux/sunrpc/xprt.h
> > @@ -173,7 +173,8 @@ struct rpc_xprt {
> > unsigned int min_reqs; /* min number of slots */
> > atomic_t num_reqs; /* total slots */
> > unsigned long state; /* transport state */
> > - unsigned char resvport : 1; /* use a reserved port */
> > + unsigned char shutdown : 2, /* server initiated a shutdown */
> > + resvport : 1; /* use a reserved port */
>
> Hrm 2 bits...so you can go from 0..3. What happens when that wraps? I'd
> be a little concerned that it might spill over into other bitfields
> when that occurs.
I had tested it to make sure it wrapped around without spilling into the
other fields and it looked okay...
> The K&R book seems to indicate that bitfield
> semantics aren't well-defined so I'm not sure if that would happen or
> not...
...yeah, the statement "nearly everything about bit fields is
implementation dependent" does make it seem like this isn't the best
approach.
>
> Even if it is safe, since this is only two bits, you'll only hit the
> delay on non-zero multiples of '3', right? I don't think that's what
> you're aiming for here.
I was mainly concerned with introducing a delay for the 'broken' server
case without introducing delay for the 'temporary hiccup' case. I
thought 3 quick retries followed by 1 delayed retry would be a nice
predictable behavior. Also I'm resetting the counter whenever we've
received an RPC reply, while in the other patch I did it every time the
state of the socket changed to TCP_ESTABLISHED (which to me seems less
predictable).
>
> So, it might be best not to use a bit-field there and use something
> like a full char or something.
I'd be fine with that.
>
> Also (nit) since "shutdown" used to have a completely different
> meaning, I'd suggest not reusing that name. It'll be confusing for
> people looking at old and new code. Maybe something like
> "server_shutdowns" for the counter or something?
I actually did originally call it srv_shutdown but changed it at the
last minute.
-Scott
>
> > unsigned int swapper; /* we're swapping over this
> > transport */
> > unsigned int bind_index; /* bind function index */
> > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > index 0edada9..c32de98 100644
> > --- a/net/sunrpc/clnt.c
> > +++ b/net/sunrpc/clnt.c
> > @@ -515,6 +515,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
> > if (args->flags & RPC_CLNT_CREATE_NONPRIVPORT)
> > xprt->resvport = 0;
> >
> > + xprt->shutdown = 0;
> > +
> > clnt = rpc_new_client(args, xprt, NULL);
> > if (IS_ERR(clnt))
> > return clnt;
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index 0addefc..35cdb63 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -1295,6 +1295,7 @@ static inline int xs_tcp_read_reply(struct rpc_xprt *xprt,
> > xprt_complete_rqst(req->rq_task, transport->tcp_copied);
> >
> > spin_unlock(&xprt->transport_lock);
> > + xprt->shutdown = 0;
> > return 0;
> > }
> >
> > @@ -1572,6 +1573,7 @@ static void xs_tcp_state_change(struct sock *sk)
> > case TCP_CLOSE_WAIT:
> > /* The server initiated a shutdown of the socket */
> > xprt->connect_cookie++;
> > + xprt->shutdown++;
> > clear_bit(XPRT_CONNECTED, &xprt->state);
> > xs_tcp_force_close(xprt);
> > case TCP_CLOSING:
> > @@ -2332,6 +2334,17 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
> > xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
> > if (xprt->reestablish_timeout > XS_TCP_MAX_REEST_TO)
> > xprt->reestablish_timeout = XS_TCP_MAX_REEST_TO;
> > + } else if (xprt->shutdown > 2) {
> > + dprintk("RPC: server previously initiated shutdown of the "
> > + "transport socket %d times\n", xprt->shutdown);
> > + if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO)
> > + xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
> > + dprintk("RPC: xs_connect delayed xprt %p for %lu "
> > + "seconds\n",
> > + xprt, xprt->reestablish_timeout / HZ);
> > + queue_delayed_work(rpciod_workqueue,
> > + &transport->connect_worker,
> > + xprt->reestablish_timeout);
> > } else {
> > dprintk("RPC: xs_connect scheduled xprt %p\n", xprt);
> > queue_delayed_work(rpciod_workqueue,
>
>
> --
> Jeff Layton <[email protected]>