LinuxLists.cc - [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

2010-02-03 06:39:38

Subject: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

If we drop a request in the sunrpc layer, either due kmalloc failure,
or due to a cache miss when we could not queue the request for later
replay, then close the connection to encourage the client to retry sooner.

Note that if the drop happens in the NFS layer, NFSERR_JUKEBOX
(aka NFS4ERR_DELAY) is returned to guide the client concerning
replay.

Signed-off-by: NeilBrown <[email protected]>
---
include/linux/sunrpc/svcauth.h | 10 +++++++---
net/sunrpc/auth_gss/svcauth_gss.c | 12 ++++++------
net/sunrpc/svc.c | 3 +++
net/sunrpc/svcauth_unix.c | 11 ++++++++---
4 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
index d39dbdc..1126693 100644
--- a/include/linux/sunrpc/svcauth.h
+++ b/include/linux/sunrpc/svcauth.h
@@ -108,9 +108,13 @@ struct auth_ops {
#define SVC_NEGATIVE 4
#define SVC_OK 5
#define SVC_DROP 6
-#define SVC_DENIED 7
-#define SVC_PENDING 8
-#define SVC_COMPLETE 9
+#define SVC_CLOSE 7 /* Like SVC_DROP, but request is definitely
+ * lost so if there is a tcp connection, it
+ * should be closed
+ */
+#define SVC_DENIED 8
+#define SVC_PENDING 9
+#define SVC_COMPLETE 10

extern int svc_authenticate(struct svc_rqst *rqstp, __be32 *authp);
diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index e34bc53..4db9562 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -963,7 +963,7 @@ svcauth_gss_set_client(struct svc_rqst *rqstp)
if (rqstp->rq_gssclient == NULL)
return SVC_DENIED;
stat = svcauth_unix_set_client(rqstp);
- if (stat == SVC_DROP)
+ if (stat == SVC_DROP || stat == SVC_CLOSE)
return stat;
return SVC_OK;
}
@@ -1017,7 +1017,7 @@ static int svcauth_gss_handle_init(struct svc_rqst *rqstp,
return SVC_DENIED;
memset(&rsikey, 0, sizeof(rsikey));
if (dup_netobj(&rsikey.in_handle, &gc->gc_ctx))
- return SVC_DROP;
+ return SVC_CLOSE;
*authp = rpc_autherr_badverf;
if (svc_safe_getnetobj(argv, &tmpobj)) {
kfree(rsikey.in_handle.data);
@@ -1025,22 +1025,22 @@ static int svcauth_gss_handle_init(struct svc_rqst *rqstp,
}
if (dup_netobj(&rsikey.in_token, &tmpobj)) {
kfree(rsikey.in_handle.data);
- return SVC_DROP;
+ return SVC_CLOSE;
}

/* Perform upcall, or find upcall result: */
rsip = rsi_lookup(&rsikey);
rsi_free(&rsikey);
if (!rsip)
- return SVC_DROP;
+ return SVC_CLOSE;
switch (cache_check(&rsi_cache, &rsip->h, &rqstp->rq_chandle)) {
case -EAGAIN:
case -ETIMEDOUT:
case -ENOENT:
/* No upcall result: */
- return SVC_DROP;
+ return SVC_CLOSE;
case 0:
- ret = SVC_DROP;
+ ret = SVC_CLOSE;
/* Got an answer to the upcall; use it: */
if (gss_write_init_verf(rqstp, rsip))
goto out;
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 538ca43..e750988 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1050,6 +1050,9 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
goto err_bad;
case SVC_DENIED:
goto err_bad_auth;
+ case SVC_CLOSE:
+ if (test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
+ svc_close_xprt(rqstp->rq_xprt);
case SVC_DROP:
goto dropit;
case SVC_COMPLETE:
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index d8c0411..a25f8ba 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -668,6 +668,8 @@ static struct group_info *unix_gid_find(uid_t uid, struct svc_rqst *rqstp)
switch (ret) {
case -ENOENT:
return ERR_PTR(-ENOENT);
+ case -ETIMEDOUT:
+ return ERR_PTR(-ESHUTDOWN);
case 0:
gi = get_group_info(ug->gi);
cache_put(&ug->h, &unix_gid_cache);
@@ -714,8 +716,9 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
switch (cache_check(&ip_map_cache, &ipm->h, &rqstp->rq_chandle)) {
default:
BUG();
- case -EAGAIN:
case -ETIMEDOUT:
+ return SVC_CLOSE;
+ case -EAGAIN:
return SVC_DROP;
case -ENOENT:
return SVC_DENIED;
@@ -730,6 +733,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
switch (PTR_ERR(gi)) {
case -EAGAIN:
return SVC_DROP;
+ case -ESHUTDOWN:
+ return SVC_CLOSE;
case -ENOENT:
break;
default:
@@ -770,7 +775,7 @@ svcauth_null_accept(struct svc_rqst *rqstp, __be32 *authp)
cred->cr_gid = (gid_t) -1;
cred->cr_group_info = groups_alloc(0);
if (cred->cr_group_info == NULL)
- return SVC_DROP; /* kmalloc failure - client must retry */
+ return SVC_CLOSE; /* kmalloc failure - client must retry */

/* Put NULL verifier */
svc_putnl(resv, RPC_AUTH_NULL);
@@ -834,7 +839,7 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp)
goto badcred;
cred->cr_group_info = groups_alloc(slen);
if (cred->cr_group_info == NULL)
- return SVC_DROP;
+ return SVC_CLOSE;
for (i = 0; i < slen; i++)
GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {

2010-02-03 15:43:46

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

On 02/03/2010 01:31 AM, NeilBrown wrote:
> If we drop a request in the sunrpc layer, either due kmalloc failure,
> or due to a cache miss when we could not queue the request for later
> replay, then close the connection to encourage the client to retry sooner.

I studied connection dropping behavior a few years back, and decided
that dropping the connection on a retransmit is nearly always
counterproductive. Any other pending requests on a connection that is
dropped must also be retransmitted, which means one retransmit suddenly
turns into many. And then you get into issues of idempotency and all
the extra traffic and the long delays and the risk of reconnecting on a
different port so that XID replay is undetectable...

I don't think dropping the connection will cause the client to
retransmit sooner. Clients I have encountered will reconnect and
retransmit only after their retransmit timeout fires, never sooner.

Unfortunately NFSv4 requires a connection drop before a retransmit, but
NFSv3 does not. NFSv4 servers are rather supposed to try very hard not
to drop requests.

How often do you expect this kind of recovery to be necessary? Would it
be possible to drop only for NFSv4 connections?

> Note that if the drop happens in the NFS layer, NFSERR_JUKEBOX
> (aka NFS4ERR_DELAY) is returned to guide the client concerning
> replay.
>
> Signed-off-by: NeilBrown<[email protected]>
> ---
> include/linux/sunrpc/svcauth.h | 10 +++++++---
> net/sunrpc/auth_gss/svcauth_gss.c | 12 ++++++------
> net/sunrpc/svc.c | 3 +++
> net/sunrpc/svcauth_unix.c | 11 ++++++++---
> 4 files changed, 24 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
> index d39dbdc..1126693 100644
> --- a/include/linux/sunrpc/svcauth.h
> +++ b/include/linux/sunrpc/svcauth.h
> @@ -108,9 +108,13 @@ struct auth_ops {
> #define SVC_NEGATIVE 4
> #define SVC_OK 5
> #define SVC_DROP 6
> -#define SVC_DENIED 7
> -#define SVC_PENDING 8
> -#define SVC_COMPLETE 9
> +#define SVC_CLOSE 7 /* Like SVC_DROP, but request is definitely
> + * lost so if there is a tcp connection, it
> + * should be closed
> + */
> +#define SVC_DENIED 8
> +#define SVC_PENDING 9
> +#define SVC_COMPLETE 10
>
>
> extern int svc_authenticate(struct svc_rqst *rqstp, __be32 *authp);
> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
> index e34bc53..4db9562 100644
> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> @@ -963,7 +963,7 @@ svcauth_gss_set_client(struct svc_rqst *rqstp)
> if (rqstp->rq_gssclient == NULL)
> return SVC_DENIED;
> stat = svcauth_unix_set_client(rqstp);
> - if (stat == SVC_DROP)
> + if (stat == SVC_DROP || stat == SVC_CLOSE)
> return stat;
> return SVC_OK;
> }
> @@ -1017,7 +1017,7 @@ static int svcauth_gss_handle_init(struct svc_rqst *rqstp,
> return SVC_DENIED;
> memset(&rsikey, 0, sizeof(rsikey));
> if (dup_netobj(&rsikey.in_handle,&gc->gc_ctx))
> - return SVC_DROP;
> + return SVC_CLOSE;
> *authp = rpc_autherr_badverf;
> if (svc_safe_getnetobj(argv,&tmpobj)) {
> kfree(rsikey.in_handle.data);
> @@ -1025,22 +1025,22 @@ static int svcauth_gss_handle_init(struct svc_rqst *rqstp,
> }
> if (dup_netobj(&rsikey.in_token,&tmpobj)) {
> kfree(rsikey.in_handle.data);
> - return SVC_DROP;
> + return SVC_CLOSE;
> }
>
> /* Perform upcall, or find upcall result: */
> rsip = rsi_lookup(&rsikey);
> rsi_free(&rsikey);
> if (!rsip)
> - return SVC_DROP;
> + return SVC_CLOSE;
> switch (cache_check(&rsi_cache,&rsip->h,&rqstp->rq_chandle)) {
> case -EAGAIN:
> case -ETIMEDOUT:
> case -ENOENT:
> /* No upcall result: */
> - return SVC_DROP;
> + return SVC_CLOSE;
> case 0:
> - ret = SVC_DROP;
> + ret = SVC_CLOSE;
> /* Got an answer to the upcall; use it: */
> if (gss_write_init_verf(rqstp, rsip))
> goto out;
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 538ca43..e750988 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -1050,6 +1050,9 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
> goto err_bad;
> case SVC_DENIED:
> goto err_bad_auth;
> + case SVC_CLOSE:
> + if (test_bit(XPT_TEMP,&rqstp->rq_xprt->xpt_flags))
> + svc_close_xprt(rqstp->rq_xprt);
> case SVC_DROP:
> goto dropit;
> case SVC_COMPLETE:
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index d8c0411..a25f8ba 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -668,6 +668,8 @@ static struct group_info *unix_gid_find(uid_t uid, struct svc_rqst *rqstp)
> switch (ret) {
> case -ENOENT:
> return ERR_PTR(-ENOENT);
> + case -ETIMEDOUT:
> + return ERR_PTR(-ESHUTDOWN);
> case 0:
> gi = get_group_info(ug->gi);
> cache_put(&ug->h,&unix_gid_cache);
> @@ -714,8 +716,9 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
> switch (cache_check(&ip_map_cache,&ipm->h,&rqstp->rq_chandle)) {
> default:
> BUG();
> - case -EAGAIN:
> case -ETIMEDOUT:
> + return SVC_CLOSE;
> + case -EAGAIN:
> return SVC_DROP;
> case -ENOENT:
> return SVC_DENIED;
> @@ -730,6 +733,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
> switch (PTR_ERR(gi)) {
> case -EAGAIN:
> return SVC_DROP;
> + case -ESHUTDOWN:
> + return SVC_CLOSE;
> case -ENOENT:
> break;
> default:
> @@ -770,7 +775,7 @@ svcauth_null_accept(struct svc_rqst *rqstp, __be32 *authp)
> cred->cr_gid = (gid_t) -1;
> cred->cr_group_info = groups_alloc(0);
> if (cred->cr_group_info == NULL)
> - return SVC_DROP; /* kmalloc failure - client must retry */
> + return SVC_CLOSE; /* kmalloc failure - client must retry */
>
> /* Put NULL verifier */
> svc_putnl(resv, RPC_AUTH_NULL);
> @@ -834,7 +839,7 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp)
> goto badcred;
> cred->cr_group_info = groups_alloc(slen);
> if (cred->cr_group_info == NULL)
> - return SVC_DROP;
> + return SVC_CLOSE;
> for (i = 0; i< slen; i++)
> GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
> if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
chuck[dot]lever[at]oracle[dot]com

2010-02-03 21:24:05

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

On Wed, 03 Feb 2010 10:43:04 -0500
Chuck Lever <[email protected]> wrote:

> On 02/03/2010 01:31 AM, NeilBrown wrote:
> > If we drop a request in the sunrpc layer, either due kmalloc failure,
> > or due to a cache miss when we could not queue the request for later
> > replay, then close the connection to encourage the client to retry sooner.
>
> I studied connection dropping behavior a few years back, and decided
> that dropping the connection on a retransmit is nearly always
> counterproductive. Any other pending requests on a connection that is
> dropped must also be retransmitted, which means one retransmit suddenly
> turns into many. And then you get into issues of idempotency and all
> the extra traffic and the long delays and the risk of reconnecting on a
> different port so that XID replay is undetectable...

You make some good points there, thanks.

>
> I don't think dropping the connection will cause the client to
> retransmit sooner. Clients I have encountered will reconnect and
> retransmit only after their retransmit timeout fires, never sooner.
>

I thought I had noticed the Linux client resending immediately, but it would
have been a while ago, and I could easily be remembering wrongly.

My reasoning was that if the connection is closed then the client can *know*
that they won't get a response to any outstanding requests, rather than
having to use the timeout heuristic. How the client uses that information I
don't know, but at least they would have it.

> Unfortunately NFSv4 requires a connection drop before a retransmit, but
> NFSv3 does not. NFSv4 servers are rather supposed to try very hard not
> to drop requests.
>
> How often do you expect this kind of recovery to be necessary? Would it
> be possible to drop only for NFSv4 connections?
>

With the improved handling of large requests I would expect this kind of
recovery would be very rarely needed.
Yes, it would be quite easy to only drop connections on which we have seen an
NFSv4 request... and maybe also connections on which we have not successfully
handled any request yet(?).
What if, instead of closing the connection, we set a flag so that it would be
closed as soon as it had been idle for 1 second, thus flushing any other
pending requests??? That probably doesn't help - there would easily be real
cases where other threads of activity keep the connection busy, while the
thread waiting for the lost request still needs a full time-out.

I would be happy with the v4-only version.

Thanks,
NeilBrown

2010-02-03 23:14:06

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

On Wed, 2010-02-03 at 17:40 -0500, Chuck Lever wrote:
> On 02/03/2010 05:20 PM, Trond Myklebust wrote:
> > On Thu, 2010-02-04 at 08:23 +1100, Neil Brown wrote:
> >> On Wed, 03 Feb 2010 10:43:04 -0500
> >> Chuck Lever<[email protected]> wrote:
> >>>
> >>> I don't think dropping the connection will cause the client to
> >>> retransmit sooner. Clients I have encountered will reconnect and
> >>> retransmit only after their retransmit timeout fires, never sooner.
> >>>
> >>
> >> I thought I had noticed the Linux client resending immediately, but it would
> >> have been a while ago, and I could easily be remembering wrongly.
> >
> > It depends on who closes the connection.
> >
> > The client assumes that if the _server_ closes the connection, then it
> > may be having resource congestion issues. In order to give the server
> > time to recover, the client will delay reconnecting for 3 seconds (with
> > an exponential back off).
> >
> > If, on the other hand, the client was the one that initiated the
> > connection closure, then it will try to reconnect immediately.
>
> That's only if there are RPC requests immediately ready to send, though,
> right? A request that is waiting for a reply when the connection is
> dropped wouldn't be resent until its retransmit timer expired, I thought.

No, that is incorrect.

We call xprt_wake_pending_tasks() both when the connection is closed,
and when it is re-established: see the details in
xs_tcp_state_change().

It doesn't make sense for the client to defer resending requests when it
knows that the original connection was lost. Deferring would simply mean
that the chances of the server evicting the reply from its DRC will
increase.

> And, this behavior is true only for late-model clients... some of the
> eariler 2.6 clients have some trouble with this scenario, I seem to recall.

Yes. Some of the earlier clients were too aggressive when reconnecting,
which sometimes lead to problems when the server was truly congested. I
hope we've fixed that now, and that the distributions that are still
supporting older kernels are working on backporting those fixes.

Cheers
Trond

2010-02-03 22:34:35

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

On Wed, Feb 03, 2010 at 05:20:10PM -0500, Trond Myklebust wrote:
> On Thu, 2010-02-04 at 08:23 +1100, Neil Brown wrote:
> > On Wed, 03 Feb 2010 10:43:04 -0500
> > Chuck Lever <[email protected]> wrote:
> > >
> > > I don't think dropping the connection will cause the client to
> > > retransmit sooner. Clients I have encountered will reconnect and
> > > retransmit only after their retransmit timeout fires, never sooner.
> > >
> >
> > I thought I had noticed the Linux client resending immediately, but it would
> > have been a while ago, and I could easily be remembering wrongly.
>
> It depends on who closes the connection.
>
> The client assumes that if the _server_ closes the connection, then it
> may be having resource congestion issues. In order to give the server
> time to recover, the client will delay reconnecting for 3 seconds (with
> an exponential back off).
>
> If, on the other hand, the client was the one that initiated the
> connection closure, then it will try to reconnect immediately.

So, if I understand Neil's patches right:

- First we try waiting at least one second for the upcall.
- Then we try to return JUKEBOX/DELAY. (But if we're still
processing the rpc header we may not have the option.)
- Then we give up and drop the request.

Upcalls shouldn't normally take a second; so something's broken or
congested, whether it's us or our kerberos server. So telling the
client we're congested sounds right, as does the client response Trond
describes.

--b.

2010-02-03 22:35:47

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

On 02/03/2010 04:23 PM, Neil Brown wrote:
> On Wed, 03 Feb 2010 10:43:04 -0500
> Chuck Lever<[email protected]> wrote:
>
>> On 02/03/2010 01:31 AM, NeilBrown wrote:
>>> If we drop a request in the sunrpc layer, either due kmalloc failure,
>>> or due to a cache miss when we could not queue the request for later
>>> replay, then close the connection to encourage the client to retry sooner.
>>
>> I studied connection dropping behavior a few years back, and decided
>> that dropping the connection on a retransmit is nearly always
>> counterproductive. Any other pending requests on a connection that is
>> dropped must also be retransmitted, which means one retransmit suddenly
>> turns into many. And then you get into issues of idempotency and all
>> the extra traffic and the long delays and the risk of reconnecting on a
>> different port so that XID replay is undetectable...
>
> You make some good points there, thanks.
>
>>
>> I don't think dropping the connection will cause the client to
>> retransmit sooner. Clients I have encountered will reconnect and
>> retransmit only after their retransmit timeout fires, never sooner.
>>
>
> I thought I had noticed the Linux client resending immediately, but it would
> have been a while ago, and I could easily be remembering wrongly.
>
> My reasoning was that if the connection is closed then the client can *know*
> that they won't get a response to any outstanding requests, rather than
> having to use the timeout heuristic. How the client uses that information I
> don't know, but at least they would have it.

So, I seem to remember expecting that behavior, and then being
disappointed when it didn't work that way :-)

It's also the case that there is some pathological behavior around
reconnect in some of the older 2.6 NFS clients. Trond would probably
remember the details there.

In any event, I think you would do well to make some direct observations
with several different vintages of Linux NFS clients, just to be sure
this works as you expect it to with reasonable clients.

I noticed that connection drops are especially onerous when a server is
under load, and that's exactly when drops seem to occur most often. It's
one of those corner cases that's nearly impossible to test well.

>> Unfortunately NFSv4 requires a connection drop before a retransmit, but
>> NFSv3 does not. NFSv4 servers are rather supposed to try very hard not
>> to drop requests.
>>
>> How often do you expect this kind of recovery to be necessary? Would it
>> be possible to drop only for NFSv4 connections?
>>
>
> With the improved handling of large requests I would expect this kind of
> recovery would be very rarely needed.
>
> Yes, it would be quite easy to only drop connections on which we have seen an
> NFSv4 request... and maybe also connections on which we have not successfully
> handled any request yet(?).
> What if, instead of closing the connection, we set a flag so that it would be
> closed as soon as it had been idle for 1 second, thus flushing any other
> pending requests??? That probably doesn't help - there would easily be real
> cases where other threads of activity keep the connection busy, while the
> thread waiting for the lost request still needs a full time-out.
>
> I would be happy with the v4-only version.

v4 would almost be required to work this way, I think. I wouldn't
object to a v4-only implementation for now.

I'm sure there are cases where v3 will have to drop the connection...
i'd just like to ensure that's it's absolutely a last resort.

--
chuck[dot]lever[at]oracle[dot]com

2010-02-03 22:42:02

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

On 02/03/2010 05:20 PM, Trond Myklebust wrote:
> On Thu, 2010-02-04 at 08:23 +1100, Neil Brown wrote:
>> On Wed, 03 Feb 2010 10:43:04 -0500
>> Chuck Lever<[email protected]> wrote:
>>>
>>> I don't think dropping the connection will cause the client to
>>> retransmit sooner. Clients I have encountered will reconnect and
>>> retransmit only after their retransmit timeout fires, never sooner.
>>>
>>
>> I thought I had noticed the Linux client resending immediately, but it would
>> have been a while ago, and I could easily be remembering wrongly.
>
> It depends on who closes the connection.
>
> The client assumes that if the _server_ closes the connection, then it
> may be having resource congestion issues. In order to give the server
> time to recover, the client will delay reconnecting for 3 seconds (with
> an exponential back off).
>
> If, on the other hand, the client was the one that initiated the
> connection closure, then it will try to reconnect immediately.

That's only if there are RPC requests immediately ready to send, though,
right? A request that is waiting for a reply when the connection is
dropped wouldn't be resent until its retransmit timer expired, I thought.

And, this behavior is true only for late-model clients... some of the
eariler 2.6 clients have some trouble with this scenario, I seem to recall.

--
chuck[dot]lever[at]oracle[dot]com

2010-02-03 22:20:17

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

On Thu, 2010-02-04 at 08:23 +1100, Neil Brown wrote:
> On Wed, 03 Feb 2010 10:43:04 -0500
> Chuck Lever <[email protected]> wrote:
> >
> > I don't think dropping the connection will cause the client to
> > retransmit sooner. Clients I have encountered will reconnect and
> > retransmit only after their retransmit timeout fires, never sooner.
> >
>
> I thought I had noticed the Linux client resending immediately, but it would
> have been a while ago, and I could easily be remembering wrongly.

It depends on who closes the connection.

The client assumes that if the _server_ closes the connection, then it
may be having resource congestion issues. In order to give the server
time to recover, the client will delay reconnecting for 3 seconds (with
an exponential back off).

If, on the other hand, the client was the one that initiated the
connection closure, then it will try to reconnect immediately.

Cheers
Trond

2010-02-04 00:24:30

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

On Wed, 2010-02-03 at 19:06 -0500, Chuck Lever wrote:
> My only concern would be that we perhaps should not assume that every
> 2.6 NFS client does this. There was an awful lot of churn at one point
> as you were getting all of these ducks in a row. Before you changed the
> underlying RPC transports to return only ENOTCONN, for instance, was
> this behavior the same? I wouldn't swear to it in the 2.6.7 - .9
> vintage kernels.

Every change in this area is performance sensitive, and should certainly
be tested carefully; particularly so with older kernels.

However we shouldn't forget that the changes to the client were
motivated by poor performance against a variety of servers, in a variety
of workloads and so it isn't unreasonable to expect that SUSE, Red Hat,
Debian, Oracle, etc do have an interest in fixing their NFS clients.

My question would rather therefore be: do they already have a fix in
their latest kernel revisions, and if not, are they able to give us a
timeframe?

Cheers
Trond

2010-02-04 00:07:52

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 6/9] sunrpc: close connection when a request is irretrievably lost.

On 02/03/2010 06:13 PM, Trond Myklebust wrote:
> On Wed, 2010-02-03 at 17:40 -0500, Chuck Lever wrote:
>> On 02/03/2010 05:20 PM, Trond Myklebust wrote:
>>> On Thu, 2010-02-04 at 08:23 +1100, Neil Brown wrote:
>>>> On Wed, 03 Feb 2010 10:43:04 -0500
>>>> Chuck Lever<[email protected]> wrote:
>>>>>
>>>>> I don't think dropping the connection will cause the client to
>>>>> retransmit sooner. Clients I have encountered will reconnect and
>>>>> retransmit only after their retransmit timeout fires, never sooner.
>>>>>
>>>>
>>>> I thought I had noticed the Linux client resending immediately, but it would
>>>> have been a while ago, and I could easily be remembering wrongly.
>>>
>>> It depends on who closes the connection.
>>>
>>> The client assumes that if the _server_ closes the connection, then it
>>> may be having resource congestion issues. In order to give the server
>>> time to recover, the client will delay reconnecting for 3 seconds (with
>>> an exponential back off).
>> >
>>> If, on the other hand, the client was the one that initiated the
>>> connection closure, then it will try to reconnect immediately.
>>
>> That's only if there are RPC requests immediately ready to send, though,
>> right? A request that is waiting for a reply when the connection is
>> dropped wouldn't be resent until its retransmit timer expired, I thought.
>
> No, that is incorrect.
>
> We call xprt_wake_pending_tasks() both when the connection is closed,
> and when it is re-established: see the details in
> xs_tcp_state_change().
>
> It doesn't make sense for the client to defer resending requests when it
> knows that the original connection was lost. Deferring would simply mean
> that the chances of the server evicting the reply from its DRC will
> increase.

True, thanks for clarifying.

My only concern would be that we perhaps should not assume that every
2.6 NFS client does this. There was an awful lot of churn at one point
as you were getting all of these ducks in a row. Before you changed the
underlying RPC transports to return only ENOTCONN, for instance, was
this behavior the same? I wouldn't swear to it in the 2.6.7 - .9
vintage kernels.

--
chuck[dot]lever[at]oracle[dot]com