2008-10-24 12:29:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: Fwd: NFS 5-minute hangs upon S3 resume using 2.6.27 client

On Thu, 2008-10-23 at 23:57 -0700, Michel Lespinasse wrote:
> On Thu, Oct 23, 2008 at 07:17:59PM -0400, Trond Myklebust wrote:
> > > (I'm still concerned about the 3 second delay here...)
> > Does this patch fix that delay?
> >
> > SUNRPC: Fix the setting of xprt->reestablish_timeout when reconnecting
>
> I applied this on top of the previous patch and it worked - but now I'm
> not sure if you wanted to test this as an independant patch ???

It was meant to be applied incrementally on top of the other.

> I'm wondering how the code in xs_tcp_state_change() that sets
> reestablish_timeout back to XS_TCP_INIT_REEST_TO managed to not cause trouble.
>
>
> Can I propose a patch too ? mine looks quite similar to your second patch,
> but with the reestablish_timeout logic hopefully simplified...
>

This would cause a different regression. The current code is there in
order to ensure that we apply that exponential backoff if and only if
the _server_ closes the TCP connection since that would usually indicate
that it is trying to deal with a resource congestion issue. We don't
need to back off if we were the ones closing the socket.

The issue of UDP exponential backoff is moot: the UDP code doesn't use
xs_connect() at all.

Cheers
Trond



2008-10-24 21:02:17

by Michel Lespinasse

[permalink] [raw]
Subject: Re: Fwd: NFS 5-minute hangs upon S3 resume using 2.6.27 client

On Fri, Oct 24, 2008 at 08:29:28AM -0400, Trond Myklebust wrote:
> On Thu, 2008-10-23 at 23:57 -0700, Michel Lespinasse wrote:
> > I applied this on top of the previous patch and it worked - but now I'm
> > not sure if you wanted to test this as an independant patch ???
> It was meant to be applied incrementally on top of the other.

OK. Worked fine, then. Thanks again !

> > Can I propose a patch too ? mine looks quite similar to your second patch,
> > but with the reestablish_timeout logic hopefully simplified...
>
> This would cause a different regression. The current code is there in
> order to ensure that we apply that exponential backoff if and only if
> the _server_ closes the TCP connection since that would usually indicate
> that it is trying to deal with a resource congestion issue. We don't
> need to back off if we were the ones closing the socket.

So the idea is to backoff if the connection is closed by a server FIN but
retry right away if it's closed by a RST ?
I did not realize that, but this makes sense too...

> The issue of UDP exponential backoff is moot: the UDP code doesn't use
> xs_connect() at all.

Hrmmm ? I see the following down the file:
static struct rpc_xprt_ops xs_udp_ops = {
...
.connect = xs_connect,
...
}

I'm not sure if/when the callback is actually called in the UDP case,
but it's definitely being set up that way.

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.