LinuxLists.cc - [PATCH] NFSv4: Use exponential backoff delay for NFS4

2013-04-24 20:56:12

Subject: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.

Additionally this alleviates an interoperability problem with the AIX NFSv4
Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
close when it happens in close proximity to a RELEASE_LOCKOWNER. This would
cause a linux client to hang for 15 seconds.

Signed-off-by: Dave Chiluk <[email protected]>
---
fs/nfs/nfs4proc.c | 12 ++++++++++++
include/linux/sunrpc/sched.h | 1 +
2 files changed, 13 insertions(+)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 0ad025e..37dad27 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4006,6 +4006,18 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
#endif /* CONFIG_NFS_V4_1 */
case -NFS4ERR_DELAY:
nfs_inc_server_stats(server, NFSIOS_DELAY);
+ /* Do an exponential backoff of retries from
+ * NFS4_POLL_RETRY_MIN to NFS4_POLL_RETRY_MAX. */
+ task->tk_timeout = NFS4_POLL_RETRY_MIN <<
+ (task->tk_delays*2);
+ if (task->tk_timeout > NFS4_POLL_RETRY_MAX)
+ rpc_delay(task, NFS4_POLL_RETRY_MAX);
+ else {
+ task->tk_delays++;
+ rpc_delay(task, task->tk_timeout);
+ }
+ task->tk_status = 0;
+ return -EAGAIN;
case -NFS4ERR_GRACE:
rpc_delay(task, NFS4_POLL_RETRY_MAX);
task->tk_status = 0;
diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 84ca436..60f82bf 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -62,6 +62,7 @@ struct rpc_task {
void * tk_calldata;

unsigned long tk_timeout; /* timeout for rpc_sleep() */
+ unsigned short tk_delays; /* number of times task delayed */
unsigned long tk_runstate; /* Task run status */
struct workqueue_struct *tk_workqueue; /* Normally rpciod, but could
* be any workqueue
--
1.7.9.5

2013-04-24 21:11:57

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Wed, Apr 24, 2013 at 03:55:49PM -0500, Dave Chiluk wrote:
> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
>
> Additionally this alleviates an interoperability problem with the AIX NFSv4
> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would
> cause a linux client to hang for 15 seconds.
>
> Signed-off-by: Dave Chiluk <[email protected]>
> ---
> fs/nfs/nfs4proc.c | 12 ++++++++++++
> include/linux/sunrpc/sched.h | 1 +
> 2 files changed, 13 insertions(+)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 0ad025e..37dad27 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -4006,6 +4006,18 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
> #endif /* CONFIG_NFS_V4_1 */
> case -NFS4ERR_DELAY:
> nfs_inc_server_stats(server, NFSIOS_DELAY);
> + /* Do an exponential backoff of retries from
> + * NFS4_POLL_RETRY_MIN to NFS4_POLL_RETRY_MAX. */
> + task->tk_timeout = NFS4_POLL_RETRY_MIN <<
> + (task->tk_delays*2);
> + if (task->tk_timeout > NFS4_POLL_RETRY_MAX)
> + rpc_delay(task, NFS4_POLL_RETRY_MAX);
> + else {
> + task->tk_delays++;
> + rpc_delay(task, task->tk_timeout);
> + }
> + task->tk_status = 0;
> + return -EAGAIN;

Just as a matter of style, could you stick this in a helper something
like the existing nfs4_delay?:

case -NFS4ERR_DELAY:
nfs_inc_server_stats(server, NFSIOS_DELAY);
nfs4_async_delay(task);
task->tk_status = 0;
return -EAGAIN;
...

--b.

> case -NFS4ERR_GRACE:
> rpc_delay(task, NFS4_POLL_RETRY_MAX);
> task->tk_status = 0;
> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
> index 84ca436..60f82bf 100644
> --- a/include/linux/sunrpc/sched.h
> +++ b/include/linux/sunrpc/sched.h
> @@ -62,6 +62,7 @@ struct rpc_task {
> void * tk_calldata;
>
> unsigned long tk_timeout; /* timeout for rpc_sleep() */
> + unsigned short tk_delays; /* number of times task delayed */
> unsigned long tk_runstate; /* Task run status */
> struct workqueue_struct *tk_workqueue; /* Normally rpciod, but could
> * be any workqueue
> --
> 1.7.9.5
>

2013-04-24 21:28:51

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
>
> Additionally this alleviates an interoperability problem with the AIX NFSv4
> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would
> cause a linux client to hang for 15 seconds.

Hi Dave,

The AIX server is not being motivated by any requirements in the NFSv4
spec here, so I fail to see the reason why the behaviour that you
describe can justify changing the client. It is not at all obvious to me
that we should be retrying aggressively when NFSv4 servers return
NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
the exising 15 seconds?

The motivation for doing it in the case of OPEN, SETATTR, etc is
clearer: those operations may require the server to recall a delegation,
in which case aggressive retries are in order since delegation recalls
are usually fast.
The motivation in the case of LOCK is less clear, but it is basically
down to the fact that NFSv4 has a polling model for doing blocking
locks.
In all other cases, why should we be treating NFS4ERR_DELAY any
differently from how we treat NFS3ERR_JUKEBOX in NFSv3?

Note that if we do decide that changing the client is the right thing,
then I don't want the patch to add new fields to struct rpc_task. That's
the wrong layer for storing NFSv4 client specific data.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-04-24 21:54:53

by Dave Chiluk

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On 04/24/2013 04:28 PM, Myklebust, Trond wrote:
> On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
>> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
>> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
>>
>> Additionally this alleviates an interoperability problem with the AIX NFSv4
>> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
>> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would
>> cause a linux client to hang for 15 seconds.
>
> Hi Dave,
>
> The AIX server is not being motivated by any requirements in the NFSv4
> spec here, so I fail to see the reason why the behaviour that you
> describe can justify changing the client. It is not at all obvious to me
> that we should be retrying aggressively when NFSv4 servers return
> NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
> the exising 15 seconds?

I agree with you that AIX is at fault, and that the preferable situation
for the linux client would be for AIX to not return NFS4ERR_DELAY in
this use case. I have attached a simple program that causes exacerbates
the problem on the AIX server. I have already had a conference call
with AIX NFS development about this issue, where I vehemently tried to
convince them to fix their server. Unfortunately as I don't have much
reputation in the NFS community, I was unable to convince them to do the
right thing. I would be more than happy to set up another call, if
someone higher up in the linux NFS hierarchy would be willing to
participate.

That being said, I think implementing an exponential backoff is an
improvement in the client regardless of what AIX is doing. If a server
needs only 2 seconds to process a request for which NFS4ERR_DELAY was
returned, this algorithm would get the client back and running after
only 2.1 seconds of elapsed time. Whereas the current dumb algorithm
would simply wait 15 seconds. This is the reason that I implemented
this change.

> The motivation for doing it in the case of OPEN, SETATTR, etc is
> clearer: those operations may require the server to recall a delegation,
> in which case aggressive retries are in order since delegation recalls
> are usually fast.
> The motivation in the case of LOCK is less clear, but it is basically
> down to the fact that NFSv4 has a polling model for doing blocking
> locks.

> In all other cases, why should we be treating NFS4ERR_DELAY any
> differently from how we treat NFS3ERR_JUKEBOX in NFSv3?
>
> Note that if we do decide that changing the client is the right thing,
> then I don't want the patch to add new fields to struct rpc_task. That's
> the wrong layer for storing NFSv4 client specific data.

This is something that I was concerned about as well, but I could not
find another persistent way to do this. I am open to suggestions of
which structures would be more acceptable.

Thanks,
Dave.

Attachments:

open-close.c (410.00 B)

2013-04-24 22:35:12

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote:
> On 04/24/2013 04:28 PM, Myklebust, Trond wrote:
> > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
> >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
> >>
> >> Additionally this alleviates an interoperability problem with the AIX NFSv4
> >> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> >> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would
> >> cause a linux client to hang for 15 seconds.
> >
> > Hi Dave,
> >
> > The AIX server is not being motivated by any requirements in the NFSv4
> > spec here, so I fail to see the reason why the behaviour that you
> > describe can justify changing the client. It is not at all obvious to me
> > that we should be retrying aggressively when NFSv4 servers return
> > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
> > the exising 15 seconds?
>
> I agree with you that AIX is at fault, and that the preferable situation
> for the linux client would be for AIX to not return NFS4ERR_DELAY in
> this use case. I have attached a simple program that causes exacerbates
> the problem on the AIX server. I have already had a conference call
> with AIX NFS development about this issue, where I vehemently tried to
> convince them to fix their server. Unfortunately as I don't have much
> reputation in the NFS community, I was unable to convince them to do the
> right thing. I would be more than happy to set up another call, if
> someone higher up in the linux NFS hierarchy would be willing to
> participate.

I'd think that if they have customers that want to use Linux clients,
then those customers are likely to have more influence. This is entirely
a consequence of _their_ design decisions, quite frankly, since
returning NFS4ERR_DELAY in the above situation is downright silly. The
server designers _know_ that the RELEASE_LOCKOWNER will finish whatever
it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do
the exact same state manipulations anyway...

> That being said, I think implementing an exponential backoff is an
> improvement in the client regardless of what AIX is doing. If a server
> needs only 2 seconds to process a request for which NFS4ERR_DELAY was
> returned, this algorithm would get the client back and running after
> only 2.1 seconds of elapsed time. Whereas the current dumb algorithm
> would simply wait 15 seconds. This is the reason that I implemented
> this change.

Right, but my point above is that _in_general_ if we don't know why the
server is returning NFS4ERR_DELAY, then how can we attach any retry
numbers at all? HSM systems, for instance, have very different latencies
than the above and were the reason for inventing NFS3ERR_JUKEBOX in the
first place.

> > The motivation for doing it in the case of OPEN, SETATTR, etc is
> > clearer: those operations may require the server to recall a delegation,
> > in which case aggressive retries are in order since delegation recalls
> > are usually fast.
> > The motivation in the case of LOCK is less clear, but it is basically
> > down to the fact that NFSv4 has a polling model for doing blocking
> > locks.
>
> > In all other cases, why should we be treating NFS4ERR_DELAY any
> > differently from how we treat NFS3ERR_JUKEBOX in NFSv3?
> >
> > Note that if we do decide that changing the client is the right thing,
> > then I don't want the patch to add new fields to struct rpc_task. That's
> > the wrong layer for storing NFSv4 client specific data.
>
> This is something that I was concerned about as well, but I could not
> find another persistent way to do this. I am open to suggestions of
> which structures would be more acceptable.

We could change nfs4_async_handle_error() to take a struct
nfs4_exception, just like nfs4_handle_exception() does; at some point we
can use that to unify the two.
Just store the timeout somewhere in the nfs4_closedata.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-04-25 12:19:40

by David Wysochanski

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Wed, 2013-04-24 at 22:35 +0000, Myklebust, Trond wrote:
> On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote:
> > On 04/24/2013 04:28 PM, Myklebust, Trond wrote:
> > > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
> > >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> > >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
> > >>
> > >> Additionally this alleviates an interoperability problem with the AIX NFSv4
> > >> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> > >> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would
> > >> cause a linux client to hang for 15 seconds.
> > >
> > > Hi Dave,
> > >
> > > The AIX server is not being motivated by any requirements in the NFSv4
> > > spec here, so I fail to see the reason why the behaviour that you
> > > describe can justify changing the client. It is not at all obvious to me
> > > that we should be retrying aggressively when NFSv4 servers return
> > > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
> > > the exising 15 seconds?
> >
> > I agree with you that AIX is at fault, and that the preferable situation
> > for the linux client would be for AIX to not return NFS4ERR_DELAY in
> > this use case. I have attached a simple program that causes exacerbates
> > the problem on the AIX server. I have already had a conference call
> > with AIX NFS development about this issue, where I vehemently tried to
> > convince them to fix their server. Unfortunately as I don't have much
> > reputation in the NFS community, I was unable to convince them to do the
> > right thing. I would be more than happy to set up another call, if
> > someone higher up in the linux NFS hierarchy would be willing to
> > participate.
>
> I'd think that if they have customers that want to use Linux clients,
> then those customers are likely to have more influence. This is entirely
> a consequence of _their_ design decisions, quite frankly, since
> returning NFS4ERR_DELAY in the above situation is downright silly. The
> server designers _know_ that the RELEASE_LOCKOWNER will finish whatever
> it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do
> the exact same state manipulations anyway...
>
> > That being said, I think implementing an exponential backoff is an
> > improvement in the client regardless of what AIX is doing. If a server
> > needs only 2 seconds to process a request for which NFS4ERR_DELAY was
> > returned, this algorithm would get the client back and running after
> > only 2.1 seconds of elapsed time. Whereas the current dumb algorithm
> > would simply wait 15 seconds. This is the reason that I implemented
> > this change.
>
> Right, but my point above is that _in_general_ if we don't know why the
> server is returning NFS4ERR_DELAY, then how can we attach any retry
> numbers at all? HSM systems, for instance, have very different latencies
> than the above and were the reason for inventing NFS3ERR_JUKEBOX in the
> first place.
>

Agreed we can't know why the server is returning NFS4ERR_DELAY so it's
hard to pick a retry number. Can you explain the rationale for the
current 15 seconds delay? Was it just for simplicity or something else?

2013-04-25 13:19:14

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Thu, 2013-04-25 at 08:19 -0400, David Wysochanski wrote:
> On Wed, 2013-04-24 at 22:35 +0000, Myklebust, Trond wrote:
> > On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote:
> > > On 04/24/2013 04:28 PM, Myklebust, Trond wrote:
> > > > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
> > > >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> > > >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
> > > >>
> > > >> Additionally this alleviates an interoperability problem with the AIX NFSv4
> > > >> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> > > >> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would
> > > >> cause a linux client to hang for 15 seconds.
> > > >
> > > > Hi Dave,
> > > >
> > > > The AIX server is not being motivated by any requirements in the NFSv4
> > > > spec here, so I fail to see the reason why the behaviour that you
> > > > describe can justify changing the client. It is not at all obvious to me
> > > > that we should be retrying aggressively when NFSv4 servers return
> > > > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
> > > > the exising 15 seconds?
> > >
> > > I agree with you that AIX is at fault, and that the preferable situation
> > > for the linux client would be for AIX to not return NFS4ERR_DELAY in
> > > this use case. I have attached a simple program that causes exacerbates
> > > the problem on the AIX server. I have already had a conference call
> > > with AIX NFS development about this issue, where I vehemently tried to
> > > convince them to fix their server. Unfortunately as I don't have much
> > > reputation in the NFS community, I was unable to convince them to do the
> > > right thing. I would be more than happy to set up another call, if
> > > someone higher up in the linux NFS hierarchy would be willing to
> > > participate.
> >
> > I'd think that if they have customers that want to use Linux clients,
> > then those customers are likely to have more influence. This is entirely
> > a consequence of _their_ design decisions, quite frankly, since
> > returning NFS4ERR_DELAY in the above situation is downright silly. The
> > server designers _know_ that the RELEASE_LOCKOWNER will finish whatever
> > it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do
> > the exact same state manipulations anyway...
> >
> > > That being said, I think implementing an exponential backoff is an
> > > improvement in the client regardless of what AIX is doing. If a server
> > > needs only 2 seconds to process a request for which NFS4ERR_DELAY was
> > > returned, this algorithm would get the client back and running after
> > > only 2.1 seconds of elapsed time. Whereas the current dumb algorithm
> > > would simply wait 15 seconds. This is the reason that I implemented
> > > this change.
> >
> > Right, but my point above is that _in_general_ if we don't know why the
> > server is returning NFS4ERR_DELAY, then how can we attach any retry
> > numbers at all? HSM systems, for instance, have very different latencies
> > than the above and were the reason for inventing NFS3ERR_JUKEBOX in the
> > first place.
> >
>
> Agreed we can't know why the server is returning NFS4ERR_DELAY so it's
> hard to pick a retry number. Can you explain the rationale for the
> current 15 seconds delay? Was it just for simplicity or something else?
>

Our expectation for NFS4ERR_DELAY event that are not listed in
RFC3530/RFC5661 is that it should be rare, but is expected on average to
last significantly longer than an RPC round-trip between the server and
client.
The other constraint was that we needed a number which is shorter than
the lease period so that we don't have to keep sending RENEWs.

The 2 main cases we thought we'd have to deal with were:

- HSM systems fetching data from a tape backup or something similar
- Idmappers needing to refill their cache from LDAP/NIS/...

We did not expect servers to be using NFS4ERR_DELAY as a generic tool
for avoiding mutexes. That sounds like great a business opportunity for
the network switch vendors, but a poor one for everyone else...

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-04-25 13:29:12

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Thu, Apr 25, 2013 at 08:19:34AM -0400, David Wysochanski wrote:
> On Wed, 2013-04-24 at 22:35 +0000, Myklebust, Trond wrote:
> > On Wed, 2013-04-24 at 16:54 -0500, Dave Chiluk wrote:
> > > On 04/24/2013 04:28 PM, Myklebust, Trond wrote:
> > > > On Wed, 2013-04-24 at 15:55 -0500, Dave Chiluk wrote:
> > > >> Changing the retry to start at NFS4_POLL_RETRY_MIN and exponentially grow
> > > >> to NFS4_POLL_RETRY_MAX allow for faster handling of these error conditions.
> > > >>
> > > >> Additionally this alleviates an interoperability problem with the AIX NFSv4
> > > >> Server. The AIX server frequently (2 out of 3) returns NFS4ERR_DELAY, on a
> > > >> close when it happens in close proximity to a RELEASE_LOCKOWNER. This would
> > > >> cause a linux client to hang for 15 seconds.
> > > >
> > > > Hi Dave,
> > > >
> > > > The AIX server is not being motivated by any requirements in the NFSv4
> > > > spec here, so I fail to see the reason why the behaviour that you
> > > > describe can justify changing the client. It is not at all obvious to me
> > > > that we should be retrying aggressively when NFSv4 servers return
> > > > NFS4ERR_DELAY. What makes 1/10sec more correct in these situations than
> > > > the exising 15 seconds?
> > >
> > > I agree with you that AIX is at fault, and that the preferable situation
> > > for the linux client would be for AIX to not return NFS4ERR_DELAY in
> > > this use case. I have attached a simple program that causes exacerbates
> > > the problem on the AIX server. I have already had a conference call
> > > with AIX NFS development about this issue, where I vehemently tried to
> > > convince them to fix their server. Unfortunately as I don't have much
> > > reputation in the NFS community, I was unable to convince them to do the
> > > right thing. I would be more than happy to set up another call, if
> > > someone higher up in the linux NFS hierarchy would be willing to
> > > participate.
> >
> > I'd think that if they have customers that want to use Linux clients,
> > then those customers are likely to have more influence. This is entirely
> > a consequence of _their_ design decisions, quite frankly, since
> > returning NFS4ERR_DELAY in the above situation is downright silly. The
> > server designers _know_ that the RELEASE_LOCKOWNER will finish whatever
> > it is doing fairly quickly; it's not as if the CLOSE wouldn't have to do
> > the exact same state manipulations anyway...
> >
> > > That being said, I think implementing an exponential backoff is an
> > > improvement in the client regardless of what AIX is doing. If a server
> > > needs only 2 seconds to process a request for which NFS4ERR_DELAY was
> > > returned, this algorithm would get the client back and running after
> > > only 2.1 seconds of elapsed time. Whereas the current dumb algorithm
> > > would simply wait 15 seconds. This is the reason that I implemented
> > > this change.
> >
> > Right, but my point above is that _in_general_ if we don't know why the
> > server is returning NFS4ERR_DELAY, then how can we attach any retry
> > numbers at all? HSM systems, for instance, have very different latencies
> > than the above and were the reason for inventing NFS3ERR_JUKEBOX in the
> > first place.
> >
>
> Agreed we can't know why the server is returning NFS4ERR_DELAY so it's
> hard to pick a retry number. Can you explain the rationale for the
> current 15 seconds delay? Was it just for simplicity or something else?

As I understand it the original idea was that cold data really could
take multiple seconds or minutes to retrieve (because e.g. a tape
library might need to go load the right tape and rewind to the right
spot...). Is that sort of system really used much these days?

My position is that we simply have no idea what order of magnitude even
delay should be. And that in such a situation exponential backoff such
as implemented in the synchronous case seems the reasonable default as
it guarantees at worst doubling the delay while still bounding the
long-term average frequency of retries.

--b.

2013-04-25 13:31:03

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Thu, 2013-04-25 at 09:29 -0400, [email protected] wrote:

> My position is that we simply have no idea what order of magnitude even
> delay should be. And that in such a situation exponential backoff such
> as implemented in the synchronous case seems the reasonable default as
> it guarantees at worst doubling the delay while still bounding the
> long-term average frequency of retries.

So we start with a 15 second delay, and then go to 60 seconds?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-04-25 13:49:25

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> On Thu, 2013-04-25 at 09:29 -0400, [email protected] wrote:
>
> > My position is that we simply have no idea what order of magnitude even
> > delay should be. And that in such a situation exponential backoff such
> > as implemented in the synchronous case seems the reasonable default as
> > it guarantees at worst doubling the delay while still bounding the
> > long-term average frequency of retries.
>
> So we start with a 15 second delay, and then go to 60 seconds?

I agree that a server should normally be doing the wait on its own if
the wait would be on the order of an rpc round trip.

So I'd be inclined to start with a delay that was an order of magnitude
or two more than a round trip.

And I'd expect NFS isn't common on networks with 1-second latencies.

So the 1/10 second we're using in the synchronous case sounds closer to
the right ballpark to me.

--b.

2013-04-25 14:10:39

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Thu, 2013-04-25 at 09:49 -0400, [email protected] wrote:
> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> > On Thu, 2013-04-25 at 09:29 -0400, [email protected] wrote:
> >
> > > My position is that we simply have no idea what order of magnitude even
> > > delay should be. And that in such a situation exponential backoff such
> > > as implemented in the synchronous case seems the reasonable default as
> > > it guarantees at worst doubling the delay while still bounding the
> > > long-term average frequency of retries.
> >
> > So we start with a 15 second delay, and then go to 60 seconds?
>
> I agree that a server should normally be doing the wait on its own if
> the wait would be on the order of an rpc round trip.
>
> So I'd be inclined to start with a delay that was an order of magnitude
> or two more than a round trip.
>
> And I'd expect NFS isn't common on networks with 1-second latencies.
>
> So the 1/10 second we're using in the synchronous case sounds closer to
> the right ballpark to me.

OK, then. Now all I need is actual motivation for changing the existing
code other than handwaving arguments about "polling is better than flat
waits".
What actual use cases are impacting us now, other than the AIX design
decision to force CLOSE to retry at least once before succeeding?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-04-25 14:51:57

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Apr 25, 2013, at 9:49 AM, [email protected] wrote:

> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
>> On Thu, 2013-04-25 at 09:29 -0400, [email protected] wrote:
>>
>>> My position is that we simply have no idea what order of magnitude even
>>> delay should be. And that in such a situation exponential backoff such
>>> as implemented in the synchronous case seems the reasonable default as
>>> it guarantees at worst doubling the delay while still bounding the
>>> long-term average frequency of retries.
>>
>> So we start with a 15 second delay, and then go to 60 seconds?
>
> I agree that a server should normally be doing the wait on its own if
> the wait would be on the order of an rpc round trip.
>
> So I'd be inclined to start with a delay that was an order of magnitude
> or two more than a round trip.
>
> And I'd expect NFS isn't common on networks with 1-second latencies.
>
> So the 1/10 second we're using in the synchronous case sounds closer to
> the right ballpark to me.

The RPC layer already keeps RPC round trip statistics, so the client doesn't have to guess with a "one size fits all" number.

I'm all for keeping client recovery time short. But after following this argument, I think 10xRTT is crazy short. Aggressive retransmits can lead to data corruption, and RTT on a fast server is going to be on the order of a millisecond. And what about RDMA, where RTT is about 20usecs?

A better answer might be to start at one second then exponentially back off to the minimum of 0.25x the lease time and 0.25x the RPC retransmit time out.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2013-04-25 15:34:06

by Matt W. Benjamin

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for Ni

Hi,

Just to clarify, the IBM delay behavior is not legal?

Matt

----- "Trond Myklebust" <[email protected]> wrote:

>
> OK, then. Now all I need is actual motivation for changing the
> existing
> code other than handwaving arguments about "polling is better than
> flat
> waits".
> What actual use cases are impacting us now, other than the AIX design
> decision to force CLOSE to retry at least once before succeeding?
>

--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309

2013-04-25 15:42:54

by Myklebust, Trond

[permalink] [raw]

Subject: RE: [PATCH] NFSv4: Use exponential backoff delay for Ni

It's legal, but dumb...

> -----Original Message-----
> From: Matt W. Benjamin [mailto:[email protected]]
> Sent: Thursday, April 25, 2013 11:28 AM
> To: Myklebust, Trond
> Cc: David Wysochanski; Dave Chiluk; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for Ni
>
> Hi,
>
> Just to clarify, the IBM delay behavior is not legal?
>
> Matt
>
> ----- "Trond Myklebust" <[email protected]> wrote:
>
> >
> > OK, then. Now all I need is actual motivation for changing the
> > existing code other than handwaving arguments about "polling is better
> > than flat waits".
> > What actual use cases are impacting us now, other than the AIX design
> > decision to force CLOSE to retry at least once before succeeding?
> >
>
>
> --
> Matt Benjamin
> The Linux Box
> 206 South Fifth Ave. Suite 150
> Ann Arbor, MI 48104
>
> http://linuxbox.com
>
> tel. 734-761-4689
> fax. 734-769-8938
> cel. 734-216-5309
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2013-04-25 18:19:38

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
> On Thu, 2013-04-25 at 09:49 -0400, [email protected] wrote:
> > On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> > > On Thu, 2013-04-25 at 09:29 -0400, [email protected] wrote:
> > >
> > > > My position is that we simply have no idea what order of magnitude even
> > > > delay should be. And that in such a situation exponential backoff such
> > > > as implemented in the synchronous case seems the reasonable default as
> > > > it guarantees at worst doubling the delay while still bounding the
> > > > long-term average frequency of retries.
> > >
> > > So we start with a 15 second delay, and then go to 60 seconds?
> >
> > I agree that a server should normally be doing the wait on its own if
> > the wait would be on the order of an rpc round trip.
> >
> > So I'd be inclined to start with a delay that was an order of magnitude
> > or two more than a round trip.
> >
> > And I'd expect NFS isn't common on networks with 1-second latencies.
> >
> > So the 1/10 second we're using in the synchronous case sounds closer to
> > the right ballpark to me.
>
> OK, then. Now all I need is actual motivation for changing the existing
> code other than handwaving arguments about "polling is better than flat
> waits".
> What actual use cases are impacting us now, other than the AIX design
> decision to force CLOSE to retry at least once before succeeding?

Nah, I've got nothing, and I agree that the AIX problem is there bug.

Just for fun I looked at re-checked the Linux server cases. As far as I
can tell they are:

- delegations: returned immediately on detection of any
conflict. The current behavior in the sync case looks
reasonable to me.
- allocation failures: not really sure it's the best error, but
it seems to be all the protocol offers. We probably don't
care much what the client does in this case.
- some rare cases that would probably indicate bugs (e.g.,
attempting to destroy a client while other rpc's from that
client are running.) Again we don't care what the client does
here.
- the 4.1 slot-inuse case.

We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
ENOMEM) to delay. I thought I remembered one of those being used by
some HFS system, but can't actually find an example now. A quick grep
doesn't show anything interesting.

--b.

2013-04-25 18:40:27

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Apr 25, 2013, at 2:19 PM, "[email protected]" <[email protected]> wrote:

> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
>> On Thu, 2013-04-25 at 09:49 -0400, [email protected] wrote:
>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
>>>> On Thu, 2013-04-25 at 09:29 -0400, [email protected] wrote:
>>>>
>>>>> My position is that we simply have no idea what order of magnitude even
>>>>> delay should be. And that in such a situation exponential backoff such
>>>>> as implemented in the synchronous case seems the reasonable default as
>>>>> it guarantees at worst doubling the delay while still bounding the
>>>>> long-term average frequency of retries.
>>>>
>>>> So we start with a 15 second delay, and then go to 60 seconds?
>>>
>>> I agree that a server should normally be doing the wait on its own if
>>> the wait would be on the order of an rpc round trip.
>>>
>>> So I'd be inclined to start with a delay that was an order of magnitude
>>> or two more than a round trip.
>>>
>>> And I'd expect NFS isn't common on networks with 1-second latencies.
>>>
>>> So the 1/10 second we're using in the synchronous case sounds closer to
>>> the right ballpark to me.
>>
>> OK, then. Now all I need is actual motivation for changing the existing
>> code other than handwaving arguments about "polling is better than flat
>> waits".
>> What actual use cases are impacting us now, other than the AIX design
>> decision to force CLOSE to retry at least once before succeeding?
>
> Nah, I've got nothing, and I agree that the AIX problem is there bug.
>
> Just for fun I looked at re-checked the Linux server cases. As far as I
> can tell they are:
>
> - delegations: returned immediately on detection of any
> conflict. The current behavior in the sync case looks
> reasonable to me.
> - allocation failures: not really sure it's the best error, but
> it seems to be all the protocol offers. We probably don't
> care much what the client does in this case.
> - some rare cases that would probably indicate bugs (e.g.,
> attempting to destroy a client while other rpc's from that
> client are running.) Again we don't care what the client does
> here.
> - the 4.1 slot-inuse case.
>
> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
> ENOMEM) to delay. I thought I remembered one of those being used by
> some HFS system, but can't actually find an example now. A quick grep
> doesn't show anything interesting.

It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2013-04-25 18:46:31

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote:
>
> On Apr 25, 2013, at 2:19 PM, "[email protected]" <[email protected]> wrote:
>
> > On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
> >> On Thu, 2013-04-25 at 09:49 -0400, [email protected] wrote:
> >>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> >>>> On Thu, 2013-04-25 at 09:29 -0400, [email protected] wrote:
> >>>>
> >>>>> My position is that we simply have no idea what order of magnitude even
> >>>>> delay should be. And that in such a situation exponential backoff such
> >>>>> as implemented in the synchronous case seems the reasonable default as
> >>>>> it guarantees at worst doubling the delay while still bounding the
> >>>>> long-term average frequency of retries.
> >>>>
> >>>> So we start with a 15 second delay, and then go to 60 seconds?
> >>>
> >>> I agree that a server should normally be doing the wait on its own if
> >>> the wait would be on the order of an rpc round trip.
> >>>
> >>> So I'd be inclined to start with a delay that was an order of magnitude
> >>> or two more than a round trip.
> >>>
> >>> And I'd expect NFS isn't common on networks with 1-second latencies.
> >>>
> >>> So the 1/10 second we're using in the synchronous case sounds closer to
> >>> the right ballpark to me.
> >>
> >> OK, then. Now all I need is actual motivation for changing the existing
> >> code other than handwaving arguments about "polling is better than flat
> >> waits".
> >> What actual use cases are impacting us now, other than the AIX design
> >> decision to force CLOSE to retry at least once before succeeding?
> >
> > Nah, I've got nothing, and I agree that the AIX problem is there bug.
> >
> > Just for fun I looked at re-checked the Linux server cases. As far as I
> > can tell they are:
> >
> > - delegations: returned immediately on detection of any
> > conflict. The current behavior in the sync case looks
> > reasonable to me.
> > - allocation failures: not really sure it's the best error, but
> > it seems to be all the protocol offers. We probably don't
> > care much what the client does in this case.
> > - some rare cases that would probably indicate bugs (e.g.,
> > attempting to destroy a client while other rpc's from that
> > client are running.) Again we don't care what the client does
> > here.
> > - the 4.1 slot-inuse case.
> >
> > We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
> > ENOMEM) to delay. I thought I remembered one of those being used by
> > some HFS system, but can't actually find an example now. A quick grep
> > doesn't show anything interesting.
>
> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.

I thought they'd decided they'll be forced to find a different way to do
that?

(The issue being that it only works if you're using 4.1, and if the
session state itself isn't part of the state to be transferred.
Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY
is seqid-modifying.)

--b.

2013-04-25 18:51:38

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Apr 25, 2013, at 2:46 PM, "[email protected]" <[email protected]> wrote:

> On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote:
>>
>> On Apr 25, 2013, at 2:19 PM, "[email protected]" <[email protected]> wrote:
>>
>>> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
>>>> On Thu, 2013-04-25 at 09:49 -0400, [email protected] wrote:
>>>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
>>>>>> On Thu, 2013-04-25 at 09:29 -0400, [email protected] wrote:
>>>>>>
>>>>>>> My position is that we simply have no idea what order of magnitude even
>>>>>>> delay should be. And that in such a situation exponential backoff such
>>>>>>> as implemented in the synchronous case seems the reasonable default as
>>>>>>> it guarantees at worst doubling the delay while still bounding the
>>>>>>> long-term average frequency of retries.
>>>>>>
>>>>>> So we start with a 15 second delay, and then go to 60 seconds?
>>>>>
>>>>> I agree that a server should normally be doing the wait on its own if
>>>>> the wait would be on the order of an rpc round trip.
>>>>>
>>>>> So I'd be inclined to start with a delay that was an order of magnitude
>>>>> or two more than a round trip.
>>>>>
>>>>> And I'd expect NFS isn't common on networks with 1-second latencies.
>>>>>
>>>>> So the 1/10 second we're using in the synchronous case sounds closer to
>>>>> the right ballpark to me.
>>>>
>>>> OK, then. Now all I need is actual motivation for changing the existing
>>>> code other than handwaving arguments about "polling is better than flat
>>>> waits".
>>>> What actual use cases are impacting us now, other than the AIX design
>>>> decision to force CLOSE to retry at least once before succeeding?
>>>
>>> Nah, I've got nothing, and I agree that the AIX problem is there bug.
>>>
>>> Just for fun I looked at re-checked the Linux server cases. As far as I
>>> can tell they are:
>>>
>>> - delegations: returned immediately on detection of any
>>> conflict. The current behavior in the sync case looks
>>> reasonable to me.
>>> - allocation failures: not really sure it's the best error, but
>>> it seems to be all the protocol offers. We probably don't
>>> care much what the client does in this case.
>>> - some rare cases that would probably indicate bugs (e.g.,
>>> attempting to destroy a client while other rpc's from that
>>> client are running.) Again we don't care what the client does
>>> here.
>>> - the 4.1 slot-inuse case.
>>>
>>> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
>>> ENOMEM) to delay. I thought I remembered one of those being used by
>>> some HFS system, but can't actually find an example now. A quick grep
>>> doesn't show anything interesting.
>>
>> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.
>
> I thought they'd decided they'll be forced to find a different way to do
> that?
>
> (The issue being that it only works if you're using 4.1, and if the
> session state itself isn't part of the state to be transferred.
> Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY
> is seqid-modifying.)

The answer is not to return NFS4ERR_DELAY on seqid-modifying operations.

The source server can return NFS4ERR_DELAY to the client's migration recovery operations (the GETATTR(fs_locations) request) for example.

Or, the server could return it on the initial PUTFH operation in a compound containing seqid-modifying operations.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2013-04-25 18:52:56

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Apr 25, 2013, at 2:46 PM, "[email protected]" <[email protected]>
wrote:

> On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote:
>>
>> On Apr 25, 2013, at 2:19 PM, "[email protected]" <[email protected]> wrote:
>>
>>> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
>>>> On Thu, 2013-04-25 at 09:49 -0400, [email protected] wrote:
>>>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
>>>>>> On Thu, 2013-04-25 at 09:29 -0400, [email protected] wrote:
>>>>>>
>>>>>>> My position is that we simply have no idea what order of magnitude even
>>>>>>> delay should be. And that in such a situation exponential backoff such
>>>>>>> as implemented in the synchronous case seems the reasonable default as
>>>>>>> it guarantees at worst doubling the delay while still bounding the
>>>>>>> long-term average frequency of retries.
>>>>>>
>>>>>> So we start with a 15 second delay, and then go to 60 seconds?
>>>>>
>>>>> I agree that a server should normally be doing the wait on its own if
>>>>> the wait would be on the order of an rpc round trip.
>>>>>
>>>>> So I'd be inclined to start with a delay that was an order of magnitude
>>>>> or two more than a round trip.
>>>>>
>>>>> And I'd expect NFS isn't common on networks with 1-second latencies.
>>>>>
>>>>> So the 1/10 second we're using in the synchronous case sounds closer to
>>>>> the right ballpark to me.
>>>>
>>>> OK, then. Now all I need is actual motivation for changing the existing
>>>> code other than handwaving arguments about "polling is better than flat
>>>> waits".
>>>> What actual use cases are impacting us now, other than the AIX design
>>>> decision to force CLOSE to retry at least once before succeeding?
>>>
>>> Nah, I've got nothing, and I agree that the AIX problem is there bug.
>>>
>>> Just for fun I looked at re-checked the Linux server cases. As far as I
>>> can tell they are:
>>>
>>> - delegations: returned immediately on detection of any
>>> conflict. The current behavior in the sync case looks
>>> reasonable to me.
>>> - allocation failures: not really sure it's the best error, but
>>> it seems to be all the protocol offers. We probably don't
>>> care much what the client does in this case.
>>> - some rare cases that would probably indicate bugs (e.g.,
>>> attempting to destroy a client while other rpc's from that
>>> client are running.) Again we don't care what the client does
>>> here.
>>> - the 4.1 slot-inuse case.
>>>
>>> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
>>> ENOMEM) to delay. I thought I remembered one of those being used by
>>> some HFS system, but can't actually find an example now. A quick grep
>>> doesn't show anything interesting.
>>
>> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.
>
> I thought they'd decided they'll be forced to find a different way to do
> that?
>
> (The issue being that it only works if you're using 4.1, and if the
> session state itself isn't part of the state to be transferred.
> Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY
> is seqid-modifying.)

Either way, migration is not a performance-critical path that needs 1second or less response times on those NFS4ERR_DELAY replies.

Trond

2013-04-25 18:57:12

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Thu, Apr 25, 2013 at 02:51:20PM -0400, Chuck Lever wrote:
>
> On Apr 25, 2013, at 2:46 PM, "[email protected]" <[email protected]> wrote:
>
> > On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote:
> >>
> >> On Apr 25, 2013, at 2:19 PM, "[email protected]" <[email protected]> wrote:
> >>
> >>> On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
> >>>> On Thu, 2013-04-25 at 09:49 -0400, [email protected] wrote:
> >>>>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> >>>>>> On Thu, 2013-04-25 at 09:29 -0400, [email protected] wrote:
> >>>>>>
> >>>>>>> My position is that we simply have no idea what order of magnitude even
> >>>>>>> delay should be. And that in such a situation exponential backoff such
> >>>>>>> as implemented in the synchronous case seems the reasonable default as
> >>>>>>> it guarantees at worst doubling the delay while still bounding the
> >>>>>>> long-term average frequency of retries.
> >>>>>>
> >>>>>> So we start with a 15 second delay, and then go to 60 seconds?
> >>>>>
> >>>>> I agree that a server should normally be doing the wait on its own if
> >>>>> the wait would be on the order of an rpc round trip.
> >>>>>
> >>>>> So I'd be inclined to start with a delay that was an order of magnitude
> >>>>> or two more than a round trip.
> >>>>>
> >>>>> And I'd expect NFS isn't common on networks with 1-second latencies.
> >>>>>
> >>>>> So the 1/10 second we're using in the synchronous case sounds closer to
> >>>>> the right ballpark to me.
> >>>>
> >>>> OK, then. Now all I need is actual motivation for changing the existing
> >>>> code other than handwaving arguments about "polling is better than flat
> >>>> waits".
> >>>> What actual use cases are impacting us now, other than the AIX design
> >>>> decision to force CLOSE to retry at least once before succeeding?
> >>>
> >>> Nah, I've got nothing, and I agree that the AIX problem is there bug.
> >>>
> >>> Just for fun I looked at re-checked the Linux server cases. As far as I
> >>> can tell they are:
> >>>
> >>> - delegations: returned immediately on detection of any
> >>> conflict. The current behavior in the sync case looks
> >>> reasonable to me.
> >>> - allocation failures: not really sure it's the best error, but
> >>> it seems to be all the protocol offers. We probably don't
> >>> care much what the client does in this case.
> >>> - some rare cases that would probably indicate bugs (e.g.,
> >>> attempting to destroy a client while other rpc's from that
> >>> client are running.) Again we don't care what the client does
> >>> here.
> >>> - the 4.1 slot-inuse case.
> >>>
> >>> We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
> >>> ENOMEM) to delay. I thought I remembered one of those being used by
> >>> some HFS system, but can't actually find an example now. A quick grep
> >>> doesn't show anything interesting.
> >>
> >> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.
> >
> > I thought they'd decided they'll be forced to find a different way to do
> > that?
> >
> > (The issue being that it only works if you're using 4.1, and if the
> > session state itself isn't part of the state to be transferred.
> > Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY
> > is seqid-modifying.)
>
> The answer is not to return NFS4ERR_DELAY on seqid-modifying operations.
>
> The source server can return NFS4ERR_DELAY to the client's migration recovery operations (the GETATTR(fs_locations) request) for example.
>
> Or, the server could return it on the initial PUTFH operation in a compound containing seqid-modifying operations.

Oh, right, I'd forgotten that approach....

--b.