2014-09-22 18:29:23

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH, RFC] nfsd: fix nfsd4_cb_recall_done error handling

The error handling for CB_RECALL seems fairly broken to me.

What looks good:

- for EBADHANDLE and NFS4ERR_BAD_STATEID retry until dl_retries
hits zero, then mark the connection down and set cb_done

What looks wrong:

- for everything else we first mark the connection down, then
retry until dl_retries hits zero, then mark the connection down
again and set cb_done.

>From all I can see what we want is:

- keep the behavior for EBADHANDLE and NFS4ERR_BAD_STATEID,
otherwise jump straight to making the connection down
and setting cb_done

But maybe I'm missing something?


diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 17d5441..ed25c58 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -971,24 +971,21 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
return;
switch (task->tk_status) {
case 0:
- cb->cb_done = true;
- return;
+ break;
case -EBADHANDLE:
case -NFS4ERR_BAD_STATEID:
/* Race: client probably got cb_recall
* before open reply granting delegation */
- break;
+ if (dp->dl_retries--) {
+ rpc_delay(task, 2*HZ);
+ task->tk_status = 0;
+ rpc_restart_call_prepare(task);
+ return;
+ }
default:
/* Network partition? */
nfsd4_mark_cb_down(clp, task->tk_status);
}
- if (dp->dl_retries--) {
- rpc_delay(task, 2*HZ);
- task->tk_status = 0;
- rpc_restart_call_prepare(task);
- return;
- }
- nfsd4_mark_cb_down(clp, task->tk_status);
cb->cb_done = true;
}

--
1.9.1



2014-09-22 20:25:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH, RFC] nfsd: fix nfsd4_cb_recall_done error handling

On Mon, Sep 22, 2014 at 11:29:23AM -0700, Christoph Hellwig wrote:
> The error handling for CB_RECALL seems fairly broken to me.
>
> What looks good:
>
> - for EBADHANDLE and NFS4ERR_BAD_STATEID retry until dl_retries
> hits zero, then mark the connection down and set cb_done
>
> What looks wrong:
>
> - for everything else we first mark the connection down, then
> retry until dl_retries hits zero, then mark the connection down
> again and set cb_done.
>
> >From all I can see what we want is:
>
> - keep the behavior for EBADHANDLE and NFS4ERR_BAD_STATEID,
> otherwise jump straight to making the connection down
> and setting cb_done
>
> But maybe I'm missing something?

I can't think of anything; let me know when you want something applied.

--b.

>
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 17d5441..ed25c58 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -971,24 +971,21 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
> return;
> switch (task->tk_status) {
> case 0:
> - cb->cb_done = true;
> - return;
> + break;
> case -EBADHANDLE:
> case -NFS4ERR_BAD_STATEID:
> /* Race: client probably got cb_recall
> * before open reply granting delegation */
> - break;
> + if (dp->dl_retries--) {
> + rpc_delay(task, 2*HZ);
> + task->tk_status = 0;
> + rpc_restart_call_prepare(task);
> + return;
> + }
> default:
> /* Network partition? */
> nfsd4_mark_cb_down(clp, task->tk_status);
> }
> - if (dp->dl_retries--) {
> - rpc_delay(task, 2*HZ);
> - task->tk_status = 0;
> - rpc_restart_call_prepare(task);
> - return;
> - }
> - nfsd4_mark_cb_down(clp, task->tk_status);
> cb->cb_done = true;
> }
>
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-09-22 20:06:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH, RFC] nfsd: fix nfsd4_cb_recall_done error handling

On Mon, Sep 22, 2014 at 04:03:37PM -0400, Trond Myklebust wrote:
> We're also missing a handler for NFS4ERR_DELAY, which is listed as a
> legal response to CB_RECALL in both RFC5661 and RFC3530bis. As far as
> I can tell from the above, knfsd will currently take that to be a sign
> it should mark the callback path as being down...

Yes. I've got a fix of that further down in my queue with the pnfs
patches, just wanted to set this bit out first.

I plan to handle NFS4ERR_DELAY in the generic callback layer instead of
burderning it onto the individual callback implementations.

2014-09-22 20:03:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH, RFC] nfsd: fix nfsd4_cb_recall_done error handling

On Mon, Sep 22, 2014 at 2:29 PM, Christoph Hellwig <[email protected]> wrote:
> The error handling for CB_RECALL seems fairly broken to me.
>
> What looks good:
>
> - for EBADHANDLE and NFS4ERR_BAD_STATEID retry until dl_retries
> hits zero, then mark the connection down and set cb_done
>
> What looks wrong:
>
> - for everything else we first mark the connection down, then
> retry until dl_retries hits zero, then mark the connection down
> again and set cb_done.
>
> From all I can see what we want is:
>
> - keep the behavior for EBADHANDLE and NFS4ERR_BAD_STATEID,
> otherwise jump straight to making the connection down
> and setting cb_done
>
> But maybe I'm missing something?
>
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 17d5441..ed25c58 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -971,24 +971,21 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
> return;
> switch (task->tk_status) {
> case 0:
> - cb->cb_done = true;
> - return;
> + break;
> case -EBADHANDLE:
> case -NFS4ERR_BAD_STATEID:
> /* Race: client probably got cb_recall
> * before open reply granting delegation */
> - break;
> + if (dp->dl_retries--) {
> + rpc_delay(task, 2*HZ);
> + task->tk_status = 0;
> + rpc_restart_call_prepare(task);
> + return;
> + }
> default:
> /* Network partition? */
> nfsd4_mark_cb_down(clp, task->tk_status);
> }
> - if (dp->dl_retries--) {
> - rpc_delay(task, 2*HZ);
> - task->tk_status = 0;
> - rpc_restart_call_prepare(task);
> - return;
> - }
> - nfsd4_mark_cb_down(clp, task->tk_status);
> cb->cb_done = true;
> }
>
>

We're also missing a handler for NFS4ERR_DELAY, which is listed as a
legal response to CB_RECALL in both RFC5661 and RFC3530bis. As far as
I can tell from the above, knfsd will currently take that to be a sign
it should mark the callback path as being down...

--
Trond Myklebust

Linux NFS client maintainer, PrimaryData

[email protected]