MIME-Version: 1.0
In-Reply-To: <20140922182923.GA18904@infradead.org>
References: <20140922182923.GA18904@infradead.org>
Date: Mon, 22 Sep 2014 16:03:37 -0400
Message-ID: <CAHQdGtSGh0vTgXw_eib3EWwGY71bSmDZDEU3Z9FW_bV_FSG7zQ@mail.gmail.com>
Subject: Re: [PATCH, RFC] nfsd: fix nfsd4_cb_recall_done error handling
From: Trond Myklebust <trond.myklebust@primarydata.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-nfs-owner@vger.kernel.org

On Mon, Sep 22, 2014 at 2:29 PM, Christoph Hellwig <hch@infradead.org> wrote:
> The error handling for CB_RECALL seems fairly broken to me.
>
> What looks good:
>
>  - for EBADHANDLE and NFS4ERR_BAD_STATEID retry until dl_retries
>    hits zero, then mark the connection down and set cb_done
>
> What looks wrong:
>
>  - for everything else we first mark the connection down, then
>    retry until dl_retries hits zero, then mark the connection down
>    again  and set cb_done.
>
> From all I can see what we want is:
>
>  - keep the behavior for EBADHANDLE and NFS4ERR_BAD_STATEID,
>    otherwise jump straight to making the connection down
>    and setting cb_done
>
> But maybe I'm missing something?
>
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 17d5441..ed25c58 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -971,24 +971,21 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
>                 return;
>         switch (task->tk_status) {
>         case 0:
> -               cb->cb_done = true;
> -               return;
> +               break;
>         case -EBADHANDLE:
>         case -NFS4ERR_BAD_STATEID:
>                 /* Race: client probably got cb_recall
>                  * before open reply granting delegation */
> -               break;
> +               if (dp->dl_retries--) {
> +                       rpc_delay(task, 2*HZ);
> +                       task->tk_status = 0;
> +                       rpc_restart_call_prepare(task);
> +                       return;
> +               }
>         default:
>                 /* Network partition? */
>                 nfsd4_mark_cb_down(clp, task->tk_status);
>         }
> -       if (dp->dl_retries--) {
> -               rpc_delay(task, 2*HZ);
> -               task->tk_status = 0;
> -               rpc_restart_call_prepare(task);
> -               return;
> -       }
> -       nfsd4_mark_cb_down(clp, task->tk_status);
>         cb->cb_done = true;
>  }
>
>

We're also missing a handler for NFS4ERR_DELAY, which is listed as a
legal response to CB_RECALL in both RFC5661 and RFC3530bis. As far as
I can tell from the above, knfsd will currently take that to be a sign
it should mark the callback path as being down...

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com