Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail-vc0-f178.google.com ([209.85.220.178]:48599 "EHLO mail-vc0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754240AbaIVUDi (ORCPT ); Mon, 22 Sep 2014 16:03:38 -0400 Received: by mail-vc0-f178.google.com with SMTP id lf12so2268407vcb.9 for ; Mon, 22 Sep 2014 13:03:37 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20140922182923.GA18904@infradead.org> References: <20140922182923.GA18904@infradead.org> Date: Mon, 22 Sep 2014 16:03:37 -0400 Message-ID: Subject: Re: [PATCH, RFC] nfsd: fix nfsd4_cb_recall_done error handling From: Trond Myklebust To: Christoph Hellwig Cc: Linux NFS Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Mon, Sep 22, 2014 at 2:29 PM, Christoph Hellwig wrote: > The error handling for CB_RECALL seems fairly broken to me. > > What looks good: > > - for EBADHANDLE and NFS4ERR_BAD_STATEID retry until dl_retries > hits zero, then mark the connection down and set cb_done > > What looks wrong: > > - for everything else we first mark the connection down, then > retry until dl_retries hits zero, then mark the connection down > again and set cb_done. > > From all I can see what we want is: > > - keep the behavior for EBADHANDLE and NFS4ERR_BAD_STATEID, > otherwise jump straight to making the connection down > and setting cb_done > > But maybe I'm missing something? > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > index 17d5441..ed25c58 100644 > --- a/fs/nfsd/nfs4callback.c > +++ b/fs/nfsd/nfs4callback.c > @@ -971,24 +971,21 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata) > return; > switch (task->tk_status) { > case 0: > - cb->cb_done = true; > - return; > + break; > case -EBADHANDLE: > case -NFS4ERR_BAD_STATEID: > /* Race: client probably got cb_recall > * before open reply granting delegation */ > - break; > + if (dp->dl_retries--) { > + rpc_delay(task, 2*HZ); > + task->tk_status = 0; > + rpc_restart_call_prepare(task); > + return; > + } > default: > /* Network partition? */ > nfsd4_mark_cb_down(clp, task->tk_status); > } > - if (dp->dl_retries--) { > - rpc_delay(task, 2*HZ); > - task->tk_status = 0; > - rpc_restart_call_prepare(task); > - return; > - } > - nfsd4_mark_cb_down(clp, task->tk_status); > cb->cb_done = true; > } > > We're also missing a handler for NFS4ERR_DELAY, which is listed as a legal response to CB_RECALL in both RFC5661 and RFC3530bis. As far as I can tell from the above, knfsd will currently take that to be a sign it should mark the callback path as being down... -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com