From: Jeff Layton Subject: Re: [PATCH 4/4] NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight Date: Fri, 14 Mar 2008 17:39:38 -0400 Message-ID: <20080314173938.48c4f7cb@tleilax.poochiereds.net> References: <1202315653-11840-1-git-send-email-jlayton@redhat.com> <1202315653-11840-2-git-send-email-jlayton@redhat.com> <1202315653-11840-3-git-send-email-jlayton@redhat.com> <1202315653-11840-4-git-send-email-jlayton@redhat.com> <1202315653-11840-5-git-send-email-jlayton@redhat.com> <170fa0d20803141255k41e8e2a3t6a53f4f812a15d7b@mail.gmail.com> <1205529061.27906.86.camel@heimdal.trondhjem.org> <170fa0d20803141421s5c784d16ie1db8e1d57e4b428@mail.gmail.com> <1205530164.27906.100.camel@heimdal.trondhjem.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Cc: Mike Snitzer , bfields@fieldses.org, linux-nfs@vger.kernel.org To: Trond Myklebust Return-path: Received: from mx1.redhat.com ([66.187.233.31]:44992 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755531AbYCNVjx (ORCPT ); Fri, 14 Mar 2008 17:39:53 -0400 In-Reply-To: <1205530164.27906.100.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Fri, 14 Mar 2008 17:29:24 -0400 Trond Myklebust wrote: > > On Fri, 2008-03-14 at 17:21 -0400, Mike Snitzer wrote: > > On 3/14/08, Trond Myklebust wrote: > > > > > > On Fri, 2008-03-14 at 15:55 -0400, Mike Snitzer wrote: > > > > On 2/6/08, Jeff Layton wrote: > > > > > It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback > > > > > is in flight. If this happens we don't want lockd to insert the block > > > > > back into the nlm_blocked list. > > > > > > > > > > This helps that situation, but there's still a possible race. Fixing > > > > > that will mean adding real locking for nlm_blocked. > > > > > > > > > > Signed-off-by: Jeff Layton > > > > > --- > > > > > fs/lockd/svclock.c | 11 +++++++++++ > > > > > 1 files changed, 11 insertions(+), 0 deletions(-) > > > > > > > > > > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c > > > > > index 82db7b3..fe9bdb4 100644 > > > > > --- a/fs/lockd/svclock.c > > > > > +++ b/fs/lockd/svclock.c > > > > > @@ -795,6 +795,17 @@ static void nlmsvc_grant_callback(struct rpc_task *task, void *data) > > > > > > > > > > dprintk("lockd: GRANT_MSG RPC callback\n"); > > > > > > > > > > + /* if the block is not on a list at this point then it has > > > > > + * been invalidated. Don't try to requeue it. > > > > > + * > > > > > + * FIXME: it's possible that the block is removed from the list > > > > > + * after this check but before the nlmsvc_insert_block. In that > > > > > + * case it will be added back. Perhaps we need better locking > > > > > + * for nlm_blocked? > > > > > + */ > > > > > + if (list_empty(&block->b_list)) > > > > > + return; > > > > > + > > > > > /* Technically, we should down the file semaphore here. Since we > > > > > * move the block towards the head of the queue only, no harm > > > > > * can be done, though. */ > > > > > > > > > > > > > Hi Jeff, > > > > > > > > Would the following patch take care of this race? Two questions I have: > > > > 1) Does my patch address your FIXME? I believe it should. > > > > 2) Can I get rid of the BKL in nlmsvc_grant_deferred() given this > > > > nlm_blocked serialization? It is unclear to me if > > > > nlmsvc_grant_deferred() is taking the BKL purely to protect > > > > nlm_blocked. > > > > > > > > > Ugh... NACK! > > > > > > We already have the file->f_mutex, so there should be no need to add a > > > global semaphore on top of that. > > > > Um, nlm_blocked is global. file->f_mutex is local. What am I missing? > > The _race_ occurs when the block is temporarily removed and replaced on > the list. > The race I had in mind was: rpciod: nlmsvc_grant_callback checks to see if the block is on the list lockd: lockd() removes the block from the list after getting a SIGKILL rpciod: nlmsvc_grant_callback requeues the block with a new timeout ...though now that I've looked, it seems like the BKL should prevent that particular race. > This should, AFAIK, only occur while holding the file->f_mutex. > Similarly, the GRANTED code should be holding the same lock when looking > up the block. > It's not though, is it? I don't think nlmsvc_grant_callback is run under the file->f_mutex (unless I'm missing something). -- Jeff Layton