Return-Path: Received: from mail-vw0-f42.google.com ([209.85.212.42]:42552 "EHLO mail-vw0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752645Ab1IPP24 convert rfc822-to-8bit (ORCPT ); Fri, 16 Sep 2011 11:28:56 -0400 Received: by vwl1 with SMTP id 1so7220596vwl.1 for ; Fri, 16 Sep 2011 08:28:54 -0700 (PDT) In-Reply-To: <1316185444.5570.3.camel@lade.trondhjem.org> References: <1315802900-1548-1-git-send-email-bergwolf@gmail.com> <1316108327.5271.4.camel@lade.trondhjem.org.localdomain> <1316108897.5271.6.camel@lade.trondhjem.org.localdomain> <1316185444.5570.3.camel@lade.trondhjem.org> From: Peng Tao Date: Fri, 16 Sep 2011 23:28:34 +0800 Message-ID: Subject: Re: [PATCH] nfs4: serialize layoutcommit To: Trond Myklebust Cc: tao.peng@emc.com, bhalevy@tonian.com, gusev.vitaliy@nexenta.com, linux-nfs@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Fri, Sep 16, 2011 at 11:04 PM, Trond Myklebust wrote: > On Fri, 2011-09-16 at 07:58 -0400, tao.peng@emc.com wrote: >> Hi, Trond, >> >> > -----Original Message----- >> > From: Trond Myklebust [mailto:Trond.Myklebust@netapp.com] >> > Sent: Friday, September 16, 2011 1:48 AM >> > To: Peng Tao >> > Cc: bhalevy@tonian.com; Vitaliy Gusev; linux-nfs@vger.kernel.org; Peng, Tao >> > Subject: Re: [PATCH] nfs4: serialize layoutcommit >> > >> > On Thu, 2011-09-15 at 13:38 -0400, Trond Myklebust wrote: >> > > On Sun, 2011-09-11 at 21:48 -0700, Peng Tao wrote: >> > > > Current pnfs_layoutcommit_inode can not handle parallel layoutcommit. >> > > > As Trond suggested, there is no need for client to optimize for >> > > > parallel layoutcommit. The patch add NFS_INO_LAYOUTCOMMITTING flag to >> > > > mark inflight layoutcommit and serialize lalyoutcommit with it. >> > > > It also fixes the pls_lc_list corruption that Vitaliy found. >> > > > >> > > > Reported-by: Vitaliy Gusev >> > > > Signed-off-by: Peng Tao >> > > > --- >> > > >  fs/nfs/nfs4proc.c      |    6 ++++++ >> > > >  fs/nfs/pnfs.c          |    9 +++++++++ >> > > >  include/linux/nfs_fs.h |    1 + >> > > >  3 files changed, 16 insertions(+), 0 deletions(-) >> > > > >> > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >> > > > index 4700fae..a7ce210 100644 >> > > > --- a/fs/nfs/nfs4proc.c >> > > > +++ b/fs/nfs/nfs4proc.c >> > > > @@ -5970,6 +5970,7 @@ static void nfs4_layoutcommit_release(void *calldata) >> > > >  { >> > > >         struct nfs4_layoutcommit_data *data = calldata; >> > > >         struct pnfs_layout_segment *lseg, *tmp; >> > > > +       unsigned long *bitlock = &NFS_I(data->args.inode)->flags; >> > > > >> > > >         pnfs_cleanup_layoutcommit(data); >> > > >         /* Matched by references in pnfs_set_layoutcommit */ >> > > > @@ -5979,6 +5980,11 @@ static void nfs4_layoutcommit_release(void >> > *calldata) >> > > >                                        &lseg->pls_flags)) >> > > >                         put_lseg(lseg); >> > > >         } >> > > > + >> > > > +       clear_bit_unlock(NFS_INO_LAYOUTCOMMITTING, bitlock); >> > > > +       smp_mb__after_clear_bit(); >> > > > +       wake_up_bit(bitlock, NFS_INO_LAYOUTCOMMITTING); >> > > > + >> > > >         put_rpccred(data->cred); >> > > >         kfree(data); >> > > >  } >> > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c >> > > > index b483bbc..fb71def 100644 >> > > > --- a/fs/nfs/pnfs.c >> > > > +++ b/fs/nfs/pnfs.c >> > > > @@ -1451,10 +1451,19 @@ pnfs_layoutcommit_inode(struct inode *inode, >> > bool sync) >> > > >                 goto out; >> > > >         } >> > > > >> > > > +       if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags) || >> > > > +           (status = wait_on_bit_lock(&nfsi->flags, >> > NFS_INO_LAYOUTCOMMITTING, >> > > > +                               nfs_wait_bit_killable, TASK_KILLABLE))) { >> > > >> > > You want the sleeping behaviour above to be subject to the 'sync' flag >> > > (in the same way we do for regular commit). If !sync, then try to grab >> > > the bit lock anyway, and exit on failure. >> I'm a little confused about the "!sync" here. If we return failure, what do we expect callers to do? >> Looking at the calling stack, I don't see any users retrying on -EAGAIN. Does the "!sync" necessarily mean non-blocking? >> >> > >> > By the way. Shouldn't pnfs_layoutcommit_inode() also be marking the >> > inode as dirty on failure? >> Yeah, right, we should mark inode as dirty on failure. > > The !sync means that we're being called from a context such as kswapd > where we really don't want to sleep or block the thread. > > In the case where this happens, and the NFS_INO_LAYOUTCOMMITTING bit > can't be set, we should exit and mark the inode as dirty so that we can > retry at a later time. Please see the 'commit' code, which has the same > properties... I see. Thanks for the explanation. I will update the patch. Best, Tao