Return-Path: Received: from mail-vw0-f46.google.com ([209.85.212.46]:64842 "EHLO mail-vw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757547Ab1ILOtL (ORCPT ); Mon, 12 Sep 2011 10:49:11 -0400 Received: by vws1 with SMTP id 1so53614vws.19 for ; Mon, 12 Sep 2011 07:49:10 -0700 (PDT) In-Reply-To: <1315592430.17611.15.camel@lade.trondhjem.org> References: <1314512558-16912-1-git-send-email-gusev.vitaliy@nexenta.com> <1315337382.16274.7.camel@lade.trondhjem.org> <4E669B21.30006@nexenta.com> <1315348373.19556.22.camel@lade.trondhjem.org> <2E1EB2CF9ED1CB4AA966F0EB76EAB4430B0ED3DF@SACMVEXC2-PRD.hq.netapp.com> <2E1EB2CF9ED1CB4AA966F0EB76EAB4430B0ED4C8@SACMVEXC2-PRD.hq.netapp.com> <1315592430.17611.15.camel@lade.trondhjem.org> From: Peng Tao Date: Mon, 12 Sep 2011 22:48:50 +0800 Message-ID: Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release To: Trond Myklebust Cc: tao.peng@emc.com, gusev.vitaliy@nexenta.com, gusev.vitaliy@gmail.com, linux-nfs@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 Hi, Trond, On Sat, Sep 10, 2011 at 2:20 AM, Trond Myklebust wrote: > On Thu, 2011-09-08 at 23:11 -0400, tao.peng@emc.com wrote: >> HI, Trond, >> >> > -----Original Message----- >> > From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com] >> > Sent: Friday, September 09, 2011 1:05 AM >> > To: Peng Tao >> > Cc: Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; >> > linux-nfs@vger.kernel.org >> > Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release >> > >> > > -----Original Message----- >> > > From: Peng Tao [mailto:bergwolf@gmail.com] >> > > Sent: Thursday, September 08, 2011 11:00 AM >> > > To: Myklebust, Trond >> > > Cc: tao.peng@emc.com; gusev.vitaliy@nexenta.com; >> > > gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org >> > > Subject: Re: [PATCH] nfs: fix inifinite loop at >> > > nfs4_layoutcommit_release >> > > >> > > On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond >> > > wrote: >> > > >> -----Original Message----- >> > > >> > > > >> > > > Yes, but as far as I can see, even in the blocks case there can be >> > > multiple extents per layout segment. What if I write to one >> > > uninitialised extent, layoutcommit, then write to another uninitialized >> > > extent in the same layout segment and layoutcommit? In my reading of >> > > the code, there is a chance that the second layoutcommit will fail to >> > > pick up the layout segment, and so will fail to notify the MDS that the >> > > second extent now contains data. >> > > >> > > blocklayout does not decide what to layoutcommit according to the lseg >> > > list. Instead, it keeps track of each extent's state at the >> > > granularity of blocksize, and encode whatever needs layoutcommitted in >> > > the layoutcommit call. So in your above case, as long as the second >> > > layoutcommit is issued, blocklayout will encode the newly written >> > > extent in the second layoutcommit call, even if the lseg is not >> > > attached to the second layoutcommit. >> > > >> > > But that leads to another two question: >> > > 1. How do we protect against layoutrecall if lseg is not linked to >> > > layoutcommit? For this one, can we just reject layoutrecall if there >> > > is inflight layoutcommit? It will be less parallel but can guarantee >> > > current implementation correctness. >> > > 2. blocklayout ONLY: bl_committing may be overloaded by several >> > > layoutcommit calls and we don't have information in >> > > cleanup_layoutcommit() on how many entry should be removed from >> > > bl_committing. Maybe we can add a (void*) to struct >> > > nfs4_layoutcommit_data, so that LD can pass some private information >> > > between encode_layoutcommit() and cleanup_layoutcommit()? >> > >> > 3. What is the purpose of pinning the layout segment at all if neither blocks, nor >> > objects nor files cares? >> I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected. >> >> > IOW: if both objects and blocks track the information that they need for >> > layoutcommit outside the layout segments, then why do we need the >> > NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list at all? >> If we remove NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list, we must find other method to protect agains freeing lseg while layoutcommit is needed or is going on. e.g., check for NFS_INO_LAYOUTCOMMIT bit and inflight layoutcommit in initiate_file_draining(). > > The easiest solution is to ensure we have only _one_ LAYOUTCOMMIT on the > wire at a time. Otherwise, you are looking at a many-to-many mapping > between layoutcommits and lsegs. > > We should not expect to need multiple layoutcommits on the wire if pNFS > is working smoothly (i.e. no layout recalls), so optimising for that > case is wrong. > I'd also think that we want to order layoutcommit and ordinary commits > (for those NFS files servers that require both) so that we don't end up > writing a file size change to disk before the actual data is committed. > > So why not just protect the layoutcommit call using the existing > nfs_commit_set_lock/nfs_commit_clear_lock? Completely agree. We should serialize layoutcommit instead of trying to handle multiple of them concurrently. -- Thanks, -Bergwolf