Return-Path: Received: from mail-bw0-f46.google.com ([209.85.214.46]:48715 "EHLO mail-bw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750724Ab1CXQyy convert rfc822-to-8bit (ORCPT ); Thu, 24 Mar 2011 12:54:54 -0400 Received: by bwz15 with SMTP id 15so223807bwz.19 for ; Thu, 24 Mar 2011 09:54:53 -0700 (PDT) In-Reply-To: <1300985305.31106.6.camel@lade.trondhjem.org> References: <4D8B732F.8020404@panasas.com> <1300985305.31106.6.camel@lade.trondhjem.org> Date: Thu, 24 Mar 2011 12:54:52 -0400 Message-ID: Subject: Re: [PATCH 11/12] NFSv4.1: layoutcommit From: Fred Isaman To: Trond Myklebust Cc: Benny Halevy , "William A. (Andy) Adamson" , NFS list Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Thu, Mar 24, 2011 at 12:48 PM, Trond Myklebust wrote: > On Thu, 2011-03-24 at 18:37 +0200, Benny Halevy wrote: >> On 2011-03-24 15:57, William A. (Andy) Adamson wrote: >> >>> Only whole file layout support means that there is only one IOMODE_RW layout >> >>> segment. >> >>> >> >>> Signed-off-by: Andy Adamson >> >>> Signed-off-by: Alexandros Batsakis >> >>> Signed-off-by: Boaz Harrosh >> >>> Signed-off-by: Dean Hildebrand >> >>> Signed-off-by: Fred Isaman >> >>> Signed-off-by: Mingyang Guo >> >>> Signed-off-by: Tao Guo >> >>> Signed-off-by: Zhang Jingwang >> >>> Tested-by: Boaz Harrosh >> >>> Signed-off-by: Benny Halevy >> >> >> >> The code in this patch is new and different enough from the one I/we >> >> signed-off originally that they don't make sense here. >> > >> > Hi Benny >> > >> > OK with me >> > >> >>> >> >>> + ? ? ? ? ? ? /* references matched in nfs4_layoutcommit_release */ >> >>> + ? ? ? ? ? ? wdata->lseg->pls_lc_cred = >> >>> + ? ? ? ? ? ? ? ? ? ? get_rpccred(wdata->args.context->state->owner->so_cred); >> >>> + ? ? ? ? ? ? mark_inode_dirty_sync(wdata->inode); >> >>> + ? ? ? ? ? ? dprintk("%s: Set layoutcommit for inode %lu ", >> >>> + ? ? ? ? ? ? ? ? ? ? __func__, wdata->inode->i_ino); >> >>> + ? ? } >> >>> + ? ? if (end_pos > wdata->lseg->pls_end_pos) >> >>> + ? ? ? ? ? ? wdata->lseg->pls_end_pos = end_pos; >> >> >> >> The end_pos is essentially per inode, why maintain it per lseg? >> >> How do you see this working with multiple lsegs in mind? >> > >> > The end-pos is per lseg, not per inode - each layoutcommit applies to >> > a range of WRITES for a layoutsegment over the LAYOUTCOMMIT range. >> > >> > From Section 18.42.3 >> > . ?The byte-range being committed is >> > ? ?specified through the byte-range (loca_offset and loca_length). ?This >> > ? ?byte-range MUST overlap with one or more existing layouts previously >> > ? ?granted via LAYOUTGET >> > >> > >> > ? ?Also, loca_last_write_offset MUST overlap the range >> > ? ?described by loca_offset and loca_length. >> > >> > For the multiple lseg case: if the lsegs are merged, bookeeping >> > end_pos per lseg just works. If a layoutdriver does not use merged >> > lsegs, then there is a bit of work to do to walk the list of lsegs and >> > determine the final end_pos for a given LAYOUTCOMMIT. ?If there are >> > multiple non-contiguous lsegs, each used for WRITEs then multiple >> > LAYOUTCOMMITs will need to be sent, otherwise the LAYOUTCOMMIT >> > byte-range will not overlap as required. >> > >> >> For the current layout types I believe that the LAYOUTCOMMIT can "merge" >> multiple layout segments into a single LAYOUTCOMMIT, with a byte range >> covering all segments and a last_byte_written offset which is just the maximum. >> Future layout types may need this method though... > > Is that safe? > > What if I'm doing blocks and have written layout segment 1 & 3, but not > layout segment 2? I don't want to have the MDS commit layout segment 2, > and make the (lack of) data there visible to future readers. > No, it is not safe. Avoiding this problem is one of the major reasons for putting the bookkeeping in the lseg. Fred