Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:39088 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754160AbcG0QMs (ORCPT ); Wed, 27 Jul 2016 12:12:48 -0400 From: "Benjamin Coddington" To: "Trond Myklebust" Cc: "hch@infradead.org" , "List Linux NFS Mailing" Subject: Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics Date: Wed, 27 Jul 2016 12:14:32 -0400 Message-ID: In-Reply-To: <06BBFA63-02D9-4BDE-A61D-14FE9F4E9E5F@primarydata.com> References: <1467844205-76852-23-git-send-email-trond.myklebust@primarydata.com> <1467844205-76852-24-git-send-email-trond.myklebust@primarydata.com> <1467844205-76852-25-git-send-email-trond.myklebust@primarydata.com> <20160718034847.GA1195@infradead.org> <1468817945.5273.2.camel@primarydata.com> <20160719035843.GA24437@infradead.org> <20160721082216.GA15247@infradead.org> <9E2543EF-0442-491E-AB22-3E1667DABBA4@redhat.com> <70375D08-FCA9-4C73-9991-38490C11B171@redhat.com> <8B8BB73D-39A6-4291-A10E-736B0F907459@primarydata.com> <737F0071-AAD2-49B5-9BD0-02DF98049B76@primarydata.com> <10749CC8-F76E-4DBD-9C27-9802475CF8A5@redhat.com> <3448E968-3564-43C1-ADBB-A16C9687B0F3@primarydata.com> <4CA4361B-2501-48AA-8B2C-715E5A3FD35C@redhat.com> <7EEB460E-F197-4B68-BBD1-7EEF16B77A71@redhat.com> <455341C5-F202-4FA7-8A9B-7AA73CFA21CF@primarydata.com> <06BBFA63-02D9-4BDE-A61D-14FE9F4E9E5F@primarydata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >> On Jul 27, 2016, at 08:15, Trond Myklebust >> wrote: >> >> >>> On Jul 27, 2016, at 07:55, Benjamin Coddington >>> wrote: >>> >>> After adding more debugging, I see that all of that is working >>> correctly, >>> but the first LAYOUTCOMMIT is taking the size back down to 4096 from >>> the >>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never brings >>> it back >>> up again. >>> >> >> Excellent! Thanks for debugging that. >> >>> Now I see that we should be marking the block extents as written >>> atomically with >>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a >>> LAYOUTCOMMIT can >>> collect extents just added from the next bl_write_cleanup(). Then, >>> the next >>> LAYOUTCOMMIT fails, and all we're left with is the size from the >>> first >>> LAYOUTCOMMIT. Not sure if that particular problem is the whole fix, >>> but >>> that's something to work on. >>> >>> I see ways to fix that: >>> >>> - make a new pnfs_set_layoutcommit_locked() that can be used to >>> call >>> ext_tree_mark_written() inside the i_lock >>> >>> - make another pnfs_layoutdriver_type operation to be used within >>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), >>> and call >>> ext_tree_mark_written() within that.. >>> >>> - have .prepare_layoutcommit return a new positive plh_lwb that >>> would >>> extend the current LAYOUTCOMMIT >>> >>> - make ext_tree_prepare_commit only encode up to plh_lwb >> >> I see no reason why ext_tree_prepare_commit() shouldn’t be allowed >> to extend the args->lastbytewritten. This is a metadata operation >> that is owned by the pNFS layout driver. >> The only thing I’d note is you should then rewrite the failure case >> in pnfs_layoutcommit_inode() so that it doesn’t rely on the saved >> “end_pos”, but uses args->lastbytewritten instead (with a comment >> to the effect why)… > > In fact, given the potential for races here, I think the right thing > to do is to have ext_tree_prepare_commit() always set the correct > value for args->lastbytewritten. OK, that has cleared up that common failure case that was getting in the way, but now it can still fail like this: nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears layoutcommit flag sets NFS_INO_LAYOUTCOMMITING nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING 1st nfs_getattr -> __revalidate_inode sets size 4096, NFS_INO_INVALID_ATTR not set.. cache is valid 2nd nfs_getattr immediately returns 4096 even though NFS_INO_LAYOUTCOMMIT Ben