Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:41742 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758895AbcG1Pbh convert rfc822-to-8bit (ORCPT ); Thu, 28 Jul 2016 11:31:37 -0400 From: "Benjamin Coddington" To: "Trond Myklebust" Cc: "hch@infradead.org" , "List Linux NFS Mailing" Subject: Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics Date: Thu, 28 Jul 2016 11:33:23 -0400 Message-ID: <6E3A8D21-6935-44DA-850B-A2B0E96EE169@redhat.com> In-Reply-To: <4FC77AD6-40D9-48E4-BF08-0EA03334BDC7@primarydata.com> References: <1467844205-76852-23-git-send-email-trond.myklebust@primarydata.com> <1467844205-76852-24-git-send-email-trond.myklebust@primarydata.com> <1467844205-76852-25-git-send-email-trond.myklebust@primarydata.com> <20160718034847.GA1195@infradead.org> <1468817945.5273.2.camel@primarydata.com> <20160719035843.GA24437@infradead.org> <20160721082216.GA15247@infradead.org> <9E2543EF-0442-491E-AB22-3E1667DABBA4@redhat.com> <70375D08-FCA9-4C73-9991-38490C11B171@redhat.com> <8B8BB73D-39A6-4291-A10E-736B0F907459@primarydata.com> <737F0071-AAD2-49B5-9BD0-02DF98049B76@primarydata.com> <10749CC8-F76E-4DBD-9C27-9802475CF8A5@redhat.com> <3448E968-3564-43C1-ADBB-A16C9687B0F3@primarydata.com> <4CA4361B-2501-48AA-8B2C-715E5A3FD35C@redhat.com> <7EEB460E-F197-4B68-BBD1-7EEF16B77A71@redhat.com> <455341C5-F202-4FA7-8A9B-7AA73CFA21CF@primarydata.com> <06BBFA63-02D9-4BDE-A61D-14FE9F4E9E5F@primarydata.com> <4F9280AE-46C2-4D1A-8CC9-F28DCFCE6271@primarydata.com> <4FC77AD6-40D9-48E4-BF08-0EA03334BDC7@primarydata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 28 Jul 2016, at 8:31, Trond Myklebust wrote: >> On Jul 28, 2016, at 05:47, Benjamin Coddington >> wrote: >> >> >> On 27 Jul 2016, at 14:05, Trond Myklebust wrote: >> >>>> On Jul 27, 2016, at 12:14, Benjamin Coddington >>>> wrote: >>>> >>>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >>>> >>>>>> On Jul 27, 2016, at 08:15, Trond Myklebust >>>>>> wrote: >>>>>> >>>>>> >>>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington >>>>>>> wrote: >>>>>>> >>>>>>> After adding more debugging, I see that all of that is working >>>>>>> correctly, >>>>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 >>>>>>> from the >>>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never >>>>>>> brings it back >>>>>>> up again. >>>>>>> >>>>>> >>>>>> Excellent! Thanks for debugging that. >>>>>> >>>>>>> Now I see that we should be marking the block extents as written >>>>>>> atomically with >>>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a >>>>>>> LAYOUTCOMMIT can >>>>>>> collect extents just added from the next bl_write_cleanup(). >>>>>>> Then, the next >>>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from the >>>>>>> first >>>>>>> LAYOUTCOMMIT. Not sure if that particular problem is the whole >>>>>>> fix, but >>>>>>> that's something to work on. >>>>>>> >>>>>>> I see ways to fix that: >>>>>>> >>>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used to >>>>>>> call >>>>>>> ext_tree_mark_written() inside the i_lock >>>>>>> >>>>>>> - make another pnfs_layoutdriver_type operation to be used >>>>>>> within >>>>>>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), >>>>>>> and call >>>>>>> ext_tree_mark_written() within that.. >>>>>>> >>>>>>> - have .prepare_layoutcommit return a new positive plh_lwb that >>>>>>> would >>>>>>> extend the current LAYOUTCOMMIT >>>>>>> >>>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb >>>>>> >>>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be >>>>>> allowed to extend the args->lastbytewritten. This is a metadata >>>>>> operation that is owned by the pNFS layout driver. >>>>>> The only thing I’d note is you should then rewrite the failure >>>>>> case in pnfs_layoutcommit_inode() so that it doesn’t rely on >>>>>> the saved “end_pos”, but uses args->lastbytewritten instead >>>>>> (with a comment to the effect why)… >>>>> >>>>> In fact, given the potential for races here, I think the right >>>>> thing to do is to have ext_tree_prepare_commit() always set the >>>>> correct value for args->lastbytewritten. >>>> >>>> OK, that has cleared up that common failure case that was getting >>>> in the >>>> way, but now it can still fail like this: >>>> >>> >>> Good progress! :-) >>> >>>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR >>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears >>>> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING >>>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR >>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, >>>> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING >>>> 1st nfs_getattr -> __revalidate_inode sets size 4096, >>>> NFS_INO_INVALID_ATTR not set.. cache is valid >>>> 2nd nfs_getattr immediately returns 4096 even though >>>> NFS_INO_LAYOUTCOMMIT >>> >>> Is this being tested on top of the current linux-next/testing? >>> Normally, I’d expect >>> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed >>> to cause 1st nfs_getattr() to not declare the cache valid. >> >> Yes, this is on your linux-next branch. >> >> When the 1st nfs_getattr() goes through nfs_update_inode() the second >> time >> (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by >> anything, >> since all the attributes returned match the cache. So even though >> NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is >> "false", >> the NFS_INO_INVALID_ATTR is never set in the "invalid" local >> variable. >> >> Should pnfs_layoutcommit_outstanding() always set >> NFS_INO_INVALID_ATTR? >> >> Ben > > nfs_post_op_update_inode_locked() should be doing that as part of the > callchain in nfs_writeback_update_inode(). And it is, and the bit persists through the next layoutcommit, it is the next GETATTR response that finds that all the attributes are the same and the bit is cleared.