Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:54642 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932580AbcG1Qi6 convert rfc822-to-8bit (ORCPT ); Thu, 28 Jul 2016 12:38:58 -0400 From: "Benjamin Coddington" To: "Trond Myklebust" Cc: "hch@infradead.org" , "List Linux NFS Mailing" Subject: Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics Date: Thu, 28 Jul 2016 12:40:44 -0400 Message-ID: <6BB5F196-ED7E-4785-867F-0EE4961CC9BB@redhat.com> In-Reply-To: <55623CCA-4208-49E5-BD5B-67A0B48A677C@primarydata.com> References: <1467844205-76852-23-git-send-email-trond.myklebust@primarydata.com> <1467844205-76852-24-git-send-email-trond.myklebust@primarydata.com> <1467844205-76852-25-git-send-email-trond.myklebust@primarydata.com> <20160718034847.GA1195@infradead.org> <1468817945.5273.2.camel@primarydata.com> <20160719035843.GA24437@infradead.org> <20160721082216.GA15247@infradead.org> <9E2543EF-0442-491E-AB22-3E1667DABBA4@redhat.com> <70375D08-FCA9-4C73-9991-38490C11B171@redhat.com> <8B8BB73D-39A6-4291-A10E-736B0F907459@primarydata.com> <737F0071-AAD2-49B5-9BD0-02DF98049B76@primarydata.com> <10749CC8-F76E-4DBD-9C27-9802475CF8A5@redhat.com> <3448E968-3564-43C1-ADBB-A16C9687B0F3@primarydata.com> <4CA4361B-2501-48AA-8B2C-715E5A3FD35C@redhat.com> <7EEB460E-F197-4B68-BBD1-7EEF16B77A71@redhat.com> <455341C5-F202-4FA7-8A9B-7AA73CFA21CF@primarydata.com> <06BBFA63-02D9-4BDE-A61D-14FE9F4E9E5F@primarydata.com> <4F9280AE-46C2-4D1A-8CC9-F28DCFCE6271@primarydata.com> <4FC77AD6-40D9-48E4-BF08-0EA03334BDC7@primarydata.com> <6E3A8D21-6935-44DA-850B-A2B0E96EE169@redhat.com> <55623CCA-4208-49E5-BD5B-67A0B48A677C@primarydata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 28 Jul 2016, at 11:36, Trond Myklebust wrote: >> On Jul 28, 2016, at 11:33, Benjamin Coddington >> wrote: >> >> On 28 Jul 2016, at 8:31, Trond Myklebust wrote: >> >>>> On Jul 28, 2016, at 05:47, Benjamin Coddington >>>> wrote: >>>> >>>> >>>> On 27 Jul 2016, at 14:05, Trond Myklebust wrote: >>>> >>>>>> On Jul 27, 2016, at 12:14, Benjamin Coddington >>>>>> wrote: >>>>>> >>>>>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >>>>>> >>>>>>>> On Jul 27, 2016, at 08:15, Trond Myklebust >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> After adding more debugging, I see that all of that is working >>>>>>>>> correctly, >>>>>>>>> but the first LAYOUTCOMMIT is taking the size back down to >>>>>>>>> 4096 from the >>>>>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never >>>>>>>>> brings it back >>>>>>>>> up again. >>>>>>>>> >>>>>>>> >>>>>>>> Excellent! Thanks for debugging that. >>>>>>>> >>>>>>>>> Now I see that we should be marking the block extents as >>>>>>>>> written atomically with >>>>>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a >>>>>>>>> LAYOUTCOMMIT can >>>>>>>>> collect extents just added from the next bl_write_cleanup(). >>>>>>>>> Then, the next >>>>>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from >>>>>>>>> the first >>>>>>>>> LAYOUTCOMMIT. Not sure if that particular problem is the >>>>>>>>> whole fix, but >>>>>>>>> that's something to work on. >>>>>>>>> >>>>>>>>> I see ways to fix that: >>>>>>>>> >>>>>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used >>>>>>>>> to call >>>>>>>>> ext_tree_mark_written() inside the i_lock >>>>>>>>> >>>>>>>>> - make another pnfs_layoutdriver_type operation to be used >>>>>>>>> within >>>>>>>>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), >>>>>>>>> and call >>>>>>>>> ext_tree_mark_written() within that.. >>>>>>>>> >>>>>>>>> - have .prepare_layoutcommit return a new positive plh_lwb >>>>>>>>> that would >>>>>>>>> extend the current LAYOUTCOMMIT >>>>>>>>> >>>>>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb >>>>>>>> >>>>>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be >>>>>>>> allowed to extend the args->lastbytewritten. This is a metadata >>>>>>>> operation that is owned by the pNFS layout driver. >>>>>>>> The only thing I’d note is you should then rewrite the >>>>>>>> failure case in pnfs_layoutcommit_inode() so that it doesn’t >>>>>>>> rely on the saved “end_pos”, but uses args->lastbytewritten >>>>>>>> instead (with a comment to the effect why)… >>>>>>> >>>>>>> In fact, given the potential for races here, I think the right >>>>>>> thing to do is to have ext_tree_prepare_commit() always set the >>>>>>> correct value for args->lastbytewritten. >>>>>> >>>>>> OK, that has cleared up that common failure case that was getting >>>>>> in the >>>>>> way, but now it can still fail like this: >>>>>> >>>>> >>>>> Good progress! :-) >>>>> >>>>>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR >>>>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>>>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears >>>>>> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING >>>>>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR >>>>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>>>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, >>>>>> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING >>>>>> 1st nfs_getattr -> __revalidate_inode sets size 4096, >>>>>> NFS_INO_INVALID_ATTR not set.. cache is valid >>>>>> 2nd nfs_getattr immediately returns 4096 even though >>>>>> NFS_INO_LAYOUTCOMMIT >>>>> >>>>> Is this being tested on top of the current linux-next/testing? >>>>> Normally, I’d expect >>>>> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed >>>>> to cause 1st nfs_getattr() to not declare the cache valid. >>>> >>>> Yes, this is on your linux-next branch. >>>> >>>> When the 1st nfs_getattr() goes through nfs_update_inode() the >>>> second time >>>> (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by >>>> anything, >>>> since all the attributes returned match the cache. So even though >>>> NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is >>>> "false", >>>> the NFS_INO_INVALID_ATTR is never set in the "invalid" local >>>> variable. >>>> >>>> Should pnfs_layoutcommit_outstanding() always set >>>> NFS_INO_INVALID_ATTR? >>>> >>>> Ben >>> >>> nfs_post_op_update_inode_locked() should be doing that as part of >>> the callchain in nfs_writeback_update_inode(). >> >> And it is, and the bit persists through the next layoutcommit, it is >> the next GETATTR response that finds that all the attributes are the >> same and the bit is cleared. >> > > So what if we require that nfsi->cache_validity be set to > save_cache_validity & NFS_INO_INVALID_ATTR at a minimum if > pnfs_layoutcommit_outstanding()? With this, I am unable to reproduce the problem: @@ -1665,7 +1684,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) unsigned long now = jiffies; unsigned long save_cache_validity; bool have_writers = nfs_file_has_buffered_writers(nfsi); - bool cache_revalidated; + bool cache_revalidated = true; dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d info=0x%x)\n", __func__, inode->i_sb->s_id, inode->i_ino, @@ -1714,8 +1733,10 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) /* Do atomic weak cache consistency updates */ invalid |= nfs_wcc_update_inode(inode, fattr); - - cache_revalidated = !pnfs_layoutcommit_outstanding(inode); + if (pnfs_layoutcommit_outstanding(inode)) { + nfsi->cache_validity |= save_cache_validity & NFS_INO_INVALID_ATTR; + cache_revalidated = false; + } I'll send these two patches along shortly unless otherwise called off..