Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp2218794rwe; Sun, 28 Aug 2022 06:27:12 -0700 (PDT) X-Google-Smtp-Source: AA6agR6rtH6snTM6r2DU3PLaw/8xr4gTyEsNQyFpNr3rWxTMYqzrlOGISfTAj5VtOZib52Up7lvv X-Received: by 2002:aa7:cc05:0:b0:447:8654:7fa9 with SMTP id q5-20020aa7cc05000000b0044786547fa9mr13251106edt.298.1661693232194; Sun, 28 Aug 2022 06:27:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661693232; cv=none; d=google.com; s=arc-20160816; b=f5rbuaDkThaKrWUi4p5axZ/wz0FYCM7e4R9UgoYMszjcmBwavsYNXc1quLiyF5f+KG keBmlQ97snPiW+IrdQHEGLv4mKEs17n4uwPodpKDXMmRdRfMSIdtNoFzfFKaVH1ic2FY 1g8EVfE1But/Xg1yAWuRRotqwDvouWddThAwq/Vf5fxS5X+iW+E1uaICNWKRkRAbSOLk 4xKTQvmJ20wy6y65yEyWrDNPMtKo306Sz3f5lz7lJSCfvo/QVZxReOL4TVt/Z5TBP7aL 2/CpsucoduxoaETHM7LRPLGCLprOHvDV1tIuQauJ1eLTWZvGxneKB1EOrqghe3SnhdTK EP2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=jJnPIiLXos1pGZCNNXz0g207shJMFvq2w6EE6u3Sb08=; b=cEK3sR+V+UYQwkyQZN0ozCKQ3/amhG08Lycvq09mOXqgVyJ8f13WEGgA7SOFFIWr5c s5DodeQu/ck5eXqdIcD1fuUkhUiYT4GmsnTNq8WqvWeIe7WFhGVn/m01nwUgoR4+PPU3 66Wz4AjFBcqFxpio5T8zygi93BblpNBECXSB/Af8KOLIOvs7Qn9icwN84u+5yvw1obwB SDI+mgpXorE8LjneBKU9y/aq7BcyFdrR6mhY2LcgmIZe9HesbtDksfBs+tOd49l+1bzg mdwiM0Pts/rTkKG4l+PgbzfRAwHrjTq3zSjpQdh2Snmbc0TZjtrWlxR7K3lTrgZDCnUC 83dA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=kPn9FW50; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id go34-20020a1709070da200b0073094a56feesi5268244ejc.546.2022.08.28.06.26.38; Sun, 28 Aug 2022 06:27:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=kPn9FW50; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229533AbiH1NZn (ORCPT + 99 others); Sun, 28 Aug 2022 09:25:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbiH1NZm (ORCPT ); Sun, 28 Aug 2022 09:25:42 -0400 Received: from mail-vs1-xe33.google.com (mail-vs1-xe33.google.com [IPv6:2607:f8b0:4864:20::e33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F5032181E; Sun, 28 Aug 2022 06:25:41 -0700 (PDT) Received: by mail-vs1-xe33.google.com with SMTP id k2so5964170vsk.8; Sun, 28 Aug 2022 06:25:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=jJnPIiLXos1pGZCNNXz0g207shJMFvq2w6EE6u3Sb08=; b=kPn9FW50BqKVOiGds0UgH1bs+xviysP6qmZCqv5tiBST88174BRcUyMuJXQvcfSlcc 42qsWsPkK5amx9cD7ByvWdqpBjE/ra4SiCuwU3zcfsvKuWaoT8Gi5DMhGYRUgOciSOz7 tCJXaTjV4sgf+pZaM58Nb8HqtMcWxfoCGe9pfVfOzHmVyvvggiHWHD4yDDKFv6X7B5uy uY6J7DUBRk6AF4M7p14R2ilmayp8TivWNxZE2WAw1axPWFfpBjQf7Zsp45J9CM1Y4lh8 xA4yiFMZo1xRhr73M7Gh5jY5+ivon96qLczdct6Wb+qaqOB70IhIy3zkVJIhG30eUiHj Xikg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=jJnPIiLXos1pGZCNNXz0g207shJMFvq2w6EE6u3Sb08=; b=h4j6x37mBIsWQ+q4bwUp73VPG5i62tshPBmwbZQ9P8k6QEZiz5b5uwanniNCcLSXoD WjjAvqW78HjzoklbfndE6RLHzAapx/sKoktATAB56NU659+tEiznBaz/MyRkt6WsH0aB G3e3LcrRHm4pWUspuZ1nvIh8ULhl3Pn7GHOb/y554H6JmpNxf8kx30Bd1PxWpif8bZyJ x0fmDZQJ+OlAqseGh4sHunt5oUTTpZMSckvocOczjfukiMKt9TeL5Nm/sjiTo9JeQdMP ajZk0nUHDkjpM5twrC2MGHi2UKLh8y2gZrrxisIW1tTu6fhkI2JlYBCJUWBfuk6+k995 GxeQ== X-Gm-Message-State: ACgBeo0WoHxXS8sh6oXOwcZFwsWPqxhAbpfGQWAiKGRMOpqkQ9akngKR s0yLDSXz2xHbDWpzcGCTRPKZkWyfE73HMJv3m4M= X-Received: by 2002:a67:a649:0:b0:390:88c5:6a91 with SMTP id r9-20020a67a649000000b0039088c56a91mr2274114vsh.3.1661693139696; Sun, 28 Aug 2022 06:25:39 -0700 (PDT) MIME-Version: 1.0 References: <20220826214703.134870-1-jlayton@kernel.org> <20220826214703.134870-5-jlayton@kernel.org> <35d31d0a5c6c9a20c58f55ef62355ff39a3f18c6.camel@kernel.org> <079df2134120f847e8237675a8cc227d6354a153.camel@hammerspace.com> In-Reply-To: From: Amir Goldstein Date: Sun, 28 Aug 2022 16:25:28 +0300 Message-ID: Subject: Re: [PATCH v3 4/7] xfs: don't bump the i_version on an atime update in xfs_vn_update_time To: Jeff Layton Cc: Trond Myklebust , "djwong@kernel.org" , "zohar@linux.ibm.com" , "brauner@kernel.org" , "xiubli@redhat.com" , "neilb@suse.de" , "linux-api@vger.kernel.org" , "linux-xfs@vger.kernel.org" , "dwysocha@redhat.com" , "david@fromorbit.com" , "linux-kernel@vger.kernel.org" , "chuck.lever@oracle.com" , "linux-nfs@vger.kernel.org" , "tytso@mit.edu" , "viro@zeniv.linux.org.uk" , "jack@suse.cz" , "linux-ext4@vger.kernel.org" , "linux-btrfs@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "lczerner@redhat.com" , "adilger.kernel@dilger.ca" , "ceph-devel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Sat, Aug 27, 2022 at 7:10 PM Jeff Layton wrote: > > On Sat, 2022-08-27 at 16:03 +0000, Trond Myklebust wrote: > > On Sat, 2022-08-27 at 08:46 -0700, Darrick J. Wong wrote: > > > On Sat, Aug 27, 2022 at 09:14:30AM -0400, Jeff Layton wrote: > > > > On Sat, 2022-08-27 at 11:01 +0300, Amir Goldstein wrote: > > > > > On Sat, Aug 27, 2022 at 10:26 AM Amir Goldstein > > > > > wrote: > > > > > > > > > > > > On Sat, Aug 27, 2022 at 12:49 AM Jeff Layton > > > > > > wrote: > > > > > > > > > > > > > > xfs will update the i_version when updating only the atime > > > > > > > value, which > > > > > > > is not desirable for any of the current consumers of > > > > > > > i_version. Doing so > > > > > > > leads to unnecessary cache invalidations on NFS and extra > > > > > > > measurement > > > > > > > activity in IMA. > > > > > > > > > > > > > > Add a new XFS_ILOG_NOIVER flag, and use that to indicate that > > > > > > > the > > > > > > > transaction should not update the i_version. Set that value > > > > > > > in > > > > > > > xfs_vn_update_time if we're only updating the atime. > > > > > > > > > > > > > > Cc: Dave Chinner > > > > > > > Cc: NeilBrown > > > > > > > Cc: Trond Myklebust > > > > > > > Cc: David Wysochanski > > > > > > > Signed-off-by: Jeff Layton > > > > > > > --- > > > > > > > fs/xfs/libxfs/xfs_log_format.h | 2 +- > > > > > > > fs/xfs/libxfs/xfs_trans_inode.c | 2 +- > > > > > > > fs/xfs/xfs_iops.c | 11 +++++++++-- > > > > > > > 3 files changed, 11 insertions(+), 4 deletions(-) > > > > > > > > > > > > > > Dave has NACK'ed this patch, but I'm sending it as a way to > > > > > > > illustrate > > > > > > > the problem. I still think this approach should at least fix > > > > > > > the worst > > > > > > > problems with atime updates being counted. We can look to > > > > > > > carve out > > > > > > > other "spurious" i_version updates as we identify them. > > > > > > > > > > > > > > > > > > > AFAIK, "spurious" is only inode blocks map changes due to > > > > > > writeback > > > > > > of dirty pages. Anybody know about other cases? > > > > > > > > > > > > Regarding inode blocks map changes, first of all, I don't think > > > > > > that there is > > > > > > any practical loss from invalidating NFS client cache on dirty > > > > > > data writeback, > > > > > > because NFS server should be serving cold data most of the > > > > > > time. > > > > > > If there are a few unneeded cache invalidations they would only > > > > > > be temporary. > > > > > > > > > > > > > > > > Unless there is an issue with a writer NFS client that > > > > > invalidates its > > > > > own attribute > > > > > caches on server data writeback? > > > > > > > > > > > > > The client just looks at the file attributes (of which i_version is > > > > but > > > > one), and if certain attributes have changed (mtime, ctime, > > > > i_version, > > > > etc...) then it invalidates its cache. > > > > > > > > In the case of blocks map changes, could that mean a difference in > > > > the > > > > observable sparse regions of the file? If so, then a READ_PLUS > > > > before > > > > the change and a READ_PLUS after could give different results. > > > > Since > > > > that difference is observable by the client, I'd think we'd want to > > > > bump > > > > i_version for that anyway. > > > > > > How /is/ READ_PLUS supposed to detect sparse regions, anyway? I know > > > that's been the subject of recent debate. At least as far as XFS is > > > concerned, a file range can go from hole -> delayed allocation > > > reservation -> unwritten extent -> (actual writeback) -> written > > > extent. > > > The dance became rather more complex when we added COW. If any of > > > that > > > will make a difference for READ_PLUS, then yes, I think you'd want > > > file > > > writeback activities to bump iversion to cause client invalidations, > > > like (I think) Dave said. > > > > > > The fs/iomap/ implementation of SEEK_DATA/SEEK_HOLE reports data for > > > written and delalloc extents; and an unwritten extent will report > > > data > > > for any pagecache it finds. > > > > > > > READ_PLUS should never return anything different than a read() system > > call would return for any given area. The way it reports sparse regions > > vs. data regions is purely an RPC formatting convenience. > > > > The only point to note about NFS READ and READ_PLUS is that because the > > client is forced to send multiple RPC calls if the user is trying to > > read a region that is larger than the 'rsize' value, it is possible > > that these READ/READ_PLUS calls may be processed out of order, and so > > the result may end up looking different than if you had executed a > > read() call for the full region directly on the server. > > However each individual READ / READ_PLUS reply should look as if the > > user had called read() on that rsize-sized section of the file. > > > > > > Yeah, thinking about it some more, simply changing the block allocation > is not something that should affect the ctime, so we probably don't want > to bump i_version on it. It's an implicit change, IOW, not an explicit > one. > > The fact that xfs might do that is unfortunate, but it's not the end of > the world and it still would conform to the proposed definition for > i_version. In practice, this sort of allocation change should come soon > after the file was written, so one would hope that any damage due to the > false i_version bump would be minimized. > That was exactly my point. > It would be nice to teach it not to do that however. Maybe we can insert > the NOIVER flag at a strategic place to avoid it? Why would that be nice to avoid? You did not specify any use case where incrementing i_version on block mapping change matters in practice. On the contrary, you said that NFS client writer sends COMMIT on close, which should stabilize i_version for the next readers. Given that we already have an xfs implementation that does increment i_version on block mapping changes and it would be a pain to change that or add a new user options, I don't see the point in discussing it further unless there is a good incentive for avoiding i_version updates in those cases. Thanks, Amir.