Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp2059220rwr; Fri, 21 Apr 2023 03:48:54 -0700 (PDT) X-Google-Smtp-Source: AKy350aWEF3HYzNIcGrQAtUZ+nj8wMuNeLQjgofv661LXXPi/yG4EkfDllN7x/tY27VwkwUaR5eC X-Received: by 2002:a05:6a00:1496:b0:63a:fb57:63c5 with SMTP id v22-20020a056a00149600b0063afb5763c5mr4844686pfu.3.1682074134392; Fri, 21 Apr 2023 03:48:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682074134; cv=none; d=google.com; s=arc-20160816; b=LOsa4Lw8Tb3E2pP17YNrzomUdGE2tNqL+PoDtGnO2Lz25B0lPdCKtfGhHXeanvG9oL xRIxUKujp4LfjqSxBiNe7o46o9HKWhDEBJkWsCCnOU+mfH/T6OrhWtn9cekVEGrtVTTt nCgQGBF666b8iH4Sn0WIy/1gGUy5NOG9x7uFB7D7lY+GFDvix0fupkywwuQUukrEAnbC V9uzZPCNBN/l71UAxWUIAiBOp7gghs3CdEWcMorgd16mD/ujrQRVQlKhof3Rgmrgqn/J +12WyN/6g4nZXJZ/Ko8cEdO/hCNaLMRzPKQLAJwPGdyl/puUL3Lu6Osalo4EHIFfn+lP GpWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=9ig1G3snkksicJsGjEgSk5CSrNqkU8xAMVaTUJN5b3Q=; b=v73j/8YG+/aWKgH9Fu90BB37hjCQwjwfQw0vpCcqBNL0LiGEJoFcTvvHX6ZdWCNxo1 WPPm/T0p8Xe3DmZC18r4DnA/6K8BfbSQrn8ehFez2agBYinzmnwxo0QFxoI3ULH91lVv fvqGjlFVna5oG9kuhRI1xEwPum4ba5TV2t08ANSgsO/QCYWc194PeWV5Rs97a1oB79LD AOl4Bgi3uAK1SOWum1f7fEmj1gi1FwNo3qAem+2y5QcTNlQEvsipLdSvuiXOuWUaRD0E n1oOfIPNTvOuBHEMIK0KqsQouUVov8F4AKzuP+DMzkPTRil4Y3LfCTRbS0KTpYeFrsPi V41Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Hjutz1fv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a6-20020aa795a6000000b0063d28a06b1asi4196789pfk.112.2023.04.21.03.48.35; Fri, 21 Apr 2023 03:48:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Hjutz1fv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231483AbjDUKsC (ORCPT + 99 others); Fri, 21 Apr 2023 06:48:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230331AbjDUKsB (ORCPT ); Fri, 21 Apr 2023 06:48:01 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C0D283D1; Fri, 21 Apr 2023 03:47:59 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1726460B6F; Fri, 21 Apr 2023 10:47:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 66A37C433EF; Fri, 21 Apr 2023 10:47:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1682074078; bh=YMiCd2tn+Kwhq5vt740oxHkL6CrclEtNSKXXgkMDZzc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=Hjutz1fvbTbSug9eVUquviL4tAoNlkEh+D8UwRz2EmqxPmD/qIE/1sOTRzmn5mwqx mKM95MWbE5ujOBQHDUryrtj3aqFhc0Q0SATadEGGrZRLjIxBV8nq0lrqpmhmgCTUKh lndHzOw7gLoe3S5beyCwDRSRjL1vX/WylSDNOcfAjn6RKpLfuCoNsC6AMTVf+ZImBs M3cix5G9qaqYVwm89y3tY5ZNdrS3+cNddU1PF9z56nFky4dcJSc1cQmV7aUBD+FE+2 Vp7JlPZBRoPkjedQHc7virsAH4OI+StJ+5EdISxvirGszqGIL3Gz4r8XpZo0rZk+Jg biHjSlsBXy5yw== Message-ID: Subject: Re: [RFC PATCH 1/3] fs: add infrastructure for opportunistic high-res ctime/mtime updates From: Jeff Layton To: Jan Kara Cc: Alexander Viro , Christian Brauner , "Darrick J. Wong" , Hugh Dickins , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, Dave Chinner Date: Fri, 21 Apr 2023 06:47:55 -0400 In-Reply-To: <20230421101331.dlxom6b5e7yds5tn@quack3> References: <20230411142708.62475-1-jlayton@kernel.org> <20230411142708.62475-2-jlayton@kernel.org> <20230421101331.dlxom6b5e7yds5tn@quack3> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.0 (3.48.0-1.fc38) MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2023-04-21 at 12:13 +0200, Jan Kara wrote: > On Tue 11-04-23 10:27:06, Jeff Layton wrote: > > The VFS always uses coarse-grained timestamp updates for filling out th= e > > ctime and mtime after a change. This has the benefit of allowing > > filesystems to optimize away metadata updates. > >=20 > > Unfortunately, this has always been an issue when we're exporting via > > NFSv3, which relies on timestamps to validate caches. Even with NFSv4, = a > > lot of exported filesystems don't properly support a change attribute > > and are subject to the same problem of timestamp granularity. Other > > applications have similar issues (e.g backup applications). > >=20 > > Switching to always using high resolution timestamps would improve the > > situation for NFS, but that becomes rather expensive, as we'd have to > > log a lot more metadata updates. > >=20 > > This patch grabs a new i_state bit to use as a flag that filesystems ca= n > > set in their getattr routine to indicate that the mtime or ctime was > > queried since it was last updated. > >=20 > > It then adds a new current_cmtime function that acts like the > > current_time helper, but will conditionally grab high-res timestamps > > when the i_state flag is set in the inode. > >=20 > > This allows NFS and other applications to reap the benefits of high-res > > ctime and mtime timestamps, but at a substantially lower cost than > > fetching them every time. > >=20 > > Cc: Dave Chinner > > Signed-off-by: Jeff Layton > > --- > > fs/inode.c | 40 ++++++++++++++++++++++++++++++++++++++-- > > fs/stat.c | 10 ++++++++++ > > include/linux/fs.h | 5 ++++- > > 3 files changed, 52 insertions(+), 3 deletions(-) > >=20 > > diff --git a/fs/inode.c b/fs/inode.c > > index 4558dc2f1355..3630f67fd042 100644 > > --- a/fs/inode.c > > +++ b/fs/inode.c > > @@ -2062,6 +2062,42 @@ static int __file_update_time(struct file *file,= struct timespec64 *now, > > return ret; > > } > > =20 > > +/** > > + * current_cmtime - Return FS time (possibly high-res) > > + * @inode: inode. > > + * > > + * Return the current time truncated to the time granularity supported= by > > + * the fs, as suitable for a ctime or mtime change. If something recen= tly > > + * fetched the ctime or mtime out of the inode via getattr, then get a > > + * high-resolution timestamp. > > + * > > + * Note that inode and inode->sb cannot be NULL. > > + * Otherwise, the function warns and returns coarse time without trunc= ation. > > + */ > > +struct timespec64 current_cmtime(struct inode *inode) > > +{ > > + struct timespec64 now; > > + > > + if (unlikely(!inode->i_sb)) { >=20 > I don't think we can have inodes without a superblock. Did you ever hit > this? >=20 No, I copied this from current_time. I've already removed this in my working branch. We can probably remove it from current_time too. > > + WARN(1, "%s() called with uninitialized super_block in the inode", _= _func__); > > + ktime_get_coarse_real_ts64(&now); > > + return now; > > + } > > + > > + /* Do a lockless check for the flag before taking the spinlock */ > > + if (READ_ONCE(inode->i_state) & I_CMTIME_QUERIED) { > > + ktime_get_real_ts64(&now); > > + spin_lock(&inode->i_lock); > > + inode->i_state &=3D ~I_CMTIME_QUERIED; >=20 > Isn't this a bit fragile? If someone does: >=20 > inode->i_mtime =3D current_cmtime(inode); > inode->i_ctime =3D current_cmtime(inode); >=20 > the ctime update will be coarse although it should be fine-grained. >=20 It is a bit. We'll need for users to do something like: inode->i_mtime =3D inode->i_ctime =3D current_ctime(inode); Fortunately, most do this already. > > + spin_unlock(&inode->i_lock); > > + } else { > > + ktime_get_coarse_real_ts64(&now); > > + } > > + > > + return timestamp_truncate(now, inode); >=20 > I'm a bit confused here. Isn't the point of this series also to give NFS > finer grained granularity time stamps than what the filesystem is possibl= y > able to store on disk? >=20 No. We actually don't want to hand out timestamps more granular than the underlying filesystem can support, as we'd end up having to invalidate caches for all of those inodes once the server rebooted and the unrecordable bits get zeroed out. The main idea here is to just ensure that we use fine-grained timestamps when someone has queried the mtime or ctime since the last time it was updated. > Hmm, checking XFS it sets 1 ns granularity (as well as tmpfs) so for thes= e > using the coarser timers indeed gives a performance benefit. And probably > you've decided not implement the "better NFS support with coarse grained > timestamps" yet. >=20 Yep. The coarse grained timestamps are a _good_ thing for most filesystems as they allow you to skip a lot of metadata updates. My hope is that this will end up being like the i_version changes such that the extra fine-grained updates should be relatively rare and should (hopefully!) not cause noticeable performance blips. We'll see! > > +} > > +EXPORT_SYMBOL(current_cmtime); > > + > > /** > > * file_update_time - update mtime and ctime time > > * @file: file accessed > > @@ -2080,7 +2116,7 @@ int file_update_time(struct file *file) > > { > > int ret; > > struct inode *inode =3D file_inode(file); > > - struct timespec64 now =3D current_time(inode); > > + struct timespec64 now =3D current_cmtime(inode); > > =20 > > ret =3D inode_needs_update_time(inode, &now); > > if (ret <=3D 0) > > @@ -2109,7 +2145,7 @@ static int file_modified_flags(struct file *file,= int flags) > > { > > int ret; > > struct inode *inode =3D file_inode(file); > > - struct timespec64 now =3D current_time(inode); > > + struct timespec64 now =3D current_cmtime(inode); > > =20 > > /* > > * Clear the security bits if the process is not being run by root. > > diff --git a/fs/stat.c b/fs/stat.c > > index 7c238da22ef0..d8b80a2e36b7 100644 > > --- a/fs/stat.c > > +++ b/fs/stat.c > > @@ -64,6 +64,16 @@ void generic_fillattr(struct mnt_idmap *idmap, struc= t inode *inode, > > } > > EXPORT_SYMBOL(generic_fillattr); > > =20 > > +void fill_cmtime_and_mark(struct inode *inode, struct kstat *stat) > > +{ > > + spin_lock(&inode->i_lock); > > + inode->i_state |=3D I_CMTIME_QUERIED; > > + stat->ctime =3D inode->i_ctime; > > + stat->mtime =3D inode->i_mtime; > > + spin_unlock(&inode->i_lock); > > +} > > +EXPORT_SYMBOL(fill_cmtime_and_mark); >=20 > The name could be better here :). Maybe stat_fill_cmtime_and_mark()? >=20 > =09 I have a quite different set that I've been working on that I'll (hopefully!) post soon. That one uses the least significant bit of the tv_nsec field as the QUERIED flag instead of the spinlock. Still cleaning up the set and need to test it some more though, so it's not quite ready to post. Stay tuned! Thanks for the review!=20 --=20 Jeff Layton