Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp5124662rwb; Wed, 21 Sep 2022 03:36:24 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4t/YAa7wKApD3ZYZtRvL0EKG5eNQyzl+g0lETZjnxQHsR8RMXKBJVHNSNEgLrr/6nYwKl0 X-Received: by 2002:a05:6402:5254:b0:450:e9be:e13 with SMTP id t20-20020a056402525400b00450e9be0e13mr24710808edd.398.1663756584742; Wed, 21 Sep 2022 03:36:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663756584; cv=none; d=google.com; s=arc-20160816; b=MhGs7ZDsdqQNAOhfuaDnKRxtbD5LK5BG6+dSbnpmhul1pC8uLcZnY4GKI1quLVt46G uO57tCqQR6aB0qafmZnsD2CzJ6SkgeEEgGVgKvRdNffdvAV6oy2gbcHmq1luqvliiow1 868eRmA8nxeu6GmhruHJB+C8kMMrhJwFIGH2jBpomdu/XlwgpIlj3HvSiWMKPhBuFgy8 HqtwXTlNU1cs32jBbOnAPBivF8yyqc//cpdDEGX8JtdiM6NhKCLDKIVZX/GwSqAip7zy Ce5gXbJVDa4oyXE6g/5DG9qors0ceUQpia/HJyobBcKKiLd96UhWD2zv3xmG9KRWq+s3 9LoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=eYvXJcKK95kX7ibS/mPRz5HIuKg9NsK6z01T52oCoP4=; b=D9WGHvQFf6ILQTRPnJb5HaHOyFv8n86Ax9fQM2You2l9CsYaC08CGxu/gtFoM4+aOR /8p2r09H5LIAjgBEu8m8ym6Mxka7jO8lUPL/p0RjDTdBbjnDtL69MKLNG7mCAzp4uJl/ med3hla4tpiJ/T7HkA+yQSNNghHkNjCsQbpnOp7O65BiqjoJHUj2YOYPc3A5IMHyqwS+ SsBMSE6RVwzCW9bU57kj05diLCwxe7JIEhCanJPctuzQrQQrbX0rzr09of74tTCvjIIS AIgKiFBIDp17qg9RfUf41x87Ozlj27eYb9QBW2ZiopSplDdK0ZMvOaUxe+SxIO3uwuSm 2XUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=o4RWarLQ; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y12-20020a056402270c00b0043ce8b2c72dsi2562577edd.36.2022.09.21.03.35.57; Wed, 21 Sep 2022 03:36:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=o4RWarLQ; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230408AbiIUKdf (ORCPT + 99 others); Wed, 21 Sep 2022 06:33:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230177AbiIUKde (ORCPT ); Wed, 21 Sep 2022 06:33:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2025F90822; Wed, 21 Sep 2022 03:33:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9F70861F48; Wed, 21 Sep 2022 10:33:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DAD53C433D6; Wed, 21 Sep 2022 10:33:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1663756412; bh=o3HfJ1dfwFPEiioeca+9POwOfG33OdnpIMMLdF2SNXg=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=o4RWarLQHwD3AgIDoCdx243jvK53BMwSxiGuWrlraMHFS8gEDQfTIWpRECTQjNNCA IbgpwkHWl/rTTQmTv8TPytRLDSnvm7teGWEiEEE+hyPHTVeUAcxBz1VxOKgCz/BxKc Am0Ym+HTEpxP1D2u2vQK08YvOqreMedpmLNCQzvqfkcSWWxsycWUfOHPD95dXSzZsB Q9AmUVduI1DNK6H8fEGBNPtsFuSRoeP8q/jzCYY09qkjOSTkshgbrEfbZkSgSLs/NC nbiYmuOgJ2RJQaGElU7slr5O3+UYrzdtEVwtoCeIliH/I6uYg4XhlABfoWRef1lZ9Y IzjLY5PB5SSFw== Message-ID: <93b6d9f7cf997245bb68409eeb195f9400e55cd0.camel@kernel.org> Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field From: Jeff Layton To: Dave Chinner Cc: Theodore Ts'o , NeilBrown , Trond Myklebust , "bfields@fieldses.org" , "zohar@linux.ibm.com" , "djwong@kernel.org" , "brauner@kernel.org" , "linux-xfs@vger.kernel.org" , "linux-api@vger.kernel.org" , "fweimer@redhat.com" , "linux-kernel@vger.kernel.org" , "chuck.lever@oracle.com" , "linux-man@vger.kernel.org" , "linux-nfs@vger.kernel.org" , "linux-ext4@vger.kernel.org" , "jack@suse.cz" , "viro@zeniv.linux.org.uk" , "xiubli@redhat.com" , "linux-fsdevel@vger.kernel.org" , "adilger.kernel@dilger.ca" , "lczerner@redhat.com" , "ceph-devel@vger.kernel.org" , "linux-btrfs@vger.kernel.org" Date: Wed, 21 Sep 2022 06:33:28 -0400 In-Reply-To: <20220921000032.GR3600936@dread.disaster.area> References: <871f9c5153ddfe760854ca31ee36b84655959b83.camel@hammerspace.com> <166328063547.15759.12797959071252871549@noble.neil.brown.name> <7027d1c2923053fe763e9218d10ce8634b56e81d.camel@kernel.org> <24005713ad25370d64ab5bd0db0b2e4fcb902c1c.camel@kernel.org> <20220918235344.GH3600936@dread.disaster.area> <87fb43b117472c0a4c688c37a925ac51738c8826.camel@kernel.org> <20220920001645.GN3600936@dread.disaster.area> <5832424c328ea427b5c6ecdaa6dd53f3b99c20a0.camel@kernel.org> <20220921000032.GR3600936@dread.disaster.area> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4 (3.44.4-1.fc36) MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, 2022-09-21 at 10:00 +1000, Dave Chinner wrote: > On Tue, Sep 20, 2022 at 06:26:05AM -0400, Jeff Layton wrote: > > On Tue, 2022-09-20 at 10:16 +1000, Dave Chinner wrote: > > > IOWs, the NFS server can define it's own on-disk persistent metadata > > > using xattrs, and you don't need local filesystems to be modified at > > > all. You can add the crash epoch into the change attr that is sent > > > to NFS clients without having to change the VFS i_version > > > implementation at all. > > >=20 > > > This whole problem is solvable entirely within the NFS server code, > > > and we don't need to change local filesystems at all. NFS can > > > control the persistence and format of the xattrs it uses, and it > > > does not need new custom on-disk format changes from every > > > filesystem to support this new application requirement. > > >=20 > > > At this point, NFS server developers don't need to care what the > > > underlying filesystem format provides - the xattrs provide the crash > > > detection and enumeration the NFS server functionality requires. > > >=20 > >=20 > > Doesn't the filesystem already detect when it's been mounted after an > > unclean shutdown? >=20 > Not every filesystem will be able to guarantee unclean shutdown > detection at the next mount. That's the whole problem - NFS > developers are asking for something that cannot be provided as > generic functionality by individual filesystems, so the NFS server > application is going to have to work around any filesytem that > cannot provide the information it needs. >=20 > e.g. ext4 has it journal replayed by the userspace tools prior > to mount, so when it then gets mounted by the kernel it's seen as a > clean mount. >=20 > If we shut an XFS filesystem down due to a filesystem corruption or > failed IO to the journal code, the kernel might not be able to > replay the journal on mount (i.e. it is corrupt). We then run > xfs_repair, and that fixes the corruption issue and -cleans the > log-. When we next mount the filesystem, it results in a _clean > mount_, and the kernel filesystem code can not signal to NFS that an > unclean mount occurred and so it should bump it's crash counter. >=20 > IOWs, this whole "filesystems need to tell NFS about crashes" > propagates all the way through *every filesystem tool chain*, not > just the kernel mount code. And we most certainly don't control > every 3rd party application that walks around in the filesystem on > disk format, and so there are -zero- guarantees that the kernel > filesystem mount code can give that an unclean shutdown occurred > prior to the current mount. >=20 > And then for niche NFS server applications (like transparent > fail-over between HA NFS servers) there are even more rigid > constraints on NFS change attributes. And you're asking local > filesystems to know about these application constraints and bake > them into their on-disk format again. >=20 > This whole discussion has come about because we baked certain > behaviour for NFS into the on-disk format many, many years ago, and > it's only now that it is considered inadequate for *new* NFS > application related functionality (e.g. fscache integration and > cache validity across server side mount cycles). >=20 > We've learnt a valuable lesson from this: don't bake application > specific persistent metadata requirements into the on-disk format > because when the application needs to change, it requires every > filesystem that supports taht application level functionality > to change their on-disk formats... >=20 > > I'm not sure what good we'll get out of bolting this > > scheme onto the NFS server, when the filesystem could just as easily > > give us this info. >=20 > The xattr scheme guarantees the correct application behaviour that the NF= S > server requires, all at the NFS application level without requiring > local filesystems to support the NFS requirements in their on-disk > format. THe NFS server controls the format and versioning of it's > on-disk persistent metadata (i.e. the xattrs it uses) and so any > changes to the application level requirements of that functionality > are now completely under the control of the application. >=20 > i.e. the application gets to manage version control, backwards and > forwards compatibility of it's persistent metadata, etc. What you > are asking is that every local filesystem takes responsibility for > managing the long term persistent metadata that only NFS requires. > It's more complex to do this at the filesystem level, and we have to > replicate the same work for every filesystem that is going to > support this on-disk functionality. >=20 > Using xattrs means the functionality is implemented once, it's > common across all local filesystems, and no exportable filesystem > needs to know anything about it as it's all self-contained in the > NFS server code. THe code is smaller, easier to maintain, consistent > across all systems, easy to test, etc. >=20 > It also can be implemented and rolled out *immediately* to all > existing supported NFS server implementations, without having to > wait months/years (or never!) for local filesystem on-disk format > changes to roll out to production systems. >=20 > Asking individual filesystems to implement application specific > persistent metadata is a *last resort* and should only be done if > correctness or performance cannot be obtained in any other way. >=20 > So, yeah, the only sane direction to take here is to use xattrs to > store this NFS application level information. It's less work for > everyone, and in the long term it means when the NFS application > requirements change again, we don't need to modify the on-disk > format of multiple local filesystems. >=20 > > In any case, the main problem at this point is not so much in detecting > > when there has been an unclean shutdown, but rather what to do when > > there is one. We need to to advance the presented change attributes > > beyond the largest possible one that may have been handed out prior to > > the crash.=20 >=20 > Sure, but you're missing my point: by using xattrs for detection, > you don't need to involve anything to do with local filesystems at > all. >=20 > > How do we determine what that offset should be? Your last email > > suggested that there really is no limit to the number of i_version bump= s > > that can happen in memory before one of them makes it to disk. What can > > we do to address that? >=20 > >=20 > I'm just pointing out problems I see when defining this as behaviour > for on-disk format purposes. If we define it as part of the on-disk > format, then we have to be concerned about how it may be used > outside the scope of just the NFS server application.=20 >=20 > However, If NFS keeps this metadata and functionaly entirely > contained at the application level via xattrs, I really don't care > what algorithm NFS developers decides to use for their crash > sequencing. It's not my concern at this point, and that's precisely > why NFS should be using xattrs for this NFS specific functionality. >=20 I get it: you'd rather not have to deal with what you see as an NFS problem, but I don't get how what you're proposing solves anything. We might be able to use that scheme to detect crashes, but that's only part of the problem (and it's a relatively simple part of the problem to solve, really). Maybe you can clarify it for me: Suppose we go with what you're saying and store some information in xattrs that allows us to detect crashes in some fashion. The server crashes and comes back up and we detect that there was a crash earlier. What does nfsd need to do now to ensure that it doesn't hand out a duplicate change attribute?=20 Until we can answer that question, detecting crashes doesn't matter. --=20 Jeff Layton