Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp2827644rwb; Mon, 3 Oct 2022 06:12:28 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4iYbhJSdAz5CtyOGjdoy8xDHx6v7G+oMS6gojoWQIEY5RfLRox89csGOxmeWJK693EZMe7 X-Received: by 2002:aa7:9aef:0:b0:561:8612:f25b with SMTP id y15-20020aa79aef000000b005618612f25bmr2941558pfp.22.1664802748538; Mon, 03 Oct 2022 06:12:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664802748; cv=none; d=google.com; s=arc-20160816; b=uVeh8C6T1WTkziakOKg0ERaEqS4F5LKNIZ/7WQOPZR9k1kjWi6yhCPZZNhJsrk59cU 8J79hD/QjvR3f+YmPQmoSXrXmi5CCO38uKvWRx+xEGHSH9PhJvr0JeuLM2JGE44E/t2N 0AfAQKvvIun7epkZgNbljH2AKzzrc6gI/hCYU0sOzQUjyAYk2nMidUo/yn+cIy9ChnD8 2BUb0u221DEF16Yuju5GXAYfdQAQmXYMwStPl9BaWLP/r/Uc+Ejap8TP5HPlFB/KbYjc 7xTFZGEww1bXiQQQpEvHUYxSQwueCneTPIuaqwcKdzQKbeFm67Zxz+Byct7vnMUTTDaw TaTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=TIY9MeyknIlKMROGw2MOvRWo+WHjxA4vUte2/48A1Mw=; b=UZT30D+quMGi+Gt/yykCy+kJ1GkVXHg69LAJCM/fA2cRxAvrXE0RdqOIk7aIA6gsh0 RdtkXr+kSH2nSEPfQEwFzUcZzpejdxG/nkWo2ju6mCAxU9DOrRt0/pe+b9wHuusrAa8X EjFx1UmoCzPlYa2qlBNcMwoiq/SeE3DieN+yx1PTQH10sYAXGtJQLBX7P9NBgEo1ovY1 82iLrGOFSKTPnFbdEcFyJ1IY2vf/3Jds6mKU3S8Ug4MkBqbf1FAi8n3EIsOTK5UFfWz2 kKKfRWI1utSdJItQ+P9qHGcTeV1UKolnZs/jQoGzVyqeNjHpops0zNUuxcoQ0XNc2OIc w7ZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Gy2gEyLj; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o10-20020a1709026b0a00b00172f1c0ff49si9836648plk.28.2022.10.03.06.11.59; Mon, 03 Oct 2022 06:12:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Gy2gEyLj; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229932AbiJCNBl (ORCPT + 99 others); Mon, 3 Oct 2022 09:01:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229921AbiJCNBh (ORCPT ); Mon, 3 Oct 2022 09:01:37 -0400 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66D0341999; Mon, 3 Oct 2022 06:01:34 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id A7038CE0B9D; Mon, 3 Oct 2022 13:01:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2F6D4C433D6; Mon, 3 Oct 2022 13:01:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1664802090; bh=mmTZgqg6FTmCB5EIX3Cw3VFcZPiIeeisIwOu8wf7Flg=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=Gy2gEyLjXxxCj+YRLcI/UBObyUaFOtu6rsGoTbJkbxYcLSmQbmAsi6DCHZa3QTnal y9TV368Td0n+5rDoB8SO9t2CUsupMJhvZRc/Lbeq6y1AmUlKV5SwhGA/WlJ1Wirilo 12eWfppfGDhxKxOHlXgFlYbdbP0VZaL/288GR6FEMLc4RKpZsu8HTn/iLsZ4HoDLTy ZBRptOFbaCgH4Piq9dH59gg9oybcy1H1h9cwsjOqcfmSDUfQRdprZOhw2/2FYni2Gc 0yJqcSw03Kh8hnclHe6AAxUqRVOXJsBwgqq6A1BrGKZKZmyBHL+/JJXp9rGtqsKlWk +wTXnwnmq7Emw== Message-ID: Subject: Re: [PATCH v6 8/9] vfs: update times after copying data in __generic_file_write_iter From: Jeff Layton To: Amir Goldstein Cc: tytso@mit.edu, adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com, trondmy@hammerspace.com, neilb@suse.de, viro@zeniv.linux.org.uk, zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com, lczerner@redhat.com, jack@suse.cz, bfields@fieldses.org, brauner@kernel.org, fweimer@redhat.com, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org Date: Mon, 03 Oct 2022 09:01:26 -0400 In-Reply-To: References: <20220930111840.10695-1-jlayton@kernel.org> <20220930111840.10695-9-jlayton@kernel.org> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4 (3.44.4-2.fc36) MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Sun, 2022-10-02 at 10:08 +0300, Amir Goldstein wrote: > On Fri, Sep 30, 2022 at 2:30 PM Jeff Layton wrote: > >=20 > > The c/mtime and i_version currently get updated before the data is > > copied (or a DIO write is issued), which is problematic for NFS. > >=20 > > READ+GETATTR can race with a write (even a local one) in such a way as > > to make the client associate the state of the file with the wrong chang= e > > attribute. That association can persist indefinitely if the file sees n= o > > further changes. > >=20 > > Move the setting of times to the bottom of the function in > > __generic_file_write_iter and only update it if something was > > successfully written. > >=20 >=20 > This solution is wrong for several reasons: >=20 > 1. There is still file_update_time() in ->page_mkwrite() so you haven't > solved the problem completely Right. I don't think there is a way to solve the problem vs. mmap. Userland can write to a writeable mmap'ed page at any time and we'd never know. We have to specifically carve out mmap as an exception here. I'll plan to add something to the manpage patch for this. > 2. The other side of the coin is that post crash state is more likely to = end > up data changes without mtime/ctime change >=20 Is this really something filesystems rely on? I suppose the danger is that some cached data gets written to disk before the write returns and the inode on disk never gets updated. But...isn't that a danger now? Some of the cached data could get written out and the updated inode just never makes it to disk before a crash (AFAIU). I'm not sure that this increases our exposure to that problem. > If I read the problem description correctly, then a solution that invalid= ates > the NFS cache before AND after the write would be acceptable. Right? > Would an extra i_version bump after the write solve the race? >=20 I based this patch on Neil's assertion that updating the time before an operation was pointless if we were going to do it afterward. The NFS client only really cares about seeing it change after a write. Doing both would be fine from a correctness standpoint, and in most cases, the second would be a no-op anyway since a query would have to race in between the two for that to happen. FWIW, I think we should update the m/ctime and version at the same time. If the version changes, then there is always the potential that a timer tick has occurred. So, that would translate to a second call to file_update_time in here. The downside of bumping the times/version both before and after is that these are hot codepaths, and we'd be adding extra operations there. Even in the case where nothing has changed, we'd have to call inode_needs_update_time a second time for every write. Is that worth the cost? > > If the time update fails, log a warning once, but don't fail the write. > > All of the existing callers use update_time functions that don't fail, > > so we should never trip this. > >=20 > > Signed-off-by: Jeff Layton > > --- > > mm/filemap.c | 17 +++++++++++++---- > > 1 file changed, 13 insertions(+), 4 deletions(-) > >=20 > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 15800334147b..72c0ceb75176 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -3812,10 +3812,6 @@ ssize_t __generic_file_write_iter(struct kiocb *= iocb, struct iov_iter *from) > > if (err) > > goto out; > >=20 > > - err =3D file_update_time(file); > > - if (err) > > - goto out; > > - > > if (iocb->ki_flags & IOCB_DIRECT) { > > loff_t pos, endbyte; > >=20 > > @@ -3868,6 +3864,19 @@ ssize_t __generic_file_write_iter(struct kiocb *= iocb, struct iov_iter *from) > > iocb->ki_pos +=3D written; > > } > > out: > > + if (written > 0) { > > + err =3D file_update_time(file); > > + /* > > + * There isn't much we can do at this point if updating= the > > + * times fails after a successful write. The times and = i_version > > + * should still be updated in the inode, and it should = still be > > + * marked dirty, so hopefully the next inode update wil= l catch it. > > + * Log a warning once so we have a record that somethin= g untoward > > + * has occurred. > > + */ > > + WARN_ONCE(err, "Failed to update m/ctime after write: %= ld\n", err); >=20 > pr_warn_once() please - this is not a programming assertion. >=20 ACK. I'll change that. --=20 Jeff Layton