Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2367031ybt; Sun, 21 Jun 2020 18:07:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyeCwa3awLB2/PVorAPMZcW/9AbJ0AoLb5LITshkrrmOXXKu9C0a8CqTC1gYmq9azdWBLEn X-Received: by 2002:a17:906:fcb7:: with SMTP id qw23mr12798329ejb.229.1592788026338; Sun, 21 Jun 2020 18:07:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592788026; cv=none; d=google.com; s=arc-20160816; b=sgVdsv3c0zBoOZXoC2kSf5J9rKtnCLS7ALSIBdBz+dzbk7FOHfUFtCSX+JaJ8j0aF6 ztPfaXFsk/XC1SrdBkomdfvNtMos+k2ud2igCzgMZ1zTnJidfWY/meIzWOgAoVcm/TOQ 6Xo/BAk+PVQwswgDeQtLALh7sxeffAfhQh+vZmOok53ra62bfY9NticSg6Omt2fgv1bu uXQKOkiN8MCqUAGQVGgLO8vQh2mJ9iins4pEJUhdG7dpQ67jtgYJhidfcxDFmb6ncVR7 wsgXa+tZ4diEYtZzOY+Jv3fpyFH56NYTEZ4V+92CpgqUYk4JlJjHj7Gu2+71PJ0S0R2Z IHiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=2CRBGGMvXCt8IP/o2a3GUnas5pFqw2ECtSS1aYFMFsY=; b=WlQJPv93i8cdYArmkfXoGkkZ1W1FKJ4rq6kHmpqTOnBEooZsvIjFVi3/l71/5/OYzr gdq0EyYKxg2Przpfg7PGvezTYiNvCHtqMb/umOXFsvgkqiE2T/hVbdbZfPeyx3iX/5Gz JHtoRyb17IEMErkVo1N8O4lgE6r6DwK5pq/CwXmTrY7duxAsvyCb9hNJZnapJgcosXdo Iz4Rzlzat8zozUdCvQLy7b9XnXJvINtx/b1py7JthEZjQ9CXkp4B0wroW2zBlwZ/UsVK Ru8DWx98O3Hn+stpk50nG/fUesTU7p7yMz6O/+XaV4BGPSEWnLcipbX1KLFvesgNdjx1 +qHQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w1si3013463edu.491.2020.06.21.18.06.43; Sun, 21 Jun 2020 18:07:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730960AbgFVBCp (ORCPT + 99 others); Sun, 21 Jun 2020 21:02:45 -0400 Received: from mail110.syd.optusnet.com.au ([211.29.132.97]:36133 "EHLO mail110.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726581AbgFVBCo (ORCPT ); Sun, 21 Jun 2020 21:02:44 -0400 Received: from dread.disaster.area (pa49-180-124-177.pa.nsw.optusnet.com.au [49.180.124.177]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 42A7310DA4D; Mon, 22 Jun 2020 11:02:35 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1jnAr4-0001bL-7m; Mon, 22 Jun 2020 11:02:34 +1000 Date: Mon, 22 Jun 2020 11:02:34 +1000 From: Dave Chinner To: Matthew Wilcox Cc: Amir Goldstein , linux-fsdevel , Linux MM , Andreas Gruenbacher , linux-kernel Subject: Re: [RFC] Bypass filesystems for reading cached pages Message-ID: <20200622010234.GD2040@dread.disaster.area> References: <20200619155036.GZ8681@bombadil.infradead.org> <20200620191521.GG8681@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200620191521.GG8681@bombadil.infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=X6os11be c=1 sm=1 tr=0 a=k3aV/LVJup6ZGWgigO6cSA==:117 a=k3aV/LVJup6ZGWgigO6cSA==:17 a=kj9zAlcOel0A:10 a=nTHF0DUjJn0A:10 a=JfrnYn6hAAAA:8 a=VwQbUJbxAAAA:8 a=pGLkceISAAAA:8 a=uZvujYp8AAAA:8 a=7-415B0cAAAA:8 a=XmBkdv6aKiYQRZ68EV0A:9 a=CjuIK1q_8ugA:10 a=MH3prGP_eOIA:10 a=1CNFftbPRP8L7MoqJWF3:22 a=AjGcO6oz07-iQ99wixmX:22 a=SLzB8X_8jTLwj6mN0q5r:22 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 20, 2020 at 12:15:21PM -0700, Matthew Wilcox wrote: > On Sat, Jun 20, 2020 at 09:19:37AM +0300, Amir Goldstein wrote: > > On Fri, Jun 19, 2020 at 6:52 PM Matthew Wilcox wrote: > > > This patch lifts the IOCB_CACHED idea expressed by Andreas to the VFS. > > > The advantage of this patch is that we can avoid taking any filesystem > > > lock, as long as the pages being accessed are in the cache (and we don't > > > need to readahead any pages into the cache). We also avoid an indirect > > > function call in these cases. > > > > XFS is taking i_rwsem lock in read_iter() for a surprising reason: > > https://lore.kernel.org/linux-xfs/CAOQ4uxjpqDQP2AKA8Hrt4jDC65cTo4QdYDOKFE-C3cLxBBa6pQ@mail.gmail.com/ > > In that post I claim that ocfs2 and cifs also do some work in read_iter(). > > I didn't go back to check what, but it sounds like cache coherence among > > nodes. > > That's out of date. Here's POSIX-2017: > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html > > "I/O is intended to be atomic to ordinary files and pipes and > FIFOs. Atomic means that all the bytes from a single operation that > started out together end up together, without interleaving from other > I/O operations. It is a known attribute of terminals that this is not > honored, and terminals are explicitly (and implicitly permanently) > excepted, making the behavior unspecified. The behavior for other > device types is also left unspecified, but the wording is intended to > imply that future standards might choose to specify atomicity (or not)." > > That _doesn't_ say "a read cannot observe a write in progress". It says > "Two writes cannot interleave". Indeed, further down in that section, it says: Nope, it says "... without interleaving from other I/O operations". That means read() needs to be atomic w.r.t truncate, hole punching, extent zeroing, etc, not just other write() syscalls. Really, though, I'm not going to get drawn into a language lawyering argument here. We've discussed this before, and it's pretty clear the language supports both arguments in one way or another. And that means we are not going to change behaviour that XFS has provided for 27 years now. Last time this came up, I said: "XFS was designed with the intent that buffered writes are atomic w.r.t. to all other file accesses." Christoph said: "Downgrading these long standing guarantees is simply not an option" Darrick: "I don't like the idea of adding a O_BROKENLOCKINGPONIES flag" Nothing has changed since this was last discussed. Well, except for the fact that since then I've seen the source code to some 20+ year old enterprise applications that have been ported to Linux and that has made me even more certain that we need to maintain XFS's existing behaviour.... Cheers, Dave. -- Dave Chinner david@fromorbit.com