Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp35323pxb; Fri, 17 Sep 2021 17:59:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxnnaZcklNR8qj+2WAWPoXO8MI3tQWKulahr2+FLJB8JyBYUl+XEaqRw3aPqUgMrCTVD4Pc X-Received: by 2002:a17:906:1c99:: with SMTP id g25mr15066039ejh.521.1631926751330; Fri, 17 Sep 2021 17:59:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631926751; cv=none; d=google.com; s=arc-20160816; b=rDr6UGz5TvzQZw2+BxYuExgPAn9XhpiliNGb9PXR+aJA99HRaD2hka8QdUQHzZ8lnt wuI27w4k/mLejTsLxnocNCd0x72mOrkuznbClzBhRIxMcgKbnKUuUrzZChbo88TXN/8T pujVSsVoO/QrnjW64b3mAuJC5VpaX/Bi9kV8QQ2/Z5+6Ea2omA8W5p5BfBbau1TRjCHw hYW4gn4XrmDvmyR96DZugZ+BS/EdhZpsUFcxbYG8pY6QcdzR5+QPYc+RGQFcwVc8VFsm e3EEX/hmDQ4NXFN05jN7D8yw6ldULTrcSkZ5vmhr3RkfKV9XYdxrJU6aoaLAM+K6Ajf4 wAWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=2HsUkQ8fOhjXi15MPOztE8TxIV5X+XjzP3rW1H5zjpE=; b=HTP5/XFoLIb8MjypxV5B+56bpIyAz7dEl7Gibxrf/dSzqhA+sqnk6fEerlikbl6+4V grtnd4VEkF1qlnLC0/RLMNy41rm64PDvzsCX3cpp6scx1wS4ETRaj3TPrmSS+rneeS1b vLBH8h0O4WN8YAsntJ8VPvOuv9wYgEzDD3UWwBev6zGn0YELWkLK8Vu6P78mUJgttghk IPcGSn1cThLvzFLTzO8hjSm5UZKWu9CZy6lXgTQ7/VmLN0cs4icnp5D9yxhFQho+VZIA q6gjPMoRlbQz/ERDkuNayKre46lm+zZsYKYbbf4RVCnyT2pbW4ARONhOiCSg/QAg03vt UoJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=fAhePr+6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g16si9087287ejm.150.2021.09.17.17.58.47; Fri, 17 Sep 2021 17:59:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=fAhePr+6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243994AbhIQQbL (ORCPT + 99 others); Fri, 17 Sep 2021 12:31:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243992AbhIQQbD (ORCPT ); Fri, 17 Sep 2021 12:31:03 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 032CDC061766 for ; Fri, 17 Sep 2021 09:29:40 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id 73so14121094qki.4 for ; Fri, 17 Sep 2021 09:29:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=2HsUkQ8fOhjXi15MPOztE8TxIV5X+XjzP3rW1H5zjpE=; b=fAhePr+69OVHmQY9xBibvJ55dFKWkjATxQFOb1dSZ4d1ppoo1W8a5gfDnc+V09uQ3L sqQtKrefilO7h1KhAKOJDJ3qbsSiGVDAEk/ZUaeOx0GiiFH7m+V179i7cmMGAU267z48 CLmeRZfmsG20emvyvzXKokLaNqJ2Lh7SDXAahUfWvSR0svSH/+jKEovNftzE9YYPKJoN xZw2RKhE1fFetfJtvRegP/3m4eWGSCzrQ6gBPBCgvgZWiFA7ITWiDc1kJLSywrg85NsC tVOei5vByQSVEbivTPyRNcrRt6/WALKyQDqtf75vnimOfM11BFpZ2hEUlg0uiFI3fpu9 113Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2HsUkQ8fOhjXi15MPOztE8TxIV5X+XjzP3rW1H5zjpE=; b=WSomwcAhIQclerlTHWyarq99ZOJhVih9431cz4D6H+rI8tZl642WOcUVAt1QKOeAtg RE36gVivVErKCwqH8Il27/nrDH5OL2ouMocmGwLdpFpWiqCtBk0oJN3skx5972oUFeVS noV/F3UzT28kQyGlbOVls3awQRM+fkCwguydi+/Vjqml88uPca+RWuseDgaqf/DzZWqz nnFgSp72YCazWaEu6BYgChjXkGMtWfpn+3vRhwOnnI8nEutJe44RYR3JL0Kxls9sqOVg Sie25Ts1ODqYwS/ttXnXLqCVpKre4Yt5+n7uMa69znhVKistMgB0JPSANpbzGs9ti767 nGNA== X-Gm-Message-State: AOAM531Gb11cNEDIvutWjrodZc6rmS2q1Za9oMi/k7EmaIfcj3EAgVwo LAGEcPU7++tkP8gJIp5h7dSO+w== X-Received: by 2002:a05:620a:c05:: with SMTP id l5mr11460700qki.17.1631896179116; Fri, 17 Sep 2021 09:29:39 -0700 (PDT) Received: from localhost (cpe-98-15-154-102.hvc.res.rr.com. [98.15.154.102]) by smtp.gmail.com with ESMTPSA id j6sm4284123qtp.97.2021.09.17.09.29.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Sep 2021 09:29:38 -0700 (PDT) Date: Fri, 17 Sep 2021 12:31:36 -0400 From: Johannes Weiner To: Dave Chinner Cc: "Darrick J. Wong" , Kent Overstreet , Matthew Wilcox , Linus Torvalds , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , Christoph Hellwig , David Howells Subject: Re: Folio discussion recap Message-ID: References: <20210916025854.GE34899@magnolia> <20210917052440.GJ1756565@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210917052440.GJ1756565@dread.disaster.area> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 17, 2021 at 03:24:40PM +1000, Dave Chinner wrote: > On Thu, Sep 16, 2021 at 12:54:22PM -0400, Johannes Weiner wrote: > > I agree with what I think the filesystems want: instead of an untyped, > > variable-sized block of memory, I think we should have a typed page > > cache desciptor. > > I don't think that's what fs devs want at all. It's what you think > fs devs want. If you'd been listening to us the same way that Willy > has been for the past year, maybe you'd have a different opinion. I was going off of Darrick's remarks about non-pagecache uses, Kent's remarks Kent about simple and obvious core data structures, and yes your suggestion of "cache page". But I think you may have overinterpreted what I meant by cache descriptor: > Indeed, we don't actually need a new page cache abstraction. I didn't suggest to change what the folio currently already is for the page cache. I asked to keep anon pages out of it (and in the future potentially other random stuff that is using compound pages). It doesn't have any bearing on how it presents to you on the filesystem side, other than that it isn't as overloaded as struct page is with non-pagecache stuff. A full-on disconnect between the cache entry descriptor and the page is something that came up during speculation on how the MM will be able to effectively raise the page size and meet scalability requirements on modern hardware - and in that context I do appreciate you providing background information on the chunk cache, which will be valuable to inform *that* discussion. But it isn't what I suggested as the immediate action to unblock the folio merge. > The fact that so many fs developers are pushing *hard* for folios is > that it provides what we've been asking for individually over last > few years. I'm not sure filesystem people are pushing hard for non-pagecache stuff to be in the folio. > Willy has done a great job of working with the fs developers and > getting feedback at every step of the process, and you see that in > the amount of work that in progress that is already based on > folios. And that's great, but the folio is blocked on MM questions: 1. Is the folio a good descriptor for all uses of anon and file pages inside MM code way beyond the page cache layer YOU care about? 2. Are compound pages a scalable, future-proof allocation strategy? For some people the answers are yes, for others they are a no. For 1), the value proposition is to clean up the relatively recent head/tail page confusion. And though everybody agrees that there is value in that, it's a LOT of churn for what it does. Several people have pointed this out, and AFAICS this is the most common reason for people that have expressed doubt or hesitation over the patches. In an attempt to address this, I pointed out the cleanup opportunities that would open up by using separate anon and file folio types instead of one type for both. Nothing more. No intermediate thing, no chunk cache. Doesn't affect you. Just taking Willy's concept of type safety and applying it to file and anon instead of page vs compound page. - It wouldn't change anything for fs people from the current folio patchset (except maybe the name) - It would accomplish the head/tail page cleanup the same way, since just like a folio, a "file folio" could also never be a tail page - It would take the same solution folio prescribes to the compound page issue (explicit typing to get rid of useless checks, lookups and subtle bugs) and solve way more instances of this all over MM code, thereby hopefully boosting the value proposition and making *that part* of the patches a clearer win for the MM subsystem This is a question directed at MM people, not filesystem people. It doesn't pertain to you at all. And if MM people agree or want to keep discussing it, the relatively minor action item for the folio patch is the same: drop the partial anon-to-folio conversion bits inside MM code for now and move on. For 2), nobody knows the answer to this. Nobody. Anybody who claims to do so is full of sh*t. Maybe compound pages work out, maybe they don't. We can talk a million years about larger page sizes, how to handle internal fragmentation, the difficulties of implementing a chunk cache, but it's completely irrelevant because it's speculative. We know there are multiple page sizes supported by the hardware and the smallest supported one is no longer the most dominant one. We do not know for sure yet how the MM is internally going to lay out its type system so that the allocator, mmap, page reclaim etc. can be CPU efficient and the descriptors be memory efficient. Nobody's "grand plan" here is any more viable, tested or proven than anybody else's. My question for fs folks is simply this: as long as you can pass a folio to kmap and mmap and it knows what to do with it, is there any filesystem relevant requirement that the folio map to 1 or more literal "struct page", and that folio_page(), folio_nr_pages() etc be part of the public API? Or can we keep this translation layer private to MM code? And will page_folio() be required for anything beyond the transitional period away from pages? Can we move things not used outside of MM into mm/internal.h, mark the transitional bits of the public API as such, and move on? The unproductive vitriol, personal attacks and dismissiveness over relatively minor asks and RFCs from the subsystem that is the most impacted by this patchset is just nuts.