Received: by 2002:a05:6520:4d:b0:139:a872:a4c9 with SMTP id i13csp3478914lkm; Tue, 21 Sep 2021 17:09:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzmKQ8idFemRoM+b8sY5v+5lHdb5QIeb9JQ8Kwv4g3kG7jlNuDCmGv4Zu7dkUBRnMxQkFuw X-Received: by 2002:aa7:da91:: with SMTP id q17mr37855859eds.276.1632269352143; Tue, 21 Sep 2021 17:09:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632269352; cv=none; d=google.com; s=arc-20160816; b=kTpfQiA+btHouB8eOUjuVrNlF82s8aouzAHfaRQ55BrCg4/kUdb1ApfHWNqNEsivYs 7PNoKw6wjmxig0BdOFv+xEB8cwqIfok8jbe80vcGOJznSU6KOSHAF8su50EilkHuhHDR DLLpC4L1xd0n871ieZJQYrIsZfMFK3PvFsYyWyAbgkEN0bGC+8CWgbkgZwLo2UJcjVqe cTQqYp3gfI+AEH5yTJULrnY+MnkckUFrl4vl9feW33cWHdnFzPTqHBq/4n78XDxDgFOJ 248m4/CRtS5GWrEYvRMuHDRZrmgh2+6NjyHBkGm+vU559/qJr3bFD+FApEO86pYIwkzL 0kEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=UVeSCGRKIiyvWycmoGx6GlZVf+OBysJsxMROIYkTp+E=; b=tH2JowwtWj8QNnbCGI/OoF89v+nxOlxdFqjFzpk+8RF5E7NpvapXkCS6OOZpaM/u3z uDhDChGQP/NsAJuKByrksP6Klp3PJQb/pZXGWN6RTro5XkcK/EYksd7/XyHIFHSa6Vpg hOFtTrTr/d2bOjgGtVPrFgw/onaLdWf+Y1MNHQ9GOBhXt9rWZ/i/2PGigYpE6lY43YOC ZS21YyQsW9/ca0NlMgZRGfQtKcHx8PLPULAcD8J46wDEWkvCORKtCMJcksg3omUcBep/ J3TN67qhDB1sRJMwpuKECA2JofrIpbs3ckNV4ClT78QcgLjAAQWf1S7CuhTKVhMrveLT z6qA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=VoYhdhCl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c16si712246edj.456.2021.09.21.17.08.43; Tue, 21 Sep 2021 17:09:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=VoYhdhCl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235763AbhIUV7L (ORCPT + 99 others); Tue, 21 Sep 2021 17:59:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234138AbhIUV7D (ORCPT ); Tue, 21 Sep 2021 17:59:03 -0400 Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A0E4C061574 for ; Tue, 21 Sep 2021 14:57:33 -0700 (PDT) Received: by mail-qt1-x82e.google.com with SMTP id a13so754449qtw.10 for ; Tue, 21 Sep 2021 14:57:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=UVeSCGRKIiyvWycmoGx6GlZVf+OBysJsxMROIYkTp+E=; b=VoYhdhCluGoYbkdSs9ZviRrXOCFdtZ5H099jEvPGW5j298L+m7OYhjvZTXrWQEpzuH IqHC5TNqkf0Ig6ZGyOuZ4kHUFiv3leqg6iJTAxKfm8dnORvZKQZE/2gtybjrS2srK1Wt FLF9r5VMSDiIZy+Sh4FL0V9blVUOrfIjCIoJjsPyMIsGf53hSsbu9wxQQDsCJ1q0voF2 zrm1tT25pHefoQTyqIFKfFXdnuEZ3cGQHC/hJOeaT7pu3YklcjJn+eB7FzXRx/JMsPZc v82mao0R5OX4ZnE+11YHpJuLTpr2LjTcTC5fCet6NvV+ofdj53hv8sWr3fUwPTiRLnCU on8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=UVeSCGRKIiyvWycmoGx6GlZVf+OBysJsxMROIYkTp+E=; b=o4CcmLccLJBMzEOlWo+24AkHjcMzM2mHouK18VkbgcLGUR5WgYClB4ZCIr4TZTxn8+ P+k+MYamwqava/rEwwgwAd0/K79cQ0pnl5TH6oPEEoT629Q3K2QK0CNqpvG3e4Z1RJAb ddxziDkOBsfO0lO1tTLqA+QfFEaw9sgAbvmWAn2+VsYLlN3QgEQ5hbl2CqqubJgzBAOU azlA8aNOnOKU+buD47WVGdXspUWzAoXI72nn7ffLBFmyjhnhWeZ+yH5bp5bJm1ZMpi/M 0Gjsm76LOTmM6xKmA7rK4z59yacPU5NQhTI2WercdnXOexH/tTxtI9DuHqy4Ous4z+Em sB7g== X-Gm-Message-State: AOAM533uPiTvn8oGt+LnmrPFeWsDjXS6qctC4Y88drieufh1mzFX6h6Q xiJanI+AIoTX3m1gQB7Qs09KGQ== X-Received: by 2002:ac8:5cd0:: with SMTP id s16mr20150301qta.378.1632261452739; Tue, 21 Sep 2021 14:57:32 -0700 (PDT) Received: from localhost (cpe-98-15-154-102.hvc.res.rr.com. [98.15.154.102]) by smtp.gmail.com with ESMTPSA id k17sm176027qtx.67.2021.09.21.14.57.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Sep 2021 14:57:32 -0700 (PDT) Date: Tue, 21 Sep 2021 17:59:33 -0400 From: Johannes Weiner To: Matthew Wilcox Cc: Kent Overstreet , Linus Torvalds , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , "Darrick J. Wong" , Christoph Hellwig , David Howells Subject: Re: Folio discussion recap Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 21, 2021 at 09:38:54PM +0100, Matthew Wilcox wrote: > On Tue, Sep 21, 2021 at 03:47:29PM -0400, Johannes Weiner wrote: > > This discussion is now about whether folio are suitable for anon pages > > as well. I'd like to reiterate that regardless of the outcome of this > > discussion I think we should probably move ahead with the page cache > > bits, since people are specifically blocked on those and there is no > > dependency on the anon stuff, as the conversion is incremental. > > So you withdraw your NAK for the 5.15 pull request which is now four > weeks old and has utterly missed the merge window? Once you drop the bits that convert shared anon and file infrastructure, yes. Because we haven't discussed yet, nor agree on, that folio are the way forward for anon pages. > > and so the justification for replacing page with folio *below* those > > entry points to address tailpage confusion becomes nil: there is no > > confusion. Move the anon bits to anon_page and leave the shared bits > > in page. That's 912 lines of swap_state.c we could mostly leave alone. > > Your argument seems to be based on "minimising churn". Which is certainly > a goal that one could have, but I think in this case is actually harmful. > There are hundreds, maybe thousands, of functions throughout the kernel > (certainly throughout filesystems) which assume that a struct page is > PAGE_SIZE bytes. Yes, every single one of them is buggy to assume that, > but tracking them all down is a never-ending task as new ones will be > added as fast as they can be removed. What does that have to do with anon pages? > > The same is true for the LRU code in swap.c. Conceptually, already no > > tailpages *should* make it onto the LRU. Once the high-level page > > instantiation functions - add_to_page_cache_lru, do_anonymous_page - > > have type safety, you really do not need to worry about tail pages > > deep in the LRU code. 1155 more lines of swap.c. > > It's actually impossible in practice as well as conceptually. The list > LRU is in the union with compound_head, so you cannot put a tail page > onto the LRU. But yet we call compound_head() on every one of them > multiple times because our current type system does not allow us to > express "this is not a tail page". No, because we haven't identified *who actually needs* these calls and move them up and out of the low-level helpers. It was a mistake to add them there, yes. But they were added recently for rather few callers. And we've had people send patches already to move them where they are actually needed. Of course converting *absolutely everybody else* to not-tailpage instead will also fix the problem... I just don't agree that this is an appropriate response to the issue. Asking again: who conceptually deals with tail pages in MM? LRU and reclaim don't. The page cache doesn't. Compaction doesn't. Migration doesn't. All these data structures and operations are structured around headpages, because that's the logical unit they operate on. The notable exception, of course, are the page tables because they map the pfns of tail pages. But is that it? Does it come down to page table walkers encountering pte-mapped tailpages? And needing compound_head() before calling mark_page_accessed() or set_page_dirty()? We couldn't fix vm_normal_page() to handle this? And switch khugepaged to a new vm_raw_page() or whatever? It should be possible to answer this question as part of the case for converting tens of thousands of lines of code to folio.