Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp1206322pxb; Fri, 27 Aug 2021 03:54:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxM6A2nOd03dEZVz5b7mLuT3/qdGvp7qLguPpyKiZB5kcwK/EpLnW/pIlxl5ci0zeVzqTH+ X-Received: by 2002:a50:cc08:: with SMTP id m8mr8907140edi.60.1630061648789; Fri, 27 Aug 2021 03:54:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630061648; cv=none; d=google.com; s=arc-20160816; b=api1J7UGZawIiQrfMmbjIryMkhbB0hpO3BEtLuMNDRg3AEXI5H5//EPoa2GB4hov8I eaUT3xLtoEBeBJNlG6vMQC1QFBkqVrOdJ8tRe4mgnlRVBskvocHmgvF6H04ISjO9ZS2G BPC5GiAmSAmmk5KkcCuBjXbvQcj831GrFt3KxhVJFWHI521rKB2YPIaTWMYPBr6zCTsH rOamBztBWBxN15sahGXxnjr3r+sSUMUjAZzS+SOi39PsGXAZUYmGTaxp0szzs8nhXl1c c/1QiXjT6uxcx+ppJ7VwAFth3xMsc1yzSkZ1HAnFyhQGfH9BVXb05OPzJx02Kxd2zLpa bLPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:content-transfer-encoding :content-id:mime-version:subject:cc:to:references:in-reply-to:from :organization:dkim-signature; bh=VTjQSWWlrkaygcuBNYQesIo6nKr0wj832LS/mOrVz+A=; b=T5tCkiDnn8LuccHJSEzHgrCenEd2+I5pcWRT63LMT1ys4nrY/k/3EwFQprKszInXNp 4dMLrkNojX7NtpDvnJgvNeWALTNqZoQoCR3SQGMzmL78uYdaFycu6BZZ1d/YHv6X+fZ5 /yXvxJZPVOYiEdHPF46JZ5uTbMGYCb4khnshzwNwcMRj1sFJJZd6Z+KKB5qHaD6pivjb AjbWE4rq0rhpIg5m04d7qbGiV92zAyXQQvk7+M+6O/E51comnP7v6pjEMCvS1nJCYc5P yJx5B+f4z72+CvGnTKaxdzhyxaGQ69WHxGlIma6KFZ8HkLIimqyKIHseJXLHyu+WJpIE +aRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MrreHlxd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s10si6911953edd.150.2021.08.27.03.53.44; Fri, 27 Aug 2021 03:54:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MrreHlxd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244979AbhH0KuI (ORCPT + 99 others); Fri, 27 Aug 2021 06:50:08 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:51584 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244814AbhH0Kt5 (ORCPT ); Fri, 27 Aug 2021 06:49:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1630061348; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VTjQSWWlrkaygcuBNYQesIo6nKr0wj832LS/mOrVz+A=; b=MrreHlxd9ddBqPtLVQIxUA4MzmNu7TrSgV8Zx74lEgA+hAfisyvm0eQsOi3kYt2pSwCUPS FnmN6L4JidxJBH5x1idMKk4Xvi8ME9f+GVDXbL5HGub4uQuqweT+hNZN5Dqn6A+7Jm1DOq DTSdV8QejfDHnnvZuUqFjPv85BVTuJk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-579-8exCeTxaPsGbTG1FG203PA-1; Fri, 27 Aug 2021 06:49:07 -0400 X-MC-Unique: 8exCeTxaPsGbTG1FG203PA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7187B1082922; Fri, 27 Aug 2021 10:49:05 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.36]) by smtp.corp.redhat.com (Postfix) with ESMTP id 63CCC5D9DD; Fri, 27 Aug 2021 10:49:03 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: References: <2101397.1629968286@warthog.procyon.org.uk> To: Johannes Weiner Cc: dhowells@redhat.com, Matthew Wilcox , Linus Torvalds , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: [GIT PULL] Memory folios for v5.15 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2476940.1630061342.1@warthog.procyon.org.uk> Content-Transfer-Encoding: quoted-printable Date: Fri, 27 Aug 2021 11:49:02 +0100 Message-ID: <2476941.1630061342@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Johannes Weiner wrote: > = > On Thu, Aug 26, 2021 at 09:58:06AM +0100, David Howells wrote: > > One thing I like about Willy's folio concept is that, as long as every= one uses > > the proper accessor functions and macros, we can mostly ignore the fac= t that > > they're 2^N sized/aligned and they're composed of exact multiples of p= ages. > > What really matters are the correspondences between folio size/alignme= nt and > > medium/IO size/alignment, so you could look on the folio as being a to= ol to > > disconnect the filesystem from the concept of pages. > > > > We could, in the future, in theory, allow the internal implementation = of a > > folio to shift from being a page array to being a kmalloc'd page list = or > > allow higher order units to be mixed in. The main thing we have to st= op > > people from doing is directly accessing the members of the struct. > = > In the current state of the folio patches, I agree with you. But > conceptually, folios are not disconnecting from the page beyond > PAGE_SIZE -> PAGE_SIZE * (1 << folio_order()). This is why I asked > what the intended endgame is. And I wonder if there is a bit of an > alignment issue between FS and MM people about the exact nature and > identity of this data structure. Possibly. I would guess there are a couple of reasons that on the MM side particularly it's dealt with as a strict array of pages: efficiency and mmap-related faults. It's most efficient to treat it as an array of contiguous pages as that removes the need for indirection. From the pov of mmap, faults happen along the lines of h/w page divisions. =46rom an FS point of view, at minimum, I just need to know the state of t= he folio. If a page fault dirties several folios, that's fine. If I can fin= d out that a folio was partially dirtied, that's useful, but not critical. = I am a bit concerned about higher-order folios causing huge writes - but I do realise that we might want to improve TLB/PT efficiency by using larger entries and that that comes with consequences for mmapped writes. > At the current stage of conversion, folio is a more clearly delineated > API of what can be safely used from the FS for the interaction with > the page cache and memory management. And it looks still flexible to > make all sorts of changes, including how it's backed by > memory. Compared with the page, where parts of the API are for the FS, > but there are tons of members, functions, constants, and restrictions > due to the page's role inside MM core code. Things you shouldn't be > using, things you shouldn't be assuming from the fs side, but it's > hard to tell which is which, because struct page is a lot of things. I definitely like the API cleanup that folios offer. However, I do think Willy needs to better document the differences between some of the functio= ns, or at least when/where they should be used - folio_mapping() and folio_file_mapping() being examples of this. > However, the MM narrative for folios is that they're an abstraction > for regular vs compound pages. This is rather generic. Conceptually, > it applies very broadly and deeply to MM core code: anonymous memory > handling, reclaim, swapping, even the slab allocator uses them. If we > follow through on this concept from the MM side - and that seems to be > the plan - it's inevitable that the folio API will grow more > MM-internal members, methods, as well as restrictions again in the > process. Except for the tail page bits, I don't see too much in struct > page that would not conceptually fit into this version of the folio. > = > The cache_entry idea is really just to codify and retain that > domain-specific minimalism and clarity from the filesystem side. As > well as the flexibility around how backing memory is implemented, > which I think could come in handy soon, but isn't the sole reason. I can see while you might want the clarification. However, at this point,= can you live with this set of folio patches? Can you live with the name? Cou= ld you live with it if "folio" was changed to something else? I would really like to see this patchset get in. It's hanging over change= s I and others want to make that will conflict with Willy's changes. If we ca= n get the basic API of folios in now, that's means I can make my changes on = top of them. Thanks, David