Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
MIME-Version: 1.0
References: <20230524153311.3625329-1-dhowells@redhat.com> <20230524153311.3625329-5-dhowells@redhat.com>
 <a819dd80-54cc-695f-f142-e3d42ce815a7@huawei.com> <f7919c2c9e1cb6218a0b0f55ddaa9a34f7d2b9a7.camel@gmail.com>
 <1841913.1686039913@warthog.procyon.org.uk>
In-Reply-To: <1841913.1686039913@warthog.procyon.org.uk>
From:   Alexander Duyck <alexander.duyck@gmail.com>
Date:   Tue, 6 Jun 2023 07:59:02 -0700
Message-ID: <CAKgT0Ud=ZWVnpwqbLqGovjPM2VR_V1-Ak=meveqnG01tGWTTeA@mail.gmail.com>
Subject: Re: [PATCH net-next 04/12] mm: Make the page_frag_cache allocator use
 multipage folios
To:     David Howells <dhowells@redhat.com>
Cc:     Yunsheng Lin <linyunsheng@huawei.com>, netdev@vger.kernel.org,
        "David S. Miller" <davem@davemloft.net>,
        Eric Dumazet <edumazet@google.com>,
        Jakub Kicinski <kuba@kernel.org>,
        Paolo Abeni <pabeni@redhat.com>,
        Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
        David Ahern <dsahern@kernel.org>,
        Matthew Wilcox <willy@infradead.org>,
        Jens Axboe <axboe@kernel.dk>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org,
        Jeroen de Borst <jeroendb@google.com>,
        Catherine Sullivan <csully@google.com>,
        Shailend Chand <shailend@google.com>,
        Felix Fietkau <nbd@nbd.name>, John Crispin <john@phrozen.org>,
        Sean Wang <sean.wang@mediatek.com>,
        Mark Lee <Mark-MC.Lee@mediatek.com>,
        Lorenzo Bianconi <lorenzo@kernel.org>,
        Matthias Brugger <matthias.bgg@gmail.com>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@collabora.com>,
        Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@fb.com>,
        Christoph Hellwig <hch@lst.de>,
        Sagi Grimberg <sagi@grimberg.me>,
        Chaitanya Kulkarni <kch@nvidia.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        linux-arm-kernel@lists.infradead.org,
        linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Precedence: bulk

On Tue, Jun 6, 2023 at 1:25=E2=80=AFAM David Howells <dhowells@redhat.com> =
wrote:
>
> Alexander H Duyck <alexander.duyck@gmail.com> wrote:
>
> > Also I have some concerns about going from page to folio as it seems
> > like the folio_alloc setups the transparent hugepage destructor instead
> > of using the compound page destructor. I would think that would slow
> > down most users as it looks like there is a spinlock that is taken in
> > the hugepage destructor that isn't there in the compound page
> > destructor.
>
> Note that this code is going to have to move to folios[*] at some point.
> "Old-style" compound pages are going to go away, I believe.  Matthew Wilc=
ox
> and the mm folks are on a drive towards simplifying memory management,
> formalising chunks larger than a single page - with the ultimate aim of
> reducing the page struct to a single, typed pointer.

I'm not against making the move, but as others have pointed out this
is getting into unrelated things. One of those being the fact that to
transition to using folios we don't need to get rid of the use of the
virtual address. The idea behind using the virtual address here is
that we can avoid a bunch of address translation overhead since we
only need to use the folio if we are going to allocate, retire, or
recycle a page/folio. If we are using an order 3 page that shouldn't
be very often.

> So, take, for example, a folio: As I understand it, this will no longer
> overlay struct page, but rather will become a single, dynamically-allocat=
ed
> struct that covers a pow-of-2 number of pages.  A contiguous subset of pa=
ge
> structs will point at it.
>
> However, rather than using a folio, we could define a "page fragment" mem=
ory
> type.  Rather than having all the flags and fields to be found in struct
> folio, it could have just the set to be found in page_frag_cache.

I don't think we need a new memory type. For the most part the page
fragment code is really more a subset of something like a
__get_free_pages where the requester provides the size, is just given
a virtual address, and we shouldn't need to be allocating a new page
as often as ideally the allocations are 2K or less in size.

Also one thing I would want to avoid is adding complexity to the
freeing path. The general idea with page frags is that they are meant
to be lightweight in terms of freeing as well. So just as they are
similar to __get_free_pages in terms of allocation the freeing is
meant to be similar to free_pages.

> David
>
> [*] It will be possible to have some other type than "folio".  See "struc=
t
> slab" in mm/slab.h for example.  struct slab corresponds to a set of page=
s
> and, in the future, a number of struct pages will point at it.

I want to avoid getting anywhere near the complexity of a slab
allocator. The whole point of this was to keep it simple so that
drivers could use it and get decent performance. When I had
implemented it in the Intel drivers back in the day this approach was
essentially just a reference count/page offset hack that allowed us to
split a page in 2 and use the pages as a sort of mobius strip within
the ring buffer.