Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp7275137rwd; Tue, 6 Jun 2023 08:31:05 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4uzG3YuqEyMINCDcta0L13/K0+AZnhUvOVwIlcfrlwL9kiL2wmtOcEeu7vvGvB2kknZp2h X-Received: by 2002:a05:6808:193:b0:39a:ab03:dced with SMTP id w19-20020a056808019300b0039aab03dcedmr2577614oic.56.1686065465472; Tue, 06 Jun 2023 08:31:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686065465; cv=none; d=google.com; s=arc-20160816; b=trykd79fMYwY0E0cTYkWX6wqWN55y/6uz6eKQkZrFHEwe81zbk1SH7lC6HJMYSMiT4 s1BrvNoHiL8905z0oScdq6fMV9nW22O9xbvChtcIcu/ms9EI2uanDdtc5cHBh/j2N3Ky u3n6/21CAmJ1TVMW4Uwb3imRprEqL0WDZEtN76AF7ixkT3KU96Zz4amNYoNRDYI1uUbb Y2qTNY04m1z3Ez/8su1DHLQJEkeMVFAK9FOW2L5YGXhExkQLhCQLAzoGybFCSfDPN+aM ZcFkEOSAf8YsUFS1x1D1RgmW5ScJeEiOtwI4qsjGoMkDK4qljDnRHbd/XCvX7eKSZuYP eUIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Wqe+EMfTqwgCzibqmrP4x167w3kg6AGKeWhHsuYgmqo=; b=h8WryvXoaGIg/I5UmAFy9yph05XBH1+KK57vm+osTnvl3Yt8rnQohbiq1hzat0xwsq m2L1VjO8uOI4PpisIJkAKkxfit5TZVo5d2UvF+LSxfWZra01jnP5ZZRpF9eolfyJqCOq yP6tb/JrAr/JpXpQ2aT4rNxRnTdr1Bh+EeC02RaeRLC7P3YeDnt83WIIG7/VtzXJmvPH e0B9qN/NsWMZQ28SFPCJtxAfz1XE6sTzlVY6GATXW5jlUHQkFL4tDTAjJo5W7wa4IPHC iKkfF7p9+JYB82fmqAeJ4SCokfwn2y24ZiDJRQFnLpXKVqjUBYcloo7X8ExiZI+QwQCP hDhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=c89hhqFN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q15-20020a638c4f000000b0050bf22172d3si7266090pgn.490.2023.06.06.08.30.51; Tue, 06 Jun 2023 08:31:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=c89hhqFN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237683AbjFFPAO (ORCPT + 99 others); Tue, 6 Jun 2023 11:00:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238773AbjFFPAF (ORCPT ); Tue, 6 Jun 2023 11:00:05 -0400 Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DCDE1998; Tue, 6 Jun 2023 07:59:41 -0700 (PDT) Received: by mail-pg1-x529.google.com with SMTP id 41be03b00d2f7-543c6a2aa07so1127757a12.0; Tue, 06 Jun 2023 07:59:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686063580; x=1688655580; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Wqe+EMfTqwgCzibqmrP4x167w3kg6AGKeWhHsuYgmqo=; b=c89hhqFN81vIg5IN2rpZKauKAqHYf+KMSB7oh1dAt0i49A3ucUMkyTzxUCrj2WWs+g /LU3Xr/oSOgiCs+CzGUU8pWhEpDacm5mxkJpE/5clGHyANc/lNkJmfLq/aJVfdcjT8UL eBL06dgnIyAp6shIXHNvUuno3W6Sk3jssjkHEmrtsXS0ehGnvMz9fE+mUbSyT7V/DtJ8 YEYyGAmkm7jdPB+ETVRV6oGB43zryHGlxwY1MiyEjreIcHwBO4xpm30cOGoi+5sxuVuw QGY0ItDFmMr8lVYp12VI5cDd1e62KrSFDl9FQbjPsohnlfnzU1oGnGODof+74Twn0KQQ vCow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686063580; x=1688655580; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Wqe+EMfTqwgCzibqmrP4x167w3kg6AGKeWhHsuYgmqo=; b=M9kchvmXuRXWO1Gxhd25eSlmsJXMaMrjGVZ+OetUuvuG48I/yum5fa3dNUL5GuIc7d IiDZBu2zwMNrcXGE39FjCqvq4+ljf1rTav8g+uid4woRfooEu4aGWuROT4ypKRIbygvB yVrrqpT74A+csxzdKl6X0xx1nNj8KWUNVUop/JCnzS/NILJP/SUZG9OqAojnQ+01pRrg jxyZvJAouoARrfLkoc4d2EPJ2aAeZpF4pqBqhICKiRz/qNwgtAJGPImgighx6AvHISzI jUye4hD4q5iHvVfdl1Jj6hZSrtOE1mYZUD9LuilTPqT0ZzfbByb5OM4XEIZNUmWtoarl A8gg== X-Gm-Message-State: AC+VfDyrHqMrqGKRIgM2qCyzU4+338HeZ9FSiaSaNODBtwiJ7gHa9XoY dZxZU+VXZ62f1dJZ2zSM9N2qng/EvGbONO+WStk= X-Received: by 2002:a05:6a20:3d83:b0:117:a2f3:3c93 with SMTP id s3-20020a056a203d8300b00117a2f33c93mr14207pzi.2.1686063579863; Tue, 06 Jun 2023 07:59:39 -0700 (PDT) MIME-Version: 1.0 References: <20230524153311.3625329-1-dhowells@redhat.com> <20230524153311.3625329-5-dhowells@redhat.com> <1841913.1686039913@warthog.procyon.org.uk> In-Reply-To: <1841913.1686039913@warthog.procyon.org.uk> From: Alexander Duyck Date: Tue, 6 Jun 2023 07:59:02 -0700 Message-ID: Subject: Re: [PATCH net-next 04/12] mm: Make the page_frag_cache allocator use multipage folios To: David Howells Cc: Yunsheng Lin , netdev@vger.kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 6, 2023 at 1:25=E2=80=AFAM David Howells = wrote: > > Alexander H Duyck wrote: > > > Also I have some concerns about going from page to folio as it seems > > like the folio_alloc setups the transparent hugepage destructor instead > > of using the compound page destructor. I would think that would slow > > down most users as it looks like there is a spinlock that is taken in > > the hugepage destructor that isn't there in the compound page > > destructor. > > Note that this code is going to have to move to folios[*] at some point. > "Old-style" compound pages are going to go away, I believe. Matthew Wilc= ox > and the mm folks are on a drive towards simplifying memory management, > formalising chunks larger than a single page - with the ultimate aim of > reducing the page struct to a single, typed pointer. I'm not against making the move, but as others have pointed out this is getting into unrelated things. One of those being the fact that to transition to using folios we don't need to get rid of the use of the virtual address. The idea behind using the virtual address here is that we can avoid a bunch of address translation overhead since we only need to use the folio if we are going to allocate, retire, or recycle a page/folio. If we are using an order 3 page that shouldn't be very often. > So, take, for example, a folio: As I understand it, this will no longer > overlay struct page, but rather will become a single, dynamically-allocat= ed > struct that covers a pow-of-2 number of pages. A contiguous subset of pa= ge > structs will point at it. > > However, rather than using a folio, we could define a "page fragment" mem= ory > type. Rather than having all the flags and fields to be found in struct > folio, it could have just the set to be found in page_frag_cache. I don't think we need a new memory type. For the most part the page fragment code is really more a subset of something like a __get_free_pages where the requester provides the size, is just given a virtual address, and we shouldn't need to be allocating a new page as often as ideally the allocations are 2K or less in size. Also one thing I would want to avoid is adding complexity to the freeing path. The general idea with page frags is that they are meant to be lightweight in terms of freeing as well. So just as they are similar to __get_free_pages in terms of allocation the freeing is meant to be similar to free_pages. > David > > [*] It will be possible to have some other type than "folio". See "struc= t > slab" in mm/slab.h for example. struct slab corresponds to a set of page= s > and, in the future, a number of struct pages will point at it. I want to avoid getting anywhere near the complexity of a slab allocator. The whole point of this was to keep it simple so that drivers could use it and get decent performance. When I had implemented it in the Intel drivers back in the day this approach was essentially just a reference count/page offset hack that allowed us to split a page in 2 and use the pages as a sort of mobius strip within the ring buffer.