Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3117195rwd; Fri, 2 Jun 2023 22:20:33 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4jLWMT5lmZcs2/72RvL++EMcC1n00hJ7dAqX+yc8J8tGb7+p5faJ4tXFB0M7rxFeFRfqM2 X-Received: by 2002:a17:902:dac4:b0:1b0:62e2:1f84 with SMTP id q4-20020a170902dac400b001b062e21f84mr2290316plx.5.1685769632866; Fri, 02 Jun 2023 22:20:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685769632; cv=none; d=google.com; s=arc-20160816; b=jkgv1J6oQSydpmLrIKyPL0O+Z4fzKAUaTj+jXFIrpvZ2gYu2E7hG1MGi9BtcgN7skY rnDhsB8H9vI5tfyFDXEoEKrr9n80cgM/E5mNB+cjyAfr4XxlaWQ/Ut/qbRboLCj7k9SJ 4yLeLha703AJqffDmP1hIHoahvGE2Gu4+vuJ1XkgaRDqNrD/hvK5qvOhTPss6t34dago ezOuBOj+WvMKq68E2nCq0Y8P05P0zvf+rCskkqqoe4Y0HttamO6Wu/Ch31jgPUWpVnXS xfnzCOB5OjkkmSmVtCurkE6rcarCzjOolNtd+yo/BjOMAENHa0QFwRR6BOd3p1OxzMHU V95A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=FMCJesx+orcQ9tVlanHdUdQLiDUTLtWDjNYa8dP5/z8=; b=MYol/8owzjsmUED9EqZdoMXpw8RmPu02iLG9bLwSu/pQPvx14Ikb6EqWij9I+zF8hL s2ru9+Q6GxP5HVQ1wRuYD0JDym/rJcynw+F8J9UQJlUiQn/c5HLw/o5lc+mH3cN6qijx mu/Ap4yFxP0gKrji62C6ryM8eG/y6rvWQTetDX5aI+ZUPFZNpuuv5NXOYc+BsLLmf4yL NulXPHjG31SQASwyYtGkHiUNLkY7I5eKncFvasW3lPBiqP0XBzbSe2qrNLsdCqb9n9YD +VTUIbAMRrPs1DYoQLCxXBICipGE2sGqB/8t8FNgPbUTfWfHO+YgY7dgbuxOGk6EoDvs uB9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=pDo1iqW2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u7-20020a17090341c700b001b04bc1f28csi2052098ple.517.2023.06.02.22.20.06; Fri, 02 Jun 2023 22:20:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=pDo1iqW2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233659AbjFCEU1 (ORCPT + 99 others); Sat, 3 Jun 2023 00:20:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229617AbjFCEUZ (ORCPT ); Sat, 3 Jun 2023 00:20:25 -0400 Received: from mail-pf1-x442.google.com (mail-pf1-x442.google.com [IPv6:2607:f8b0:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DB1FE55; Fri, 2 Jun 2023 21:20:24 -0700 (PDT) Received: by mail-pf1-x442.google.com with SMTP id d2e1a72fcca58-64d57cd373fso3026885b3a.1; Fri, 02 Jun 2023 21:20:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685766023; x=1688358023; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject:from:to:cc :subject:date:message-id:reply-to; bh=FMCJesx+orcQ9tVlanHdUdQLiDUTLtWDjNYa8dP5/z8=; b=pDo1iqW2kRynbw5SZpKMZ7uNolpp5uLD3hL+tXf4wkIY4WDIrxvFEK/sOovxNkVALo Cr1vgiA0gHzSCiGKJMKY48E1jEXRnEADLKfIWGTpakkCf4rn+Pk6IwwN9efm78stl9fE Wsu1ofmtdyGjwvqY5cVNzyELC66IGodsiTFsEPwQYC6OsTsZFaZ//7N7gv8oqrvAFy7P UDdblRKw0z82PoEbCfWLD2MlaaiwnMJ44caVCRLPjRq9fgn90bYFTjAYTBzQHGsrmpu4 cFz8wd3A/glmnKDFq6hKK+lDrkdBehK0L1B/MZU3oyRgDhJjGmzNFThujuZHbM25Xvaw Lq1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685766023; x=1688358023; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FMCJesx+orcQ9tVlanHdUdQLiDUTLtWDjNYa8dP5/z8=; b=PNyn44RgWcB58wJ9cyQmMbtqPWxpvHMxdtpOeq5yT9k9f8rey/FDjuIKtfGd+s5tNg UXFGd6vhm3SRa5S2DSqkOxVpojjqqbdPgdxWF07sfKfGpCaoClnvFck3OTMlMzLKsYvq 9BmsPGyoHCHVGIr12Q4KTTcZb6CoEK5OshkDmJTjn7gerQpZ4s2aNmcoE7rJRORk6bei zyZFuQh2f59BuzUSvodvwRtdNomtXvhzLFRFSP1Htld5cTrtVdyRDtHuf9xhCgfjJAs0 EWkGS/fb7KdCApaK34FO/8lTToCePT3zND1o36rPCjp1plPLy1L5ouRXXkQjcFAc55di 8faA== X-Gm-Message-State: AC+VfDyBbFmNZGb5PBdcNAQFWCihQ2oDxlMJo+yiUaRF6cpxSvQaq11b d0geOHdKICskPHZS8NJQpJ4= X-Received: by 2002:a05:6a20:734a:b0:10e:96a4:e31d with SMTP id v10-20020a056a20734a00b0010e96a4e31dmr702780pzc.22.1685766023400; Fri, 02 Jun 2023 21:20:23 -0700 (PDT) Received: from ?IPv6:2409:8a55:301b:e120:3d47:213a:3f9e:50ab? ([2409:8a55:301b:e120:3d47:213a:3f9e:50ab]) by smtp.gmail.com with ESMTPSA id s6-20020a656906000000b0051b0e564963sm1822923pgq.49.2023.06.02.21.20.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 02 Jun 2023 21:20:22 -0700 (PDT) Subject: Re: [PATCH net-next v2 2/3] page_pool: support non-frag page for page_pool_alloc_frag() To: Alexander Duyck , Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Lorenzo Bianconi , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet References: <20230529092840.40413-1-linyunsheng@huawei.com> <20230529092840.40413-3-linyunsheng@huawei.com> <977d55210bfcb4f454b9d740fcbe6c451079a086.camel@gmail.com> <2e4f0359-151a-5cff-6d31-0ea0f014ef9a@huawei.com> <8c9d5dd8-b654-2d50-039d-9b7732e7746f@huawei.com> From: Yunsheng Lin Message-ID: Date: Sat, 3 Jun 2023 12:20:15 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023/6/2 23:57, Alexander Duyck wrote: > On Fri, Jun 2, 2023 at 5:23 AM Yunsheng Lin wrote: ... >> >> According to my defination in this patchset: >> frag page: page alloced from page_pool_alloc_frag() with page->pp_frag_count >> being greater than one. >> non-frag page:page alloced return from both page_pool_alloc_frag() and >> page_pool_alloc_pages() with page->pp_frag_count being one. >> >> I assume the above 'non-page pool pages' refer to what I call as 'non-frag >> page' alloced return from both page_pool_alloc_frag(), right? And it is >> still about doing the (size << 1 > max_size)' checking at the begin instead >> of at the middle right now to avoid extra steps for 'non-frag page' case? > > Yeah, the non-page I was referring to were you mono-frag pages. I was using 'frag page' and 'non-frag page' per the defination above, and you were using 'mono-frag' mostly and 'non-page' sometimes. I am really confused by them as I felt like I got what they meant and then I was lost when you used them in the next comment. I really hope that you could describe what do you mean in more detailed by using 'mono-frag pages' and 'non-page', so that we can choose the right naming to continue the discussion without further misunderstanding and confusion. > >>> >>>>> >>>>>> - if (page && *offset + size > max_size) { >>>>>> + if (page) { >>>>>> + *offset = pool->frag_offset; >>>>>> + >>>>>> + if (*offset + size <= max_size) { >>>>>> + pool->frag_users++; >>>>>> + pool->frag_offset = *offset + size; >>>>>> + alloc_stat_inc(pool, fast); >>>>>> + return page; >>>> >>>> Note that we still allow frag page here when '(size << 1 > max_size)'. >> >> This is the optimization I was taking about: suppose we start >> from a clean state with 64K page size, if page_pool_alloc_frag() >> is called with size being 2K and then 34K, we only need one page >> to satisfy caller's need as we do the '*offset + size > max_size' >> checking before the '(size << 1 > max_size)' checking. > > The issue is the unaccounted for waste. We are supposed to know the > general size of the frags being used so we can compute truesize. If Note, for case of veth and virtio_net, the driver may only know the current frag size when calling page_pool_alloc_frag(), it does not konw what is the size of the frags will be used next time, how exactly are we going to compute the truesize for cases with different frag size? As far as I can tell, we may only do something like virtio_net is doing with 'page_frag' for the last frag as below, for other frags, the truesize may need to take accounting to the aligning requirement: https://elixir.bootlin.com/linux/v6.3.5/source/drivers/net/virtio_net.c#L1638 > for example you are using an order 3 page and you are splitting it > between a 2K and a 17K fragment the 2K fragments will have a massive > truesize underestimate that can lead to memory issues if those smaller > fragments end up holding onto the pages. > > As such we should try to keep the small fragments away from anything > larger than half of the page. IMHO, doing the above only alleviate the problem. How is above splitting different from splitting it evently with 16 2K fragments, and when 15 frag is released, we still have the last 2K fragment holding onto 32K memory, doesn't that also cause massive truesize underestimate? Not to mention that for system with 64K page size. In RFC patch below, 'page_pool_frag' is used to report the truesize, but I was thinking both 'page_frag' and 'page_frag_cache' both have a similiar problem, so I dropped it in V1 and left that as a future improvement. I can pick it up again if 'truesize' is really the concern, but we have to align on how to compute the truesize here first. https://patchwork.kernel.org/project/netdevbpf/patch/20230516124801.2465-4-linyunsheng@huawei.com/