Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3008292rwd; Fri, 16 Jun 2023 10:57:16 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7MOTUnDCtMPAr7agwm/rn45e5aN3E2UlNk47xpKt9DWUFRqJ0Gu026iDVJe03z8vB/iPSq X-Received: by 2002:a05:6808:4246:b0:398:2c02:20a2 with SMTP id dp6-20020a056808424600b003982c0220a2mr2736783oib.17.1686938235983; Fri, 16 Jun 2023 10:57:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686938235; cv=none; d=google.com; s=arc-20160816; b=taOC2fodoJRyJoOiDGhxdzFzJs8BwHrtfiwn6TqfEN33G3N87Flpjq26F3WBxdqhaM KXeu/Lh5sSW4BxPDKqlViDXfpjkHVeidTVtCKchKPkxk8+O3+Pi1JXG8uP5QyMW+aTZe RbK/UTsRPc7FlkOqbZnsxG2etHe+HlKz60zSx8nV7iJKe4IbpI4r7UKNynfK4+ulvRT9 /xYqxjWNYEuDuZLich9sBUpCwfggtQ+mBj/p0bqbDILmT0x2PZmrZ7kZ1bng77fSwqJv k/tqFccOQ8MfKQAbQrY4Gx3ECy85RawQcXPsQoqCTouvqmb9h2eUEwNI/P7Xk4/JErhG U9pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=kxA+ZqGfFPqYw3C9ialTd5Ien3AooaP5NvhhVbxD7jk=; b=INMrGVkWtYJAgH30jXL1sssvOIJ7mjp7PJ1jFO9wRYdC6ayLXQzPRURhWLrhIiaLRm qJ1Bm/yYtwtHYlFO5gTjV/ujIjw8cCG+1Z/LMgJMkBkZY9Q+1xwrnNvELH8iVqMXHEfu /MA4kdZCXGw/caTAMVueh/LGzD4NideWxc5ZH2plEms1zYB/QE1So4XYUTOada4aNV6S X86XNJ0pODD0c5F+qbQALVJlRDYWXjfSULldwZ5z4bpwpKGW4K1DQrHi9eVw4r2bO9dC nH2ev0eUgAXLVgLBaWMm3AdAjBVo+VPSaWsOCMRjmDqtxfNWOog0I6bjLdBHe/2K+k6J wZRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=k8VOj7bl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v16-20020a63d550000000b00543a6cf2b5csi14929479pgi.662.2023.06.16.10.57.03; Fri, 16 Jun 2023 10:57:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=k8VOj7bl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230502AbjFPRev (ORCPT + 99 others); Fri, 16 Jun 2023 13:34:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232984AbjFPRek (ORCPT ); Fri, 16 Jun 2023 13:34:40 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 573021FF7; Fri, 16 Jun 2023 10:34:39 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id 98e67ed59e1d1-25e8545ea28so781704a91.0; Fri, 16 Jun 2023 10:34:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686936879; x=1689528879; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kxA+ZqGfFPqYw3C9ialTd5Ien3AooaP5NvhhVbxD7jk=; b=k8VOj7blc5+BW5q9oY08+FkzwflcaI8z9k70mTiZPwgDMJTWtpx7vpERhzdxujkp8L cD+FDXJQ3YEyHPmOyjqniCM1J+IqUasyEg+LnEkO0EbMP8t7S3DA57ote9QG5tiCIDHz 6rlxEHySklqJ7cQO+hQYRJeYDLFmI43j89MzNUyuNC5vtDKJ4FbfAiOQWe/w+u5ENQo/ 4Od1SMFoJThcEuVwnAg1tzivco1RtwV9W7smQ+PtGidtHxgQyQlvMII5tBi7mHL5JX4K SsetE+7HjeIASKbBid0IFGAZPc0tEuLPIyOKrQA3hRWnn3Yt13QVlH2WsgOrzjkNrK3U TBKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686936879; x=1689528879; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kxA+ZqGfFPqYw3C9ialTd5Ien3AooaP5NvhhVbxD7jk=; b=WJwAQh+l71QXdzate4tu0eW54ul6BfKp/Dd3ykl5HG9BenMmNzCWiT4gTyiw+fc63l AGQng4TIVoVD+NwUFnKIy0QowP/x8MkWGiHOx0JtBDFvaRU+YJyT/SrDDmUiiieoopHV +myg2dySCVVNcMrusEyEEJ7fXhdLWKCIL9CXfU9bt3Bbh8mtM7C7oTyFpX6Qaf6/LsmQ VAkQ37AowbhsZL8kZ4ZXguE33pHl9KMXZldEYwGE/W3ZqxP9+QJUlIFQBPssg05/Jxvs t6Ql9vmJT/k6zzEZcSpniDsPuDDtXOd9Y9MkYP2anMEuYqdQjjMKCuCJDfPMbnpdVWt5 c0cQ== X-Gm-Message-State: AC+VfDzVvPdJ6iMwNs/m94VKwP+XaN9iFwcVF+XIXL4xLmfyzYn4n9ll iJix4yMPaHgKMIDg32opkQqaOvSF0jv8msByUY8= X-Received: by 2002:a17:90b:a4b:b0:255:9038:fe0d with SMTP id gw11-20020a17090b0a4b00b002559038fe0dmr2347329pjb.38.1686936878634; Fri, 16 Jun 2023 10:34:38 -0700 (PDT) MIME-Version: 1.0 References: <20230609131740.7496-1-linyunsheng@huawei.com> <20230609131740.7496-4-linyunsheng@huawei.com> <36366741-8df2-1137-0dd9-d498d0f770e4@huawei.com> <0ba1bf9c-2e45-cd44-60d3-66feeb3268f3@redhat.com> In-Reply-To: From: Alexander Duyck Date: Fri, 16 Jun 2023 10:34:02 -0700 Message-ID: Subject: Re: [PATCH net-next v3 3/4] page_pool: introduce page_pool_alloc() API To: Jesper Dangaard Brouer Cc: Yunsheng Lin , brouer@redhat.com, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Lorenzo Bianconi , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , Maryam Tahhan , bpf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 16, 2023 at 9:31=E2=80=AFAM Jesper Dangaard Brouer wrote: > > > > On 16/06/2023 13.57, Yunsheng Lin wrote: > > On 2023/6/16 0:19, Jesper Dangaard Brouer wrote: > > > > ... > > > >> You have mentioned veth as the use-case. I know I acked adding page_po= ol > >> use-case to veth, for when we need to convert an SKB into an > >> xdp_buff/xdp-frame, but maybe it was the wrong hammer(?). > >> In this case in veth, the size is known at the page allocation time. > >> Thus, using the page_pool API is wasting memory. We did this for > >> performance reasons, but we are not using PP for what is was intended > >> for. We mostly use page_pool, because it an existing recycle return > >> path, and we were too lazy to add another alloc-type (see enum > >> xdp_mem_type). > >> > >> Maybe you/we can extend veth to use this dynamic size API, to show us > >> that this is API is a better approach. I will signup for benchmarking > >> this (and coordinating with CC Maryam as she came with use-case we > >> improved on). > > > > Thanks, let's find out if page pool is the right hammer for the > > veth XDP case. > > > > Below is the change for veth using the new api in this patch. > > Only compile test as I am not familiar enough with veth XDP and > > testing environment for it. > > Please try it if it is helpful. > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > > index 614f3e3efab0..8850394f1d29 100644 > > --- a/drivers/net/veth.c > > +++ b/drivers/net/veth.c > > @@ -736,7 +736,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth= _rq *rq, > > if (skb_shared(skb) || skb_head_is_locked(skb) || > > skb_shinfo(skb)->nr_frags || > > skb_headroom(skb) < XDP_PACKET_HEADROOM) { > > - u32 size, len, max_head_size, off; > > + u32 size, len, max_head_size, off, truesize, page_offse= t; > > struct sk_buff *nskb; > > struct page *page; > > int i, head_off; > > @@ -752,12 +752,15 @@ static int veth_convert_skb_to_xdp_buff(struct ve= th_rq *rq, > > if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_si= ze) > > goto drop; > > > > + size =3D min_t(u32, skb->len, max_head_size); > > + truesize =3D size; > > + > > /* Allocate skb head */ > > - page =3D page_pool_dev_alloc_pages(rq->page_pool); > > + page =3D page_pool_dev_alloc(rq->page_pool, &page_offse= t, &truesize); > > Maybe rename API to: > > addr =3D netmem_alloc(rq->page_pool, &truesize); > > > if (!page) > > goto drop; > > > > - nskb =3D napi_build_skb(page_address(page), PAGE_SIZE); > > + nskb =3D napi_build_skb(page_address(page) + page_offse= t, truesize); > > IMHO this illustrates that API is strange/funky. > (I think this is what Alex Duyck is also pointing out). > > This is the memory (virtual) address "pointer": > addr =3D page_address(page) + page_offset > > This is what napi_build_skb() takes as input. (I looked at other users > of napi_build_skb() whom all give a mem ptr "va" as arg.) > So, why does your new API provide the "page" and not just the address? > > As proposed above: > addr =3D netmem_alloc(rq->page_pool, &truesize); > > Maybe the API should be renamed, to indicate this isn't returning a "page= "? > We have talked about the name "netmem" before. Yeah, this is more-or-less what I was getting at. Keep in mind this is likely the most common case since most frames passed and forth aren't ever usually much larger than 1500B. > > if (!nskb) { > > page_pool_put_full_page(rq->page_pool, page, t= rue); > > goto drop; > > @@ -767,7 +770,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth= _rq *rq, > > skb_copy_header(nskb, skb); > > skb_mark_for_recycle(nskb); > > > > - size =3D min_t(u32, skb->len, max_head_size); > > if (skb_copy_bits(skb, 0, nskb->data, size)) { > > consume_skb(nskb); > > goto drop; > > @@ -782,14 +784,17 @@ static int veth_convert_skb_to_xdp_buff(struct ve= th_rq *rq, > > len =3D skb->len - off; > > > > for (i =3D 0; i < MAX_SKB_FRAGS && off < skb->len; i++= ) { > > - page =3D page_pool_dev_alloc_pages(rq->page_poo= l); > > + size =3D min_t(u32, len, PAGE_SIZE); > > + truesize =3D size; > > + > > + page =3D page_pool_dev_alloc(rq->page_pool, &pa= ge_offset, > > + &truesize); > > if (!page) { > > consume_skb(nskb); > > goto drop; > > } > > > > - size =3D min_t(u32, len, PAGE_SIZE); > > - skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SI= ZE); > > + skb_add_rx_frag(nskb, i, page, page_offset, siz= e, truesize); > > Guess, this shows the opposite; that the "page" _is_ used by the > existing API. This is a sort-of. One thing that has come up as of late is that all this stuff is being moved over to folios anyway and getting away from pages. In addition I am not sure how often we are having to take this path as I am not sure how many non-Tx frames end up having to have fragments added to them. For something like veth it might be more common though since Tx becomes Rx in this case. One thought I had on this is that we could look at adding a new function that abstracts this away and makes use of netmem instead. Then the whole page/folio thing would be that much further removed. One other question I have now that I look at this code as well. Why is it using page_pool and not just a frag cache allocator, or pages themselves? It doesn't seem like it has a DMA mapping to deal with since this is essentially copy-break code. Seems problematic that there is no DMA involved here at all. This could be more easily handled with just a single page_frag_cache style allocator.