Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3076311rwd; Fri, 16 Jun 2023 11:52:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7Afiy6agnanm/7UBeHaQmRi0t+D1tLZ1p9thUJiCLqYXMn7W9GzP4TIl7cc8RxNz5wzJwf X-Received: by 2002:a17:90a:194a:b0:253:45ce:fad7 with SMTP id 10-20020a17090a194a00b0025345cefad7mr2190541pjh.31.1686941569777; Fri, 16 Jun 2023 11:52:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686941569; cv=none; d=google.com; s=arc-20160816; b=MYeTwdEXP4jcEdRYIC74qbRbXTAzJNW1j6qnaSlKL+I4r+ULEGI0unAw7Qpb1KcgGT 4Fx6Ws6zWhKuNcjxPMrcKP8s4irPboTV1AY/BQwoFdq2AYukWtOkB9Tta/tQ12zAuCCj d76IWNJvHqcEk+YTOz+GUzQPmDUP22aAZXVJFmOGna5vtt6+xfh1NQcux1HIvFr+5PUo Gc2/Ryyu0LPRPYQTSBWuBpsLknKbzrZnQJtiEUg0dep6mUm3fz39OE9oy3bY1ghL5Xh7 2UxTqxUvptaKTwHfkUfLKYXtMvoivi2YyV8u96grmntz1bVyCkjnhrlWYQ1p6x+2TulX nY2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :to:content-language:subject:cc:user-agent:mime-version:date :message-id:from:dkim-signature; bh=ncUHSaugo1pt966Srqti4iXSp2GyLmXVm+6LdG2q0jM=; b=kmD581e3aj0WgJK8UnK4JXOssvbR+/t+0tvbbU9hOAggwM4hqd+TipRWQyrrm6BiPK YOd3kIo/ad40iPdIcUcFxjGbW5/zKUiIly7jkzxOBf9A61cIf0R6ZK8GBeuY2s8YQ3b3 30CT9H+CFfoyIjJd9ekhD+jJ7VJrCVX54eTE7ndVnoXpibtNlvVPFrEsLqQ8bNfOX5DD 7KOiI/SygojbHn+6lwL4zqwdlEdEbCh1o3xhTGPYjDEX3+wUys0XeI1B70uO7Bye3DLB VL85R+4Uxm0o5S0XWN+EPgQ1elqh657tX3l7NUYVDCW57+DrNeqD7Ic9Ry0bxLDqyQMu A8jg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ba+CgnlO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ev21-20020a17090aead500b002567ce1c798si2105390pjb.137.2023.06.16.11.52.35; Fri, 16 Jun 2023 11:52:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ba+CgnlO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344043AbjFPSmn (ORCPT + 99 others); Fri, 16 Jun 2023 14:42:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229535AbjFPSmm (ORCPT ); Fri, 16 Jun 2023 14:42:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 596FC30E1 for ; Fri, 16 Jun 2023 11:41:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686940910; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ncUHSaugo1pt966Srqti4iXSp2GyLmXVm+6LdG2q0jM=; b=ba+CgnlOpFuGan6I84HlNWHUgNWq0B6HfWC+44cfpTYNMszz9sTD40g2hNPz3SXi6WYXZ8 /NEC8CtYoS9pH6UrBQ9h7jgv48SzxUcOIxwqlE/VhyNeczvlbVJlawJb6iClQbeDqDuQP/ z6hiomxFcduo6rNs9gy10XSJPtYWOjw= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-628-k_YSSVrHOSmOLwbNOUHJEg-1; Fri, 16 Jun 2023 14:41:48 -0400 X-MC-Unique: k_YSSVrHOSmOLwbNOUHJEg-1 Received: by mail-ed1-f72.google.com with SMTP id 4fb4d7f45d1cf-51870864083so616835a12.1 for ; Fri, 16 Jun 2023 11:41:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686940907; x=1689532907; h=content-transfer-encoding:in-reply-to:references:to :content-language:subject:cc:user-agent:mime-version:date:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ncUHSaugo1pt966Srqti4iXSp2GyLmXVm+6LdG2q0jM=; b=RXx5tJFV6CB113u+MU0pNV8ckJzNy1Lm+O/QKg3emjWN2FUMZWBu4wSs5mt7ibk16q zLBAG+CidJ83JmVTZEgRs/HRcUG9xOdK5DbNZTdG7sfNGsQXyZQcD3tw+LR+noCgHyX8 IY1apo0a56gCQyiG6dmYMZeG85CpqiPGaD8r+Q/b7Fvr7wa4ssIUUxQi4pQziihJ0CSh K9Zw3R0mfWdti/FTEqCn2RSoVyUrZMfaT5Sv1itMmsP6OmW5rsmrAvowuJHrb1iVBUiE ERca1Lb8wgaTw8snl1WLvwZk0zSAoYrAfsjYc1I0yJX6cHx9u8UPoQjbEh36FD8vgKW3 sNuQ== X-Gm-Message-State: AC+VfDwvFZMJ/r4P7ui33rPrfaEI2nSi1m/6d1xv0WH1CdcSH1S6+KM3 eEOLdpEKzl+Yi7L9OFOgTj1MXBVdN0HQrujeYTi07IjTPzDfCEFOqwLur/bo/BXgombmvT69dK9 A2SLjCaj750cQE+uGukRkbZj1 X-Received: by 2002:a05:6402:456:b0:50b:c693:70af with SMTP id p22-20020a056402045600b0050bc69370afmr1802602edw.2.1686940907701; Fri, 16 Jun 2023 11:41:47 -0700 (PDT) X-Received: by 2002:a05:6402:456:b0:50b:c693:70af with SMTP id p22-20020a056402045600b0050bc69370afmr1802591edw.2.1686940907364; Fri, 16 Jun 2023 11:41:47 -0700 (PDT) Received: from [192.168.42.222] (194-45-78-10.static.kviknet.net. [194.45.78.10]) by smtp.gmail.com with ESMTPSA id j10-20020aa7c40a000000b0051a315d6e1bsm1313091edq.70.2023.06.16.11.41.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 Jun 2023 11:41:46 -0700 (PDT) From: Jesper Dangaard Brouer X-Google-Original-From: Jesper Dangaard Brouer Message-ID: <699563f5-c4fa-0246-5e79-61a29e1a8db3@redhat.com> Date: Fri, 16 Jun 2023 20:41:45 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Cc: brouer@redhat.com, Yunsheng Lin , davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Lorenzo Bianconi , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , Maryam Tahhan , bpf Subject: Re: [PATCH net-next v3 3/4] page_pool: introduce page_pool_alloc() API Content-Language: en-US To: Alexander Duyck , Jesper Dangaard Brouer References: <20230609131740.7496-1-linyunsheng@huawei.com> <20230609131740.7496-4-linyunsheng@huawei.com> <36366741-8df2-1137-0dd9-d498d0f770e4@huawei.com> <0ba1bf9c-2e45-cd44-60d3-66feeb3268f3@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 16/06/2023 19.34, Alexander Duyck wrote: > On Fri, Jun 16, 2023 at 9:31 AM Jesper Dangaard Brouer > wrote: >> >> >> >> On 16/06/2023 13.57, Yunsheng Lin wrote: >>> On 2023/6/16 0:19, Jesper Dangaard Brouer wrote: >>> >>> ... >>> >>>> You have mentioned veth as the use-case. I know I acked adding page_pool >>>> use-case to veth, for when we need to convert an SKB into an >>>> xdp_buff/xdp-frame, but maybe it was the wrong hammer(?). >>>> In this case in veth, the size is known at the page allocation time. >>>> Thus, using the page_pool API is wasting memory. We did this for >>>> performance reasons, but we are not using PP for what is was intended >>>> for. We mostly use page_pool, because it an existing recycle return >>>> path, and we were too lazy to add another alloc-type (see enum >>>> xdp_mem_type). >>>> >>>> Maybe you/we can extend veth to use this dynamic size API, to show us >>>> that this is API is a better approach. I will signup for benchmarking >>>> this (and coordinating with CC Maryam as she came with use-case we >>>> improved on). >>> >>> Thanks, let's find out if page pool is the right hammer for the >>> veth XDP case. >>> >>> Below is the change for veth using the new api in this patch. >>> Only compile test as I am not familiar enough with veth XDP and >>> testing environment for it. >>> Please try it if it is helpful. >>> >>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>> index 614f3e3efab0..8850394f1d29 100644 >>> --- a/drivers/net/veth.c >>> +++ b/drivers/net/veth.c >>> @@ -736,7 +736,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, >>> if (skb_shared(skb) || skb_head_is_locked(skb) || >>> skb_shinfo(skb)->nr_frags || >>> skb_headroom(skb) < XDP_PACKET_HEADROOM) { >>> - u32 size, len, max_head_size, off; >>> + u32 size, len, max_head_size, off, truesize, page_offset; >>> struct sk_buff *nskb; >>> struct page *page; >>> int i, head_off; >>> @@ -752,12 +752,15 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, >>> if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size) >>> goto drop; >>> >>> + size = min_t(u32, skb->len, max_head_size); >>> + truesize = size; >>> + >>> /* Allocate skb head */ >>> - page = page_pool_dev_alloc_pages(rq->page_pool); >>> + page = page_pool_dev_alloc(rq->page_pool, &page_offset, &truesize); >> >> Maybe rename API to: >> >> addr = netmem_alloc(rq->page_pool, &truesize); >> >>> if (!page) >>> goto drop; >>> >>> - nskb = napi_build_skb(page_address(page), PAGE_SIZE); >>> + nskb = napi_build_skb(page_address(page) + page_offset, truesize); >> >> IMHO this illustrates that API is strange/funky. >> (I think this is what Alex Duyck is also pointing out). >> >> This is the memory (virtual) address "pointer": >> addr = page_address(page) + page_offset >> >> This is what napi_build_skb() takes as input. (I looked at other users >> of napi_build_skb() whom all give a mem ptr "va" as arg.) >> So, why does your new API provide the "page" and not just the address? >> >> As proposed above: >> addr = netmem_alloc(rq->page_pool, &truesize); >> >> Maybe the API should be renamed, to indicate this isn't returning a "page"? >> We have talked about the name "netmem" before. > > Yeah, this is more-or-less what I was getting at. Keep in mind this is > likely the most common case since most frames passed and forth aren't > ever usually much larger than 1500B. > Good to get confirmed this is "more-or-less" your suggestion/direction. >>> if (!nskb) { >>> page_pool_put_full_page(rq->page_pool, page, true); >>> goto drop; >>> @@ -767,7 +770,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, >>> skb_copy_header(nskb, skb); >>> skb_mark_for_recycle(nskb); >>> >>> - size = min_t(u32, skb->len, max_head_size); >>> if (skb_copy_bits(skb, 0, nskb->data, size)) { >>> consume_skb(nskb); >>> goto drop; >>> @@ -782,14 +784,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, >>> len = skb->len - off; >>> >>> for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { >>> - page = page_pool_dev_alloc_pages(rq->page_pool); >>> + size = min_t(u32, len, PAGE_SIZE); >>> + truesize = size; >>> + >>> + page = page_pool_dev_alloc(rq->page_pool, &page_offset, >>> + &truesize); >>> if (!page) { >>> consume_skb(nskb); >>> goto drop; >>> } >>> >>> - size = min_t(u32, len, PAGE_SIZE); >>> - skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE); >>> + skb_add_rx_frag(nskb, i, page, page_offset, size, truesize); >> >> Guess, this shows the opposite; that the "page" _is_ used by the >> existing API. > > This is a sort-of. One thing that has come up as of late is that all > this stuff is being moved over to folios anyway and getting away from > pages. In addition I am not sure how often we are having to take this > path as I am not sure how many non-Tx frames end up having to have > fragments added to them. For something like veth it might be more > common though since Tx becomes Rx in this case. I'm thinking, that is it very unlikely that XDP have modified the fragments. So, why are we allocating and copying the fragments? Wouldn't it be possible for this veth code to bump the refcnt on these fragments? (maybe I missed some detail). > > One thought I had on this is that we could look at adding a new > function that abstracts this away and makes use of netmem instead. > Then the whole page/folio thing would be that much further removed. I like this "thought" of yours :-) > > One other question I have now that I look at this code as well. Why is > it using page_pool and not just a frag cache allocator, or pages > themselves? It doesn't seem like it has a DMA mapping to deal with > since this is essentially copy-break code. Seems problematic that > there is no DMA involved here at all. This could be more easily > handled with just a single page_frag_cache style allocator. > Yes, precisely. I distinctly remember what I tried to poke you and Eric on this approach earlier, but I cannot find a link to that email. I would really appreciate, if you Alex, could give the approach in veth_convert_skb_to_xdp_buff() some review, as I believe that is a huge potential for improvements that will lead to large performance improvements. (I'm sure Maryam will be eager to help re-test performance for her use-cases). --Jesper