Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp5446rwd; Wed, 7 Jun 2023 18:14:28 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4ijnvLWJJolf1NJfp9RhIXmQ9wtHGKPy68o0PLnZAKhIGPEG51S/4f+AU83ZfkmPywjrp1 X-Received: by 2002:a17:902:ba83:b0:1b0:25ff:a8af with SMTP id k3-20020a170902ba8300b001b025ffa8afmr4559153pls.60.1686186868132; Wed, 07 Jun 2023 18:14:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686186868; cv=none; d=google.com; s=arc-20160816; b=VG2s0+D6cSpyvSsj+WWYA5yk6cJa5akqyGLMo6qptKX4AZv8ihap1C2y75LEFtp1tr 6OoZh4M4Z+RQN23+6UlWAbnXszw4mGRfDjttW/WTRImcXZFfpgl94kfVGRkaEKiuG7zA eyGIRXG9ztnZ/dciKVE6vJzR/kyf2dUWasv1oVeSScXpkl/lz1BPjHKZ1dxE3md7PFPS iYDhPFvUAcG9Lhy9Yc4PQ0TmdKKXWhXz2r6kM2oLMYlNzbS17PsN3xEfx1ApKae3f3W1 AYU3MShZTzBAHMcrd6FOde0wcNIRxD6HJhYDVs/xLrSzvfiib9pqTELUbzgw4aTCXBWq RB1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=XCMVNDfPhrDuTYQEHCOO2Ls5jA+CZj+/YTctXEc3A/k=; b=FqcT4d48+PugCwa9IsjSJOVdWAya0jpSmQJvti23Z616K5BURSvr5b893O63mHkVQh zlRFGWKZklutGby8C/SHEzlN+V2cfE8wy3gnzZDt6FIObbYSkweIO4ECzTZ2wRoPqbtv YD+NjM1m7fCewW1o+jkwnqlGS8bEOWX6qEL8Rh8XNoS6u2EMGZe9IwTzRL1RHjR0weoQ BLrctySu0rb+rWLeXyc3PCLlwST/rqV+HZBqAOVhgyXXat4geHrli6SVOzoHhhch+Y4e kKhGF8QeHJfF/p76VHXXW0KzL4sCP+ssw0QnHkboc0q6IKmnar2RmtLdwBCzJT0df0aJ GYMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HnE6W48T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l6-20020a170902f68600b001b03bacdc9dsi203539plg.343.2023.06.07.18.14.15; Wed, 07 Jun 2023 18:14:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HnE6W48T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230282AbjFHAjT (ORCPT + 99 others); Wed, 7 Jun 2023 20:39:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229589AbjFHAjS (ORCPT ); Wed, 7 Jun 2023 20:39:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B834213C for ; Wed, 7 Jun 2023 17:38:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686184709; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XCMVNDfPhrDuTYQEHCOO2Ls5jA+CZj+/YTctXEc3A/k=; b=HnE6W48TtIO9ewc2Qf92/lOowp9ybwLICPU8b1B/3ZwAHy6XiG/CGhsCQADhQI1paYv0EM qT9bF/p6zrAOPuKfAi0ZNBvlbBsIidt2u77+o0DJJq5qaJfKUt4nKfTQ0aWUhtBNLJFCwU +cvfoZzkbo709WP7aP4AzKfVyGrGFYg= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-246-fZagNznhMtqvDnb3J3aUQQ-1; Wed, 07 Jun 2023 20:38:27 -0400 X-MC-Unique: fZagNznhMtqvDnb3J3aUQQ-1 Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-4f5847463baso48507e87.3 for ; Wed, 07 Jun 2023 17:38:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686184706; x=1688776706; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XCMVNDfPhrDuTYQEHCOO2Ls5jA+CZj+/YTctXEc3A/k=; b=SqTaoLBo1L0mzkvcvxjaeh4AMpvpwjHRGQz6+FrHjUyYmY+4PyQVVuJ3dj+jrsMr0g lA7zTCdOqnibUTCOD31h1rYRTcnEwUFUcYRYCBsh9RXQ/Cs5HQMFXGlXSBKlKxoLW+mH /7xlq2IPnzoAShjt/kSp0opUYb78s+i24qTRQmy4++UwPdYz3/PKSrugpQ8WfuZ2pbZW dSjXu9gLNoD4F7kVc+LMFRuS5+3iJSBX8a7nRZK3IHzopPYp1W4SGBt3y0kAIujiooZk BcI0pmklv8upA1FenU7GIPHZ3UU8g9tMGmyf20vpOgHCEErGzQ7bLpiniuBqG/TKsYiW OZEg== X-Gm-Message-State: AC+VfDxvrQTaVClh9R/AZ10IQNjzp4Zq7TsGG9CXKh9RSOmLwwvOgKaK AKCx14AYFUJfxC8QL09RoGiEo36SrbGZ/XVsgyg2m1n3YdSfoU2QONkjUGxTJOOkXoi/08WRAr7 nKLGJIA0V67Z7vr8U+0vesOuVeklvJbtyLaNLsz72 X-Received: by 2002:ac2:5991:0:b0:4f6:2cd8:5ffe with SMTP id w17-20020ac25991000000b004f62cd85ffemr2739069lfn.1.1686184705990; Wed, 07 Jun 2023 17:38:25 -0700 (PDT) X-Received: by 2002:ac2:5991:0:b0:4f6:2cd8:5ffe with SMTP id w17-20020ac25991000000b004f62cd85ffemr2739062lfn.1.1686184705425; Wed, 07 Jun 2023 17:38:25 -0700 (PDT) MIME-Version: 1.0 References: <20230526054621.18371-1-liangchen.linux@gmail.com> <20230526054621.18371-2-liangchen.linux@gmail.com> <20230528021708-mutt-send-email-mst@kernel.org> <20230529055439-mutt-send-email-mst@kernel.org> <20230607161724-mutt-send-email-mst@kernel.org> In-Reply-To: <20230607161724-mutt-send-email-mst@kernel.org> From: Jason Wang Date: Thu, 8 Jun 2023 08:38:14 +0800 Message-ID: Subject: Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance To: "Michael S. Tsirkin" Cc: Liang Chen , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xuanzhuo@linux.alibaba.com, kuba@kernel.org, edumazet@google.com, davem@davemloft.net, pabeni@redhat.com, alexander.duyck@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 8, 2023 at 4:17=E2=80=AFAM Michael S. Tsirkin = wrote: > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > > On Tue, May 30, 2023 at 9:19=E2=80=AFAM Liang Chen wrote: > > > > > > On Mon, May 29, 2023 at 5:55=E2=80=AFPM Michael S. Tsirkin wrote: > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > On Sun, May 28, 2023 at 2:20=E2=80=AFPM Michael S. Tsirkin wrote: > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > The implementation at the moment uses one page per packet in = both the > > > > > > > normal and XDP path. In addition, introducing a module parame= ter to enable > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest per= formance gain > > > > > > > in the normal path. > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant p= erformance > > > > > > > gain is observed in XDP cpumap: > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > With this foundation, we can further integrate page pool frag= mentation and > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > > > > > > > Why off by default? > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When = the > > > > > packet size is very small, it reduces the likelihood of skb > > > > > coalescing. But such cases are rare. > > > > > > > > small packets are rare? These workloads are easy to create actually= . > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > Sure, Thanks! > > > > Before going ahead and posting v2 patch, I would like to hear more > > advice for the cases of small packets. I have done more performance > > benchmark with small packets since then. Here is a list of iperf > > output, > > > > With PP and PP fragmenting: > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec 0 14= 4 KBytes > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec 0 > > 223 KBytes > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec 0 > > 324 KBytes > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec 0 > > 1.08 MBytes > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec 0 > > 744 KBytes > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec 0 9= 63 KBytes > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec 0 1.= 25 MBytes > > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec 0 1.= 70 MBytes > > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec 0 4.2= 6 MBytes > > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec 0 3.2= 0 MBytes Note that virtio-net driver is lacking things like BQL and others, so it might suffer from buffer bloat for TCP performance. Would you mind to measure with e.g using testpmd on the vhost to see the rx PPS? > > > > Without PP: > > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec 0 3= 59 KBytes > > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec 0 7= 30 KBytes > > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec 0 1.= 99 MBytes > > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec 0 1.= 20 MBytes > > 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec 0 1.= 72 MBytes > > 16K: [ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec 0 2.9= 0 MBytes > > 32K: [ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec 0 3.0= 3 MBytes > > 64K: [ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec 0 3.0= 5 MBytes > > 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec 1 3.03= MBytes > > 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec 0 3.11= MBytes > > > > > > The major factor contributing to the performance drop is the reduction > > of skb coalescing. Additionally, without the page pool, small packets > > can still benefit from the allocation of 8 continuous pages by > > breaking them down into smaller pieces. This effectively reduces the > > frequency of page allocation from the buddy system. For instance, the > > arrival of 32 1K packets only triggers one alloc_page call. Therefore, > > the benefits of using a page pool are limited in such cases. I wonder if we can improve page pool in this case anyhow. > In fact, > > without page pool fragmenting enabled, it can even hinder performance > > from this perspective. > > > > Upon further consideration, I tend to believe making page pool the > > default option may not be appropriate. As you pointed out, we cannot > > simply ignore the performance impact on small packets. Any comments on > > this will be much appreciated. > > > > > > Thanks, > > Liang > > > So, let's only use page pool for XDP then? +1 We can start from this. Thanks > > > > > > > > The usage of page pool for big mode is being evaluated now. Thank= s! > > > > > > > > > > > > --- > > > > > > > drivers/net/virtio_net.c | 188 +++++++++++++++++++++++++++++= +--------- > > > > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_ne= t.c > > > > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > > > > --- a/drivers/net/virtio_net.c > > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > > > > > > module_param(gso, bool, 0444); > > > > > > > module_param(napi_tx, bool, 0644); > > > > > > > > > > > > > > +static bool page_pool_enabled; > > > > > > > +module_param(page_pool_enabled, bool, 0400); > > > > > > > + > > > > > > > /* FIXME: MTU in config. */ > > > > > > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN= ) > > > > > > > #define GOOD_COPY_LEN 128 > > > > > > > @@ -159,6 +162,9 @@ struct receive_queue { > > > > > > > /* Chain pages by the private ptr. */ > > > > > > > struct page *pages; > > > > > > > > > > > > > > + /* Page pool */ > > > > > > > + struct page_pool *page_pool; > > > > > > > + > > > > > > > /* Average packet length for mergeable receive buffers.= */ > > > > > > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > > > > > > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb= (void *buf, unsigned int buflen, > > > > > > > return skb; > > > > > > > } > > > > > > > > > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struc= t page *page) > > > > > > > +{ > > > > > > > + if (rq->page_pool) > > > > > > > + page_pool_put_full_page(rq->page_pool, page, tr= ue); > > > > > > > + else > > > > > > > + put_page(page); > > > > > > > +} > > > > > > > + > > > > > > > /* Called from bottom half context */ > > > > > > > static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > > > > > struct receive_queue *rq, > > > > > > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct= virtnet_info *vi, > > > > > > > hdr =3D skb_vnet_hdr(skb); > > > > > > > memcpy(hdr, hdr_p, hdr_len); > > > > > > > if (page_to_free) > > > > > > > - put_page(page_to_free); > > > > > > > + virtnet_put_page(rq, page_to_free); > > > > > > > > > > > > > > return skb; > > > > > > > } > > > > > > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_de= vice *dev, > > > > > > > return ret; > > > > > > > } > > > > > > > > > > > > > > -static void put_xdp_frags(struct xdp_buff *xdp) > > > > > > > +static void put_xdp_frags(struct xdp_buff *xdp, struct recei= ve_queue *rq) > > > > > > > { > > > > > > > struct skb_shared_info *shinfo; > > > > > > > struct page *xdp_page; > > > > > > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff= *xdp) > > > > > > > shinfo =3D xdp_get_shared_info_from_buff(xdp); > > > > > > > for (i =3D 0; i < shinfo->nr_frags; i++) { > > > > > > > xdp_page =3D skb_frag_page(&shinfo->fra= gs[i]); > > > > > > > - put_page(xdp_page); > > > > > > > + virtnet_put_page(rq, xdp_page); > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(s= truct receive_queue *rq, > > > > > > > if (page_off + *len + tailroom > PAGE_SIZE) > > > > > > > return NULL; > > > > > > > > > > > > > > - page =3D alloc_page(GFP_ATOMIC); > > > > > > > + if (rq->page_pool) > > > > > > > + page =3D page_pool_dev_alloc_pages(rq->page_poo= l); > > > > > > > + else > > > > > > > + page =3D alloc_page(GFP_ATOMIC); > > > > > > > + > > > > > > > if (!page) > > > > > > > return NULL; > > > > > > > > > > > > > > @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(= struct receive_queue *rq, > > > > > > > * is sending packet larger than the MTU. > > > > > > > */ > > > > > > > if ((page_off + buflen + tailroom) > PAGE_SIZE)= { > > > > > > > - put_page(p); > > > > > > > + virtnet_put_page(rq, p); > > > > > > > goto err_buf; > > > > > > > } > > > > > > > > > > > > > > memcpy(page_address(page) + page_off, > > > > > > > page_address(p) + off, buflen); > > > > > > > page_off +=3D buflen; > > > > > > > - put_page(p); > > > > > > > + virtnet_put_page(rq, p); > > > > > > > } > > > > > > > > > > > > > > /* Headroom does not contribute to packet length */ > > > > > > > *len =3D page_off - VIRTIO_XDP_HEADROOM; > > > > > > > return page; > > > > > > > err_buf: > > > > > > > - __free_pages(page, 0); > > > > > > > + if (rq->page_pool) > > > > > > > + page_pool_put_full_page(rq->page_pool, page, tr= ue); > > > > > > > + else > > > > > > > + __free_pages(page, 0); > > > > > > > return NULL; > > > > > > > } > > > > > > > > > > > > > > @@ -1144,7 +1165,7 @@ static void mergeable_buf_free(struct r= eceive_queue *rq, int num_buf, > > > > > > > } > > > > > > > stats->bytes +=3D len; > > > > > > > page =3D virt_to_head_page(buf); > > > > > > > - put_page(page); > > > > > > > + virtnet_put_page(rq, page); > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > @@ -1264,7 +1285,7 @@ static int virtnet_build_xdp_buff_mrg(s= truct net_device *dev, > > > > > > > cur_frag_size =3D truesize; > > > > > > > xdp_frags_truesz +=3D cur_frag_size; > > > > > > > if (unlikely(len > truesize - room || cur_frag_= size > PAGE_SIZE)) { > > > > > > > - put_page(page); > > > > > > > + virtnet_put_page(rq, page); > > > > > > > pr_debug("%s: rx error: len %u exceeds = truesize %lu\n", > > > > > > > dev->name, len, (unsigned long= )(truesize - room)); > > > > > > > dev->stats.rx_length_errors++; > > > > > > > @@ -1283,7 +1304,7 @@ static int virtnet_build_xdp_buff_mrg(s= truct net_device *dev, > > > > > > > return 0; > > > > > > > > > > > > > > err: > > > > > > > - put_xdp_frags(xdp); > > > > > > > + put_xdp_frags(xdp, rq); > > > > > > > return -EINVAL; > > > > > > > } > > > > > > > > > > > > > > @@ -1344,7 +1365,10 @@ static void *mergeable_xdp_get_buf(str= uct virtnet_info *vi, > > > > > > > if (*len + xdp_room > PAGE_SIZE) > > > > > > > return NULL; > > > > > > > > > > > > > > - xdp_page =3D alloc_page(GFP_ATOMIC); > > > > > > > + if (rq->page_pool) > > > > > > > + xdp_page =3D page_pool_dev_alloc_pages(= rq->page_pool); > > > > > > > + else > > > > > > > + xdp_page =3D alloc_page(GFP_ATOMIC); > > > > > > > if (!xdp_page) > > > > > > > return NULL; > > > > > > > > > > > > > > @@ -1354,7 +1378,7 @@ static void *mergeable_xdp_get_buf(stru= ct virtnet_info *vi, > > > > > > > > > > > > > > *frame_sz =3D PAGE_SIZE; > > > > > > > > > > > > > > - put_page(*page); > > > > > > > + virtnet_put_page(rq, *page); > > > > > > > > > > > > > > *page =3D xdp_page; > > > > > > > > > > > > > > @@ -1400,6 +1424,8 @@ static struct sk_buff *receive_mergeabl= e_xdp(struct net_device *dev, > > > > > > > head_skb =3D build_skb_from_xdp_buff(dev, vi, &= xdp, xdp_frags_truesz); > > > > > > > if (unlikely(!head_skb)) > > > > > > > break; > > > > > > > + if (rq->page_pool) > > > > > > > + skb_mark_for_recycle(head_skb); > > > > > > > return head_skb; > > > > > > > > > > > > > > case XDP_TX: > > > > > > > @@ -1410,10 +1436,10 @@ static struct sk_buff *receive_mergea= ble_xdp(struct net_device *dev, > > > > > > > break; > > > > > > > } > > > > > > > > > > > > > > - put_xdp_frags(&xdp); > > > > > > > + put_xdp_frags(&xdp, rq); > > > > > > > > > > > > > > err_xdp: > > > > > > > - put_page(page); > > > > > > > + virtnet_put_page(rq, page); > > > > > > > mergeable_buf_free(rq, num_buf, dev, stats); > > > > > > > > > > > > > > stats->xdp_drops++; > > > > > > > @@ -1467,6 +1493,9 @@ static struct sk_buff *receive_mergeabl= e(struct net_device *dev, > > > > > > > head_skb =3D page_to_skb(vi, rq, page, offset, len, tru= esize, headroom); > > > > > > > curr_skb =3D head_skb; > > > > > > > > > > > > > > + if (rq->page_pool) > > > > > > > + skb_mark_for_recycle(curr_skb); > > > > > > > + > > > > > > > if (unlikely(!curr_skb)) > > > > > > > goto err_skb; > > > > > > > while (--num_buf) { > > > > > > > @@ -1509,6 +1538,8 @@ static struct sk_buff *receive_mergeabl= e(struct net_device *dev, > > > > > > > curr_skb =3D nskb; > > > > > > > head_skb->truesize +=3D nskb->truesize; > > > > > > > num_skb_frags =3D 0; > > > > > > > + if (rq->page_pool) > > > > > > > + skb_mark_for_recycle(curr_skb); > > > > > > > } > > > > > > > if (curr_skb !=3D head_skb) { > > > > > > > head_skb->data_len +=3D len; > > > > > > > @@ -1517,7 +1548,7 @@ static struct sk_buff *receive_mergeabl= e(struct net_device *dev, > > > > > > > } > > > > > > > offset =3D buf - page_address(page); > > > > > > > if (skb_can_coalesce(curr_skb, num_skb_frags, p= age, offset)) { > > > > > > > - put_page(page); > > > > > > > + virtnet_put_page(rq, page); > > > > > > > skb_coalesce_rx_frag(curr_skb, num_skb_= frags - 1, > > > > > > > len, truesize); > > > > > > > } else { > > > > > > > @@ -1530,7 +1561,7 @@ static struct sk_buff *receive_mergeabl= e(struct net_device *dev, > > > > > > > return head_skb; > > > > > > > > > > > > > > err_skb: > > > > > > > - put_page(page); > > > > > > > + virtnet_put_page(rq, page); > > > > > > > mergeable_buf_free(rq, num_buf, dev, stats); > > > > > > > > > > > > > > err_buf: > > > > > > > @@ -1737,31 +1768,40 @@ static int add_recvbuf_mergeable(stru= ct virtnet_info *vi, > > > > > > > * disabled GSO for XDP, it won't be a big issue. > > > > > > > */ > > > > > > > len =3D get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len,= room); > > > > > > > - if (unlikely(!skb_page_frag_refill(len + room, alloc_fr= ag, gfp))) > > > > > > > - return -ENOMEM; > > > > > > > + if (rq->page_pool) { > > > > > > > + struct page *page; > > > > > > > > > > > > > > - buf =3D (char *)page_address(alloc_frag->page) + alloc_= frag->offset; > > > > > > > - buf +=3D headroom; /* advance address leaving hole at f= ront of pkt */ > > > > > > > - get_page(alloc_frag->page); > > > > > > > - alloc_frag->offset +=3D len + room; > > > > > > > - hole =3D alloc_frag->size - alloc_frag->offset; > > > > > > > - if (hole < len + room) { > > > > > > > - /* To avoid internal fragmentation, if there is= very likely not > > > > > > > - * enough space for another buffer, add the rem= aining space to > > > > > > > - * the current buffer. > > > > > > > - * XDP core assumes that frame_size of xdp_buff= and the length > > > > > > > - * of the frag are PAGE_SIZE, so we disable the= hole mechanism. > > > > > > > - */ > > > > > > > - if (!headroom) > > > > > > > - len +=3D hole; > > > > > > > - alloc_frag->offset +=3D hole; > > > > > > > - } > > > > > > > + page =3D page_pool_dev_alloc_pages(rq->page_poo= l); > > > > > > > + if (unlikely(!page)) > > > > > > > + return -ENOMEM; > > > > > > > + buf =3D (char *)page_address(page); > > > > > > > + buf +=3D headroom; /* advance address leaving h= ole at front of pkt */ > > > > > > > + } else { > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, = alloc_frag, gfp))) > > > > > > > + return -ENOMEM; > > > > > > > > > > > > > > + buf =3D (char *)page_address(alloc_frag->page) = + alloc_frag->offset; > > > > > > > + buf +=3D headroom; /* advance address leaving h= ole at front of pkt */ > > > > > > > + get_page(alloc_frag->page); > > > > > > > + alloc_frag->offset +=3D len + room; > > > > > > > + hole =3D alloc_frag->size - alloc_frag->offset; > > > > > > > + if (hole < len + room) { > > > > > > > + /* To avoid internal fragmentation, if = there is very likely not > > > > > > > + * enough space for another buffer, add= the remaining space to > > > > > > > + * the current buffer. > > > > > > > + * XDP core assumes that frame_size of = xdp_buff and the length > > > > > > > + * of the frag are PAGE_SIZE, so we dis= able the hole mechanism. > > > > > > > + */ > > > > > > > + if (!headroom) > > > > > > > + len +=3D hole; > > > > > > > + alloc_frag->offset +=3D hole; > > > > > > > + } > > > > > > > + } > > > > > > > sg_init_one(rq->sg, buf, len); > > > > > > > ctx =3D mergeable_len_to_ctx(len + room, headroom); > > > > > > > err =3D virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf,= ctx, gfp); > > > > > > > if (err < 0) > > > > > > > - put_page(virt_to_head_page(buf)); > > > > > > > + virtnet_put_page(rq, virt_to_head_page(buf)); > > > > > > > > > > > > > > return err; > > > > > > > } > > > > > > > @@ -1994,8 +2034,15 @@ static int virtnet_enable_queue_pair(s= truct virtnet_info *vi, int qp_index) > > > > > > > if (err < 0) > > > > > > > return err; > > > > > > > > > > > > > > - err =3D xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xd= p_rxq, > > > > > > > - MEM_TYPE_PAGE_SHARED, = NULL); > > > > > > > + if (vi->rq[qp_index].page_pool) > > > > > > > + err =3D xdp_rxq_info_reg_mem_model(&vi->rq[qp_i= ndex].xdp_rxq, > > > > > > > + MEM_TYPE_PAGE_= POOL, > > > > > > > + vi->rq[qp_inde= x].page_pool); > > > > > > > + else > > > > > > > + err =3D xdp_rxq_info_reg_mem_model(&vi->rq[qp_i= ndex].xdp_rxq, > > > > > > > + MEM_TYPE_PAGE_= SHARED, > > > > > > > + NULL); > > > > > > > + > > > > > > > if (err < 0) > > > > > > > goto err_xdp_reg_mem_model; > > > > > > > > > > > > > > @@ -2951,6 +2998,7 @@ static void virtnet_get_strings(struct = net_device *dev, u32 stringset, u8 *data) > > > > > > > ethtool_sprintf(&p, "tx_queue_%= u_%s", i, > > > > > > > virtnet_sq_stat= s_desc[j].desc); > > > > > > > } > > > > > > > + page_pool_ethtool_stats_get_strings(p); > > > > > > > break; > > > > > > > } > > > > > > > } > > > > > > > @@ -2962,12 +3010,30 @@ static int virtnet_get_sset_count(str= uct net_device *dev, int sset) > > > > > > > switch (sset) { > > > > > > > case ETH_SS_STATS: > > > > > > > return vi->curr_queue_pairs * (VIRTNET_RQ_STATS= _LEN + > > > > > > > - VIRTNET_SQ_STATS= _LEN); > > > > > > > + VIRTNET_SQ_STATS= _LEN + > > > > > > > + (page_pool_enab= led && vi->mergeable_rx_bufs ? > > > > > > > + page_pool_etht= ool_stats_get_count() : 0)); > > > > > > > default: > > > > > > > return -EOPNOTSUPP; > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > +static void virtnet_get_page_pool_stats(struct net_device *d= ev, u64 *data) > > > > > > > +{ > > > > > > > +#ifdef CONFIG_PAGE_POOL_STATS > > > > > > > + struct virtnet_info *vi =3D netdev_priv(dev); > > > > > > > + struct page_pool_stats pp_stats =3D {}; > > > > > > > + int i; > > > > > > > + > > > > > > > + for (i =3D 0; i < vi->curr_queue_pairs; i++) { > > > > > > > + if (!vi->rq[i].page_pool) > > > > > > > + continue; > > > > > > > + page_pool_get_stats(vi->rq[i].page_pool, &pp_st= ats); > > > > > > > + } > > > > > > > + page_pool_ethtool_stats_get(data, &pp_stats); > > > > > > > +#endif /* CONFIG_PAGE_POOL_STATS */ > > > > > > > +} > > > > > > > + > > > > > > > static void virtnet_get_ethtool_stats(struct net_device *dev= , > > > > > > > struct ethtool_stats *sta= ts, u64 *data) > > > > > > > { > > > > > > > @@ -3003,6 +3069,8 @@ static void virtnet_get_ethtool_stats(s= truct net_device *dev, > > > > > > > } while (u64_stats_fetch_retry(&sq->stats.syncp= , start)); > > > > > > > idx +=3D VIRTNET_SQ_STATS_LEN; > > > > > > > } > > > > > > > + > > > > > > > + virtnet_get_page_pool_stats(dev, &data[idx]); > > > > > > > } > > > > > > > > > > > > > > static void virtnet_get_channels(struct net_device *dev, > > > > > > > @@ -3623,6 +3691,8 @@ static void virtnet_free_queues(struct = virtnet_info *vi) > > > > > > > for (i =3D 0; i < vi->max_queue_pairs; i++) { > > > > > > > __netif_napi_del(&vi->rq[i].napi); > > > > > > > __netif_napi_del(&vi->sq[i].napi); > > > > > > > + if (vi->rq[i].page_pool) > > > > > > > + page_pool_destroy(vi->rq[i].page_pool); > > > > > > > } > > > > > > > > > > > > > > /* We called __netif_napi_del(), > > > > > > > @@ -3679,12 +3749,19 @@ static void virtnet_rq_free_unused_bu= f(struct virtqueue *vq, void *buf) > > > > > > > struct virtnet_info *vi =3D vq->vdev->priv; > > > > > > > int i =3D vq2rxq(vq); > > > > > > > > > > > > > > - if (vi->mergeable_rx_bufs) > > > > > > > - put_page(virt_to_head_page(buf)); > > > > > > > - else if (vi->big_packets) > > > > > > > + if (vi->mergeable_rx_bufs) { > > > > > > > + if (vi->rq[i].page_pool) { > > > > > > > + page_pool_put_full_page(vi->rq[i].page_= pool, > > > > > > > + virt_to_head_pa= ge(buf), > > > > > > > + true); > > > > > > > + } else { > > > > > > > + put_page(virt_to_head_page(buf)); > > > > > > > + } > > > > > > > + } else if (vi->big_packets) { > > > > > > > give_pages(&vi->rq[i], buf); > > > > > > > - else > > > > > > > + } else { > > > > > > > put_page(virt_to_head_page(buf)); > > > > > > > + } > > > > > > > } > > > > > > > > > > > > > > static void free_unused_bufs(struct virtnet_info *vi) > > > > > > > @@ -3718,6 +3795,26 @@ static void virtnet_del_vqs(struct vir= tnet_info *vi) > > > > > > > virtnet_free_queues(vi); > > > > > > > } > > > > > > > > > > > > > > +static void virtnet_alloc_page_pool(struct receive_queue *rq= ) > > > > > > > +{ > > > > > > > + struct virtio_device *vdev =3D rq->vq->vdev; > > > > > > > + > > > > > > > + struct page_pool_params pp_params =3D { > > > > > > > + .order =3D 0, > > > > > > > + .pool_size =3D rq->vq->num_max, > > > > > > > + .nid =3D dev_to_node(vdev->dev.parent), > > > > > > > + .dev =3D vdev->dev.parent, > > > > > > > + .offset =3D 0, > > > > > > > + }; > > > > > > > + > > > > > > > + rq->page_pool =3D page_pool_create(&pp_params); > > > > > > > + if (IS_ERR(rq->page_pool)) { > > > > > > > + dev_warn(&vdev->dev, "page pool creation failed= : %ld\n", > > > > > > > + PTR_ERR(rq->page_pool)); > > > > > > > + rq->page_pool =3D NULL; > > > > > > > + } > > > > > > > +} > > > > > > > + > > > > > > > /* How large should a single buffer be so a queue full of th= ese can fit at > > > > > > > * least one full packet? > > > > > > > * Logic below assumes the mergeable buffer header is used. > > > > > > > @@ -3801,6 +3898,13 @@ static int virtnet_find_vqs(struct vir= tnet_info *vi) > > > > > > > vi->rq[i].vq =3D vqs[rxq2vq(i)]; > > > > > > > vi->rq[i].min_buf_len =3D mergeable_min_buf_len= (vi, vi->rq[i].vq); > > > > > > > vi->sq[i].vq =3D vqs[txq2vq(i)]; > > > > > > > + > > > > > > > + if (page_pool_enabled && vi->mergeable_rx_bufs) > > > > > > > + virtnet_alloc_page_pool(&vi->rq[i]); > > > > > > > + else > > > > > > > + dev_warn(&vi->vdev->dev, > > > > > > > + "page pool only support mergea= ble mode\n"); > > > > > > > + > > > > > > > } > > > > > > > > > > > > > > /* run here: ret =3D=3D 0. */ > > > > > > > -- > > > > > > > 2.31.1 > > > > > > > > > > >