Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp492202rwd; Wed, 7 Jun 2023 02:56:37 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4fflZq5OW+UtFp44gicJLn5rwCnXjRjJ0zCq/FJ0ncQzkQsCw3Tl4AGeIRoKLd7yjtH6Ve X-Received: by 2002:a25:5888:0:b0:ba8:66fb:dd86 with SMTP id m130-20020a255888000000b00ba866fbdd86mr5429067ybb.25.1686131797230; Wed, 07 Jun 2023 02:56:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686131797; cv=none; d=google.com; s=arc-20160816; b=AH9KuyYjwgySwCHyEaOEmhhjhgbgYsI3XIl2UAwCDXQFk8vsBVOJ05KDi47T984ccX 3jOC+FAnYPD5dqkFxFy4sDbY6fLKOPQyqpZiUqKoJt5nDmAgErrQcnHhN8s86Oiyb/BR 5u5e9uLg51JNLTmsiXlG4coz0bsnPvtAyqENZ1UM7wNPdElhfvmlJrH7r1ozJXpZjgIw Gmtt+r3x7yLZq4Hxqp73BQ6bQUPiP08ycV1eR0q/bMOXRtww1Bpn/O8H6iDdRZm0lIw9 TM7zqOqiWJeggp/JbJmMgxd3DipZ9m+DCDCsNFXlKiqFq3IJUi8zVOrUtqR6v8vqjz63 7Qmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=JmPNYkBgrmBcmDwp1wXLEF1VLnBZxU0TCJ0B7RX359k=; b=DbXrO5+jj7wrngSKZ49feTQ6yMJmqDSf5TKdFsI4TsC5f1s4ZNr5rMFwHAat+ojy8L H72zXdIfbP6Bcrh45JD6yTUfehBoCCEHB8/jEif8T3AnG51wbGTFxM67wBmOKHVK2OHK t3hKWe5KQGXLjD91PzLca1pQ+AnzeeSeucCFffKWyib0tNjgtxq5I+hMIjVOOy2OCQwV UZ4O+qIJrlecs2v5b0zjfugca0j+wZEaMANI45GPR1KdFF9ernMkqSNh4AjYsv8MEu8l tpQmMbdKf7SnFRYNPB4D+TDzLRwZxGyIXjtkfvn0Ir3MUrzwuP30Gw3Qz97bSgi6Msgz usZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=hrcoUew4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ju4-20020a170903428400b001b241f8a865si1730783plb.117.2023.06.07.02.56.24; Wed, 07 Jun 2023 02:56:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=hrcoUew4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239574AbjFGJNh (ORCPT + 99 others); Wed, 7 Jun 2023 05:13:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239579AbjFGJNE (ORCPT ); Wed, 7 Jun 2023 05:13:04 -0400 Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7682270B; Wed, 7 Jun 2023 02:11:59 -0700 (PDT) Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-2b1acd41ad2so65833281fa.3; Wed, 07 Jun 2023 02:11:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686129117; x=1688721117; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JmPNYkBgrmBcmDwp1wXLEF1VLnBZxU0TCJ0B7RX359k=; b=hrcoUew4GGOgY40QjZuk7As8VKuxPtSWlU/7nxoalftJquDHyowB6Zu2S5VQedVR6c 7Z2Qi5OEuZKw3bh+7i5H6qJVvvMuo7vnSgjT5a+7HwuxJqvTgQYn/g2F2xwXXuhFgjU6 76enUw8Z4MPsy4j76fuqchWqroxgoqTfs4HjNVSmLC+odutswLD8CTP9Hv81gDXrx8iw tyxT6kq39fqNNJ/s4Z0NkMyBg3mCxSxRee/tR/35h2qPNvonzxYOUvjb8TSMzrruwEnn ndYrw74QOFuLQoiWWISJMxEaNX0O77dro3z7CHGVarUfqBuY2611iDd7Wizkk2VzVv7s GM7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686129117; x=1688721117; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JmPNYkBgrmBcmDwp1wXLEF1VLnBZxU0TCJ0B7RX359k=; b=gGm3SGKGF9Os89k8TbPe/MQo0raq9W1/5PExArYG6tFj7r9hOyGe2imtID3+voTSeN vfkJ2FWFe3xioJlukQp0j4/OfiFaujXHxhDigWtGIldej5aS8yU5fuOJvJomGLZqAb97 5o5omRYHljijhQECnyEf3s3IXIq9SwhLdfJdfD8AkAQRhA+KWwAbTGEphjxtKPmg/oH0 obllclEcB0XM5g2BrbxtiIE72xIqLKrRrudI299ivLj2cIw0g2E66TawUdFOz/qgNAEp vWbi/GfuE4WRz0L53aPbjwO7W0UmVN0/o7fv77kO7+5GgyuFutzg7tn8EZ/BYcrgPQnd yhuQ== X-Gm-Message-State: AC+VfDzaADB/GdvbQN//J+2dI2272mI4SnHme3GBTl0ExkM2rHqmi06z wmJmocrxTqIpWlmIuwEJaQ6entXZOsj3NLeDbD0= X-Received: by 2002:a2e:3013:0:b0:2ac:8e5a:1054 with SMTP id w19-20020a2e3013000000b002ac8e5a1054mr1913896ljw.0.1686129117219; Wed, 07 Jun 2023 02:11:57 -0700 (PDT) MIME-Version: 1.0 References: <20230526054621.18371-1-liangchen.linux@gmail.com> <20230526054621.18371-2-liangchen.linux@gmail.com> <20230528023956-mutt-send-email-mst@kernel.org> <1685502651.282807-1-xuanzhuo@linux.alibaba.com> In-Reply-To: <1685502651.282807-1-xuanzhuo@linux.alibaba.com> From: Liang Chen Date: Wed, 7 Jun 2023 17:11:44 +0800 Message-ID: Subject: Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance To: Xuan Zhuo Cc: Jason Wang , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kuba@kernel.org, edumazet@google.com, davem@davemloft.net, pabeni@redhat.com, alexander.duyck@gmail.com, "Michael S. Tsirkin" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 31, 2023 at 11:12=E2=80=AFAM Xuan Zhuo wrote: > > On Mon, 29 May 2023 15:28:17 +0800, Liang Chen wrote: > > On Sun, May 28, 2023 at 2:40=E2=80=AFPM Michael S. Tsirkin wrote: > > > > > > On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote: > > > > On Fri, May 26, 2023 at 2:51=E2=80=AFPM Jason Wang wrote: > > > > > > > > > > On Fri, May 26, 2023 at 1:46=E2=80=AFPM Liang Chen wrote: > > > > > > > > > > > > The implementation at the moment uses one page per packet in bo= th the > > > > > > normal and XDP path. > > > > > > > > > > It's better to explain why we need a page pool and how it can hel= p the > > > > > performance. > > > > > > > > > > > > > Sure, I will include that on v2. > > > > > > In addition, introducing a module parameter to enable > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > If page pool wins for most of the cases, any reason to disable it= by default? > > > > > > > > > > > > > Thank you for raising the point. It does make sense to enable it by= default. > > > > > > I'd like to see more benchmarks pls then, with a variety of packet > > > sizes, udp and tcp. > > > > > > > Sure, more benchmarks will be provided. Thanks. > > > I think so. > > I did this, but I did not found any improve. So I gave up it. > > Thanks. > > Our UDP benchmark shows a steady 0.8 percent change in PPS measurement. However, when conducting iperf TCP stream performance testing, the results vary depending on the packet size and testing setup. With small packet sizes, the performance actually drops slightly due to the reasons I explained in the previous email. On the other hand, with large packets, we need to ensure that the sender side doesn't become the bottleneck. To achieve this, our setup uses a single-core vm to keep the receiver busy, which allows us to identify performance differences in the receiving path. Thanks, Liang > > > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest perfo= rmance gain > > > > > > in the normal path. > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > In multi-core vm testing environments, The most significant per= formance > > > > > > gain is observed in XDP cpumap: > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > Please show more details on the test. E.g which kinds of tests ha= ve > > > > > you measured? > > > > > > > > > > Btw, it would be better to measure PPS as well. > > > > > > > > > > > > > Sure. It will be added on v2. > > > > > > > > > > > > With this foundation, we can further integrate page pool fragme= ntation and > > > > > > DMA map/unmap support. > > > > > > > > > > > > Signed-off-by: Liang Chen > > > > > > --- > > > > > > drivers/net/virtio_net.c | 188 ++++++++++++++++++++++++++++++-= -------- > > > > > > > > > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or= do > > > > > the ifdef tricks at least. > > > > > > > > > > > > > Sure. it will be done on v2. > > > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.= c > > > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > > > --- a/drivers/net/virtio_net.c > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > > > > > module_param(gso, bool, 0444); > > > > > > module_param(napi_tx, bool, 0644); > > > > > > > > > > > > +static bool page_pool_enabled; > > > > > > +module_param(page_pool_enabled, bool, 0400); > > > > > > + > > > > > > /* FIXME: MTU in config. */ > > > > > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > > > > > #define GOOD_COPY_LEN 128 > > > > > > @@ -159,6 +162,9 @@ struct receive_queue { > > > > > > /* Chain pages by the private ptr. */ > > > > > > struct page *pages; > > > > > > > > > > > > + /* Page pool */ > > > > > > + struct page_pool *page_pool; > > > > > > + > > > > > > /* Average packet length for mergeable receive buffers.= */ > > > > > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > > > > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(v= oid *buf, unsigned int buflen, > > > > > > return skb; > > > > > > } > > > > > > > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct = page *page) > > > > > > +{ > > > > > > + if (rq->page_pool) > > > > > > + page_pool_put_full_page(rq->page_pool, page, tr= ue); > > > > > > + else > > > > > > + put_page(page); > > > > > > +} > > > > > > + > > > > > > /* Called from bottom half context */ > > > > > > static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > > > > struct receive_queue *rq, > > > > > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct v= irtnet_info *vi, > > > > > > hdr =3D skb_vnet_hdr(skb); > > > > > > memcpy(hdr, hdr_p, hdr_len); > > > > > > if (page_to_free) > > > > > > - put_page(page_to_free); > > > > > > + virtnet_put_page(rq, page_to_free); > > > > > > > > > > > > return skb; > > > > > > } > > > > > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_devi= ce *dev, > > > > > > return ret; > > > > > > } > > > > > > > > > > > > -static void put_xdp_frags(struct xdp_buff *xdp) > > > > > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive= _queue *rq) > > > > > > { > > > > > > > > > > rq could be fetched from xdp_rxq_info? > > > > > > > > Yeah, it has the queue_index there. > > > > > > > > > > > struct skb_shared_info *shinfo; > > > > > > struct page *xdp_page; > > > > > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *= xdp) > > > > > > shinfo =3D xdp_get_shared_info_from_buff(xdp); > > > > > > for (i =3D 0; i < shinfo->nr_frags; i++) { > > > > > > xdp_page =3D skb_frag_page(&shinfo->fra= gs[i]); > > > > > > - put_page(xdp_page); > > > > > > + virtnet_put_page(rq, xdp_page); > > > > > > } > > > > > > } > > > > > > } > > > > > > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(str= uct receive_queue *rq, > > > > > > if (page_off + *len + tailroom > PAGE_SIZE) > > > > > > return NULL; > > > > > > > > > > > > - page =3D alloc_page(GFP_ATOMIC); > > > > > > + if (rq->page_pool) > > > > > > + page =3D page_pool_dev_alloc_pages(rq->page_poo= l); > > > > > > + else > > > > > > + page =3D alloc_page(GFP_ATOMIC); > > > > > > + > > > > > > if (!page) > > > > > > return NULL; > > > > > > > > > > > > @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(st= ruct receive_queue *rq, > > > > > > * is sending packet larger than the MTU. > > > > > > */ > > > > > > if ((page_off + buflen + tailroom) > PAGE_SIZE)= { > > > > > > - put_page(p); > > > > > > + virtnet_put_page(rq, p); > > > > > > goto err_buf; > > > > > > } > > > > > > > > > > > > memcpy(page_address(page) + page_off, > > > > > > page_address(p) + off, buflen); > > > > > > page_off +=3D buflen; > > > > > > - put_page(p); > > > > > > + virtnet_put_page(rq, p); > > > > > > } > > > > > > > > > > > > /* Headroom does not contribute to packet length */ > > > > > > *len =3D page_off - VIRTIO_XDP_HEADROOM; > > > > > > return page; > > > > > > err_buf: > > > > > > - __free_pages(page, 0); > > > > > > + if (rq->page_pool) > > > > > > + page_pool_put_full_page(rq->page_pool, page, tr= ue); > > > > > > + else > > > > > > + __free_pages(page, 0); > > > > > > return NULL; > > > > > > } > > > > > > > > > > > > @@ -1144,7 +1165,7 @@ static void mergeable_buf_free(struct rec= eive_queue *rq, int num_buf, > > > > > > } > > > > > > stats->bytes +=3D len; > > > > > > page =3D virt_to_head_page(buf); > > > > > > - put_page(page); > > > > > > + virtnet_put_page(rq, page); > > > > > > } > > > > > > } > > > > > > > > > > > > @@ -1264,7 +1285,7 @@ static int virtnet_build_xdp_buff_mrg(str= uct net_device *dev, > > > > > > cur_frag_size =3D truesize; > > > > > > xdp_frags_truesz +=3D cur_frag_size; > > > > > > if (unlikely(len > truesize - room || cur_frag_= size > PAGE_SIZE)) { > > > > > > - put_page(page); > > > > > > + virtnet_put_page(rq, page); > > > > > > pr_debug("%s: rx error: len %u exceeds = truesize %lu\n", > > > > > > dev->name, len, (unsigned long= )(truesize - room)); > > > > > > dev->stats.rx_length_errors++; > > > > > > @@ -1283,7 +1304,7 @@ static int virtnet_build_xdp_buff_mrg(str= uct net_device *dev, > > > > > > return 0; > > > > > > > > > > > > err: > > > > > > - put_xdp_frags(xdp); > > > > > > + put_xdp_frags(xdp, rq); > > > > > > return -EINVAL; > > > > > > } > > > > > > > > > > > > @@ -1344,7 +1365,10 @@ static void *mergeable_xdp_get_buf(struc= t virtnet_info *vi, > > > > > > if (*len + xdp_room > PAGE_SIZE) > > > > > > return NULL; > > > > > > > > > > > > - xdp_page =3D alloc_page(GFP_ATOMIC); > > > > > > + if (rq->page_pool) > > > > > > + xdp_page =3D page_pool_dev_alloc_pages(= rq->page_pool); > > > > > > + else > > > > > > + xdp_page =3D alloc_page(GFP_ATOMIC); > > > > > > if (!xdp_page) > > > > > > return NULL; > > > > > > > > > > > > @@ -1354,7 +1378,7 @@ static void *mergeable_xdp_get_buf(struct= virtnet_info *vi, > > > > > > > > > > > > *frame_sz =3D PAGE_SIZE; > > > > > > > > > > > > - put_page(*page); > > > > > > + virtnet_put_page(rq, *page); > > > > > > > > > > > > *page =3D xdp_page; > > > > > > > > > > > > @@ -1400,6 +1424,8 @@ static struct sk_buff *receive_mergeable_= xdp(struct net_device *dev, > > > > > > head_skb =3D build_skb_from_xdp_buff(dev, vi, &= xdp, xdp_frags_truesz); > > > > > > if (unlikely(!head_skb)) > > > > > > break; > > > > > > + if (rq->page_pool) > > > > > > + skb_mark_for_recycle(head_skb); > > > > > > return head_skb; > > > > > > > > > > > > case XDP_TX: > > > > > > @@ -1410,10 +1436,10 @@ static struct sk_buff *receive_mergeabl= e_xdp(struct net_device *dev, > > > > > > break; > > > > > > } > > > > > > > > > > > > - put_xdp_frags(&xdp); > > > > > > + put_xdp_frags(&xdp, rq); > > > > > > > > > > > > err_xdp: > > > > > > - put_page(page); > > > > > > + virtnet_put_page(rq, page); > > > > > > mergeable_buf_free(rq, num_buf, dev, stats); > > > > > > > > > > > > stats->xdp_drops++; > > > > > > @@ -1467,6 +1493,9 @@ static struct sk_buff *receive_mergeable(= struct net_device *dev, > > > > > > head_skb =3D page_to_skb(vi, rq, page, offset, len, tru= esize, headroom); > > > > > > curr_skb =3D head_skb; > > > > > > > > > > > > + if (rq->page_pool) > > > > > > + skb_mark_for_recycle(curr_skb); > > > > > > + > > > > > > if (unlikely(!curr_skb)) > > > > > > goto err_skb; > > > > > > while (--num_buf) { > > > > > > @@ -1509,6 +1538,8 @@ static struct sk_buff *receive_mergeable(= struct net_device *dev, > > > > > > curr_skb =3D nskb; > > > > > > head_skb->truesize +=3D nskb->truesize; > > > > > > num_skb_frags =3D 0; > > > > > > + if (rq->page_pool) > > > > > > + skb_mark_for_recycle(curr_skb); > > > > > > } > > > > > > if (curr_skb !=3D head_skb) { > > > > > > head_skb->data_len +=3D len; > > > > > > @@ -1517,7 +1548,7 @@ static struct sk_buff *receive_mergeable(= struct net_device *dev, > > > > > > } > > > > > > offset =3D buf - page_address(page); > > > > > > if (skb_can_coalesce(curr_skb, num_skb_frags, p= age, offset)) { > > > > > > - put_page(page); > > > > > > + virtnet_put_page(rq, page); > > > > > > > > > > I wonder why not we can't do this during buffer allocation like o= ther drivers? > > > > > > > > > > > > > Sorry, I don't quite understand the point here. Would you please > > > > elaborate a bit more? > > > > > > skb_coalesce_rx_frag(curr_skb, num_skb_= frags - 1, > > > > > > len, truesize); > > > > > > } else { > > > > > > @@ -1530,7 +1561,7 @@ static struct sk_buff *receive_mergeable(= struct net_device *dev, > > > > > > return head_skb; > > > > > > > > > > > > err_skb: > > > > > > - put_page(page); > > > > > > + virtnet_put_page(rq, page); > > > > > > mergeable_buf_free(rq, num_buf, dev, stats); > > > > > > > > > > > > err_buf: > > > > > > @@ -1737,31 +1768,40 @@ static int add_recvbuf_mergeable(struct= virtnet_info *vi, > > > > > > * disabled GSO for XDP, it won't be a big issue. > > > > > > */ > > > > > > len =3D get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len,= room); > > > > > > - if (unlikely(!skb_page_frag_refill(len + room, alloc_fr= ag, gfp))) > > > > > > - return -ENOMEM; > > > > > > + if (rq->page_pool) { > > > > > > + struct page *page; > > > > > > > > > > > > - buf =3D (char *)page_address(alloc_frag->page) + alloc_= frag->offset; > > > > > > - buf +=3D headroom; /* advance address leaving hole at f= ront of pkt */ > > > > > > - get_page(alloc_frag->page); > > > > > > - alloc_frag->offset +=3D len + room; > > > > > > - hole =3D alloc_frag->size - alloc_frag->offset; > > > > > > - if (hole < len + room) { > > > > > > - /* To avoid internal fragmentation, if there is= very likely not > > > > > > - * enough space for another buffer, add the rem= aining space to > > > > > > - * the current buffer. > > > > > > - * XDP core assumes that frame_size of xdp_buff= and the length > > > > > > - * of the frag are PAGE_SIZE, so we disable the= hole mechanism. > > > > > > - */ > > > > > > - if (!headroom) > > > > > > - len +=3D hole; > > > > > > - alloc_frag->offset +=3D hole; > > > > > > - } > > > > > > + page =3D page_pool_dev_alloc_pages(rq->page_poo= l); > > > > > > + if (unlikely(!page)) > > > > > > + return -ENOMEM; > > > > > > + buf =3D (char *)page_address(page); > > > > > > + buf +=3D headroom; /* advance address leaving h= ole at front of pkt */ > > > > > > + } else { > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, = alloc_frag, gfp))) > > > > > > > > > > Why not simply use a helper like virtnet_page_frag_refill() and a= dd > > > > > the page_pool allocation logic there? It helps to reduce the > > > > > changeset. > > > > > > > > > > > > > Sure. Will do that on v2. > > > > > > + return -ENOMEM; > > > > > > > > > > > > + buf =3D (char *)page_address(alloc_frag->page) = + alloc_frag->offset; > > > > > > + buf +=3D headroom; /* advance address leaving h= ole at front of pkt */ > > > > > > + get_page(alloc_frag->page); > > > > > > + alloc_frag->offset +=3D len + room; > > > > > > + hole =3D alloc_frag->size - alloc_frag->offset; > > > > > > + if (hole < len + room) { > > > > > > + /* To avoid internal fragmentation, if = there is very likely not > > > > > > + * enough space for another buffer, add= the remaining space to > > > > > > + * the current buffer. > > > > > > + * XDP core assumes that frame_size of = xdp_buff and the length > > > > > > + * of the frag are PAGE_SIZE, so we dis= able the hole mechanism. > > > > > > + */ > > > > > > + if (!headroom) > > > > > > + len +=3D hole; > > > > > > + alloc_frag->offset +=3D hole; > > > > > > + } > > > > > > + } > > > > > > sg_init_one(rq->sg, buf, len); > > > > > > ctx =3D mergeable_len_to_ctx(len + room, headroom); > > > > > > err =3D virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf,= ctx, gfp); > > > > > > if (err < 0) > > > > > > - put_page(virt_to_head_page(buf)); > > > > > > + virtnet_put_page(rq, virt_to_head_page(buf)); > > > > > > > > > > > > return err; > > > > > > } > > > > > > @@ -1994,8 +2034,15 @@ static int virtnet_enable_queue_pair(str= uct virtnet_info *vi, int qp_index) > > > > > > if (err < 0) > > > > > > return err; > > > > > > > > > > > > - err =3D xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xd= p_rxq, > > > > > > - MEM_TYPE_PAGE_SHARED, = NULL); > > > > > > + if (vi->rq[qp_index].page_pool) > > > > > > + err =3D xdp_rxq_info_reg_mem_model(&vi->rq[qp_i= ndex].xdp_rxq, > > > > > > + MEM_TYPE_PAGE_= POOL, > > > > > > + vi->rq[qp_inde= x].page_pool); > > > > > > + else > > > > > > + err =3D xdp_rxq_info_reg_mem_model(&vi->rq[qp_i= ndex].xdp_rxq, > > > > > > + MEM_TYPE_PAGE_= SHARED, > > > > > > + NULL); > > > > > > + > > > > > > if (err < 0) > > > > > > goto err_xdp_reg_mem_model; > > > > > > > > > > > > @@ -2951,6 +2998,7 @@ static void virtnet_get_strings(struct ne= t_device *dev, u32 stringset, u8 *data) > > > > > > ethtool_sprintf(&p, "tx_queue_%= u_%s", i, > > > > > > virtnet_sq_stat= s_desc[j].desc); > > > > > > } > > > > > > + page_pool_ethtool_stats_get_strings(p); > > > > > > break; > > > > > > } > > > > > > } > > > > > > @@ -2962,12 +3010,30 @@ static int virtnet_get_sset_count(struc= t net_device *dev, int sset) > > > > > > switch (sset) { > > > > > > case ETH_SS_STATS: > > > > > > return vi->curr_queue_pairs * (VIRTNET_RQ_STATS= _LEN + > > > > > > - VIRTNET_SQ_STATS= _LEN); > > > > > > + VIRTNET_SQ_STATS= _LEN + > > > > > > + (page_pool_enab= led && vi->mergeable_rx_bufs ? > > > > > > + page_pool_etht= ool_stats_get_count() : 0)); > > > > > > default: > > > > > > return -EOPNOTSUPP; > > > > > > } > > > > > > } > > > > > > > > > > > > +static void virtnet_get_page_pool_stats(struct net_device *dev= , u64 *data) > > > > > > +{ > > > > > > +#ifdef CONFIG_PAGE_POOL_STATS > > > > > > + struct virtnet_info *vi =3D netdev_priv(dev); > > > > > > + struct page_pool_stats pp_stats =3D {}; > > > > > > + int i; > > > > > > + > > > > > > + for (i =3D 0; i < vi->curr_queue_pairs; i++) { > > > > > > + if (!vi->rq[i].page_pool) > > > > > > + continue; > > > > > > + page_pool_get_stats(vi->rq[i].page_pool, &pp_st= ats); > > > > > > + } > > > > > > + page_pool_ethtool_stats_get(data, &pp_stats); > > > > > > +#endif /* CONFIG_PAGE_POOL_STATS */ > > > > > > +} > > > > > > + > > > > > > static void virtnet_get_ethtool_stats(struct net_device *dev, > > > > > > struct ethtool_stats *sta= ts, u64 *data) > > > > > > { > > > > > > @@ -3003,6 +3069,8 @@ static void virtnet_get_ethtool_stats(str= uct net_device *dev, > > > > > > } while (u64_stats_fetch_retry(&sq->stats.syncp= , start)); > > > > > > idx +=3D VIRTNET_SQ_STATS_LEN; > > > > > > } > > > > > > + > > > > > > + virtnet_get_page_pool_stats(dev, &data[idx]); > > > > > > } > > > > > > > > > > > > static void virtnet_get_channels(struct net_device *dev, > > > > > > @@ -3623,6 +3691,8 @@ static void virtnet_free_queues(struct vi= rtnet_info *vi) > > > > > > for (i =3D 0; i < vi->max_queue_pairs; i++) { > > > > > > __netif_napi_del(&vi->rq[i].napi); > > > > > > __netif_napi_del(&vi->sq[i].napi); > > > > > > + if (vi->rq[i].page_pool) > > > > > > + page_pool_destroy(vi->rq[i].page_pool); > > > > > > } > > > > > > > > > > > > /* We called __netif_napi_del(), > > > > > > @@ -3679,12 +3749,19 @@ static void virtnet_rq_free_unused_buf(= struct virtqueue *vq, void *buf) > > > > > > struct virtnet_info *vi =3D vq->vdev->priv; > > > > > > int i =3D vq2rxq(vq); > > > > > > > > > > > > - if (vi->mergeable_rx_bufs) > > > > > > - put_page(virt_to_head_page(buf)); > > > > > > - else if (vi->big_packets) > > > > > > + if (vi->mergeable_rx_bufs) { > > > > > > + if (vi->rq[i].page_pool) { > > > > > > + page_pool_put_full_page(vi->rq[i].page_= pool, > > > > > > + virt_to_head_pa= ge(buf), > > > > > > + true); > > > > > > + } else { > > > > > > + put_page(virt_to_head_page(buf)); > > > > > > + } > > > > > > + } else if (vi->big_packets) { > > > > > > give_pages(&vi->rq[i], buf); > > > > > > > > > > Any reason only mergeable were modified but not for small and big= ? > > > > > > > > > > Thanks > > > > > > > > > > > > > Big mode uses the page chain to recycle pages, thus the using of > > > > "private" of the buffer page. I will take further look into that to > > > > see if it is better to use page pool in these cases. Thanks! > > > > > > > > > > > > > > > > > > - else > > > > > > + } else { > > > > > > put_page(virt_to_head_page(buf)); > > > > > > + } > > > > > > } > > > > > > > > > > > > static void free_unused_bufs(struct virtnet_info *vi) > > > > > > @@ -3718,6 +3795,26 @@ static void virtnet_del_vqs(struct virtn= et_info *vi) > > > > > > virtnet_free_queues(vi); > > > > > > } > > > > > > > > > > > > +static void virtnet_alloc_page_pool(struct receive_queue *rq) > > > > > > +{ > > > > > > + struct virtio_device *vdev =3D rq->vq->vdev; > > > > > > + > > > > > > + struct page_pool_params pp_params =3D { > > > > > > + .order =3D 0, > > > > > > + .pool_size =3D rq->vq->num_max, > > > > > > + .nid =3D dev_to_node(vdev->dev.parent), > > > > > > + .dev =3D vdev->dev.parent, > > > > > > + .offset =3D 0, > > > > > > + }; > > > > > > + > > > > > > + rq->page_pool =3D page_pool_create(&pp_params); > > > > > > + if (IS_ERR(rq->page_pool)) { > > > > > > + dev_warn(&vdev->dev, "page pool creation failed= : %ld\n", > > > > > > + PTR_ERR(rq->page_pool)); > > > > > > + rq->page_pool =3D NULL; > > > > > > + } > > > > > > +} > > > > > > + > > > > > > /* How large should a single buffer be so a queue full of thes= e can fit at > > > > > > * least one full packet? > > > > > > * Logic below assumes the mergeable buffer header is used. > > > > > > @@ -3801,6 +3898,13 @@ static int virtnet_find_vqs(struct virtn= et_info *vi) > > > > > > vi->rq[i].vq =3D vqs[rxq2vq(i)]; > > > > > > vi->rq[i].min_buf_len =3D mergeable_min_buf_len= (vi, vi->rq[i].vq); > > > > > > vi->sq[i].vq =3D vqs[txq2vq(i)]; > > > > > > + > > > > > > + if (page_pool_enabled && vi->mergeable_rx_bufs) > > > > > > + virtnet_alloc_page_pool(&vi->rq[i]); > > > > > > + else > > > > > > + dev_warn(&vi->vdev->dev, > > > > > > + "page pool only support mergea= ble mode\n"); > > > > > > + > > > > > > } > > > > > > > > > > > > /* run here: ret =3D=3D 0. */ > > > > > > -- > > > > > > 2.31.1 > > > > > > > > > > > > > >