Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp275105pxb; Wed, 18 Aug 2021 01:59:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxjtb6I7C0DeZrfPqZ1MG276WI2bJGnUnSGPRW2m+kBUR3r20wuMRzX95TDbSJG/J6HmGXa X-Received: by 2002:a17:906:43c9:: with SMTP id j9mr8475702ejn.57.1629277168305; Wed, 18 Aug 2021 01:59:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629277168; cv=none; d=google.com; s=arc-20160816; b=Vinfv6nFwwdnUAzUMuG7ECLWF1php6kQ3mZHOrNHC6LcmxVH7bLzTOQOEyiOZdcH0W Mh1hVGO7WCEp+to++82JHOX0lUtgibCmpdBf1Wd3Ja6/tIZuc5yo8EHpCdrtOAaKVIZU 2TeippPMH6AOZKkldfJeDY0cWk9lci8iSZFttTw8RjcxZY7Z3T71/PUwRQL3etDzGea4 hK/KInEcUS9WC6BsLS580JdbIBEq5Z1lv6DsfDM0HhCeDT+YeKi6K8FjKf/wI70machk wU+zO5QRLyrXpWx6TpkscVt0maUcOQYSWm3R3/lX607BbT3Y3yCqHfL39/LsyqwPTkf1 3AZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=YTLMiHhkiL4q3jiK9pt4YDrvktzE393XGs6AJ7TzC8Q=; b=WFUtu1dhVp5oO/MEjpsT2W8+a1aEulfgTUMSGqjm5oMduTiUY7954oCL9t0FfUjByc UStrt8pGgrbz5lxogRZ9i25XE3pKcAAn9lsRtztse24HmyxRlwdrUtnSfzMG4JUIKwad prbLp3AdJYnrZggKqQtTdZ9qO9VvpBESnm6hbeohrrbk+CQfEb7QQE7tPcZqQKD8+l0I nBjeQ5OdmaNFOPwSK0gqS6RE1hip3ltvEjXDWKSsR4uHoyG5/rLR8A4EWacYZJTHR6RF ASaphD/LFgFg93Xtm3kx4nEtEJOezkbM+VJMoQE+73wC7Gs7cFYHGn6OeA1W3cwBAQzx xq7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=I6kyAYLA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id yd23si5152211ejb.466.2021.08.18.01.59.04; Wed, 18 Aug 2021 01:59:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=I6kyAYLA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229864AbhHRI5y (ORCPT + 99 others); Wed, 18 Aug 2021 04:57:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229549AbhHRI5x (ORCPT ); Wed, 18 Aug 2021 04:57:53 -0400 Received: from mail-yb1-xb35.google.com (mail-yb1-xb35.google.com [IPv6:2607:f8b0:4864:20::b35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E2B1C061764 for ; Wed, 18 Aug 2021 01:57:19 -0700 (PDT) Received: by mail-yb1-xb35.google.com with SMTP id a9so400987ybr.5 for ; Wed, 18 Aug 2021 01:57:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=YTLMiHhkiL4q3jiK9pt4YDrvktzE393XGs6AJ7TzC8Q=; b=I6kyAYLAi/5ZODW7wt6QJbXbDvsHP7w+wgDYfsI9r0WrXnLIOL6x9SX+oTGZt2MtHL 1c7P6go5VFqsUuq53wUjUoUKTprlG1Th4tey6QyS0hU6YC1q8xQWAuuACjr/b8aHL8Od ygWUVKvp3pmjMLay2722OExM1GZtyq9rU7kS3IptT7DOB6b1U/JCLU/elvfNkt0dUgDW D68MoMMMhGyarXKuBDTRYHEEk/BT2LdVTObaZxoP8QXZjnRnb1DBaZTUw1U0djh7mGxy ANvzU50AEt9CCJsTBZMXFibsDsapAXORN2rA+9Gs1o7YNz1j6YW3n0RIdySVgY8TkyHo D/1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YTLMiHhkiL4q3jiK9pt4YDrvktzE393XGs6AJ7TzC8Q=; b=mFFkjpUCCjcAfkjHrasV73tR9NM9LSMiHFWN0+1RCy3oXa/ezj4WErdiku1MAsrHlf wZbMPKqc196zH/L+kOC+hsXECrH6fRu82lTBEVXZu5wWGUp0HUW9pogo5BhxOLunAsjA /gcPKowoFOVwz2X772fhfDGk3cRDfcIFZZJ2iZf7EyZG2Utq7AOXi+Hsm5dJoppZOFF9 B4uQZHpKrujnt3/kDnFjRZvJGjz8mrGhxJFK+LGTIAiKP41ySWNKlYfYjHSZxrJpmswG JqI4oqOXr4D9OZ6ovQw7pUYzAvhDPGVzrchTEu36w9mgDa8DX/wC6utWAmvd9ZbcCDCs bRdA== X-Gm-Message-State: AOAM532SxMcfVUQtrMqYG02toa9wIq/96HUIk8DEuDL6+5YdmLnF1hRZ vTPrItqgZ5G3DLFUUwDJVk2/sLDSBzNL8J1qTT1KLQ== X-Received: by 2002:a25:7806:: with SMTP id t6mr10195954ybc.132.1629277037785; Wed, 18 Aug 2021 01:57:17 -0700 (PDT) MIME-Version: 1.0 References: <1629257542-36145-1-git-send-email-linyunsheng@huawei.com> In-Reply-To: <1629257542-36145-1-git-send-email-linyunsheng@huawei.com> From: Eric Dumazet Date: Wed, 18 Aug 2021 10:57:06 +0200 Message-ID: Subject: Re: [PATCH RFC 0/7] add socket to netdev page frag recycling support To: Yunsheng Lin Cc: David Miller , Jakub Kicinski , Alexander Duyck , Russell King , Marcin Wojtas , linuxarm@openeuler.org, Yisen Zhuang , Salil Mehta , Thomas Petazzoni , Jesper Dangaard Brouer , Ilias Apalodimas , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrew Morton , Peter Zijlstra , Will Deacon , Matthew Wilcox , Vlastimil Babka , Fenghua Yu , Roman Gushchin , Peter Xu , "Tang, Feng" , Jason Gunthorpe , mcroce@microsoft.com, Hugh Dickins , Jonathan Lemon , Alexander Lobakin , Willem de Bruijn , wenxu , Cong Wang , Kevin Hao , Aleksandr Nogikh , Marco Elver , Yonghong Song , kpsingh@kernel.org, Andrii Nakryiko , Martin KaFai Lau , Song Liu , netdev , LKML , bpf , chenhao288@hisilicon.com, Hideaki YOSHIFUJI , David Ahern , memxor@gmail.com, linux@rempel-privat.de, Antoine Tenart , Wei Wang , Taehee Yoo , Arnd Bergmann , Mat Martineau , aahringo@redhat.com, ceggers@arri.de, yangbo.lu@nxp.com, Florian Westphal , xiangxia.m.yue@gmail.com, linmiaohe Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 18, 2021 at 5:33 AM Yunsheng Lin wrote: > > This patchset adds the socket to netdev page frag recycling > support based on the busy polling and page pool infrastructure. I really do not see how this can scale to thousands of sockets. tcp_mem[] defaults to ~ 9 % of physical memory. If you now run tests with thousands of sockets, their skbs will consume Gigabytes of memory on typical servers, now backed by order-0 pages (instead of current order-3 pages) So IOMMU costs will actually be much bigger. Are we planning to use Gigabyte sized page pools for NIC ? Have you tried instead to make TCP frags twice bigger ? This would require less IOMMU mappings. (Note: This could require some mm help, since PAGE_ALLOC_COSTLY_ORDER is currently 3, not 4) diff --git a/net/core/sock.c b/net/core/sock.c index a3eea6e0b30a7d43793f567ffa526092c03e3546..6b66b51b61be9f198f6f1c4a3d81b57fa327986a 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2560,7 +2560,7 @@ static void sk_leave_memory_pressure(struct sock *sk) } } -#define SKB_FRAG_PAGE_ORDER get_order(32768) +#define SKB_FRAG_PAGE_ORDER get_order(65536) DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); /** > > The profermance improve from 30Gbit to 41Gbit for one thread iperf > tcp flow, and the CPU usages decreases about 20% for four threads > iperf flow with 100Gb line speed in IOMMU strict mode. > > The profermance improve about 2.5% for one thread iperf tcp flow > in IOMMU passthrough mode. > > Yunsheng Lin (7): > page_pool: refactor the page pool to support multi alloc context > skbuff: add interface to manipulate frag count for tx recycling > net: add NAPI api to register and retrieve the page pool ptr > net: pfrag_pool: add pfrag pool support based on page pool > sock: support refilling pfrag from pfrag_pool > net: hns3: support tx recycling in the hns3 driver > sysctl_tcp_use_pfrag_pool > > drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 32 +++++---- > include/linux/netdevice.h | 9 +++ > include/linux/skbuff.h | 43 +++++++++++- > include/net/netns/ipv4.h | 1 + > include/net/page_pool.h | 15 ++++ > include/net/pfrag_pool.h | 24 +++++++ > include/net/sock.h | 1 + > net/core/Makefile | 1 + > net/core/dev.c | 34 ++++++++- > net/core/page_pool.c | 86 ++++++++++++----------- > net/core/pfrag_pool.c | 92 +++++++++++++++++++++++++ > net/core/sock.c | 12 ++++ > net/ipv4/sysctl_net_ipv4.c | 7 ++ > net/ipv4/tcp.c | 34 ++++++--- > 14 files changed, 325 insertions(+), 66 deletions(-) > create mode 100644 include/net/pfrag_pool.h > create mode 100644 net/core/pfrag_pool.c > > -- > 2.7.4 >