Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp568730pxb; Wed, 25 Aug 2021 09:37:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyM3ZxR+DiAzDdIpnD9JocvKRYAcR5iE3L3sSHAV2F+evE0/0wteZLD5268NvgAXrP1/lS/ X-Received: by 2002:a50:e699:: with SMTP id z25mr13253720edm.130.1629909427172; Wed, 25 Aug 2021 09:37:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629909427; cv=none; d=google.com; s=arc-20160816; b=gWHpGmCcOAsIeh1cJxDkXJNnC3pX0+VG+TtMCJEy3pX5URvdmD+RknLblBc+FOectJ qMTvJ6w8xLWpH0zhmx13LD15MQoQwNWvxL6lOcG1DlqgIhd62Wy/XXo+ci91GTgZg8hE 2UvVa6975/QXUYem9ZpvLWXxzKiKPTeTap/NZBZ1R6wlU3t1l6RsBijqIPVavHuQYp1g itpq+/HRSq9Ar0HoB2J4S5M7TJABajnpeE3O5mILGp6JiQuhJbh5+qJ5Dp07fZsr1kYR NnepMnQkoMSo0shrP9mRno5lfncwnfAHUsPasD1SIhzMaivT62H6NI8WO1yUeQ5kKGq7 aCbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=L2AFAbGFTPnA/RAdM2ZFaFkRP5a2X/wBdXmNjJ5mbxQ=; b=uDkWRtC93rPKgYYDxlhTW0HXygQRq4R+LHIOn1fTvGCozHvej6Z34xbSM0zpwvanvH gFu+d+5LZh3dCFE7pMyb5885h8RsFx6cGo1PRHwF1zC2AUrHcyjVfeGFYeo/iJ4VsSAl yJYl7DpuF4KC1uS1idPDughcPPsbxVHrsjBXxdUz+3yx9L7hNn/PNV7yP86rG4dQJCeJ T0BSPfALIwGGYLLO3upIQryJ1jVVMTSatvON/RLm8YxihXlxt6qi/lsZuxXD7KDXnm2Z d4OtTvMKU8AYuEO30wdc0Nuc2zQFO4N1z3Qv5img8NjLpLYX1JkqOp/NjmPPkI/CFPa4 gM2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=EFyH5gQ1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bu1si217150ejb.482.2021.08.25.09.36.39; Wed, 25 Aug 2021 09:37:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=EFyH5gQ1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241823AbhHYQdr (ORCPT + 99 others); Wed, 25 Aug 2021 12:33:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241876AbhHYQdj (ORCPT ); Wed, 25 Aug 2021 12:33:39 -0400 Received: from mail-yb1-xb2d.google.com (mail-yb1-xb2d.google.com [IPv6:2607:f8b0:4864:20::b2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B99B7C0613CF for ; Wed, 25 Aug 2021 09:32:53 -0700 (PDT) Received: by mail-yb1-xb2d.google.com with SMTP id z128so59621ybc.10 for ; Wed, 25 Aug 2021 09:32:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=L2AFAbGFTPnA/RAdM2ZFaFkRP5a2X/wBdXmNjJ5mbxQ=; b=EFyH5gQ1TyJQ0E/ipGx4TjWbQoJdvjNdek6wygsodqgXmWXthWUMYK9134wQX76vjd pjZyjX4x6YcGYOgnKumYuYYtVgnqbYaVCtAXBfYjPAh//H8JEXyKRCsVnaXdDAasu9/O zPhwt8cHWwLvKcKavTKt7NJ6x2JpU1DM3B8BRVw6TFys4+mM+pxgy3Y0M8xp2ddwyUlW eXO0m9iYrH/mcG71cp9eDjDeOU3pjHqlzJBpIJWz66SM+2FKzsAYULoFZLc5rT9dpmiE RgDFxCUdcZzYcQbR9eYrYDgiVhhaYI9Ra0hngSaLblFJSYXuEC9MGDdvXfT2d0cYL3QB TVbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=L2AFAbGFTPnA/RAdM2ZFaFkRP5a2X/wBdXmNjJ5mbxQ=; b=IyqVS0v92b+aV832U1O9t7MWu7mq9aAhwf9+R2fYPR+oynE9XMXxlEBoQUfaGiQaL8 vMGxnX4sJ3vJ4Kc+mjCGnsbvWnCiqPEuQk+aERKlOvFdqhsSD5qIGtWwUFVA/0fK2yKU farPjCfaDIve9Z2cVQydEScUocmAqmN0DFuWC5HEE5i572uLEnj+DiP82ISx3I9ugHUz wSOhVUSZvVOIxKn27ardffYnJjjmJ+irmRWZojd4xTaURhbFweygpah05BzibvRIBvJE V0tnc/P7peBb+UvJn3ZDzQN+C8sr4/ZVjVo7TbRnEEhPzC8u+3qZ1CPRiMZrAm6tGpxf DJiQ== X-Gm-Message-State: AOAM531JJepKdzheavEhq0Pyp+37q1ced5bZxGw7vs1N5dONzdZEwhYg 15VYaGcfBHDWM5Lak7DDgQhjnp9WC6/KmQLvb8nP6Q== X-Received: by 2002:a25:d6d8:: with SMTP id n207mr20857216ybg.518.1629909172492; Wed, 25 Aug 2021 09:32:52 -0700 (PDT) MIME-Version: 1.0 References: <1629257542-36145-1-git-send-email-linyunsheng@huawei.com> <2cf4b672-d7dc-db3d-ce90-15b4e91c4005@huawei.com> <4b2ad6d4-8e3f-fea9-766e-2e7330750f84@huawei.com> <5fdc5223-7d67-fed7-f691-185dcb2e3d80@gmail.com> In-Reply-To: <5fdc5223-7d67-fed7-f691-185dcb2e3d80@gmail.com> From: Eric Dumazet Date: Wed, 25 Aug 2021 09:32:41 -0700 Message-ID: Subject: Re: [Linuxarm] Re: [PATCH RFC 0/7] add socket to netdev page frag recycling support To: David Ahern Cc: Yunsheng Lin , David Miller , Jakub Kicinski , Alexander Duyck , Russell King , Marcin Wojtas , linuxarm@openeuler.org, Yisen Zhuang , Salil Mehta , Thomas Petazzoni , Jesper Dangaard Brouer , Ilias Apalodimas , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrew Morton , Peter Zijlstra , Will Deacon , Matthew Wilcox , Vlastimil Babka , Fenghua Yu , Roman Gushchin , Peter Xu , "Tang, Feng" , Jason Gunthorpe , mcroce@microsoft.com, Hugh Dickins , Jonathan Lemon , Alexander Lobakin , Willem de Bruijn , wenxu , Cong Wang , Kevin Hao , Aleksandr Nogikh , Marco Elver , Yonghong Song , kpsingh@kernel.org, Andrii Nakryiko , Martin KaFai Lau , Song Liu , netdev , LKML , bpf , chenhao288@hisilicon.com, Hideaki YOSHIFUJI , David Ahern , memxor@gmail.com, linux@rempel-privat.de, Antoine Tenart , Wei Wang , Taehee Yoo , Arnd Bergmann , Mat Martineau , aahringo@redhat.com, ceggers@arri.de, yangbo.lu@nxp.com, Florian Westphal , xiangxia.m.yue@gmail.com, linmiaohe , Christoph Hellwig Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 25, 2021 at 9:29 AM David Ahern wrote: > > On 8/23/21 8:04 AM, Eric Dumazet wrote: > >> > >> > >> It seems PAGE_ALLOC_COSTLY_ORDER is mostly related to pcp page, OOM, memory > >> compact and memory isolation, as the test system has a lot of memory installed > >> (about 500G, only 3-4G is used), so I used the below patch to test the max > >> possible performance improvement when making TCP frags twice bigger, and > >> the performance improvement went from about 30Gbit to 32Gbit for one thread > >> iperf tcp flow in IOMMU strict mode, > > > > This is encouraging, and means we can do much better. > > > > Even with SKB_FRAG_PAGE_ORDER set to 4, typical skbs will need 3 mappings > > > > 1) One for the headers (in skb->head) > > 2) Two page frags, because one TSO packet payload is not a nice power-of-two. > > interesting observation. I have noticed 17 with the ZC API. That might > explain the less than expected performance bump with iommu strict mode. Note that if application is using huge pages, things get better after commit 394fcd8a813456b3306c423ec4227ed874dfc08b Author: Eric Dumazet Date: Thu Aug 20 08:43:59 2020 -0700 net: zerocopy: combine pages in zerocopy_sg_from_iter() Currently, tcp sendmsg(MSG_ZEROCOPY) is building skbs with order-0 fragments. Compared to standard sendmsg(), these skbs usually contain up to 16 fragments on arches with 4KB page sizes, instead of two. This adds considerable costs on various ndo_start_xmit() handlers, especially when IOMMU is in the picture. As high performance applications are often using huge pages, we can try to combine adjacent pages belonging to same compound page. Tested on AMD Rome platform, with IOMMU, nominal single TCP flow speed is roughly doubled (~55Gbit -> ~100Gbit), when user application is using hugepages. For reference, nominal single TCP flow speed on this platform without MSG_ZEROCOPY is ~65Gbit. Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Signed-off-by: David S. Miller Ideally the gup stuff should really directly deal with hugepages, so that we avoid all these crazy refcounting games on the per-huge-page central refcount. > > > > > The first issue can be addressed using a piece of coherent memory (128 > > or 256 bytes per entry in TX ring). > > Copying the headers can avoid one IOMMU mapping, and improve IOTLB > > hits, because all > > slots of the TX ring buffer will use one single IOTLB slot. > > > > The second issue can be solved by tweaking a bit > > skb_page_frag_refill() to accept an additional parameter > > so that the whole skb payload fits in a single order-4 page. > > > >