Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1446309pxb; Wed, 10 Feb 2021 08:30:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJwAvZ6iaUihvI8CxVPN1WTaFmcmoIC0jBQLVhUWSeH4eRvzEFC8MO1gT0rpRPtMdW5JZyJa X-Received: by 2002:a05:6402:19bd:: with SMTP id o29mr3858588edz.161.1612974654143; Wed, 10 Feb 2021 08:30:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612974654; cv=none; d=google.com; s=arc-20160816; b=aXu9gn1AoNdQLpdhNSxw3rYH/M0bhArj4rURlr5sRWG2yIdbDV4LpdUPzPCpvXIxaC 3WrlJUy8THz2hCR3Ct+qq5nwgaHppZ8IFD7NhynpcZe3HSym6vhIeRe537T/vdSkSfHD 9XlhkdpbcYoRaCpFklFADpg8b/ieOR8YYnGr1rDZfQDcInrP/qnOPMt5c4t6BKciBj1j WWxO1Txv33ichWz8hCpsQ0OEB+0UuZ4UyVHqWzUUQjEBwRd7N4VfnUCb39BwlmAw4TRZ UQDoQbKtqiSOBZK10YsyMqdghXnEt52//A2/cWKmZELyiPgNwC14dOJAOcXkydI4v2FR Vfyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:subject:reply-to:cc:from:to:dkim-signature:date; bh=o4T7iSQYleBEuIFv05CFvm501rCgWuFs++fo+TxXVYs=; b=AMXhRCpsA2NubsszL7+ftEyuyIEV57FMXQn4SA++g3SX9DlxCiPFhS+HZCvesxfMnH fiuzl4CTRx/1UQMqxhgIsokbB0vk27cX65hmOf8fXEgARV0n/LWFbVPhH90XOhXX6xT8 /Lx7uiZYXR1JGjvUgYD68yZ2hBRtDVQgWJMP3ifhC/Ig0spGxQw1eXqcIh8NXGpEWkdj 4egXgI0v2Z+O2Nze2qsaMi0bIlDdCyuUzqdx/WjaWq/HRbdVxLCbuoEHe4JaDK/woyop hrVvAt+mehyao5luFsR0GY58BPhK+RLeeuUeBhQwb78TDId0DcnfL5VMv6RazRhqteJC 85YA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@pm.me header.s=protonmail header.b=FwZtDlvR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=pm.me Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l17si1542429ejf.260.2021.02.10.08.30.29; Wed, 10 Feb 2021 08:30:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@pm.me header.s=protonmail header.b=FwZtDlvR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=pm.me Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232459AbhBJQ31 (ORCPT + 99 others); Wed, 10 Feb 2021 11:29:27 -0500 Received: from mail-40136.protonmail.ch ([185.70.40.136]:10676 "EHLO mail-40136.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232366AbhBJQ3N (ORCPT ); Wed, 10 Feb 2021 11:29:13 -0500 Date: Wed, 10 Feb 2021 16:28:17 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1612974508; bh=o4T7iSQYleBEuIFv05CFvm501rCgWuFs++fo+TxXVYs=; h=Date:To:From:Cc:Reply-To:Subject:From; b=FwZtDlvRMPdMCwmp0GVTgOOGQZTCHaeGCYTE2bPjIFoBGKYkvJXpYtN+EExmUwtUk OHU2gvKKZMWsp6wxDC0AaKvQeDHspUgfeUu5hes7ReSdv+rz+X0WzsGNjKxf2hPKHR 7zDNRXuCY0+a3C+aujaYKYz0iuvJiORLxSb2ocF+GilyzS1mXFgsBjAwoEH4bRNP+k 0Jvaspult0o2PTGOC9J4ZCVulE23cxmwsAa8KW7Uv13ptctKWOUoXeCPT6jM5hnCBt 5lyq/MDURpFhlxJHdnt5WOL1ndew4C3jNU9LKXkE9qBxD57kTKOb6B0pjDmb7RvYAz QGXKKWcjTJguA== To: "David S. Miller" , Jakub Kicinski From: Alexander Lobakin Cc: Jonathan Lemon , Eric Dumazet , Dmitry Vyukov , Willem de Bruijn , Alexander Lobakin , Randy Dunlap , Kevin Hao , Pablo Neira Ayuso , Jakub Sitnicki , Marco Elver , Dexuan Cui , Paolo Abeni , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Taehee Yoo , Cong Wang , =?utf-8?Q?Bj=C3=B6rn_T=C3=B6pel?= , Miaohe Lin , Guillaume Nault , Yonghong Song , zhudi , Michal Kubecek , Marcelo Ricardo Leitner , Dmitry Safonov <0x7f454c46@gmail.com>, Yang Yingliang , Florian Westphal , Edward Cree , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v4 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing Message-ID: <20210210162732.80467-1-alobakin@pm.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=10.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=disabled version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on mailout.protonmail.ch Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, all sorts of skb allocation always do allocate skbuff_heads one by one via kmem_cache_alloc(). On the other hand, we have percpu napi_alloc_cache to store skbuff_heads queued up for freeing and flush them by bulks. We can use this cache not only for bulk-wiping, but also to obtain heads for new skbs and avoid unconditional allocations, as well as for bulk-allocating (like XDP's cpumap code and veth driver already do). As this might affect latencies, cache pressure and lots of hardware and driver-dependent stuff, this new feature is mostly optional and can be issued via: - a new napi_build_skb() function (as a replacement for build_skb()); - existing {,__}napi_alloc_skb() and napi_get_frags() functions; - __alloc_skb() with passing SKB_ALLOC_NAPI in flags. iperf3 showed 35-70 Mbps bumps for both TCP and UDP while performing VLAN NAT on 1.2 GHz MIPS board. The boost is likely to be bigger on more powerful hosts and NICs with tens of Mpps. Note on skbuff_heads from distant slabs or pfmemalloc'ed slabs: - kmalloc()/kmem_cache_alloc() itself allows by default allocating memory from the remote nodes to defragment their slabs. This is controlled by sysctl, but according to this, skbuff_head from a remote node is an OK case; - The easiest way to check if the slab of skbuff_head is remote or pfmemalloc'ed is: =09if (!dev_page_is_reusable(virt_to_head_page(skb))) =09=09/* drop it */; ...*but*, regarding that most slabs are built of compound pages, virt_to_head_page() will hit unlikely-branch every single call. This check costed at least 20 Mbps in test scenarios and seems like it'd be better to _not_ do this. Since v3 [2]: - make the feature mostly optional, so driver developers could decide whether to use it or not (Paolo Abeni). This reuses the old flag for __alloc_skb() and introduces a new napi_build_skb(); - reduce bulk-allocation size from 32 to 16 elements (also Paolo). This equals to the value of XDP's devmap and veth batch processing (which were tested a lot) and should be sane enough; - don't waste cycles on explicit in_serving_softirq() check. Since v2 [1]: - also cover {,__}alloc_skb() and {,__}build_skb() cases (became handy after the changes that pass tiny skbs requests to kmalloc layer); - cover the cache with KASAN instrumentation (suggested by Eric Dumazet, help of Dmitry Vyukov); - completely drop redundant __kfree_skb_flush() (also Eric); - lots of code cleanups; - expand the commit message with NUMA and pfmemalloc points (Jakub). Since v1 [0]: - use one unified cache instead of two separate to greatly simplify the logics and reduce hotpath overhead (Edward Cree); - new: recycle also GRO_MERGED_FREE skbs instead of immediate freeing; - correct performance numbers after optimizations and performing lots of tests for different use cases. [0] https://lore.kernel.org/netdev/20210111182655.12159-1-alobakin@pm.me [1] https://lore.kernel.org/netdev/20210113133523.39205-1-alobakin@pm.me [2] https://lore.kernel.org/netdev/20210209204533.327360-1-alobakin@pm.me Alexander Lobakin (11): skbuff: move __alloc_skb() next to the other skb allocation functions skbuff: simplify kmalloc_reserve() skbuff: make __build_skb_around() return void skbuff: simplify __alloc_skb() a bit skbuff: use __build_skb_around() in __alloc_skb() skbuff: remove __kfree_skb_flush() skbuff: move NAPI cache declarations upper in the file skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads skbuff: allow to optionally use NAPI cache from __alloc_skb() skbuff: allow to use NAPI cache from __napi_alloc_skb() skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing include/linux/skbuff.h | 4 +- net/core/dev.c | 15 +- net/core/skbuff.c | 429 +++++++++++++++++++++++------------------ 3 files changed, 243 insertions(+), 205 deletions(-) --=20 2.30.1