Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1452658pxb; Wed, 10 Feb 2021 08:38:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJzKeo0J0CIjYOmnDxuVTIjC3zaSvVBn4ZkAsBtjgDHhpLJCRMmrJJQfjEqc1OMQRe8q27Op X-Received: by 2002:a50:cf02:: with SMTP id c2mr3911269edk.333.1612975137338; Wed, 10 Feb 2021 08:38:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612975137; cv=none; d=google.com; s=arc-20160816; b=V30WRM/5SzbMAH3FOKRhdI5xr8oc7/yrJdfSK5JcqbtpWdDbdPXQhkDC1D2BDwnaba KdrHShZ+R2IlFHP74YMxQ64USgYTgQ9Ck5CYpxs6afVsiGhQWpbir8W20tThBZIlfTd+ UNOINOKSNzjNvwAx/6v42XGdpiol3jHodY1xAUPujCuQtR1pHjBuZf02MSydNhubc7hF tEaBwvYneFYiFB3KMi5kI5ot+/v75WhaXK758zYInZBduATnJtiyd8smlnOmJmfwvR3P VwzrvnwJ7SePubfmAxLOhr+cxbLsRBLHP3xnjxtPaNx3aRfLgesccxGiSzpDJ8/AaIgq io7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:reply-to:cc:from:to :dkim-signature:date; bh=OJItbnlO9/miYI8tp2rCg3ReZOArter9GMDQWwcIoz0=; b=RdyurBlUOAkLpWEFipYOOSndX7dYzU59/YtYvHrlStlhY7TU+MWJ13RhuOwlhu9tRm uNEC3eIkpbhiouLyWK4i4S76qrGqxBHXtlqJqHz2M0vAGZ3lnKHYZ4vBjV0NYGApDf9a MDgx+Ju7VUZyaz9cBm1u+qH7Ai+tegplmc1wAmQoqZ4kBlfqafQaTH2M//pc3SxzaEcP LKPbTJr2rwpXoyZEE+gGDCVpeRaEOj7QeScYmqKmtMmn/H46Z/TLn2TNoSCVBUNqNGz+ aUu3vA2URVVpNPHqYUAoHUGL2uoRwmMGJyDsaHUVPGKg5P5bjwJ4zwnwtW6tNwosFF8t 57Zg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@pm.me header.s=protonmail header.b=pCxz9nMa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=pm.me Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i19si1725099edt.336.2021.02.10.08.38.33; Wed, 10 Feb 2021 08:38:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@pm.me header.s=protonmail header.b=pCxz9nMa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=pm.me Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232673AbhBJQf2 (ORCPT + 99 others); Wed, 10 Feb 2021 11:35:28 -0500 Received: from mail-40133.protonmail.ch ([185.70.40.133]:23983 "EHLO mail-40133.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232564AbhBJQbP (ORCPT ); Wed, 10 Feb 2021 11:31:15 -0500 Date: Wed, 10 Feb 2021 16:30:23 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1612974630; bh=OJItbnlO9/miYI8tp2rCg3ReZOArter9GMDQWwcIoz0=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=pCxz9nMakoYTN7EV8AAK71I0iI+zWLIZQ2vRRCjvIImEmug5z9bgRhNA08/4fIC+J BKBiZmsH+sDQCNHeR7r/tahSzm8/FftqYFQtjRPH63Ji/1VjzYLR6SMrX4Om2Whbgx W/6g5s8x4GmvKkk2+vN/IU6mEdP/9kizjZtc21F7vQmaqEwqWj9Abze9SUsE7notf4 3v8zgYV706gfYwE5Gbu4ACElPWyn167lGaSC7G9YVFK74kOPW5LNJd89kBfI6zoyE8 qenuWGACsCE4OR7/qdR3S54d0O6Mil3clvnyO3S4pflGNZVOWg4gwqsJbvLYVAAY3D qE/S8XiU8sJkQ== To: "David S. Miller" , Jakub Kicinski From: Alexander Lobakin Cc: Jonathan Lemon , Eric Dumazet , Dmitry Vyukov , Willem de Bruijn , Alexander Lobakin , Randy Dunlap , Kevin Hao , Pablo Neira Ayuso , Jakub Sitnicki , Marco Elver , Dexuan Cui , Paolo Abeni , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Taehee Yoo , Cong Wang , =?utf-8?Q?Bj=C3=B6rn_T=C3=B6pel?= , Miaohe Lin , Guillaume Nault , Yonghong Song , zhudi , Michal Kubecek , Marcelo Ricardo Leitner , Dmitry Safonov <0x7f454c46@gmail.com>, Yang Yingliang , Florian Westphal , Edward Cree , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v4 net-next 08/11] skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads Message-ID: <20210210162732.80467-9-alobakin@pm.me> In-Reply-To: <20210210162732.80467-1-alobakin@pm.me> References: <20210210162732.80467-1-alobakin@pm.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=10.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=disabled version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on mailout.protonmail.ch Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Instead of just bulk-flushing skbuff_heads queued up through napi_consume_skb() or __kfree_skb_defer(), try to reuse them on allocation path. If the cache is empty on allocation, bulk-allocate the first 16 elements, which is more efficient than per-skb allocation. If the cache is full on freeing, bulk-wipe the second half of the cache (32 elements). This also includes custom KASAN poisoning/unpoisoning to be double sure there are no use-after-free cases. To not change current behaviour, introduce a new function, napi_build_skb(), to optionally use a new approach later in drivers. Note on selected bulk size, 16: - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE and especially VETH_XDP_BATCH, which is also used to bulk-allocate skbuff_heads and was tested on powerful setups; - this also showed the best performance in the actual test series (from the array of {8, 16, 32}). Suggested-by: Edward Cree # Divide on two halves Suggested-by: Eric Dumazet # KASAN poisoning Cc: Dmitry Vyukov # Help with KASAN Cc: Paolo Abeni # Reduced batch size Signed-off-by: Alexander Lobakin --- include/linux/skbuff.h | 2 + net/core/skbuff.c | 94 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 83 insertions(+), 13 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0e0707296098..906122eac82a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1087,6 +1087,8 @@ struct sk_buff *build_skb(void *data, unsigned int fr= ag_size); struct sk_buff *build_skb_around(struct sk_buff *skb, =09=09=09=09 void *data, unsigned int frag_size); =20 +struct sk_buff *napi_build_skb(void *data, unsigned int frag_size); + /** * alloc_skb - allocate a network buffer * @size: size to allocate diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 860a9d4f752f..9e1a8ded4acc 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -120,6 +120,8 @@ static void skb_under_panic(struct sk_buff *skb, unsign= ed int sz, void *addr) } =20 #define NAPI_SKB_CACHE_SIZE=0964 +#define NAPI_SKB_CACHE_BULK=0916 +#define NAPI_SKB_CACHE_HALF=09(NAPI_SKB_CACHE_SIZE / 2) =20 struct napi_alloc_cache { =09struct page_frag_cache page; @@ -164,6 +166,25 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, u= nsigned int align_mask) } EXPORT_SYMBOL(__netdev_alloc_frag_align); =20 +static struct sk_buff *napi_skb_cache_get(void) +{ +=09struct napi_alloc_cache *nc =3D this_cpu_ptr(&napi_alloc_cache); +=09struct sk_buff *skb; + +=09if (unlikely(!nc->skb_count)) +=09=09nc->skb_count =3D kmem_cache_alloc_bulk(skbuff_head_cache, +=09=09=09=09=09=09 GFP_ATOMIC, +=09=09=09=09=09=09 NAPI_SKB_CACHE_BULK, +=09=09=09=09=09=09 nc->skb_cache); +=09if (unlikely(!nc->skb_count)) +=09=09return NULL; + +=09skb =3D nc->skb_cache[--nc->skb_count]; +=09kasan_unpoison_object_data(skbuff_head_cache, skb); + +=09return skb; +} + /* Caller must provide SKB that is memset cleared */ static void __build_skb_around(struct sk_buff *skb, void *data, =09=09=09 unsigned int frag_size) @@ -265,6 +286,53 @@ struct sk_buff *build_skb_around(struct sk_buff *skb, } EXPORT_SYMBOL(build_skb_around); =20 +/** + * __napi_build_skb - build a network buffer + * @data: data buffer provided by caller + * @frag_size: size of data, or 0 if head was kmalloced + * + * Version of __build_skb() that uses NAPI percpu caches to obtain + * skbuff_head instead of inplace allocation. + * + * Returns a new &sk_buff on success, %NULL on allocation failure. + */ +static struct sk_buff *__napi_build_skb(void *data, unsigned int frag_size= ) +{ +=09struct sk_buff *skb; + +=09skb =3D napi_skb_cache_get(); +=09if (unlikely(!skb)) +=09=09return NULL; + +=09memset(skb, 0, offsetof(struct sk_buff, tail)); +=09__build_skb_around(skb, data, frag_size); + +=09return skb; +} + +/** + * napi_build_skb - build a network buffer + * @data: data buffer provided by caller + * @frag_size: size of data, or 0 if head was kmalloced + * + * Version of __napi_build_skb() that takes care of skb->head_frag + * and skb->pfmemalloc when the data is a page or page fragment. + * + * Returns a new &sk_buff on success, %NULL on allocation failure. + */ +struct sk_buff *napi_build_skb(void *data, unsigned int frag_size) +{ +=09struct sk_buff *skb =3D __napi_build_skb(data, frag_size); + +=09if (likely(skb) && frag_size) { +=09=09skb->head_frag =3D 1; +=09=09skb_propagate_pfmemalloc(virt_to_head_page(data), skb); +=09} + +=09return skb; +} +EXPORT_SYMBOL(napi_build_skb); + /* * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tell= s * the caller if emergency pfmemalloc reserves are being used. If it is an= d @@ -838,31 +906,31 @@ void __consume_stateless_skb(struct sk_buff *skb) =09kfree_skbmem(skb); } =20 -static inline void _kfree_skb_defer(struct sk_buff *skb) +static void napi_skb_cache_put(struct sk_buff *skb) { =09struct napi_alloc_cache *nc =3D this_cpu_ptr(&napi_alloc_cache); +=09u32 i; =20 =09/* drop skb->head and call any destructors for packet */ =09skb_release_all(skb); =20 -=09/* record skb to CPU local list */ +=09kasan_poison_object_data(skbuff_head_cache, skb); =09nc->skb_cache[nc->skb_count++] =3D skb; =20 -#ifdef CONFIG_SLUB -=09/* SLUB writes into objects when freeing */ -=09prefetchw(skb); -#endif - -=09/* flush skb_cache if it is filled */ =09if (unlikely(nc->skb_count =3D=3D NAPI_SKB_CACHE_SIZE)) { -=09=09kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_SIZE, -=09=09=09=09 nc->skb_cache); -=09=09nc->skb_count =3D 0; +=09=09for (i =3D NAPI_SKB_CACHE_HALF; i < NAPI_SKB_CACHE_SIZE; i++) +=09=09=09kasan_unpoison_object_data(skbuff_head_cache, +=09=09=09=09=09=09 nc->skb_cache[i]); + +=09=09kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_HALF, +=09=09=09=09 nc->skb_cache + NAPI_SKB_CACHE_HALF); +=09=09nc->skb_count =3D NAPI_SKB_CACHE_HALF; =09} } + void __kfree_skb_defer(struct sk_buff *skb) { -=09_kfree_skb_defer(skb); +=09napi_skb_cache_put(skb); } =20 void napi_consume_skb(struct sk_buff *skb, int budget) @@ -887,7 +955,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget) =09=09return; =09} =20 -=09_kfree_skb_defer(skb); +=09napi_skb_cache_put(skb); } EXPORT_SYMBOL(napi_consume_skb); =20 --=20 2.30.1