Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2386991pxb; Thu, 11 Feb 2021 11:01:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJwJZN/s/wkSU9AAiM2oXehNaogg3OcZ2N52dA0qkOAW3RwZDU/GOv0YxeJ0v6i3ibb3giaO X-Received: by 2002:a17:907:7785:: with SMTP id ky5mr10013918ejc.176.1613070060002; Thu, 11 Feb 2021 11:01:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613070059; cv=none; d=google.com; s=arc-20160816; b=0IPuz+FSfv9pylTSyeoelHWvjXl7qfWXaYpholN6K4kvZGpXBUn7INbaSn0UqjQZzs cddOOPUfgJD2znEslMwEtyKdyad3MHPYwFB6qKhUeVrA6i8fUTykVD7+9GXInBImkZc8 vUb0ADPhNZiwwYQ+Na4UjMoVFpX6jMMovNDe73HxEY6XyHZPJdomKhloUuzTeNIAFJUX u04AermizpwnuXDySDVlaDGLw4F9qRzRQYgTVN2wOliJfqkTTjlDmOSKxN1MIBJ10IBv qFRav0svGMjcvlBU63lNAjcnAb++sy8tIJl2XNFGGwMwkQCJcjv1QJ9c8KJNOq3lZP85 FCVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:reply-to:cc:from:to :dkim-signature:date; bh=OJItbnlO9/miYI8tp2rCg3ReZOArter9GMDQWwcIoz0=; b=Oz0tOtSOJN6W8bkDFkwJksFvmvLX2KnzgvF+mLyQUe1l7C0gzrzMnkTXnxlF2tUbU4 aY50/+HE1RKiwa3t5VZCM+dfv82q8OKHlKDHsHDseDjU3GBhEKx3AdE5DjFoaoc3X6bp 7Ryokq/yfC1mp5BR/d2p/ud+5amhbUXd61XtJ2yHE9Ly8ua2LiA3cxyOom74PTTtdlLZ x9V6cWXPFQZLAxMYvnCFKaNBksvVKzuYec7xlURfT5Ef8kaHh83M4TmxSNTeAgA0+pvr OwFtQ0WAyJX9cOBquec+GDd+rKDInguGQE9trvRKkmWn/K18qs0BAcSmMxPPvcMU5sI1 aBMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@pm.me header.s=protonmail header.b=dLLwWAiS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=pm.me Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v17si4358945edr.19.2021.02.11.11.00.07; Thu, 11 Feb 2021 11:00:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@pm.me header.s=protonmail header.b=dLLwWAiS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=pm.me Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230005AbhBKS70 (ORCPT + 99 others); Thu, 11 Feb 2021 13:59:26 -0500 Received: from mail-40134.protonmail.ch ([185.70.40.134]:19814 "EHLO mail-40134.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231283AbhBKSzu (ORCPT ); Thu, 11 Feb 2021 13:55:50 -0500 Date: Thu, 11 Feb 2021 18:54:40 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613069682; bh=OJItbnlO9/miYI8tp2rCg3ReZOArter9GMDQWwcIoz0=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=dLLwWAiSjlNwZGTYX46lw4ucrdaIK9/HhLQbnzrcW76tNgpjNZjHrWN11ttTijek0 UXQjfgd1wmWToJALr7Vqj43L6Da2wgLaHU9U4KyAIV6gcK5Q9vAqGbeU5bwJ4yhxM+ xOG6JkzF5/8qgyTNC9DOdsOGfxuiwjNzQu7WuOZTKQLzaorgwOEY/7NYXmJDQyEXw5 t+c8oVVvmrjSS7hE5XA+GFSpyRjNWtpM7Q2Qr15TH7SEE3e9DO/uiH7TmEKzrvzMHv xGlX8qo/9ml376eOkjjiKpX1Y+NQzl26XET1iR3MLJcH3eC2+84mK1n9bkVjhjU8wJ eQGPmqj/nqvdA== To: "David S. Miller" , Jakub Kicinski From: Alexander Lobakin Cc: Jonathan Lemon , Eric Dumazet , Dmitry Vyukov , Willem de Bruijn , Alexander Lobakin , Randy Dunlap , Kevin Hao , Pablo Neira Ayuso , Jakub Sitnicki , Marco Elver , Dexuan Cui , Paolo Abeni , Jesper Dangaard Brouer , Alexander Duyck , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Taehee Yoo , Cong Wang , =?utf-8?Q?Bj=C3=B6rn_T=C3=B6pel?= , Miaohe Lin , Guillaume Nault , Yonghong Song , zhudi , Michal Kubecek , Marcelo Ricardo Leitner , Dmitry Safonov <0x7f454c46@gmail.com>, Yang Yingliang , Florian Westphal , Edward Cree , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v5 net-next 08/11] skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads Message-ID: <20210211185220.9753-9-alobakin@pm.me> In-Reply-To: <20210211185220.9753-1-alobakin@pm.me> References: <20210211185220.9753-1-alobakin@pm.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=10.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=disabled version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on mailout.protonmail.ch Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Instead of just bulk-flushing skbuff_heads queued up through napi_consume_skb() or __kfree_skb_defer(), try to reuse them on allocation path. If the cache is empty on allocation, bulk-allocate the first 16 elements, which is more efficient than per-skb allocation. If the cache is full on freeing, bulk-wipe the second half of the cache (32 elements). This also includes custom KASAN poisoning/unpoisoning to be double sure there are no use-after-free cases. To not change current behaviour, introduce a new function, napi_build_skb(), to optionally use a new approach later in drivers. Note on selected bulk size, 16: - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE and especially VETH_XDP_BATCH, which is also used to bulk-allocate skbuff_heads and was tested on powerful setups; - this also showed the best performance in the actual test series (from the array of {8, 16, 32}). Suggested-by: Edward Cree # Divide on two halves Suggested-by: Eric Dumazet # KASAN poisoning Cc: Dmitry Vyukov # Help with KASAN Cc: Paolo Abeni # Reduced batch size Signed-off-by: Alexander Lobakin --- include/linux/skbuff.h | 2 + net/core/skbuff.c | 94 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 83 insertions(+), 13 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0e0707296098..906122eac82a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1087,6 +1087,8 @@ struct sk_buff *build_skb(void *data, unsigned int fr= ag_size); struct sk_buff *build_skb_around(struct sk_buff *skb, =09=09=09=09 void *data, unsigned int frag_size); =20 +struct sk_buff *napi_build_skb(void *data, unsigned int frag_size); + /** * alloc_skb - allocate a network buffer * @size: size to allocate diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 860a9d4f752f..9e1a8ded4acc 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -120,6 +120,8 @@ static void skb_under_panic(struct sk_buff *skb, unsign= ed int sz, void *addr) } =20 #define NAPI_SKB_CACHE_SIZE=0964 +#define NAPI_SKB_CACHE_BULK=0916 +#define NAPI_SKB_CACHE_HALF=09(NAPI_SKB_CACHE_SIZE / 2) =20 struct napi_alloc_cache { =09struct page_frag_cache page; @@ -164,6 +166,25 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, u= nsigned int align_mask) } EXPORT_SYMBOL(__netdev_alloc_frag_align); =20 +static struct sk_buff *napi_skb_cache_get(void) +{ +=09struct napi_alloc_cache *nc =3D this_cpu_ptr(&napi_alloc_cache); +=09struct sk_buff *skb; + +=09if (unlikely(!nc->skb_count)) +=09=09nc->skb_count =3D kmem_cache_alloc_bulk(skbuff_head_cache, +=09=09=09=09=09=09 GFP_ATOMIC, +=09=09=09=09=09=09 NAPI_SKB_CACHE_BULK, +=09=09=09=09=09=09 nc->skb_cache); +=09if (unlikely(!nc->skb_count)) +=09=09return NULL; + +=09skb =3D nc->skb_cache[--nc->skb_count]; +=09kasan_unpoison_object_data(skbuff_head_cache, skb); + +=09return skb; +} + /* Caller must provide SKB that is memset cleared */ static void __build_skb_around(struct sk_buff *skb, void *data, =09=09=09 unsigned int frag_size) @@ -265,6 +286,53 @@ struct sk_buff *build_skb_around(struct sk_buff *skb, } EXPORT_SYMBOL(build_skb_around); =20 +/** + * __napi_build_skb - build a network buffer + * @data: data buffer provided by caller + * @frag_size: size of data, or 0 if head was kmalloced + * + * Version of __build_skb() that uses NAPI percpu caches to obtain + * skbuff_head instead of inplace allocation. + * + * Returns a new &sk_buff on success, %NULL on allocation failure. + */ +static struct sk_buff *__napi_build_skb(void *data, unsigned int frag_size= ) +{ +=09struct sk_buff *skb; + +=09skb =3D napi_skb_cache_get(); +=09if (unlikely(!skb)) +=09=09return NULL; + +=09memset(skb, 0, offsetof(struct sk_buff, tail)); +=09__build_skb_around(skb, data, frag_size); + +=09return skb; +} + +/** + * napi_build_skb - build a network buffer + * @data: data buffer provided by caller + * @frag_size: size of data, or 0 if head was kmalloced + * + * Version of __napi_build_skb() that takes care of skb->head_frag + * and skb->pfmemalloc when the data is a page or page fragment. + * + * Returns a new &sk_buff on success, %NULL on allocation failure. + */ +struct sk_buff *napi_build_skb(void *data, unsigned int frag_size) +{ +=09struct sk_buff *skb =3D __napi_build_skb(data, frag_size); + +=09if (likely(skb) && frag_size) { +=09=09skb->head_frag =3D 1; +=09=09skb_propagate_pfmemalloc(virt_to_head_page(data), skb); +=09} + +=09return skb; +} +EXPORT_SYMBOL(napi_build_skb); + /* * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tell= s * the caller if emergency pfmemalloc reserves are being used. If it is an= d @@ -838,31 +906,31 @@ void __consume_stateless_skb(struct sk_buff *skb) =09kfree_skbmem(skb); } =20 -static inline void _kfree_skb_defer(struct sk_buff *skb) +static void napi_skb_cache_put(struct sk_buff *skb) { =09struct napi_alloc_cache *nc =3D this_cpu_ptr(&napi_alloc_cache); +=09u32 i; =20 =09/* drop skb->head and call any destructors for packet */ =09skb_release_all(skb); =20 -=09/* record skb to CPU local list */ +=09kasan_poison_object_data(skbuff_head_cache, skb); =09nc->skb_cache[nc->skb_count++] =3D skb; =20 -#ifdef CONFIG_SLUB -=09/* SLUB writes into objects when freeing */ -=09prefetchw(skb); -#endif - -=09/* flush skb_cache if it is filled */ =09if (unlikely(nc->skb_count =3D=3D NAPI_SKB_CACHE_SIZE)) { -=09=09kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_SIZE, -=09=09=09=09 nc->skb_cache); -=09=09nc->skb_count =3D 0; +=09=09for (i =3D NAPI_SKB_CACHE_HALF; i < NAPI_SKB_CACHE_SIZE; i++) +=09=09=09kasan_unpoison_object_data(skbuff_head_cache, +=09=09=09=09=09=09 nc->skb_cache[i]); + +=09=09kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_HALF, +=09=09=09=09 nc->skb_cache + NAPI_SKB_CACHE_HALF); +=09=09nc->skb_count =3D NAPI_SKB_CACHE_HALF; =09} } + void __kfree_skb_defer(struct sk_buff *skb) { -=09_kfree_skb_defer(skb); +=09napi_skb_cache_put(skb); } =20 void napi_consume_skb(struct sk_buff *skb, int budget) @@ -887,7 +955,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget) =09=09return; =09} =20 -=09_kfree_skb_defer(skb); +=09napi_skb_cache_put(skb); } EXPORT_SYMBOL(napi_consume_skb); =20 --=20 2.30.1