Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3695792pxb; Sat, 13 Feb 2021 06:17:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJygn+NpiX4GXe6scE+iSuHYtj+WB+087iVOiDre1jbw0xlP7Qn3Guq84wJKXSnihA7jwMj6 X-Received: by 2002:a17:906:719:: with SMTP id y25mr7589884ejb.180.1613225858425; Sat, 13 Feb 2021 06:17:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613225858; cv=none; d=google.com; s=arc-20160816; b=M0UfGydi3LwwFXJVIk79I0fY4R1TQTX6Q4XjvrIWfTJUZeFTLNImrO0qk/kiwwQS3C RMIdo0VZdRokx5gnmOoWnxCZJSH9d/0WF7vGPasFvjWksHy8oUz+DiPja9wR/7Q9QJ1Z SPE3NDZPB8djHHMgSqCBaPGUnXNtm/ln3jWGBNpMcm9VPi32gaxHq3sZX2AS0z9h2iru dtRRv2sE1eBKB7hVT0fCYzd5lMczyQYg1DapOHIv83R5470WznD/SineubxBrnW94M8p 5IdigJBjf2xUDgctyMAtoi6la0OeEUo9WNsQSWKqkrcZewAQnr2CCnqWXTQDxgoKvCZx oBcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:reply-to:cc:from:to :dkim-signature:date; bh=OJItbnlO9/miYI8tp2rCg3ReZOArter9GMDQWwcIoz0=; b=V3BveeyIhZL7mrs05/97ZHARCbMA6oEa/YlyQxmAKgnytxPYOmg7V8AyEv29w+kQ3Q gUB+rTFipiudwWZJwtMfo3F+/SvZk3NbzgNT6XozGRvDhgiuSgYj5gZKiZaH8WaQiIxC 4iDPNXl0SPA/CBn2XCwAsf5YfC0Ut8SWEDwPduMRIhF7ifmAIddkwGQ6nlb3+fLIMBzN grWN8UlA/0WHjSF2Bh42SLYDn6mripc+v8kjc0OsQgi1sgqDPL2/4LJmjLMo4W0yMda5 7qPvtJgVCo2mRo+py472KboBOsOkLpYWWwpk/eh63MmJ0V3WhTBl1F1V+NGR3MqfW9hx DPBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@pm.me header.s=protonmail header.b=hEoWGl6L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=pm.me Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dd3si2737234ejb.435.2021.02.13.06.17.14; Sat, 13 Feb 2021 06:17:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@pm.me header.s=protonmail header.b=hEoWGl6L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=pm.me Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230002AbhBMOO7 (ORCPT + 99 others); Sat, 13 Feb 2021 09:14:59 -0500 Received: from mail-40131.protonmail.ch ([185.70.40.131]:13743 "EHLO mail-40131.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229884AbhBMONN (ORCPT ); Sat, 13 Feb 2021 09:13:13 -0500 Date: Sat, 13 Feb 2021 14:12:25 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613225548; bh=OJItbnlO9/miYI8tp2rCg3ReZOArter9GMDQWwcIoz0=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=hEoWGl6LPNuVQjvF48w6Z7gcQWWHww5AeUVD9C1rVqMwcHJvrba7pjouo2Yc0Kxha 4NYfCP4X8n6HBKo8vCTZghGeQlJG2bxWOTA1biPfb6T//XrMduSJCQ+cTM89JHjQAq FmVzOABo6KyLTVfW8ONAT2nXHtuV+KgJdClEhQ6hX8vYMcY/kMyOy28N6o/s2p2jx5 n2VrMulcf3IXSs6Q4+cgf+y4x1NvX2St6Sf+9XS21WWCX+FGV7/tt/1knqUblDLZ9s tHwyseYp/obAFYiCIwSN9auE9Kgswq40KCO0EyylKW95SCOhu9stDlV0vEMAnwTOkl RaAPKNLeeYXwA== To: "David S. Miller" , Jakub Kicinski From: Alexander Lobakin Cc: Jonathan Lemon , Eric Dumazet , Dmitry Vyukov , Willem de Bruijn , Alexander Lobakin , Randy Dunlap , Kevin Hao , Pablo Neira Ayuso , Jakub Sitnicki , Marco Elver , Dexuan Cui , Paolo Abeni , Jesper Dangaard Brouer , Alexander Duyck , Alexander Duyck , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Taehee Yoo , Wei Wang , Cong Wang , =?utf-8?Q?Bj=C3=B6rn_T=C3=B6pel?= , Miaohe Lin , Guillaume Nault , Florian Westphal , Edward Cree , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v6 net-next 08/11] skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads Message-ID: <20210213141021.87840-9-alobakin@pm.me> In-Reply-To: <20210213141021.87840-1-alobakin@pm.me> References: <20210213141021.87840-1-alobakin@pm.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=10.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=disabled version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on mailout.protonmail.ch Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Instead of just bulk-flushing skbuff_heads queued up through napi_consume_skb() or __kfree_skb_defer(), try to reuse them on allocation path. If the cache is empty on allocation, bulk-allocate the first 16 elements, which is more efficient than per-skb allocation. If the cache is full on freeing, bulk-wipe the second half of the cache (32 elements). This also includes custom KASAN poisoning/unpoisoning to be double sure there are no use-after-free cases. To not change current behaviour, introduce a new function, napi_build_skb(), to optionally use a new approach later in drivers. Note on selected bulk size, 16: - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE and especially VETH_XDP_BATCH, which is also used to bulk-allocate skbuff_heads and was tested on powerful setups; - this also showed the best performance in the actual test series (from the array of {8, 16, 32}). Suggested-by: Edward Cree # Divide on two halves Suggested-by: Eric Dumazet # KASAN poisoning Cc: Dmitry Vyukov # Help with KASAN Cc: Paolo Abeni # Reduced batch size Signed-off-by: Alexander Lobakin --- include/linux/skbuff.h | 2 + net/core/skbuff.c | 94 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 83 insertions(+), 13 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0e0707296098..906122eac82a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1087,6 +1087,8 @@ struct sk_buff *build_skb(void *data, unsigned int fr= ag_size); struct sk_buff *build_skb_around(struct sk_buff *skb, =09=09=09=09 void *data, unsigned int frag_size); =20 +struct sk_buff *napi_build_skb(void *data, unsigned int frag_size); + /** * alloc_skb - allocate a network buffer * @size: size to allocate diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 860a9d4f752f..9e1a8ded4acc 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -120,6 +120,8 @@ static void skb_under_panic(struct sk_buff *skb, unsign= ed int sz, void *addr) } =20 #define NAPI_SKB_CACHE_SIZE=0964 +#define NAPI_SKB_CACHE_BULK=0916 +#define NAPI_SKB_CACHE_HALF=09(NAPI_SKB_CACHE_SIZE / 2) =20 struct napi_alloc_cache { =09struct page_frag_cache page; @@ -164,6 +166,25 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, u= nsigned int align_mask) } EXPORT_SYMBOL(__netdev_alloc_frag_align); =20 +static struct sk_buff *napi_skb_cache_get(void) +{ +=09struct napi_alloc_cache *nc =3D this_cpu_ptr(&napi_alloc_cache); +=09struct sk_buff *skb; + +=09if (unlikely(!nc->skb_count)) +=09=09nc->skb_count =3D kmem_cache_alloc_bulk(skbuff_head_cache, +=09=09=09=09=09=09 GFP_ATOMIC, +=09=09=09=09=09=09 NAPI_SKB_CACHE_BULK, +=09=09=09=09=09=09 nc->skb_cache); +=09if (unlikely(!nc->skb_count)) +=09=09return NULL; + +=09skb =3D nc->skb_cache[--nc->skb_count]; +=09kasan_unpoison_object_data(skbuff_head_cache, skb); + +=09return skb; +} + /* Caller must provide SKB that is memset cleared */ static void __build_skb_around(struct sk_buff *skb, void *data, =09=09=09 unsigned int frag_size) @@ -265,6 +286,53 @@ struct sk_buff *build_skb_around(struct sk_buff *skb, } EXPORT_SYMBOL(build_skb_around); =20 +/** + * __napi_build_skb - build a network buffer + * @data: data buffer provided by caller + * @frag_size: size of data, or 0 if head was kmalloced + * + * Version of __build_skb() that uses NAPI percpu caches to obtain + * skbuff_head instead of inplace allocation. + * + * Returns a new &sk_buff on success, %NULL on allocation failure. + */ +static struct sk_buff *__napi_build_skb(void *data, unsigned int frag_size= ) +{ +=09struct sk_buff *skb; + +=09skb =3D napi_skb_cache_get(); +=09if (unlikely(!skb)) +=09=09return NULL; + +=09memset(skb, 0, offsetof(struct sk_buff, tail)); +=09__build_skb_around(skb, data, frag_size); + +=09return skb; +} + +/** + * napi_build_skb - build a network buffer + * @data: data buffer provided by caller + * @frag_size: size of data, or 0 if head was kmalloced + * + * Version of __napi_build_skb() that takes care of skb->head_frag + * and skb->pfmemalloc when the data is a page or page fragment. + * + * Returns a new &sk_buff on success, %NULL on allocation failure. + */ +struct sk_buff *napi_build_skb(void *data, unsigned int frag_size) +{ +=09struct sk_buff *skb =3D __napi_build_skb(data, frag_size); + +=09if (likely(skb) && frag_size) { +=09=09skb->head_frag =3D 1; +=09=09skb_propagate_pfmemalloc(virt_to_head_page(data), skb); +=09} + +=09return skb; +} +EXPORT_SYMBOL(napi_build_skb); + /* * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tell= s * the caller if emergency pfmemalloc reserves are being used. If it is an= d @@ -838,31 +906,31 @@ void __consume_stateless_skb(struct sk_buff *skb) =09kfree_skbmem(skb); } =20 -static inline void _kfree_skb_defer(struct sk_buff *skb) +static void napi_skb_cache_put(struct sk_buff *skb) { =09struct napi_alloc_cache *nc =3D this_cpu_ptr(&napi_alloc_cache); +=09u32 i; =20 =09/* drop skb->head and call any destructors for packet */ =09skb_release_all(skb); =20 -=09/* record skb to CPU local list */ +=09kasan_poison_object_data(skbuff_head_cache, skb); =09nc->skb_cache[nc->skb_count++] =3D skb; =20 -#ifdef CONFIG_SLUB -=09/* SLUB writes into objects when freeing */ -=09prefetchw(skb); -#endif - -=09/* flush skb_cache if it is filled */ =09if (unlikely(nc->skb_count =3D=3D NAPI_SKB_CACHE_SIZE)) { -=09=09kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_SIZE, -=09=09=09=09 nc->skb_cache); -=09=09nc->skb_count =3D 0; +=09=09for (i =3D NAPI_SKB_CACHE_HALF; i < NAPI_SKB_CACHE_SIZE; i++) +=09=09=09kasan_unpoison_object_data(skbuff_head_cache, +=09=09=09=09=09=09 nc->skb_cache[i]); + +=09=09kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_HALF, +=09=09=09=09 nc->skb_cache + NAPI_SKB_CACHE_HALF); +=09=09nc->skb_count =3D NAPI_SKB_CACHE_HALF; =09} } + void __kfree_skb_defer(struct sk_buff *skb) { -=09_kfree_skb_defer(skb); +=09napi_skb_cache_put(skb); } =20 void napi_consume_skb(struct sk_buff *skb, int budget) @@ -887,7 +955,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget) =09=09return; =09} =20 -=09_kfree_skb_defer(skb); +=09napi_skb_cache_put(skb); } EXPORT_SYMBOL(napi_consume_skb); =20 --=20 2.30.1