Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4741537pxf; Tue, 30 Mar 2021 16:17:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzsKK1nUlmfPEWJn8GvcDGKHiX/brKo/2gJGT/DdOQyfmCm25181uIF0eX78NJQDG/SZ0BU X-Received: by 2002:a05:6402:34d5:: with SMTP id w21mr330022edc.14.1617146274268; Tue, 30 Mar 2021 16:17:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617146274; cv=none; d=google.com; s=arc-20160816; b=A/GLx+vnu3pGlB1O2Bq2kp1l89au/WECHgijNKB0I3SSeYJ/hDB9AHEVFFE3Rm2CNp 98tpqgQl4OP7NPAJW7Zm1wtilhycio40k5LLtylqZ/BoqMHY5syyf5Oj5vwNVVnKhvSx LGB+cpykXA/RCgP1ZgZxmqG0Bf0DpxMFWJxz8jD7504sC/bLvqOfTi/RIVpw8ENwyzL9 wfXq8/jw0Xqd8s2UKjo14yFYp1gVbXgPLiebDxEy8g/o50rYhfj8N2mrHqWRJxJIhmI8 n70uEj+W4jkhCgdj3A0S9QanUWFCKentXM6QL45QNHIdNT1l2NrG7pPdKbVMDfhrBMI1 qTtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:reply-to:cc:from:to :dkim-signature:date; bh=uwv69iaWSKcMKpaWEGIM/Lw3/ltMpeHlZTENOpjcuZ8=; b=PbivKOMtwaMpwatJf0gYJTP8RmrooXX6PFXfNGeVhiiA2VO6skLQjX7Ra42kbWiXb+ ntwOIkc0w5B/yxtpZt48VSRCMFhBE6bArjjoJzwvQzQ76WRrgO1YeDBIKt1XangXK8ey HV3bT3wdSRrejMaWWNK4Cr5Xyeuxih7qKn1D7nkC5eIeFBAamYr8Sl/qrD4x4vzchhKM QNrhbBD50+18y6eHEA3huLC740RjuDUabfTwT9BvA8rR/SOUDe2LSJ8iPvFpEVIBIg20 xQTeBOrSgUJ0i04GLF/o/VS1DY6SFhqmqPEimDQ+VBBop7pG1e6pVajrs+a6bZT2GZuV MQ3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@pm.me header.s=protonmail header.b=l3HzsJyA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=pm.me Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j22si327459edq.584.2021.03.30.16.17.31; Tue, 30 Mar 2021 16:17:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@pm.me header.s=protonmail header.b=l3HzsJyA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=pm.me Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233258AbhC3XQa (ORCPT + 99 others); Tue, 30 Mar 2021 19:16:30 -0400 Received: from mail-40136.protonmail.ch ([185.70.40.136]:19228 "EHLO mail-40136.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233114AbhC3XQJ (ORCPT ); Tue, 30 Mar 2021 19:16:09 -0400 Date: Tue, 30 Mar 2021 23:15:59 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1617146166; bh=uwv69iaWSKcMKpaWEGIM/Lw3/ltMpeHlZTENOpjcuZ8=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=l3HzsJyAAdMP6kj5/QUIZfO9SIGnSJUHzUqvPNF1VMBvZAhebd/tvW0ufIMxf0Ub/ v/neGwAdPEfZCDi4TW+/ec5CkoG3Zdk/39AVInLe2zRw+zbj28BdJk7qjtNVTza6gw /Bj0dtqJ62d1O/iTmDFtCqLkBU9h5rOc0Qy2VHn59gBTgsx2zYNb/Nm600xWISxXun XFupXZEClWhyOSaAHpppC6ztv2zFTHOMZlkm1fvagJ38PLIFv+FU+YDshIXFLsyROT 4S803XD2Cp1JMJPlyhK7/oHVH+dyWl5KS8sQnrL4g1krFCyGFg3maUO2MTYLwy1Twe T8xQ4oNWLIISA== To: Alexei Starovoitov , Daniel Borkmann From: Alexander Lobakin Cc: Xuan Zhuo , =?utf-8?Q?Bj=C3=B6rn_T=C3=B6pel?= , Magnus Karlsson , Jonathan Lemon , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Alexander Lobakin , netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH bpf-next 2/2] xsk: introduce generic almost-zerocopy xmit Message-ID: <20210330231528.546284-3-alobakin@pm.me> In-Reply-To: <20210330231528.546284-1-alobakin@pm.me> References: <20210330231528.546284-1-alobakin@pm.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=10.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=disabled version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on mailout.protonmail.ch Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The reasons behind IFF_TX_SKB_NO_LINEAR are: - most drivers expect skb with the linear space; - most drivers expect hard header in the linear space; - many drivers need some headroom to insert custom headers and/or pull headers from frags (pskb_may_pull() etc.). With some bits of overhead, we can satisfy all of this without inducing full buffer data copy. Now frames that are no lesser than 128 bytes (to mitigate allocation overhead) are also being built using zerocopy path (if the device and driver support S/G xmit, which is almost always true). We allocate 256* additional bytes for skb linear space and pull hard header there (aligning its end by 16 bytes for platforms with NET_IP_ALIGN). The rest of the buffer data is just pinned as frags. A room of at least 242 bytes is left for any driver needs. We could just pass the buffer to eth_get_headlen() to minimize allocation overhead and be able to copy all the headers into the linear space, but the flow dissection procedure tends to be more expensive than the current approach. IFF_TX_SKB_NO_LINEAR path remains unchanged and is still actual and generally faster. * The value of 256 bytes is kinda "magic", it can be found in lots of drivers and places of core code and it is believed that 256 bytes are enough to store any headers of any frame. Cc: Xuan Zhuo Signed-off-by: Alexander Lobakin --- net/xdp/xsk.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 41f8f21b3348..090ff9c096a3 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -445,6 +445,9 @@ static void xsk_destruct_skb(struct sk_buff *skb) =09sock_wfree(skb); } +#define XSK_SKB_HEADLEN=09=09256 +#define XSK_COPY_THRESHOLD=09(XSK_SKB_HEADLEN / 2) + static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, =09=09=09=09=09 struct xdp_desc *desc) { @@ -452,13 +455,22 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct = xdp_sock *xs, =09u32 hr, len, ts, offset, copy, copied; =09struct sk_buff *skb; =09struct page *page; +=09bool need_pull; =09void *buffer; =09int err, i; =09u64 addr; =09hr =3D max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); +=09len =3D hr; + +=09need_pull =3D !(xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR); +=09if (need_pull) { +=09=09len +=3D XSK_SKB_HEADLEN; +=09=09len +=3D NET_IP_ALIGN; +=09=09hr +=3D NET_IP_ALIGN; +=09} -=09skb =3D sock_alloc_send_skb(&xs->sk, hr, 1, &err); +=09skb =3D sock_alloc_send_skb(&xs->sk, len, 1, &err); =09if (unlikely(!skb)) =09=09return ERR_PTR(err); @@ -488,6 +500,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct x= dp_sock *xs, =09skb->data_len +=3D len; =09skb->truesize +=3D ts; +=09if (need_pull && unlikely(!__pskb_pull_tail(skb, ETH_HLEN))) { +=09=09kfree_skb(skb); +=09=09return ERR_PTR(-ENOMEM); +=09} + =09refcount_add(ts, &xs->sk.sk_wmem_alloc); =09return skb; @@ -498,19 +515,20 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock = *xs, { =09struct net_device *dev =3D xs->dev; =09struct sk_buff *skb; +=09u32 len =3D desc->len; -=09if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { +=09if ((dev->priv_flags & IFF_TX_SKB_NO_LINEAR) || +=09 (len >=3D XSK_COPY_THRESHOLD && likely(dev->features & NETIF_F_SG))= ) { =09=09skb =3D xsk_build_skb_zerocopy(xs, desc); =09=09if (IS_ERR(skb)) =09=09=09return skb; =09} else { -=09=09u32 hr, tr, len; =09=09void *buffer; +=09=09u32 hr, tr; =09=09int err; =09=09hr =3D max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)); =09=09tr =3D dev->needed_tailroom; -=09=09len =3D desc->len; =09=09skb =3D sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err); =09=09if (unlikely(!skb)) -- 2.31.1