Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp557456rdb; Tue, 31 Oct 2023 15:38:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHYelhzmoANIaf00tr+SXrwRVc6XBjoXkTcN1+dVK/Wg7Lkt5zNO+n83/ROIpeWZdFX4cuh X-Received: by 2002:a05:6e02:310e:b0:359:39af:ffba with SMTP id bg14-20020a056e02310e00b0035939afffbamr3006904ilb.7.1698791930435; Tue, 31 Oct 2023 15:38:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698791930; cv=none; d=google.com; s=arc-20160816; b=SoU8zuxRVqiaz3UVurb+CWDxhKsyz+ov46BiekD9mWOBBkwSiOQx4vTSU3Rfl3Qryn gVhHZ4xrzZkX59Fk7EgMYTJz2KrKB/MOD56ZW0VGOGZpNQF6IxdKxSEQ7sE5rzJA6/ZO Tc+9RDFTeopsCHezE0TaW12Y8bIiU66Di5qNK467xSIrp75g7jFqS8FH7Dds4bQdFdDX 17VH3nhcknM4XtRKXklgaX6rxN8zLrk93nxQWLzN6oxo3uiQR0KvpG/i1aZQxc0GJBP2 Ph4dKUyBnywbOqmMWw4eMG3JAEB8o4TSwzCz9bLyKDwptSGvuvbvlJFYT0Z16w61gkN6 /AZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=6yNnFVQiHIMKBbkJdP4Bo88uYPZR71YVjxbz/oPSYuw=; fh=nCcF7sMqBAAAqM5lp9EhnFmyiWyS0NiOEJISRDCF6dc=; b=gRm92myhjdFYcEV8e+2TlxDwug2w3vd7u1oP8qoDbKHVi5hO3uZhvt/ipWFuaa/D/8 yZn1wJEcQKjSxgDc7CErAiKCUX0og7uxti34mmnRERvbfp8dtbSamjhB1PaoyRDeXIPA zpmMv/JR0zOx5/cDOjIvzGIjj/bVD+bKUnDY49fmbIyLoYmIKIdqwatyuyetb5oNIwul GU0bWxACXlocpk7OMNVKHugK1haEFDwoiuZ/F/l4G8qtGV3kg8KkU51IaVrLpSNXDp6M mafrOGiQJRIR4McMx8Vd117N8dCJwcxMXLsb6DSPudT0X5NqIWd8BI685UBiG1HIkSX5 DBqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=T+30KIRB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id v10-20020a63610a000000b0058988999c70si1726113pgb.165.2023.10.31.15.38.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 15:38:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=T+30KIRB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 6875181121C6; Tue, 31 Oct 2023 15:38:49 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376276AbjJaWiq (ORCPT + 99 others); Tue, 31 Oct 2023 18:38:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346780AbjJaWim (ORCPT ); Tue, 31 Oct 2023 18:38:42 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1477F4; Tue, 31 Oct 2023 15:38:39 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id 5b1f17b1804b1-407c3adef8eso50468025e9.2; Tue, 31 Oct 2023 15:38:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698791918; x=1699396718; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6yNnFVQiHIMKBbkJdP4Bo88uYPZR71YVjxbz/oPSYuw=; b=T+30KIRBjXelxp/eUrY79RSq3lnD4/L/JgiGnYGg4RGo5g6Mx72bHsatzJmAvUFT4z Myc5elYlM3vXPRPL2MhKa+ppAUyECudv4Nj1m39lt0Ih5Qgs+wownL3DOutrMTQHORjT LFp2E+8enQwTDnDHCXJGWCyXTYN0ag6BT2PDE1tUXJ7OJv10Qde7Vhx4HZgsGcFFdy7I Aa2PvmqMir0tofk4IpL4kzodncAg65hDw8iHKauYz8+uZdumfIvx2RWwZA+iuCo5uRah /5xFZ1vL1vrS8IRmDJTUI+teE70tms/rYZu9XnaWq9A4+Z1wFqxWPaGHpyBcxitNC74E jPbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698791918; x=1699396718; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6yNnFVQiHIMKBbkJdP4Bo88uYPZR71YVjxbz/oPSYuw=; b=ITToJUzbwVvP3rDxoqCZSkFwMsVT3kLsTd/zUK9sPqOAipW0c9LYBXY2dZh4/V2xLn E+YxRVs2hE55Bu5ABQdUC0AedtdX+yXlGH708M/ksS6MUs5vyHEWNG6KQ5X2OjqX6t09 wwbzLcYR8a5T2IRBJxCvgHLJ5bGq6G0XLtlDP3M1cutBPBlyKHeNV09BN4MOWdV/Piaz ECcn/JJ9RM1bM+AIgEeciMw/cEwMf7BffDL4NOey1MqJ9/dYi73bAarEtNV4eunqz76n aWACcKe14R9DKihWt8/4kYMG7fk5BqcEe3DxQw0oELIpzwwWOf23daCmEX6QlujafKRy jFJA== X-Gm-Message-State: AOJu0Yw0ZDkRlZwsN3fxXJn8G0DZag0Eqlsgjewmww9BSCPXAcl3b5Di CN+9pf1wChaHqfGEIykVWM8Xgq0eavA07+KOEfE= X-Received: by 2002:a05:600c:19d1:b0:401:be5a:989 with SMTP id u17-20020a05600c19d100b00401be5a0989mr10650711wmq.23.1698791918006; Tue, 31 Oct 2023 15:38:38 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alexei Starovoitov Date: Tue, 31 Oct 2023 15:38:26 -0700 Message-ID: Subject: Re: [RFC bpf-next 1/6] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc To: Daniel Xu Cc: Jesper Dangaard Brouer , Steffen Klassert , Alexei Starovoitov , Paolo Abeni , Daniel Borkmann , Jakub Kicinski , Herbert Xu , "David S. Miller" , John Fastabend , Eric Dumazet , antony.antony@secunet.com, LKML , Network Development , bpf , devel@linux-ipsec.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 31 Oct 2023 15:38:49 -0700 (PDT) On Sun, Oct 29, 2023 at 3:55=E2=80=AFPM Daniel Xu wrote: > > Hi Alexei, > > On Sat, Oct 28, 2023 at 04:49:45PM -0700, Alexei Starovoitov wrote: > > On Fri, Oct 27, 2023 at 11:46=E2=80=AFAM Daniel Xu wrot= e: > > > > > > This commit adds an unstable kfunc helper to access internal xfrm_sta= te > > > associated with an SA. This is intended to be used for the upcoming > > > IPsec pcpu work to assign special pcpu SAs to a particular CPU. In ot= her > > > words: for custom software RSS. > > > > > > That being said, the function that this kfunc wraps is fairly generic > > > and used for a lot of xfrm tasks. I'm sure people will find uses > > > elsewhere over time. > > > > > > Signed-off-by: Daniel Xu > > > --- > > > include/net/xfrm.h | 9 ++++ > > > net/xfrm/Makefile | 1 + > > > net/xfrm/xfrm_policy.c | 2 + > > > net/xfrm/xfrm_state_bpf.c | 105 ++++++++++++++++++++++++++++++++++++= ++ > > > 4 files changed, 117 insertions(+) > > > create mode 100644 net/xfrm/xfrm_state_bpf.c > > > > > > diff --git a/include/net/xfrm.h b/include/net/xfrm.h > > > index 98d7aa78adda..ab4cf66480f3 100644 > > > --- a/include/net/xfrm.h > > > +++ b/include/net/xfrm.h > > > @@ -2188,4 +2188,13 @@ static inline int register_xfrm_interface_bpf(= void) > > > > > > #endif > > > > > > +#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF) > > > +int register_xfrm_state_bpf(void); > > > +#else > > > +static inline int register_xfrm_state_bpf(void) > > > +{ > > > + return 0; > > > +} > > > +#endif > > > + > > > #endif /* _NET_XFRM_H */ > > > diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile > > > index cd47f88921f5..547cec77ba03 100644 > > > --- a/net/xfrm/Makefile > > > +++ b/net/xfrm/Makefile > > > @@ -21,3 +21,4 @@ obj-$(CONFIG_XFRM_USER_COMPAT) +=3D xfrm_compat.o > > > obj-$(CONFIG_XFRM_IPCOMP) +=3D xfrm_ipcomp.o > > > obj-$(CONFIG_XFRM_INTERFACE) +=3D xfrm_interface.o > > > obj-$(CONFIG_XFRM_ESPINTCP) +=3D espintcp.o > > > +obj-$(CONFIG_DEBUG_INFO_BTF) +=3D xfrm_state_bpf.o > > > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > > > index 5cdd3bca3637..62e64fa7ae5c 100644 > > > --- a/net/xfrm/xfrm_policy.c > > > +++ b/net/xfrm/xfrm_policy.c > > > @@ -4267,6 +4267,8 @@ void __init xfrm_init(void) > > > #ifdef CONFIG_XFRM_ESPINTCP > > > espintcp_init(); > > > #endif > > > + > > > + register_xfrm_state_bpf(); > > > } > > > > > > #ifdef CONFIG_AUDITSYSCALL > > > diff --git a/net/xfrm/xfrm_state_bpf.c b/net/xfrm/xfrm_state_bpf.c > > > new file mode 100644 > > > index 000000000000..a73a17a6497b > > > --- /dev/null > > > +++ b/net/xfrm/xfrm_state_bpf.c > > > @@ -0,0 +1,105 @@ > > > +// SPDX-License-Identifier: GPL-2.0-only > > > +/* Unstable XFRM state BPF helpers. > > > + * > > > + * Note that it is allowed to break compatibility for these function= s since the > > > + * interface they are exposed through to BPF programs is explicitly = unstable. > > > + */ > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +/* bpf_xfrm_state_opts - Options for XFRM state lookup helpers > > > + * > > > + * Members: > > > + * @error - Out parameter, set for any errors encountered > > > + * Values: > > > + * -EINVAL - netns_id is less than -1 > > > + * -EINVAL - Passed NULL for opts > > > + * -EINVAL - opts__sz isn't BPF_XFRM_STATE_OPTS_SZ > > > + * -ENONET - No network namespace found for netns_id > > > + * @netns_id - Specify the network namespace for lookup > > > + * Values: > > > + * BPF_F_CURRENT_NETNS (-1) > > > + * Use namespace associated with ctx > > > + * [0, S32_MAX] > > > + * Network Namespace ID > > > + * @mark - XFRM mark to match on > > > + * @daddr - Destination address to match on > > > + * @spi - Security parameter index to match on > > > + * @proto - L3 protocol to match on > > > + * @family - L3 protocol family to match on > > > + */ > > > +struct bpf_xfrm_state_opts { > > > + s32 error; > > > + s32 netns_id; > > > + u32 mark; > > > + xfrm_address_t daddr; > > > + __be32 spi; > > > + u8 proto; > > > + u16 family; > > > +}; > > > + > > > +enum { > > > + BPF_XFRM_STATE_OPTS_SZ =3D sizeof(struct bpf_xfrm_state_opts)= , > > > +}; > > > + > > > +__diag_push(); > > > +__diag_ignore_all("-Wmissing-prototypes", > > > + "Global functions as their definitions will be in x= frm_state BTF"); > > > + > > > +/* bpf_xdp_get_xfrm_state - Get XFRM state > > > + * > > > + * Parameters: > > > + * @ctx - Pointer to ctx (xdp_md) in XDP program > > > + * Cannot be NULL > > > + * @opts - Options for lookup (documented above) > > > + * Cannot be NULL > > > + * @opts__sz - Length of the bpf_xfrm_state_opts structure > > > + * Must be BPF_XFRM_STATE_OPTS_SZ > > > + */ > > > +__bpf_kfunc struct xfrm_state * > > > +bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opt= s *opts, u32 opts__sz) > > > +{ > > > + struct xdp_buff *xdp =3D (struct xdp_buff *)ctx; > > > + struct net *net =3D dev_net(xdp->rxq->dev); > > > + > > > + if (!opts || opts__sz !=3D BPF_XFRM_STATE_OPTS_SZ) { > > > + opts->error =3D -EINVAL; > > > + return NULL; > > > + } > > > + > > > + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) { > > > + opts->error =3D -EINVAL; > > > + return NULL; > > > + } > > > + > > > + if (opts->netns_id >=3D 0) { > > > + net =3D get_net_ns_by_id(net, opts->netns_id); > > > + if (unlikely(!net)) { > > > + opts->error =3D -ENONET; > > > + return NULL; > > > + } > > > + } > > > + > > > + return xfrm_state_lookup(net, opts->mark, &opts->daddr, opts-= >spi, > > > + opts->proto, opts->family); > > > +} > > > > Patch 6 example does little to explain how this kfunc can be used. > > Cover letter sounds promising, but no code to demonstrate the result. > > Part of the reason for that is this kfunc is intended to be used with a > not-yet-upstreamed xfrm patchset. The other is that the usage is quite > trivial. This is the code the experiments were run with: > > https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce2= 863ecb27/xdp-bench/xdp_redirect_cpumap.bpf.c#L385-L406 > > We intend to upstream that cpumap mode to xdp-tools as soon as the xfrm > patches are in. (Note the linked code is a little buggy but the > main idea is there). I don't understand how it survives anything, but sanity check. To measure perf gains it needs to be under traffic for some time, but x =3D bpf_xdp_get_xfrm_state(ctx, &opts, sizeof(opts)); will keep refcnt++ that state for every packet. Minimum -> memory leak or refcnt overflow.