Received: by 2002:a89:413:0:b0:1fd:dba5:e537 with SMTP id m19csp21258lqs; Thu, 13 Jun 2024 02:32:28 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVpKCeO2oaZDrompjUeSaCzeFjV40iTwgXEFUySSc+3412T5k1TXnXpzhYCKJZ722p5rW2H8SK/yvi+UVs/XT0qmz6A+LYb5n/3htdgLQ== X-Google-Smtp-Source: AGHT+IHoMlMr4RZNC7HIDLc8iXvzlGoAU4U7dsTY3oGyxa1tSzgfSurKkJ6gVNfxp3+657gDbOI7 X-Received: by 2002:a05:6870:1694:b0:254:d05e:4cd5 with SMTP id 586e51a60fabf-25514dfdf6bmr4720035fac.38.1718271148466; Thu, 13 Jun 2024 02:32:28 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718271148; cv=pass; d=google.com; s=arc-20160816; b=06E+Kb/gWbalG0fzwr9lCsshZLae5iqf6plnmGSUmF7RAIncNrRJ14OEuiWkBD6FY3 s2Gx41dOpvjYkCamUXm0Hx+Ysn2A+/fATUnVRl76vPdr5hSbper2setOg7UZtdgvFOeg PUzt1B38S2QdmjtoXzxOFKP79lgXI4RWLP2i1WXfdT2RaRBwHsXE5FTW/y/YEJeRyjbH mEh63UpstuYKuCdim/5GCuOZ/FbK0gmstj90oBsYtcUQGvSg3VLEUlB4P91S6O2Stu9c OgTfmNz+A+7w/p7+Lmol73D02n4F0JvFk4ICZTCHPfymJbuF77SG83fxKiDQUBNq+MOV 3bqg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=V9US63EuGTsLZEei4xFO4HOknXsEOPazwtewZnSx6yw=; fh=vkiGR7XHaMc4l2Fd9KN6HHj2DgLeNRbmR+eEXcyx6+0=; b=I+b1BxORuYZnipEcQzHaGREHxpI0Hr5mcS5VjGz+z8RzpvzS7Nk564RUOxtnF26EnM Wg+/963iH/Qz8PpBiJj+UylozyQxD94dVzKngs54oZS5VPttjdGtTcd4D2gpo7BYW+LF j0cuRCGRJjN+zMgkvs5409VupeAEVEmtplZOBUWN/FkbJ6mGMB+1dv5CV6M+ZqYV+ysi Ci8lq5kxksuExM4M9HhP9UDoZg790mZ6bTK5iq5pERQ56i9InpTjQYZDtYhWWaDhgL11 sIKqmp5Q5EACzRDjswesfqYS2yY9bLYZZHiXvoYUmrLZ0DLh1EdwWl9C+EeCPLdzysVU L2pg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=CPBAMWh4; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-212952-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-212952-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id d2e1a72fcca58-705ccbc890dsi1005021b3a.396.2024.06.13.02.32.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 02:32:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-212952-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=CPBAMWh4; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-212952-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-212952-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id A32F028452D for ; Thu, 13 Jun 2024 09:32:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D44B713F45D; Thu, 13 Jun 2024 09:32:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CPBAMWh4" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C07FA381C4; Thu, 13 Jun 2024 09:32:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718271133; cv=none; b=S0w1KgX/ONAVUokQgR2hq8QIgcoiEycCp21Enqk2VQOCU9hdL32JeBsUdnEE//5Vi94/VQD2qTrL2C28+V5b6PzipUMQa6sMS78T0wKhqkF4GEw3bCIK3gimloLl+zSnmtcPtj36Y9OB21kAQRScf9HBsA1wpY2sKz4fWZh8ojg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718271133; c=relaxed/simple; bh=PaV1/aUONukaNRB9HswUe/iOgOw9sgxe6pBxT6j1JEM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=u6P2vM7J2ZLiZIt7l9RxAoDGSU1titFVPZbAgEryr/jnkopoRRY46Ec8vyuOmM+E/vc/fogwhQwYKkcbhfXQHFWHB2NE3rVqOeDVXK/x2dWsfHZ1AVMZ8ult2r6qWMMO5DzJYD4Is6vhHqJeGbaepZ7zRKKwlRnyRZ4muo1OmDk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CPBAMWh4; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id E8250C2BBFC; Thu, 13 Jun 2024 09:32:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718271133; bh=PaV1/aUONukaNRB9HswUe/iOgOw9sgxe6pBxT6j1JEM=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=CPBAMWh4wx4yh2ijw91fOQa1Qa7DO5KZr04gK4yZXcfrsYwhZrDqB5LZ/4LGzK/g2 glNJRJ1LGahPnku4c1XwqEuSPsVxQ2gNJ1NHSO3nQ7xXjiEp57Bl5HQW3AXcMz/Yjq OF0Ana6IjFXKWdmF2PotAve4fUJ3VtXVTMOgpk2hl/MedLoslbeyTZxTZybbkHzxGg moNxltM0cNdl5sk2kw13tB1uMo4736riZT3cYaWRmyqlVEZW7wakWMfe1lXg6SkjXj rITQm8sLla1mc2XHgimlAI1ZgHigx4wDJJ1pvHFOwF67iNPSF2CCbvGwemPuoCG5aJ qPLopIEp/KfHw== Message-ID: <74985816-3a3a-490e-b8f0-49f795ab2f07@kernel.org> Date: Thu, 13 Jun 2024 11:32:04 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 net-next 14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. To: Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Daniel Bristot de Oliveira , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Hao Luo , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , =?UTF-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Yonghong Song , bpf@vger.kernel.org References: <20240612170303.3896084-1-bigeasy@linutronix.de> <20240612170303.3896084-15-bigeasy@linutronix.de> Content-Language: en-US From: Jesper Dangaard Brouer In-Reply-To: <20240612170303.3896084-15-bigeasy@linutronix.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 12/06/2024 18.44, Sebastian Andrzej Siewior wrote: > The XDP redirect process is two staged: > - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the > packet and makes decisions. While doing that, the per-CPU variable > bpf_redirect_info is used. > > - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info > and it may also access other per-CPU variables like xskmap_flush_list. > > At the very end of the NAPI callback, xdp_do_flush() is invoked which > does not access bpf_redirect_info but will touch the individual per-CPU > lists. > > The per-CPU variables are only used in the NAPI callback hence disabling > bottom halves is the only protection mechanism. Users from preemptible > context (like cpu_map_kthread_run()) explicitly disable bottom halves > for protections reasons. > Without locking in local_bh_disable() on PREEMPT_RT this data structure > requires explicit locking. > > PREEMPT_RT has forced-threaded interrupts enabled and every > NAPI-callback runs in a thread. If each thread has its own data > structure then locking can be avoided. > > Create a struct bpf_net_context which contains struct bpf_redirect_info. > Define the variable on stack, use bpf_net_ctx_set() to save a pointer to > it, bpf_net_ctx_clear() removes it again. > The bpf_net_ctx_set() may nest. For instance a function can be used from > within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and > NET_TX_SOFTIRQ which does not. Therefore only the first invocations > updates the pointer. > Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct > bpf_redirect_info. The returned data structure is zero initialized to > ensure nothing is leaked from stack. This is done on first usage of the > struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to > note that initialisation is required. First invocation of > bpf_net_ctx_get_ri() will memset() the data structure and update > bpf_redirect_info::kern_flags. > bpf_redirect_info::nh is excluded from memset because it is only used > once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is > moved past nh to exclude it from memset. > > The pointer to bpf_net_context is saved task's task_struct. Using > always the bpf_net_context approach has the advantage that there is > almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds. > > Cc: Alexei Starovoitov > Cc: Andrii Nakryiko > Cc: Eduard Zingerman > Cc: Hao Luo > Cc: Jesper Dangaard Brouer > Cc: Jiri Olsa > Cc: John Fastabend > Cc: KP Singh > Cc: Martin KaFai Lau > Cc: Song Liu > Cc: Stanislav Fomichev > Cc: Toke Høiland-Jørgensen > Cc: Yonghong Song > Cc:bpf@vger.kernel.org > Acked-by: Alexei Starovoitov > Reviewed-by: Toke Høiland-Jørgensen > Signed-off-by: Sebastian Andrzej Siewior > --- > include/linux/filter.h | 56 ++++++++++++++++++++++++++++++++++-------- > include/linux/sched.h | 3 +++ > kernel/bpf/cpumap.c | 3 +++ > kernel/bpf/devmap.c | 9 ++++++- > kernel/fork.c | 1 + > net/bpf/test_run.c | 11 ++++++++- > net/core/dev.c | 26 +++++++++++++++++++- > net/core/filter.c | 44 +++++++++------------------------ > net/core/lwt_bpf.c | 3 +++ > 9 files changed, 111 insertions(+), 45 deletions(-) > I like it :-) Acked-by: Jesper Dangaard Brouer > diff --git a/include/linux/filter.h b/include/linux/filter.h > index b02aea291b7e8..0a7f6e4a00b60 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -733,21 +733,59 @@ struct bpf_nh_params { > }; > }; > > +/* flags for bpf_redirect_info kern_flags */ > +#define BPF_RI_F_RF_NO_DIRECT BIT(0) /* no napi_direct on return_frame */ > +#define BPF_RI_F_RI_INIT BIT(1) > + > struct bpf_redirect_info { > u64 tgt_index; > void *tgt_value; > struct bpf_map *map; > u32 flags; > - u32 kern_flags; > u32 map_id; > enum bpf_map_type map_type; > struct bpf_nh_params nh; > + u32 kern_flags; > }; > > -DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); > +struct bpf_net_context { > + struct bpf_redirect_info ri; > +}; > > -/* flags for bpf_redirect_info kern_flags */ > -#define BPF_RI_F_RF_NO_DIRECT BIT(0) /* no napi_direct on return_frame */ > +static inline struct bpf_net_context *bpf_net_ctx_set(struct bpf_net_context *bpf_net_ctx) > +{ > + struct task_struct *tsk = current; > + > + if (tsk->bpf_net_context != NULL) > + return NULL; > + bpf_net_ctx->ri.kern_flags = 0; > + > + tsk->bpf_net_context = bpf_net_ctx; > + return bpf_net_ctx; > +} > + > +static inline void bpf_net_ctx_clear(struct bpf_net_context *bpf_net_ctx) > +{ > + if (bpf_net_ctx) > + current->bpf_net_context = NULL; > +} > + > +static inline struct bpf_net_context *bpf_net_ctx_get(void) > +{ > + return current->bpf_net_context; > +} > + > +static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void) > +{ > + struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); > + > + if (!(bpf_net_ctx->ri.kern_flags & BPF_RI_F_RI_INIT)) { > + memset(&bpf_net_ctx->ri, 0, offsetof(struct bpf_net_context, ri.nh)); > + bpf_net_ctx->ri.kern_flags |= BPF_RI_F_RI_INIT; > + } > + > + return &bpf_net_ctx->ri; > +} > > /* Compute the linear packet data range [data, data_end) which > * will be accessed by various program types (cls_bpf, act_bpf, > @@ -1018,25 +1056,23 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off, > const struct bpf_insn *patch, u32 len); > int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt); > > -void bpf_clear_redirect_map(struct bpf_map *map); > - > static inline bool xdp_return_frame_no_direct(void) > { > - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); > + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); > > return ri->kern_flags & BPF_RI_F_RF_NO_DIRECT; > } > > static inline void xdp_set_return_frame_no_direct(void) > { > - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); > + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); > > ri->kern_flags |= BPF_RI_F_RF_NO_DIRECT; > } > > static inline void xdp_clear_return_frame_no_direct(void) > { > - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); > + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); > > ri->kern_flags &= ~BPF_RI_F_RF_NO_DIRECT; > } > @@ -1592,7 +1628,7 @@ static __always_inline long __bpf_xdp_redirect_map(struct bpf_map *map, u64 inde > u64 flags, const u64 flag_mask, > void *lookup_elem(struct bpf_map *map, u32 key)) > { > - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); > + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); > const u64 action_mask = XDP_ABORTED | XDP_DROP | XDP_PASS | XDP_TX;