Received: by 2002:a05:6500:1b8f:b0:1fa:5c73:8e2d with SMTP id df15csp925462lqb; Wed, 29 May 2024 15:09:45 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUWR8zp83wMpB5l+UvP0Z1iPN62qikfqyhtPLP/jWx9Ga+wrOm+nj5EDV5yTBTZAWB8qxDminfZIBoqwTFP7ZS5ZXVZSCJPdDfRnqdb9w== X-Google-Smtp-Source: AGHT+IG+lrX2/guFFT3GmugMiEA6+nN/cFGPtQmJKGOpLOXmSGf0zHEgNhn0q+20eZNz1rWfyB0Z X-Received: by 2002:a17:90b:1012:b0:2bf:e473:7045 with SMTP id 98e67ed59e1d1-2c1acd998dcmr129325a91.21.1717020585162; Wed, 29 May 2024 15:09:45 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717020585; cv=pass; d=google.com; s=arc-20160816; b=XCfLUqLM0qDCUS2u1fIDZeGSzzyJBD/E2mUzOV1/x2oEzGLJJPS/gyJjGkIcnUsC95 +RcAzOhW2VZaqjWBauVtchgVxUZaaR5vtWSVIw+H6H9585p2xtHaUI7AJBD708ocH1Fo /3LLHokVD6K/F4IXyohnlO5sVhUdgfPS2DP/QaAmw4dWbzKWmcWBPEuwa6lbu/2oTE3J ztINCG1NyJGn52jnPSQoDDVjMxQeM5I4jNY/YFs363pEncbc7BMYCsx9JFE3EneJBDuh O0BHpFXIITysDgG7Oa0hfRRphkVCOXgMhHCd/huYKIGXor+sIquhPyCuVfM+zBPrOCTn zhWA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:references :in-reply-to:subject:cc:to:from:dkim-signature; bh=z9ow/XSyHWasa02XS2ZsOAzreuOoRcDV7JFckpkP2k0=; fh=+dPtzeMEFYKHRxzfWY1rq3hibLzn7wHNOzS819Sec2M=; b=G8botLNJDN5GtivBKS8lYo5KIrJ9Xf9fLG2Z/DM0L9keMv11EzwYyamb4NbuMSicZE 1+qGbxnBgiQWpPv3gCXVpcrrL0CQT35PDrKP2xblP6Y+bKoWKzXplUPjllNPZxWm4BjM 6cufTs8KlhG+XXcerxp/G0BlPwTLxegs6owvs5QUvyKKRuc9ZOknX6qO6QBOWKUOV2IK amQIuziU1FHf98AkB/FsNfvFIKqQnHkQ1vlE+mMy/fjju6kdqnLXOFKtBuaHHHG4aizP EJZmh8jB1Tc9In7Ns+9VJZ2jSqLrwo6WnNSd+EOw3Mq+C+G5+wez8EYq4hQVXWhah8Kk G9qg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RfqzMy4I; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-194731-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-194731-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id 41be03b00d2f7-6822779b92esi10797423a12.371.2024.05.29.15.09.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 May 2024 15:09:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-194731-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RfqzMy4I; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-194731-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-194731-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id DC8A3B22FC0 for ; Wed, 29 May 2024 22:09:43 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 64B3E1CB306; Wed, 29 May 2024 22:09:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RfqzMy4I" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A88CF1C68B4 for ; Wed, 29 May 2024 22:09:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717020569; cv=none; b=u7+BABMJ/u/EgNgrTQqk32JaF5aitZbR9hz2jk9xaDDBunGrKjgQSupK+pG0tFX4O6LNBsG+z7CUB2yatoQoiFe1h9F+9JWzNuOYun08rRQiR8kcDq6F7MP3YZZ6ja5G4JBweeGghLWkghfutBfWQAlxWWJnlVdx5qnlXKEKEJ4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717020569; c=relaxed/simple; bh=0vYVZzUaVKP4uUJpRFZyVVfF/2Uv+MQRVxcUlKxLmrw=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=ZZ1RrsW1478BF3IJTckkfBN6RaRhJUskuX0q/US7bhGpbRBKiJV0nCrQAg81ggpm4NRW9KRVRwQD3KnquGq8oKfd+1Ii9NisEAsjAOUR4ieZk5ZXrMbZWTxbuoM/ZhWxT9MHsUfp/6DcLQgmdwB4M+CKjTIWhh4qGRbUmUMnK/0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RfqzMy4I; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1717020566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z9ow/XSyHWasa02XS2ZsOAzreuOoRcDV7JFckpkP2k0=; b=RfqzMy4IRHBCPh4LB2wFNYo3tI2k4XNTcIbUWXpDKIHqdnIk2q4clSMvoZcBcwDPkJFzri p1LerShYtyO0ml3BWyoZ3904FsbM141DScq4PjrnlfHA7ppSr25gFbsNnrVeQmU4ZxTDxj ei74VNbWkVaGiOYRDMiWnrTCKq+fqtc= Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-62-gt9wsJUSNTaeJmjSLZNBbg-1; Wed, 29 May 2024 18:09:25 -0400 X-MC-Unique: gt9wsJUSNTaeJmjSLZNBbg-1 Received: by mail-lj1-f199.google.com with SMTP id 38308e7fff4ca-2e95ad43650so870031fa.3 for ; Wed, 29 May 2024 15:09:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717020564; x=1717625364; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z9ow/XSyHWasa02XS2ZsOAzreuOoRcDV7JFckpkP2k0=; b=q/x0DZofWSxPYBGHKvZEdcsgvpxXdyTWFsZxAZrB3Mg7TJ8r0uRqnnTmUKIZKMgK3r zOPqzvNilYZrTbItcbc2T4CWzOqTurYBBq9z98iYovxa4oTvoEyeAMiZqmPqr6kofGPd 45tCejcg76bTKdeIr8celbNZrtem7l570KkoHP6TdPcZwpZ6wYUrqGOsQEPkhMLzwPT4 7FUhw/j5b9V/c0RyjtmgnLq3AIy7DG64x/qrvtJrpISTO8EGKGkmKTA/41K1DOHfcbeW kjbWg4ZI1LYGA9AjFy4ScKyRfg1NljV/VMKdTN3IrnacEjXZLZu5bY7TA0f12PyGHruQ mimA== X-Forwarded-Encrypted: i=1; AJvYcCUGANwZ/VDi31WsdQWpyyPtnJlkrlO8zeMyj8kbxTMjpHO/aDH+beX4iprGnJ7wJMGjpeQPrPFWblTL5TQO6adz+lrs6rdiTpGnsSbW X-Gm-Message-State: AOJu0YxVebKfXxxSIDDS6d3TmO6bZWbtP0+cuef7o7/3jbr+s6GDxno0 Y/InSF0y0GbqtI8h1WpKx9z7mNLqh+XToxvpZ0L48hJG3Xl7fPXJI+o+hs255bRD1soejSgtlm7 ktGv+bKo6ncBqKinQwPn4v4TTsrDyjpPmZWJUtq+oDAbqW75d8vcHw9MoqAcogQ== X-Received: by 2002:a2e:96d7:0:b0:2e4:7996:f9f0 with SMTP id 38308e7fff4ca-2ea847a4e39mr835011fa.17.1717020563974; Wed, 29 May 2024 15:09:23 -0700 (PDT) X-Received: by 2002:a2e:96d7:0:b0:2e4:7996:f9f0 with SMTP id 38308e7fff4ca-2ea847a4e39mr834621fa.17.1717020563092; Wed, 29 May 2024 15:09:23 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5785ed2bb6asm8099164a12.26.2024.05.29.15.09.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 May 2024 15:09:22 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id B488712F7F44; Thu, 30 May 2024 00:09:21 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , Alexei Starovoitov , Andrii Nakryiko , Eduard Zingerman , Hao Luo , Jesper Dangaard Brouer , Jiri Olsa , John Fastabend , KP Singh , Martin KaFai Lau , Song Liu , Stanislav Fomichev , Yonghong Song , bpf@vger.kernel.org Subject: Re: [PATCH v3 net-next 14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. In-Reply-To: <20240529162927.403425-15-bigeasy@linutronix.de> References: <20240529162927.403425-1-bigeasy@linutronix.de> <20240529162927.403425-15-bigeasy@linutronix.de> X-Clacks-Overhead: GNU Terry Pratchett Date: Thu, 30 May 2024 00:09:21 +0200 Message-ID: <87y17sfey6.fsf@toke.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sebastian Andrzej Siewior writes: > The XDP redirect process is two staged: > - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the > packet and makes decisions. While doing that, the per-CPU variable > bpf_redirect_info is used. > > - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info > and it may also access other per-CPU variables like xskmap_flush_list. > > At the very end of the NAPI callback, xdp_do_flush() is invoked which > does not access bpf_redirect_info but will touch the individual per-CPU > lists. > > The per-CPU variables are only used in the NAPI callback hence disabling > bottom halves is the only protection mechanism. Users from preemptible > context (like cpu_map_kthread_run()) explicitly disable bottom halves > for protections reasons. > Without locking in local_bh_disable() on PREEMPT_RT this data structure > requires explicit locking. > > PREEMPT_RT has forced-threaded interrupts enabled and every > NAPI-callback runs in a thread. If each thread has its own data > structure then locking can be avoided. > > Create a struct bpf_net_context which contains struct bpf_redirect_info. > Define the variable on stack, use bpf_net_ctx_set() to save a pointer to > it. Use the __free() annotation to automatically reset the pointer once > function returns. > The bpf_net_ctx_set() may nest. For instance a function can be used from > within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and > NET_TX_SOFTIRQ which does not. Therefore only the first invocations > updates the pointer. > Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct > bpf_redirect_info. > > On PREEMPT_RT the pointer to bpf_net_context is saved task's > task_struct. On non-PREEMPT_RT builds the pointer saved in a per-CPU > variable (which is always NODE-local memory). Using always the > bpf_net_context approach has the advantage that there is almost zero > differences between PREEMPT_RT and non-PREEMPT_RT builds. > > Cc: Alexei Starovoitov > Cc: Andrii Nakryiko > Cc: Eduard Zingerman > Cc: Hao Luo > Cc: Jesper Dangaard Brouer > Cc: Jiri Olsa > Cc: John Fastabend > Cc: KP Singh > Cc: Martin KaFai Lau > Cc: Song Liu > Cc: Stanislav Fomichev > Cc: Toke H=C3=B8iland-J=C3=B8rgensen > Cc: Yonghong Song > Cc: bpf@vger.kernel.org > Signed-off-by: Sebastian Andrzej Siewior [...] > @@ -240,12 +240,14 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_= entry *rcpu, void **frames, > int xdp_n, struct xdp_cpumap_stats *stats, > struct list_head *list) > { > + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; > int nframes; I think we need to zero-initialise all the context objects we allocate on the stack. The reason being that an XDP program can return XDP_REDIRECT without calling any of the redirect helpers first; which will lead to xdp_do_redirect() being called without any of the fields in struct bpf_redirect_info having being set. This can lead to a crash if the values happen to be the wrong value; and if we're not initialising the stack space used by this struct, we have no guarantees about what value they will end up with. We fixed a similar bug relatively recently, see: 5bcf0dcbf906 ("xdp: use flags field to disambiguate broadcast redirect") > void bpf_clear_redirect_map(struct bpf_map *map) > { > - struct bpf_redirect_info *ri; > - int cpu; > - > - for_each_possible_cpu(cpu) { > - ri =3D per_cpu_ptr(&bpf_redirect_info, cpu); > - /* Avoid polluting remote cacheline due to writes if > - * not needed. Once we pass this test, we need the > - * cmpxchg() to make sure it hasn't been changed in > - * the meantime by remote CPU. > - */ > - if (unlikely(READ_ONCE(ri->map) =3D=3D map)) > - cmpxchg(&ri->map, map, NULL); > - } > + /* ri->map is assigned in __bpf_xdp_redirect_map() from within a eBPF > + * program/ during NAPI callback. It is used during > + * xdp_do_generic_redirect_map()/ __xdp_do_redirect_frame() from the > + * redirect callback afterwards. ri->map is cleared after usage. > + * The path has no explicit RCU read section but the local_bh_disable() > + * is also a RCU read section which makes the complete softirq callback > + * RCU protected. This in turn makes ri->map RCU protected and it is > + * sufficient to wait a grace period to ensure that no "ri->map =3D=3D = map" > + * exists. dev_map_free() removes the map from the list and then > + * invokes synchronize_rcu() after calling this function. > + */ > } With the zeroing of the stack variable mentioned above, I agree that this is not needed anymore, but I think we should just get rid of the function entirely and put a comment in devmap.c instead of the call to the (now empty) function. -Toke