Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp6828267rdb; Fri, 15 Dec 2023 09:17:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IFGuU+NowCsnj/qtv6iqfLCKyozy4dXl3ZWLZLBvI+X2ttvzaYlS3DmNo/yh27Vz8/VWD+D X-Received: by 2002:a17:903:2282:b0:1d3:39fd:9043 with SMTP id b2-20020a170903228200b001d339fd9043mr4168957plh.13.1702660660363; Fri, 15 Dec 2023 09:17:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702660660; cv=none; d=google.com; s=arc-20160816; b=tLfWhmcy9xTeQMP+orulATqAULoprqd07623keULbkfFkxip9wfeE+L7qq2iFAPDuU CkmIlpuz5Ijy9nK5mZ8Q2AB12NZEXoM3dslKhp+jZ8Y5CFauyhQX9BEuJLxYAwJG6rnq /YJZRJNcdi80Ob8wsUjPCVmUgxVKkied96xK55aQ+fB6JryTMQ8yMHECD7i4t78WA4mT /9IDq4E4iLR+6uLd+xSWjrVTUDUm6MtUHXoTOjw4OheGhCYRhw8Ku75HDl6h2gUaJ8oZ Jzbgjdj6AAQcFpOgvh6/NiIndsOL+HgUm0e81/c0dSC8W8DNleo+yR8lb+ozJzTpDYCU Xi8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:dkim-signature:dkim-signature:from; bh=+0HJJPub1Rbn3MJWlEsGP0R+kJz2sY8/FL1YHZqJTPQ=; fh=KixjP/Jtrq6k+7tK5D9LTn4dZ0DXEfC1kRtQGbZy5Ek=; b=kipoHQMcvyMQ5XSyj0AbpyZL9a64RnP3bRs7jnY1n0+HKzfsvOncV1+DCt4YxxyRvJ qqG1/80MYp6PB5xqB5OuYXk65TPzSjcxgT7lwdZVHD58W6LaYi92HsW3NXKNNLgwlsh4 UTlM0+OoGLKpjiMzSK5x4TF67eni12aQ4yV0p560x7ZRTjsVDrBCTjywR9lOJu36gtQF Z19Aw3LKtgwa9xLSaGf1QnZjRLppCqRmCLqG5d8X2OKU9Hn4w5aL6z9f3bxTBlr6aJ6/ T15XNMWwbv1CwdlTi7OMKbarRNz6qiObY1EQahoiy0cDF5R6gpTRXJ6VNQPWwa/IHRsG nmHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="bu//WPoJ"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=IFHcLtVI; spf=pass (google.com: domain of linux-kernel+bounces-1380-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-1380-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id p13-20020a170902e74d00b001d33cc8086esi6368596plf.443.2023.12.15.09.17.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Dec 2023 09:17:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-1380-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="bu//WPoJ"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=IFHcLtVI; spf=pass (google.com: domain of linux-kernel+bounces-1380-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-1380-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 30370B229AF for ; Fri, 15 Dec 2023 17:17:06 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DDEA96BB2B; Fri, 15 Dec 2023 17:10:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="bu//WPoJ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="IFHcLtVI" X-Original-To: linux-kernel@vger.kernel.org Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E197547F7F; Fri, 15 Dec 2023 17:10:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702660239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+0HJJPub1Rbn3MJWlEsGP0R+kJz2sY8/FL1YHZqJTPQ=; b=bu//WPoJj1s+CyuzY9fCzuWUPNElBTG0mDAue5MtksSZ/nlXc3JYfccn6tG0UR7/s9bLcB TyHwPF9rL6/pk9aAThq84QZ/O0PPk1BIhRiaV+wWrm874nalOaix3eDsDuVspKXyfMpQsX SEP7qczM42BqZeYaUbDx8DFkqux75/XmDcmsMW8XVXUoArSSTH/2/SLlrKU/Y/BeoEFJ1G niUm3L0GNIf2r6tGor3HgSQ2mPyJbN54p8y9ylbMz9svBrqSDE0+Rvfj8QwFQ3UWZybFQ3 9KKmqHpMn8OyR/lDx/Ew+Azv1ohXTln9Yj3vk5ZeyQyuhF5aBTxjwvDJbtZIEw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702660239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+0HJJPub1Rbn3MJWlEsGP0R+kJz2sY8/FL1YHZqJTPQ=; b=IFHcLtVItNS1jqIMdMN6dYTNqvyEKzMDmeSXBvDnw0eZeuYwuQGT7alkK3RqsOXKSI1Vp9 o+kMgK4SkJMS6CBw== To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior , "K. Y. Srinivasan" , "Michael S. Tsirkin" , Alexei Starovoitov , Andrii Nakryiko , Dexuan Cui , Haiyang Zhang , Hao Luo , Jesper Dangaard Brouer , Jiri Olsa , John Fastabend , Juergen Gross , KP Singh , Martin KaFai Lau , Nikolay Aleksandrov , Song Liu , Stanislav Fomichev , Stefano Stabellini , Wei Liu , Willem de Bruijn , Xuan Zhuo , Yonghong Song , bpf@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org Subject: [PATCH net-next 16/24] net: netkit, veth, tun, virt*: Use nested-BH locking for XDP redirect. Date: Fri, 15 Dec 2023 18:07:35 +0100 Message-ID: <20231215171020.687342-17-bigeasy@linutronix.de> In-Reply-To: <20231215171020.687342-1-bigeasy@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable The per-CPU variables used during bpf_prog_run_xdp() invocation and later during xdp_do_redirect() rely on disabled BH for their protection. Without locking in local_bh_disable() on PREEMPT_RT these data structure require explicit locking. This is a follow-up on the previous change which introduced bpf_run_lock.redirect_lock and uses it now within drivers. The simple way is to acquire the lock before bpf_prog_run_xdp() is invoked and hold it until the end of function. This does not always work because some drivers (cpsw, atlantic) invoke xdp_do_flush() in the same context. Acquiring the lock in bpf_prog_run_xdp() and dropping in xdp_do_redirect() (without touching drivers) does not work because not all driver, which use bpf_prog_run_xdp(), do support XDP_REDIRECT (and invoke xdp_do_redirect()). Ideally the minimal locking scope would be bpf_prog_run_xdp() + xdp_do_redirect() and everything else (error recovery, DMA unmapping, free/ alloc of memory, =E2=80=A6) would happen outside of the locked sectio= n. Cc: "K. Y. Srinivasan" Cc: "Michael S. Tsirkin" Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Dexuan Cui Cc: Haiyang Zhang Cc: Hao Luo Cc: Jesper Dangaard Brouer Cc: Jiri Olsa Cc: John Fastabend Cc: Juergen Gross Cc: KP Singh Cc: Martin KaFai Lau Cc: Nikolay Aleksandrov Cc: Song Liu Cc: Stanislav Fomichev Cc: Stefano Stabellini Cc: Wei Liu Cc: Willem de Bruijn Cc: Xuan Zhuo Cc: Yonghong Song Cc: bpf@vger.kernel.org Cc: virtualization@lists.linux.dev Cc: xen-devel@lists.xenproject.org Signed-off-by: Sebastian Andrzej Siewior --- drivers/net/hyperv/netvsc_bpf.c | 1 + drivers/net/netkit.c | 13 +++++++---- drivers/net/tun.c | 28 +++++++++++++---------- drivers/net/veth.c | 40 ++++++++++++++++++++------------- drivers/net/virtio_net.c | 1 + drivers/net/xen-netfront.c | 1 + 6 files changed, 52 insertions(+), 32 deletions(-) diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bp= f.c index 4a9522689fa4f..55f8ca92ca199 100644 --- a/drivers/net/hyperv/netvsc_bpf.c +++ b/drivers/net/hyperv/netvsc_bpf.c @@ -58,6 +58,7 @@ u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc= _channel *nvchan, =20 memcpy(xdp->data, data, len); =20 + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act =3D bpf_prog_run_xdp(prog, xdp); =20 switch (act) { diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c index 39171380ccf29..fbcf78477bda8 100644 --- a/drivers/net/netkit.c +++ b/drivers/net/netkit.c @@ -80,8 +80,15 @@ static netdev_tx_t netkit_xmit(struct sk_buff *skb, stru= ct net_device *dev) netkit_prep_forward(skb, !net_eq(dev_net(dev), dev_net(peer))); skb->dev =3D peer; entry =3D rcu_dereference(nk->active); - if (entry) - ret =3D netkit_run(entry, skb, ret); + if (entry) { + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + ret =3D netkit_run(entry, skb, ret); + if (ret =3D=3D NETKIT_REDIRECT) { + dev_sw_netstats_tx_add(dev, 1, len); + skb_do_redirect(skb); + } + } + } switch (ret) { case NETKIT_NEXT: case NETKIT_PASS: @@ -95,8 +102,6 @@ static netdev_tx_t netkit_xmit(struct sk_buff *skb, stru= ct net_device *dev) } break; case NETKIT_REDIRECT: - dev_sw_netstats_tx_add(dev, 1, len); - skb_do_redirect(skb); break; case NETKIT_DROP: default: diff --git a/drivers/net/tun.c b/drivers/net/tun.c index afa5497f7c35c..fe0d31f11e4b6 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1708,16 +1708,18 @@ static struct sk_buff *tun_build_skb(struct tun_str= uct *tun, xdp_init_buff(&xdp, buflen, &tfile->xdp_rxq); xdp_prepare_buff(&xdp, buf, pad, len, false); =20 - act =3D bpf_prog_run_xdp(xdp_prog, &xdp); - if (act =3D=3D XDP_REDIRECT || act =3D=3D XDP_TX) { - get_page(alloc_frag->page); - alloc_frag->offset +=3D buflen; - } - err =3D tun_xdp_act(tun, xdp_prog, &xdp, act); - if (err < 0) { - if (act =3D=3D XDP_REDIRECT || act =3D=3D XDP_TX) - put_page(alloc_frag->page); - goto out; + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act =3D bpf_prog_run_xdp(xdp_prog, &xdp); + if (act =3D=3D XDP_REDIRECT || act =3D=3D XDP_TX) { + get_page(alloc_frag->page); + alloc_frag->offset +=3D buflen; + } + err =3D tun_xdp_act(tun, xdp_prog, &xdp, act); + if (err < 0) { + if (act =3D=3D XDP_REDIRECT || act =3D=3D XDP_TX) + put_page(alloc_frag->page); + goto out; + } } =20 if (err =3D=3D XDP_REDIRECT) @@ -2460,8 +2462,10 @@ static int tun_xdp_one(struct tun_struct *tun, xdp_init_buff(xdp, buflen, &tfile->xdp_rxq); xdp_set_data_meta_invalid(xdp); =20 - act =3D bpf_prog_run_xdp(xdp_prog, xdp); - ret =3D tun_xdp_act(tun, xdp_prog, xdp, act); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act =3D bpf_prog_run_xdp(xdp_prog, xdp); + ret =3D tun_xdp_act(tun, xdp_prog, xdp, act); + } if (ret < 0) { put_page(virt_to_head_page(xdp->data)); return ret; diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 977861c46b1fe..c69e5ff9f8795 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -624,7 +624,18 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_= rq *rq, xdp->rxq =3D &rq->xdp_rxq; vxbuf.skb =3D NULL; =20 - act =3D bpf_prog_run_xdp(xdp_prog, xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act =3D bpf_prog_run_xdp(xdp_prog, xdp); + if (act =3D=3D XDP_REDIRECT) { + orig_frame =3D *frame; + xdp->rxq->mem =3D frame->mem; + if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { + frame =3D &orig_frame; + stats->xdp_drops++; + goto err_xdp; + } + } + } =20 switch (act) { case XDP_PASS: @@ -644,13 +655,6 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_= rq *rq, rcu_read_unlock(); goto xdp_xmit; case XDP_REDIRECT: - orig_frame =3D *frame; - xdp->rxq->mem =3D frame->mem; - if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { - frame =3D &orig_frame; - stats->rx_drops++; - goto err_xdp; - } stats->xdp_redirect++; rcu_read_unlock(); goto xdp_xmit; @@ -857,7 +861,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq= *rq, orig_data =3D xdp->data; orig_data_end =3D xdp->data_end; =20 - act =3D bpf_prog_run_xdp(xdp_prog, xdp); + scoped_guard(local_lock_nested_bh, &bpf_run_lock.redirect_lock) { + act =3D bpf_prog_run_xdp(xdp_prog, xdp); + if (act =3D=3D XDP_REDIRECT) { + veth_xdp_get(xdp); + consume_skb(skb); + xdp->rxq->mem =3D rq->xdp_mem; + if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { + stats->rx_drops++; + goto err_xdp; + } + } + } =20 switch (act) { case XDP_PASS: @@ -875,13 +890,6 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq= *rq, rcu_read_unlock(); goto xdp_xmit; case XDP_REDIRECT: - veth_xdp_get(xdp); - consume_skb(skb); - xdp->rxq->mem =3D rq->xdp_mem; - if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { - stats->rx_drops++; - goto err_xdp; - } stats->xdp_redirect++; rcu_read_unlock(); goto xdp_xmit; diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index d16f592c2061f..5e362c4604239 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1010,6 +1010,7 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_p= rog, struct xdp_buff *xdp, int err; u32 act; =20 + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act =3D bpf_prog_run_xdp(xdp_prog, xdp); u64_stats_inc(&stats->xdp_packets); =20 diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index ad29f370034e4..e3daa8cdeb84e 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -978,6 +978,7 @@ static u32 xennet_run_xdp(struct netfront_queue *queue,= struct page *pdata, xdp_prepare_buff(xdp, page_address(pdata), XDP_PACKET_HEADROOM, len, false); =20 + guard(local_lock_nested_bh)(&bpf_run_lock.redirect_lock); act =3D bpf_prog_run_xdp(prog, xdp); switch (act) { case XDP_TX: --=20 2.43.0