Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp2182790rbb; Tue, 27 Feb 2024 13:24:59 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUpTTEV1suXfHBQO1wd6NId+BZVuNi3Dyjh24ZDH6XpI7fbq/Zedfy7NGwKL5Lxs4P6CLYdI9t1V+Cat3qSOz5wwqkDjAeI4iPlPBdX6w== X-Google-Smtp-Source: AGHT+IFAXVTUVpex+s7U+YK3YgzVDJ4S5rYUwXNPLI6jOOjyecTkfejO1fQchVV2goGcEo1vlP9o X-Received: by 2002:aa7:d0ca:0:b0:565:862d:1c58 with SMTP id u10-20020aa7d0ca000000b00565862d1c58mr7972112edo.8.1709069099607; Tue, 27 Feb 2024 13:24:59 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709069099; cv=pass; d=google.com; s=arc-20160816; b=SOAkV0GiT1YQ9gUnP5K6lydjWpIq5BSxP+W1VBVMkZFhnRrF0ZXcGBpAB34e0hS0dM RoXIjCzroGjNGHPyTDskeBIVT5oZrL4j/1LO1Z1LEavrdbRJvuw4iJt0m7d5aGoU1aKH fanHhrorplg41tfuLqhd639qJQkS1yfBsR6WoPBg0bYmv1sjjfhke4Kk9qjqXsCilUrT Fu5p18KKyHCoUDq2/pqPk//zUUDlBA+E661XBRIxG1zdpWTUIhDv/CRiVzBmmFPB1NBk gZGpecfJQiSCmVOnmkt4c5N0BjUdLUFiW7CFkoScDG84Hib9e4JdU92RplshFiYnKNSJ ITAg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=ZhErYklq/EJPQHwU+q7yFPhN9tYP/IYN1YoBwGWgE3U=; fh=YwCbVsxoUjXZA8/chXIno8ZyTFyCHGearEbVM0G5mEA=; b=jpzJipfCchKi4mbLwS9pC4dD7cJUE8+bh9lTYxuORtN9qK7Vyd2pref0Ip1NJPQwk7 00iv190G7E6PQFXnPddgOoKIYQB1q8NhAqYqyF28NUqjmQN5WELh6llNcVHidlLoHNs1 eapUNMctvGqfE1Q1fUrIjHGYrmD7oAFiK+kKk4iblMgAnt6XuEEYeu6bfH0SBw1rLbWn q+6zHe7l3Qju+iJSfwglfguAqlMtHvTACeEIuDvqonjBilvMTWlGdRPnPDJBkFda3vHu xrwKR8ctk/tnL1ofxwT297whWXa5FyRBgQEkHhQfFnSdNcwVNY7LCccRMu/mJ254fZQ9 kccg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google09082023 header.b=bRfHDvAC; arc=pass (i=1 spf=pass spfdomain=cloudflare.com dkim=pass dkdomain=cloudflare.com dmarc=pass fromdomain=cloudflare.com); spf=pass (google.com: domain of linux-kernel+bounces-84029-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-84029-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id r6-20020a056402234600b005666676215asi19960eda.84.2024.02.27.13.24.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 13:24:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-84029-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google09082023 header.b=bRfHDvAC; arc=pass (i=1 spf=pass spfdomain=cloudflare.com dkim=pass dkdomain=cloudflare.com dmarc=pass fromdomain=cloudflare.com); spf=pass (google.com: domain of linux-kernel+bounces-84029-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-84029-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 737A01F2B765 for ; Tue, 27 Feb 2024 21:24:09 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4424214F98B; Tue, 27 Feb 2024 21:23:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b="bRfHDvAC" Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2AF2B14EFED for ; Tue, 27 Feb 2024 21:23:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709068993; cv=none; b=dX/bVzvSNGHTAb7BjkWUrC7jszrWX2ulCvxCblpzhMJyHKXVEma/G8RpyTQsR5KwPe05TGBx2W8hxfFSIcQ5i9bqwSXOEJY3NIoANFODYWr5Tine4b3fDC4vAE4RLomj0/2bWbeqXSHw9SEXub65N24I1GiuBX3xHCtDso03sN0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709068993; c=relaxed/simple; bh=EbE0NpjDsX2f82kmxrArsWy71nW9ItuJ2Y3sA+HtFsE=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=pgmWPJuG95bvWu8T+DlEuCQjt8vbR/uqcNdcLHyyr1BRYz/5VOM1JfOyM5ni8OUuFi+kicD8XqMsbTyNEFyhbjsYlMPWeF15c/etAwUFhtNQumQrBAvUn6aq+Wn5ZDyEAZCtiCU88DIUa5HMgRwAyIr+c6pc9RLETAkXkSNg180= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com; spf=pass smtp.mailfrom=cloudflare.com; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b=bRfHDvAC; arc=none smtp.client-ip=209.85.208.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cloudflare.com Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-5654f700705so6326448a12.1 for ; Tue, 27 Feb 2024 13:23:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1709068988; x=1709673788; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZhErYklq/EJPQHwU+q7yFPhN9tYP/IYN1YoBwGWgE3U=; b=bRfHDvACxttR2PALUf0DnK+G3/FSee3l2zXGVIBjPH3Lh0E9DmrPLpfdJBBbNF4eQj uxI0PqL+/RkxG7DF6CUwdYEI+mTmZezXjFtDic2we0khgK9m0EIvO4WBcpdnN4EtLFtw XupYlgsGdZSdnodONJXdK2zMq/HKlj+ibH1P+c4AYIe0hGGKLhQr/OkfXRSZ6TJ2C7P9 jqfpvBe1ouhCzFWEIsmfuzglxUCbaQyTvt3km34xrTRQqfa2lcb3ENOE4VkgDk+pfcXF fW3FAThtk28QgAH+Mv4ngfJVL4t31lDxFQwtZgeRjOXxbAetnuh6wqsk7Zk/7dTF9ngR mtxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709068988; x=1709673788; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZhErYklq/EJPQHwU+q7yFPhN9tYP/IYN1YoBwGWgE3U=; b=kJRkW6G7CPJlbFppzFvqz3O+ZOibJuh8AjYvNFusv1Oap7FTuTR9t+Jp9CdWV3A2I1 7C51/GFsXbNyyzRkBvYa3Sfsxn6h0kkc2cjYzSdXMkDGEJC7wx4CS2Jj6mFta+meNxCG 9tl65o8BbnNiVkUV4vufxq19egd9QwlmMDchyX/t7msUbrmklg5HFO0usQwqbd5JGzq1 w5MiZrFNo6vpLDYN5knVIWFCz/FJ6YKUFGys21jkjBIBjjeiO+C07YvrPIf/p4iK8cos b/AaVJcs3K9hzrD3AMB4zHCmoxDAXp9+Vu5J/qmodXrXCm96Z9xVz+dePvopwZR/++Sx ns2g== X-Forwarded-Encrypted: i=1; AJvYcCVDq/oXRltfJWBLWasTG+N+qfA6Fevb4+i9CP1YOcVR8X6hfrnXOKt3BgQ6xu7da7AueGHhUssDJhbCaYCxaZ0Abt2t8DogbI06L8W6 X-Gm-Message-State: AOJu0YwVHveOgGnl9u/a33PnXzyWcXTBUnI9LPoeGpaWgqpcWlEVNYKw 5gqjiSW/Iz3Gyr2EnB430gjnJvXZCYQQDd6559ehyOXBoVcAS3zRvAWZ92M7f0Mprag7s2PllDx TWCF7t7O6eTpVdXFWF2vP313ahdmk3Qacj+CJVg== X-Received: by 2002:aa7:c456:0:b0:565:9fff:6046 with SMTP id n22-20020aa7c456000000b005659fff6046mr6700370edr.3.1709068988451; Tue, 27 Feb 2024 13:23:08 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: In-Reply-To: From: Yan Zhai Date: Tue, 27 Feb 2024 15:22:57 -0600 Message-ID: Subject: Re: [PATCH] net: raise RCU qs after each threaded NAPI poll To: paulmck@kernel.org Cc: Eric Dumazet , netdev@vger.kernel.org, "David S. Miller" , Jakub Kicinski , Paolo Abeni , Jiri Pirko , Simon Horman , Daniel Borkmann , Lorenzo Bianconi , Coco Li , Wei Wang , Alexander Duyck , Hannes Frederic Sowa , linux-kernel@vger.kernel.org, rcu@vger.kernel.org, bpf@vger.kernel.org, kernel-team@cloudflare.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Feb 27, 2024 at 12:32=E2=80=AFPM Paul E. McKenney wrote: > > On Tue, Feb 27, 2024 at 05:44:17PM +0100, Eric Dumazet wrote: > > On Tue, Feb 27, 2024 at 4:44=E2=80=AFPM Yan Zhai w= rote: > > > > > > We noticed task RCUs being blocked when threaded NAPIs are very busy = in > > > production: detaching any BPF tracing programs, i.e. removing a ftrac= e > > > trampoline, will simply block for very long in rcu_tasks_wait_gp. Thi= s > > > ranges from hundreds of seconds to even an hour, severely harming any > > > observability tools that rely on BPF tracing programs. It can be > > > easily reproduced locally with following setup: > > > > > > ip netns add test1 > > > ip netns add test2 > > > > > > ip -n test1 link add veth1 type veth peer name veth2 netns test2 > > > > > > ip -n test1 link set veth1 up > > > ip -n test1 link set lo up > > > ip -n test2 link set veth2 up > > > ip -n test2 link set lo up > > > > > > ip -n test1 addr add 192.168.1.2/31 dev veth1 > > > ip -n test1 addr add 1.1.1.1/32 dev lo > > > ip -n test2 addr add 192.168.1.3/31 dev veth2 > > > ip -n test2 addr add 2.2.2.2/31 dev lo > > > > > > ip -n test1 route add default via 192.168.1.3 > > > ip -n test2 route add default via 192.168.1.2 > > > > > > for i in `seq 10 210`; do > > > for j in `seq 10 210`; do > > > ip netns exec test2 iptables -I INPUT -s 3.3.$i.$j -p udp --dport= 5201 > > > done > > > done > > > > > > ip netns exec test2 ethtool -K veth2 gro on > > > ip netns exec test2 bash -c 'echo 1 > /sys/class/net/veth2/threaded' > > > ip netns exec test1 ethtool -K veth1 tso off > > > > > > Then run an iperf3 client/server and a bpftrace script can trigger it= : > > > > > > ip netns exec test2 iperf3 -s -B 2.2.2.2 >/dev/null& > > > ip netns exec test1 iperf3 -c 2.2.2.2 -B 1.1.1.1 -u -l 1500 -b 3g -t = 100 >/dev/null& > > > bpftrace -e 'kfunc:__napi_poll{@=3Dcount();} interval:s:1{exit();}' > > > > > > Above reproduce for net-next kernel with following RCU and preempt > > > configuraitons: > > > > > > # RCU Subsystem > > > CONFIG_TREE_RCU=3Dy > > > CONFIG_PREEMPT_RCU=3Dy > > > # CONFIG_RCU_EXPERT is not set > > > CONFIG_SRCU=3Dy > > > CONFIG_TREE_SRCU=3Dy > > > CONFIG_TASKS_RCU_GENERIC=3Dy > > > CONFIG_TASKS_RCU=3Dy > > > CONFIG_TASKS_RUDE_RCU=3Dy > > > CONFIG_TASKS_TRACE_RCU=3Dy > > > CONFIG_RCU_STALL_COMMON=3Dy > > > CONFIG_RCU_NEED_SEGCBLIST=3Dy > > > # end of RCU Subsystem > > > # RCU Debugging > > > # CONFIG_RCU_SCALE_TEST is not set > > > # CONFIG_RCU_TORTURE_TEST is not set > > > # CONFIG_RCU_REF_SCALE_TEST is not set > > > CONFIG_RCU_CPU_STALL_TIMEOUT=3D21 > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=3D0 > > > # CONFIG_RCU_TRACE is not set > > > # CONFIG_RCU_EQS_DEBUG is not set > > > # end of RCU Debugging > > > > > > CONFIG_PREEMPT_BUILD=3Dy > > > # CONFIG_PREEMPT_NONE is not set > > > CONFIG_PREEMPT_VOLUNTARY=3Dy > > > # CONFIG_PREEMPT is not set > > > CONFIG_PREEMPT_COUNT=3Dy > > > CONFIG_PREEMPTION=3Dy > > > CONFIG_PREEMPT_DYNAMIC=3Dy > > > CONFIG_PREEMPT_RCU=3Dy > > > CONFIG_HAVE_PREEMPT_DYNAMIC=3Dy > > > CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=3Dy > > > CONFIG_PREEMPT_NOTIFIERS=3Dy > > > # CONFIG_DEBUG_PREEMPT is not set > > > # CONFIG_PREEMPT_TRACER is not set > > > # CONFIG_PREEMPTIRQ_DELAY_TEST is not set > > > > > > An interesting observation is that, while tasks RCUs are blocked, > > > related NAPI thread is still being scheduled (even across cores) > > > regularly. Looking at the gp conditions, I am inclining to cond_resch= ed > > > after each __napi_poll being the problem: cond_resched enters the > > > scheduler with PREEMPT bit, which does not account as a gp for tasks > > > RCUs. Meanwhile, since the thread has been frequently resched, the > > > normal scheduling point (no PREEMPT bit, accounted as a task RCU gp) > > > seems to have very little chance to kick in. Given the nature of "bus= y > > > polling" program, such NAPI thread won't have task->nvcsw or task->on= _rq > > > updated (other gp conditions), the result is that such NAPI thread is > > > put on RCU holdouts list for indefinitely long time. > > > > > > This is simply fixed by mirroring the ksoftirqd behavior: after > > > NAPI/softirq work, raise a RCU QS to help expedite the RCU period. No > > > more blocking afterwards for the same setup. > > > > > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop sup= port") > > > Signed-off-by: Yan Zhai > > > --- > > > net/core/dev.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index 275fd5259a4a..6e41263ff5d3 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -6773,6 +6773,10 @@ static int napi_threaded_poll(void *data) > > > net_rps_action_and_irq_enable(sd); > > > } > > > skb_defer_free_flush(sd); > > Please put a comment here stating that RCU readers cannot cross > this point. > > I need to add lockdep to rcu_softirq_qs() to catch placing this in an > RCU read-side critical section. And a header comment noting that from > an RCU perspective, it acts as a momentary enabling of preemption. > Just to clarify, do you mean I should state that this polling function can not be called from within an RCU read critical section? Or do you mean any read critical sections need to end before raising this QS? Yan > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > > > + rcu_softirq_qs(); > > > + > > > local_bh_enable(); > > > > > > if (!repoll) > > > -- > > > 2.30.2 > > > > > > > Hmm.... > > Why napi_busy_loop() does not have a similar problem ? > > > > It is unclear why rcu_all_qs() in __cond_resched() is guarded by > > > > #ifndef CONFIG_PREEMPT_RCU > > rcu_all_qs(); > > #endif > > The theory is that PREEMPT_RCU kernels have preemption, and get their > quiescent states that way. > > The more recent practice involves things like PREEMPT_DYNAMIC and maybe > soon PREEMPT_AUTO, which might require adjustments, so thank you for > pointing this out! > > Back on the patch, my main other concern is that someone somewhere might > be using something like synchronize_rcu() to wait for all in-progress > softirq handlers to complete. But I don't know of such a thing, and if > there is, there are workarounds, including synchronize_rcu_tasks(). > > So something to be aware of, not (as far as I know) something to block > this commit. > > With the added comment: > > Acked-by: Paul E. McKenney > > Thanx, Paul