Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp2001464rbb; Tue, 27 Feb 2024 07:45:55 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUJBFvcZyHSUTq/nxdMZn+IwEKuE0UXuxKE3yi0g5dXRk2albRk8bh81sF7SVZZ5Z+p5KbW4VCgaJ6rGL233/Wvo7F3q4DWQV7EJ2D0vg== X-Google-Smtp-Source: AGHT+IGV7nEXl3/rD48LK5owv13SqZ/wFaEd9pzD3n2sATEErfb+jcFpCdLDoaw8FDsmy3snQwhL X-Received: by 2002:a05:622a:246:b0:42e:77e9:7491 with SMTP id c6-20020a05622a024600b0042e77e97491mr11707615qtx.20.1709048754840; Tue, 27 Feb 2024 07:45:54 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709048754; cv=pass; d=google.com; s=arc-20160816; b=ZzrhO8ke5pYiVbZIvzkiJ7nT8O2fuM6Zb1fLCu41v3cRr1AqS8466AUEpWRnwJRT40 IwmxyJWlO7W+VtB4NcmG8VMNFD/iLV8RtQWTugl95mvvBBhQFlaIRRH+GbzoPlp0GImQ cgbjpKjPkLYkuWpu8b4HRV2Pe8w+PSSCt/GBH6bX76SWqibEnKZ1+bN4G4B0gp5ugCqw BBWhA3oRHhtrascOv8xcZBHtPmwVqoIVCBFXaB6ozMbDKXyPC5UsHPC7D8A528AQ1pdS 5rxbqoPeaEi2ITAVmsAG1W614occvHueeSG9YzQTSzftagDvSbzjsFdLObxJYpRbh0Ys I8zQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:message-id:subject:cc:to:from:date :dkim-signature; bh=fvLPNpXdLGD+M5a3hZKAFIZfEPkaIJ/ji6J8mxzWC4I=; fh=LtSLhi3QgbRwK3rFOYnD5USf49XJ0IoP/hJsoADq88Y=; b=DyDjBmoflvK64c4H4qus3m5LHnGVOFRXrfB37Zp/Kgx0Y1qIomFGta9m8DXuimDx+7 ukZ1qupMkJgbO5TfTrPOOf3Vnt49BAgPpr+AiMdy6ewbL2tSaI1zPJvvACYOiRGouZsG Sg7j1AP4zDsC6NAsOplm2n94MA/FcjpL4Mk8woA4L5C2ERkjb39BvusPhIzhYil/0vNP +OODoGvpYVr+PbYIauttHGXOC3myFjQhUdzBsbxJ3nB6N6Uc78bndWv0UknrV8bmmGW2 P2fjL6o1ayxoYUfEl/gbYJol+BBbooP+jiK9ElVD0excvVZp3TR9hdklKgZGea3cbMet 2zXA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google09082023 header.b=QyaDeuTY; arc=pass (i=1 spf=pass spfdomain=cloudflare.com dkim=pass dkdomain=cloudflare.com dmarc=pass fromdomain=cloudflare.com); spf=pass (google.com: domain of linux-kernel+bounces-83528-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83528-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id f6-20020ac85d06000000b0042e606d45fcsi8158039qtx.758.2024.02.27.07.45.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 07:45:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83528-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google09082023 header.b=QyaDeuTY; arc=pass (i=1 spf=pass spfdomain=cloudflare.com dkim=pass dkdomain=cloudflare.com dmarc=pass fromdomain=cloudflare.com); spf=pass (google.com: domain of linux-kernel+bounces-83528-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83528-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id AAC131C21FAE for ; Tue, 27 Feb 2024 15:45:20 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CC77F1487D0; Tue, 27 Feb 2024 15:44:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b="QyaDeuTY" Received: from mail-yb1-f175.google.com (mail-yb1-f175.google.com [209.85.219.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C24741474A9 for ; Tue, 27 Feb 2024 15:44:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709048676; cv=none; b=pP5gz01JOznaG6u+GjrHr6eWT9haksXuTqYsBBXNZfpJj1LcJcHPIsv0EsnBp4aqqmK++j5THSt/1HWGKPvUbimQFoy170nI7QktH1lY+9Nh8ddE2gSH0wWGAVABIYmw+/Ac6hNZTT5mMJgaYnXK4NROnTK1dpsRKlQjlLwnNKY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709048676; c=relaxed/simple; bh=G9yIq1UUe7Jy5IKgqipKCB37DVtAPsay/xE5VGy7JGQ=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition; b=glvifuZlW1JkHW+hjLexMkjU4IHR5g69ZOtW4h4uNRx9dNI95zl8QCBTfwxP7wtUPb8hWSa0rm9LJ5nRVjknGeTWuHSFmh7HUHMmJvoKMtleu2ikc3adpvmxg7aQ7hEJIJH1RJ9AgKAGQ/Z/tL+cKeqVqn5xRcU6huLH48ofD1o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com; spf=pass smtp.mailfrom=cloudflare.com; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b=QyaDeuTY; arc=none smtp.client-ip=209.85.219.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cloudflare.com Received: by mail-yb1-f175.google.com with SMTP id 3f1490d57ef6-dc6d8bd612dso4444018276.1 for ; Tue, 27 Feb 2024 07:44:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1709048673; x=1709653473; darn=vger.kernel.org; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=fvLPNpXdLGD+M5a3hZKAFIZfEPkaIJ/ji6J8mxzWC4I=; b=QyaDeuTYW/zCYpeqBDUNHYJTiVWsmtDI98B4keln5MbOuSLo5Xt76iWzrnHW7EWA+U vCYqfhdw+wEtEm3Zb3PCyonEXbfqiyArKvmsDdqWOEv+02S0ajmB2mswPZ4NtpJtn6MZ ip5s056V8p2hjJ+dgDP4hcWmZPTtR713NKa7lbD+NCW/nfHRjMotqtbpX3t31ghTl8Gw zs2Mmblbw4AY72XCTqV7Wk0VU2Nm3Th499miW1b0+vQsPj9lrgkyMQqh9iF3jwUW1q7g PkgCfHDvJ8hNq9V9ZpYVVycClX5Wt/3ec+a6+PbSF3szZugdRLSFxeT4+wwMa3hsoi+q HuYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709048673; x=1709653473; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fvLPNpXdLGD+M5a3hZKAFIZfEPkaIJ/ji6J8mxzWC4I=; b=queceXfrDFS+EiE05AvZYUzvXi0EF4Fvu1H5cll4w9T8zTCJB8+RJZRzzVUixCNdag GF3RekHajoQP6Tp4pJ1FGMgTq41SiX0V8uQfgvcEQx8RVMIsNQNLyx0Cuy10m+VEAlNW Sa+GRerMDE+HZzZh2DuakAgn+mWowlq2yTcZk8URTVFwgHGVRAErP3NDqkc/F4As1pUj 52qct79phQW7yecjOlP6/opybi2T11InCY2KT6dhyuLKij2nT/imVFNqi9AhCL92LGng 6qxJoV2g6UPexMekV5HuRkvwTX8WET5IoFIKT+oYFP2ivzSSW9TK/qgE6U3E1vLeuw8e akzQ== X-Forwarded-Encrypted: i=1; AJvYcCVAoAwcXQ6M06zqr0+iOb+FM+HRWaAncM139R7MUvh6iBVuzHRIQ33giLnJUj5gIHzynDfaQT9a7ovDZ0HjnVc7QQdzodDUkIJZ2qcw X-Gm-Message-State: AOJu0YzyZg2SJTX+WW773Rxf0eD/HM5Qm51G9X7Mr8ghLrD8r9VePyK6 pYBwwGAMfj+CJPOPJ+2vRp3yk2DBH3FmhAAcMI3eDpTQu0Ai+CxZ7QTazuDF/nA= X-Received: by 2002:a25:13c4:0:b0:dcd:1854:9f43 with SMTP id 187-20020a2513c4000000b00dcd18549f43mr2079117ybt.3.1709048672424; Tue, 27 Feb 2024 07:44:32 -0800 (PST) Received: from debian.debian ([2a09:bac5:7a49:f91::18d:13]) by smtp.gmail.com with ESMTPSA id k25-20020ac86059000000b0042e8a53d216sm1796307qtm.86.2024.02.27.07.44.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 07:44:31 -0800 (PST) Date: Tue, 27 Feb 2024 07:44:29 -0800 From: Yan Zhai To: netdev@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jiri Pirko , Simon Horman , Daniel Borkmann , Lorenzo Bianconi , Coco Li , Wei Wang , Alexander Duyck , Hannes Frederic Sowa , linux-kernel@vger.kernel.org, rcu@vger.kernel.org, bpf@vger.kernel.org, kernel-team@cloudflare.com Subject: [PATCH] net: raise RCU qs after each threaded NAPI poll Message-ID: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline We noticed task RCUs being blocked when threaded NAPIs are very busy in production: detaching any BPF tracing programs, i.e. removing a ftrace trampoline, will simply block for very long in rcu_tasks_wait_gp. This ranges from hundreds of seconds to even an hour, severely harming any observability tools that rely on BPF tracing programs. It can be easily reproduced locally with following setup: ip netns add test1 ip netns add test2 ip -n test1 link add veth1 type veth peer name veth2 netns test2 ip -n test1 link set veth1 up ip -n test1 link set lo up ip -n test2 link set veth2 up ip -n test2 link set lo up ip -n test1 addr add 192.168.1.2/31 dev veth1 ip -n test1 addr add 1.1.1.1/32 dev lo ip -n test2 addr add 192.168.1.3/31 dev veth2 ip -n test2 addr add 2.2.2.2/31 dev lo ip -n test1 route add default via 192.168.1.3 ip -n test2 route add default via 192.168.1.2 for i in `seq 10 210`; do for j in `seq 10 210`; do ip netns exec test2 iptables -I INPUT -s 3.3.$i.$j -p udp --dport 5201 done done ip netns exec test2 ethtool -K veth2 gro on ip netns exec test2 bash -c 'echo 1 > /sys/class/net/veth2/threaded' ip netns exec test1 ethtool -K veth1 tso off Then run an iperf3 client/server and a bpftrace script can trigger it: ip netns exec test2 iperf3 -s -B 2.2.2.2 >/dev/null& ip netns exec test1 iperf3 -c 2.2.2.2 -B 1.1.1.1 -u -l 1500 -b 3g -t 100 >/dev/null& bpftrace -e 'kfunc:__napi_poll{@=count();} interval:s:1{exit();}' Above reproduce for net-next kernel with following RCU and preempt configuraitons: # RCU Subsystem CONFIG_TREE_RCU=y CONFIG_PREEMPT_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_TASKS_RCU_GENERIC=y CONFIG_TASKS_RCU=y CONFIG_TASKS_RUDE_RCU=y CONFIG_TASKS_TRACE_RCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y # end of RCU Subsystem # RCU Debugging # CONFIG_RCU_SCALE_TEST is not set # CONFIG_RCU_TORTURE_TEST is not set # CONFIG_RCU_REF_SCALE_TEST is not set CONFIG_RCU_CPU_STALL_TIMEOUT=21 CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0 # CONFIG_RCU_TRACE is not set # CONFIG_RCU_EQS_DEBUG is not set # end of RCU Debugging CONFIG_PREEMPT_BUILD=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_COUNT=y CONFIG_PREEMPTION=y CONFIG_PREEMPT_DYNAMIC=y CONFIG_PREEMPT_RCU=y CONFIG_HAVE_PREEMPT_DYNAMIC=y CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y CONFIG_PREEMPT_NOTIFIERS=y # CONFIG_DEBUG_PREEMPT is not set # CONFIG_PREEMPT_TRACER is not set # CONFIG_PREEMPTIRQ_DELAY_TEST is not set An interesting observation is that, while tasks RCUs are blocked, related NAPI thread is still being scheduled (even across cores) regularly. Looking at the gp conditions, I am inclining to cond_resched after each __napi_poll being the problem: cond_resched enters the scheduler with PREEMPT bit, which does not account as a gp for tasks RCUs. Meanwhile, since the thread has been frequently resched, the normal scheduling point (no PREEMPT bit, accounted as a task RCU gp) seems to have very little chance to kick in. Given the nature of "busy polling" program, such NAPI thread won't have task->nvcsw or task->on_rq updated (other gp conditions), the result is that such NAPI thread is put on RCU holdouts list for indefinitely long time. This is simply fixed by mirroring the ksoftirqd behavior: after NAPI/softirq work, raise a RCU QS to help expedite the RCU period. No more blocking afterwards for the same setup. Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") Signed-off-by: Yan Zhai --- net/core/dev.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 275fd5259a4a..6e41263ff5d3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6773,6 +6773,10 @@ static int napi_threaded_poll(void *data) net_rps_action_and_irq_enable(sd); } skb_defer_free_flush(sd); + + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + rcu_softirq_qs(); + local_bh_enable(); if (!repoll) -- 2.30.2