Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1027718ybl; Wed, 4 Dec 2019 15:26:29 -0800 (PST) X-Google-Smtp-Source: APXvYqysURcygQKXuJSPHXdo4cDAAd92k0U14n3aJeqchQXgddK7DBqy/uYhS9Xn8yiFLZN0uj2z X-Received: by 2002:a05:6830:2157:: with SMTP id r23mr4665090otd.143.1575501989169; Wed, 04 Dec 2019 15:26:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575501989; cv=none; d=google.com; s=arc-20160816; b=uDvPi3a/UTkJYFSkMCHBhcDx16dyIII0sSghM5f+7NHTbMNN5ywp1lomYKcYwiaLOH L5cvydkUnDSzA+IDt3mry6o2GHXQHxBn1ZPJXd8VDRgVv4PglINebeGCpDrpSinqa1MF qGtcNgbvZEwplMghDZFCKSXTQco1qJ1p51WA98fAF1WBzmE7AiMHeB8uIuFSXzwUDcv1 rRWHfSnZ3syNu+AWb4TXrkLvcGzOnXjRI+bE6chafjOogRVXOhYWvBzm0DkGYjHWA46T FR1cqXFoP64IRsP87iSZAOKrruxWA311qZAhdt59ynBoGhFAw1lY8QdhZYeSkTgSf+D6 gT7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=n5eqlm6n5ud9Vang7tfSmeF4l/yXqTiKujCCrJuWa+0=; b=kyykhFHVWWNVMhYDsMAa41f93pP0UXObW1G+uK/AVqXNv6iM+7ZvFE3/Pu8WDcRyrL L4W/nhEmLpb4Njc4nSuqBsMGTU2TFxQaxHOWns1DjoaFUqfwJcyywVRz02LchoP0iDgr s59SztM4/hqGGwGvCrHJ1wbb3g2Cf3lmFkI36Pz8RsV9rPT15Ei0ForfY1VS8zInuooP D/KEcJDcUlq9sUS29s+WDz22M8lngTjvIPQRx7qf6U4JDCM2dgizv3tGOV1RtQLPJKWo mm4ixiC7A8Wqn26UdXnZLzA0qtZ5oWHrSO1bXXy7J19Yc10AcFduUxrHzFPQGSWz5MiZ 0m0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=fhqDWyZ8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m4si4011415otn.281.2019.12.04.15.26.16; Wed, 04 Dec 2019 15:26:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=fhqDWyZ8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728121AbfLDXZq (ORCPT + 99 others); Wed, 4 Dec 2019 18:25:46 -0500 Received: from mail.kernel.org ([198.145.29.99]:58028 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727116AbfLDXZq (ORCPT ); Wed, 4 Dec 2019 18:25:46 -0500 Received: from paulmck-ThinkPad-P72.home (50-39-105-78.bvtn.or.frontiernet.net [50.39.105.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 644A6206DF; Wed, 4 Dec 2019 23:25:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1575501944; bh=h9iE1L4oSigLlIJOvTFavcHoI+dJQyu6qEDxeGSyg6A=; h=Date:From:To:Cc:Subject:Reply-To:From; b=fhqDWyZ8aDrj9EK2orxvp+WjgfLj27MhkHCIe28T7kl7FGfTvyky2+8ByDVtLDy2s Tiu6f752p4jUS3xYeMAIRKNMG2CPEkyYLuyrASUNXIXxAKDR/rRLhThvcPqM1Bo8XD fKK/ZP/czxaERj7n+pNnhUQC2dcbtohwtAmrzHck= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id 2DA8135227BE; Wed, 4 Dec 2019 15:25:44 -0800 (PST) Date: Wed, 4 Dec 2019 15:25:44 -0800 From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, joel@joelfernandes.org Subject: [PATCH tip/core/rcu] Enable tick for nohz_full CPUs not responding to expedited GP Message-ID: <20191204232544.GA17061@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org rcu: Enable tick for nohz_full CPUs slow to provide expedited QS An expedited grace period can be stalled by a nohz_full CPU looping in kernel context. This possibility is currently handled by some carefully crafted checks in rcu_read_unlock_special() that enlist help from ksoftirqd when permitted by the scheduler. However, it is exactly these checks that require the scheduler avoid holding any of its rq or pi locks across rcu_read_unlock() without also having held them across the entire RCU read-side critical section. It would therefore be very nice if expedited grace periods could handle nohz_full CPUs looping in kernel context without such checks. This commit therefore adds code to the expedited grace period's wait and cleanup code that forces the scheduler-clock interrupt on for CPUs that fail to quickly supply a quiescent state. "Quickly" is currently a hard-coded single-jiffy delay. Signed-off-by: Paul E. McKenney --- This is needed to avoid rcu_read_unlock_special() needing to enter the scheduler for the benefit of expedited grace periods on nohz_full CPUs, thus enabling more of Lai Jiangshan's patchset. diff --git a/include/linux/tick.h b/include/linux/tick.h index 4ed788c..72a2a21 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -109,9 +109,10 @@ enum tick_dep_bits { TICK_DEP_BIT_PERF_EVENTS = 1, TICK_DEP_BIT_SCHED = 2, TICK_DEP_BIT_CLOCK_UNSTABLE = 3, - TICK_DEP_BIT_RCU = 4 + TICK_DEP_BIT_RCU = 4, + TICK_DEP_BIT_RCU_EXP = 5 }; -#define TICK_DEP_BIT_MAX TICK_DEP_BIT_RCU +#define TICK_DEP_BIT_MAX TICK_DEP_BIT_RCU_EXP #define TICK_DEP_MASK_NONE 0 #define TICK_DEP_MASK_POSIX_TIMER (1 << TICK_DEP_BIT_POSIX_TIMER) @@ -119,6 +120,7 @@ enum tick_dep_bits { #define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED) #define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE) #define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU) +#define TICK_DEP_MASK_RCU_EXP (1 << TICK_DEP_BIT_RCU_EXP) #ifdef CONFIG_NO_HZ_COMMON extern bool tick_nohz_enabled; diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 634c1db..0c87e4c 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -181,6 +181,7 @@ struct rcu_data { bool rcu_need_heavy_qs; /* GP old, so heavy quiescent state! */ bool rcu_urgent_qs; /* GP old need light quiescent state. */ bool rcu_forced_tick; /* Forced tick to provide QS. */ + bool rcu_forced_tick_exp; /* ... provide QS to expedited GP. */ #ifdef CONFIG_RCU_FAST_NO_HZ unsigned long last_accelerate; /* Last jiffy CBs were accelerated. */ unsigned long last_advance_all; /* Last jiffy CBs were all advanced. */ diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 726ba20..6935a9e 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -230,7 +230,9 @@ static void __maybe_unused rcu_report_exp_rnp(struct rcu_node *rnp, bool wake) static void rcu_report_exp_cpu_mult(struct rcu_node *rnp, unsigned long mask, bool wake) { + int cpu; unsigned long flags; + struct rcu_data *rdp; raw_spin_lock_irqsave_rcu_node(rnp, flags); if (!(rnp->expmask & mask)) { @@ -238,6 +240,13 @@ static void rcu_report_exp_cpu_mult(struct rcu_node *rnp, return; } WRITE_ONCE(rnp->expmask, rnp->expmask & ~mask); + for_each_leaf_node_cpu_mask(rnp, cpu, mask) { + rdp = per_cpu_ptr(&rcu_data, cpu); + if (!IS_ENABLED(CONFIG_NO_HZ_FULL) || !rdp->rcu_forced_tick_exp) + continue; + rdp->rcu_forced_tick_exp = false; + tick_dep_clear_cpu(cpu, TICK_DEP_BIT_RCU_EXP); + } __rcu_report_exp_rnp(rnp, wake, flags); /* Releases rnp->lock. */ } @@ -450,6 +459,26 @@ static void sync_rcu_exp_select_cpus(void) } /* + * Wait for the expedited grace period to elapse, within time limit. + * If the time limit is exceeded without the grace period elapsing, + * return false, otherwise return true. + */ +static bool synchronize_rcu_expedited_wait_once(long tlimit) +{ + int t; + struct rcu_node *rnp_root = rcu_get_root(); + + t = swait_event_timeout_exclusive(rcu_state.expedited_wq, + sync_rcu_exp_done_unlocked(rnp_root), + tlimit); + // Workqueues should not be signaled. + if (t > 0 || sync_rcu_exp_done_unlocked(rnp_root)) + return true; + WARN_ON(t < 0); /* workqueues should not be signaled. */ + return false; +} + +/* * Wait for the expedited grace period to elapse, issuing any needed * RCU CPU stall warnings along the way. */ @@ -460,22 +489,31 @@ static void synchronize_rcu_expedited_wait(void) unsigned long jiffies_start; unsigned long mask; int ndetected; + struct rcu_data *rdp; struct rcu_node *rnp; struct rcu_node *rnp_root = rcu_get_root(); - int ret; trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait")); jiffies_stall = rcu_jiffies_till_stall_check(); jiffies_start = jiffies; + if (IS_ENABLED(CONFIG_NO_HZ_FULL)) { + if (synchronize_rcu_expedited_wait_once(1)) + return; + rcu_for_each_leaf_node(rnp) { + for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) { + rdp = per_cpu_ptr(&rcu_data, cpu); + if (rdp->rcu_forced_tick_exp) + continue; + rdp->rcu_forced_tick_exp = true; + tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP); + } + } + WARN_ON_ONCE(1); + } for (;;) { - ret = swait_event_timeout_exclusive( - rcu_state.expedited_wq, - sync_rcu_exp_done_unlocked(rnp_root), - jiffies_stall); - if (ret > 0 || sync_rcu_exp_done_unlocked(rnp_root)) + if (synchronize_rcu_expedited_wait_once(jiffies_stall)) return; - WARN_ON(ret < 0); /* workqueues should not be signaled. */ if (rcu_cpu_stall_suppress) continue; panic_on_rcu_stall();