Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp6808493imm; Wed, 27 Jun 2018 13:52:29 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJwmOHwTirZYDgiw3D1L82Ls3bJk2JUNRiEnkAf21SfKkahKxDnEz/dD6wzfOXyaX1/rY+S X-Received: by 2002:a65:5284:: with SMTP id y4-v6mr6311504pgp.297.1530132749398; Wed, 27 Jun 2018 13:52:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530132749; cv=none; d=google.com; s=arc-20160816; b=rIFqoQCDC56j0/NaktiNqZHYy8kQGol5QFGsTnKYF2I/+3T1n9thGbv1sl/v4N/O89 JmgDil3TDIxst+ejwzwi9N1kTHVwKxyVKUbDOqizS+pTW1XlhgNY5qeltqBI7Ivh4Wim d/jxLBB9M9cOwkUBqWUqB3v3jD+lLOZhMg0o1KYP4DERrgMF1i7jTzB3lcRwFs7uzQSm wM+fNT1FhcHhwaDqgA0uytkCRu9H+hR5ZowSpPCY9c8Pe3rs2qeX9bIviUlHQO2TgGLv mJJYsNjHKyrzIpcrkGhG22vDVJaRlIhhL00GaFswTukMy4ZvgVN/2XmoYKiTciYr7m7O MibQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:references:in-reply-to:date :subject:cc:to:from:arc-authentication-results; bh=IXiz7KrQ6DqhIHvwsUYdhwEuoBS4P1ZSFziDprjM7cY=; b=I+G4aUL5QPBKKDpaiaQsYUmepkV4FZb36B7lEnJ+7GHncChdGnfirNZIN16SL62P9D A1hsvV5QPQjNOSdsoaXp8SWg5m2zvni6gqu3LeohGkrcKgEO90ezo/o9AyG/D0ovB14U Ys2FXcvp5V/YyWnwMogJWQv8ccixx67SqSnrbdCJ8eUOFNmWwfY2C7pjeoR+FE0hm6U+ pAcJGDyJzP/r5YK/rBWCCxv7THdoUWJYjNCrnQoWdwhex7zxOWYJvnCVcxlMlvb0kU3z Y4WPjua4RRYUkU2Gts4aaI54pWW4Hj4qvAXWyE7x1iMdHm4kclWVuhnecdN9nOz7nk8Y p/dA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 11-v6si4466652plb.195.2018.06.27.13.52.14; Wed, 27 Jun 2018 13:52:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966041AbeF0UrY (ORCPT + 99 others); Wed, 27 Jun 2018 16:47:24 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:58572 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S933133AbeF0UrU (ORCPT ); Wed, 27 Jun 2018 16:47:20 -0400 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w5RKfMbe104964 for ; Wed, 27 Jun 2018 16:47:20 -0400 Received: from e11.ny.us.ibm.com (e11.ny.us.ibm.com [129.33.205.201]) by mx0b-001b2d01.pphosted.com with ESMTP id 2jvd21kgmd-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 27 Jun 2018 16:47:19 -0400 Received: from localhost by e11.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 27 Jun 2018 16:47:19 -0400 Received: from b01cxnp22035.gho.pok.ibm.com (9.57.198.25) by e11.ny.us.ibm.com (146.89.104.198) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 27 Jun 2018 16:47:14 -0400 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w5RKlDGu5898876 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 27 Jun 2018 20:47:13 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B5ED3B205F; Wed, 27 Jun 2018 16:47:05 -0400 (EDT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 73A79B2066; Wed, 27 Jun 2018 16:47:05 -0400 (EDT) Received: from paulmck-ThinkPad-W541 (unknown [9.70.82.159]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 27 Jun 2018 16:47:05 -0400 (EDT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id 9239616C5F16; Wed, 27 Jun 2018 13:49:20 -0700 (PDT) From: "Paul E. McKenney" To: linux-kernel@vger.kernel.org Cc: mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, joel@joelfernandes.org, "Paul E. McKenney" Subject: [PATCH RFC tip/core/rcu 1/2] rcu: Defer reporting RCU-preempt quiescent states when disabled Date: Wed, 27 Jun 2018 13:49:14 -0700 X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627204835.GA25456@linux.vnet.ibm.com> References: <20180627204835.GA25456@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18062720-2213-0000-0000-000002C0E325 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00009265; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000266; SDB=6.01053221; UDB=6.00540000; IPR=6.00831146; MB=3.00021889; MTD=3.00000008; XFM=3.00000015; UTC=2018-06-27 20:47:18 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18062720-2214-0000-0000-00005AA13AA2 Message-Id: <20180627204915.27253-1-paulmck@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-06-27_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1806270218 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This commit defers reporting of RCU-preempt quiescent states at rcu_read_unlock_special() time when any of interrupts, softirq, or preemption are disabled. These deferred quiescent states are reported at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline operation. Of course, if another RCU read-side critical section has started in the meantime, the reporting of the quiescent state will be further deferred. This also means that disabling preemption, interrupts, and/or softirqs will act as an RCU-preempt read-side critical section. This is enforced by checking preempt_count() as needed. Some special cases must be handled on an ad-hoc basis, for example, context switch is a quiescent state even though both the scheduler and do_exit() disable preemption. In these cases, additional calls to rcu_preempt_deferred_qs() override the preemption disabling. Similar logic overrides disabled interrupts in rcu_preempt_check_callbacks() because in this case the quiescent state happened just before the corresponding scheduling-clock interrupt. This change lifts a long-standing restriction that required that if interrupts were disabled across a call to rcu_read_unlock() that the matching rcu_read_lock() also be contained within that interrupts-disabled region of code. Because the reporting of the corresponding RCU-preempt quiescent state is now deferred until after interrupts have been enabled, it is no longer possible for this situation to result in deadlocks involving the scheduler's runqueue and priority-inheritance locks. This may allow some code simplification that might reduce interrupt latency a bit. Unfortunately, this would also defer deboosting a low-priority task that had been subjected to RCU priority boosting, so real-time-response considerations might well force this restriction to remain in place. Because RCU-preempt grace periods are now blocked not only by RCU read-side critical sections, but also by disabling of interrupts, preemption, and softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels. This may require some additional plumbing to provide the network denial-of-service guarantees that have been traditionally provided by RCU-bh. Once these are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched. This would mean that all kernels would have but one flavor of RCU, which would open the door to significant code cleanup. Moving to a single flavor of RCU would also have the beneficial effect of reducing the NOCB kthreads by at least a factor of two. Signed-off-by: Paul E. McKenney --- .../RCU/Design/Requirements/Requirements.html | 50 +++---- include/linux/rcutiny.h | 5 + kernel/rcu/tree.c | 9 ++ kernel/rcu/tree.h | 3 + kernel/rcu/tree_exp.h | 71 +++++++-- kernel/rcu/tree_plugin.h | 138 +++++++++++++----- 6 files changed, 199 insertions(+), 77 deletions(-) diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 51f39f65002d..f1ead98e871a 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -2405,30 +2405,9 @@ when invoked from a CPU-hotplug notifier.

RCU depends on the scheduler, and the scheduler uses RCU to protect some of its data structures. -This means the scheduler is forbidden from acquiring -the runqueue locks and the priority-inheritance locks -in the middle of an outermost RCU read-side critical section unless either -(1) it releases them before exiting that same -RCU read-side critical section, or -(2) interrupts are disabled across -that entire RCU read-side critical section. -This same prohibition also applies (recursively!) to any lock that is acquired -while holding any lock to which this prohibition applies. -Adhering to this rule prevents preemptible RCU from invoking -rcu_read_unlock_special() while either runqueue or -priority-inheritance locks are held, thus avoiding deadlock. - -

-Prior to v4.4, it was only necessary to disable preemption across -RCU read-side critical sections that acquired scheduler locks. -In v4.4, expedited grace periods started using IPIs, and these -IPIs could force a rcu_read_unlock() to take the slowpath. -Therefore, this expedited-grace-period change required disabling of -interrupts, not just preemption. - -

-For RCU's part, the preemptible-RCU rcu_read_unlock() -implementation must be written carefully to avoid similar deadlocks. +The preemptible-RCU rcu_read_unlock() +implementation must therefore be written carefully to avoid deadlocks +involving the scheduler's runqueue and priority-inheritance locks. In particular, rcu_read_unlock() must tolerate an interrupt where the interrupt handler invokes both rcu_read_lock() and rcu_read_unlock(). @@ -2437,7 +2416,7 @@ negative nesting levels to avoid destructive recursion via interrupt handler's use of RCU.

-This pair of mutual scheduler-RCU requirements came as a +This scheduler-RCU requirement came as a complete surprise.

@@ -2448,9 +2427,28 @@ when running context-switch-heavy workloads when built with CONFIG_NO_HZ_FULL=y did come as a surprise [PDF]. RCU has made good progress towards meeting this requirement, even -for context-switch-have CONFIG_NO_HZ_FULL=y workloads, +for context-switch-heavy CONFIG_NO_HZ_FULL=y workloads, but there is room for further improvement. +

+In the past, it was forbidden to disable interrupts across an +rcu_read_unlock() unless that interrupt-disabled region +of code also included the matching rcu_read_lock(). +Violating this restriction could result in deadlocks involving the +scheduler's runqueue and priority-inheritance spinlocks. +This restriction was lifted when interrupt-disabled calls to +rcu_read_unlock() started deferring the reporting of +the resulting RCU-preempt quiescent state until the end of that +interrupts-disabled region. +This deferred reporting means that the scheduler's runqueue and +priority-inheritance locks cannot be held while reporting an RCU-preempt +quiescent state, which lifts the earlier restriction, at least from +a deadlock perspective. +Unfortunately, real-time systems using RCU priority boosting may +need this restriction to remain in effect because deferred +quiescent-state reporting also defers deboosting, which in turn +degrades real-time latencies. +

Tracing and RCU

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 8d9a0ea8f0b5..f617ab19bb51 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -115,6 +115,11 @@ static inline void rcu_irq_exit_irqson(void) { } static inline void rcu_irq_enter_irqson(void) { } static inline void rcu_irq_exit(void) { } static inline void exit_rcu(void) { } +static inline bool rcu_preempt_need_deferred_qs(struct task_struct *t) +{ + return false; +} +static inline void rcu_preempt_deferred_qs(struct task_struct *t) { } #ifdef CONFIG_SRCU void rcu_scheduler_starting(void); #else /* #ifndef CONFIG_SRCU */ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 5fbefb341b6f..6c5a7f0daadc 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -422,6 +422,7 @@ static void rcu_momentary_dyntick_idle(void) special = atomic_add_return(2 * RCU_DYNTICK_CTRL_CTR, &rdtp->dynticks); /* It is illegal to call this from idle state. */ WARN_ON_ONCE(!(special & RCU_DYNTICK_CTRL_CTR)); + rcu_preempt_deferred_qs(current); } /* @@ -732,6 +733,7 @@ static void rcu_eqs_enter(bool user) WRITE_ONCE(rdtp->dynticks_nesting, 0); /* Avoid irq-access tearing. */ rcu_dynticks_eqs_enter(); rcu_dynticks_task_enter(); + rcu_preempt_deferred_qs(current); } /** @@ -2847,6 +2849,12 @@ __rcu_process_callbacks(struct rcu_state *rsp) WARN_ON_ONCE(!rdp->beenonline); + /* Report any deferred quiescent states if preemption enabled. */ + if (!(preempt_count() & PREEMPT_MASK)) + rcu_preempt_deferred_qs(current); + else if (rcu_preempt_need_deferred_qs(current)) + resched_cpu(rdp->cpu); /* Provoke future context switch. */ + /* Update RCU state based on any recent quiescent states. */ rcu_check_quiescent_state(rsp, rdp); @@ -3820,6 +3828,7 @@ void rcu_report_dead(unsigned int cpu) rcu_report_exp_rdp(&rcu_sched_state, this_cpu_ptr(rcu_sched_state.rda), true); preempt_enable(); + rcu_preempt_deferred_qs(current); for_each_rcu_flavor(rsp) rcu_cleanup_dying_idle_cpu(cpu, rsp); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 4e74df768c57..025bd2e5592b 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -195,6 +195,7 @@ struct rcu_data { bool core_needs_qs; /* Core waits for quiesc state. */ bool beenonline; /* CPU online at least once. */ bool gpwrap; /* Possible ->gp_seq wrap. */ + bool deferred_qs; /* This CPU awaiting a deferred QS? */ struct rcu_node *mynode; /* This CPU's leaf of hierarchy */ unsigned long grpmask; /* Mask to apply to leaf qsmask. */ unsigned long ticks_this_gp; /* The number of scheduling-clock */ @@ -461,6 +462,8 @@ static void rcu_cleanup_after_idle(void); static void rcu_prepare_for_idle(void); static void rcu_idle_count_callbacks_posted(void); static bool rcu_preempt_has_tasks(struct rcu_node *rnp); +static bool rcu_preempt_need_deferred_qs(struct task_struct *t); +static void rcu_preempt_deferred_qs(struct task_struct *t); static void print_cpu_stall_info_begin(void); static void print_cpu_stall_info(struct rcu_state *rsp, int cpu); static void print_cpu_stall_info_end(void); diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index b3df3b770afb..7602a94d3ebf 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -262,6 +262,7 @@ static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp, static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp, bool wake) { + WRITE_ONCE(rdp->deferred_qs, false); rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, wake); } @@ -735,32 +736,70 @@ EXPORT_SYMBOL_GPL(synchronize_sched_expedited); */ static void sync_rcu_exp_handler(void *info) { - struct rcu_data *rdp; + unsigned long flags; struct rcu_state *rsp = info; + struct rcu_data *rdp = this_cpu_ptr(rsp->rda); + struct rcu_node *rnp = rdp->mynode; struct task_struct *t = current; /* - * Within an RCU read-side critical section, request that the next - * rcu_read_unlock() report. Unless this RCU read-side critical - * section has already blocked, in which case it is already set - * up for the expedited grace period to wait on it. + * First, the common case of not being in an RCU read-side + * critical section. If also enabled or idle, immediately + * report the quiescent state, otherwise defer. */ - if (t->rcu_read_lock_nesting > 0 && - !t->rcu_read_unlock_special.b.blocked) { - t->rcu_read_unlock_special.b.exp_need_qs = true; + if (!t->rcu_read_lock_nesting) { + if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) || + rcu_dynticks_curr_cpu_in_eqs()) { + rcu_report_exp_rdp(rsp, rdp, true); + } else { + rdp->deferred_qs = true; + resched_cpu(rdp->cpu); + } return; } /* - * We are either exiting an RCU read-side critical section (negative - * values of t->rcu_read_lock_nesting) or are not in one at all - * (zero value of t->rcu_read_lock_nesting). Or we are in an RCU - * read-side critical section that blocked before this expedited - * grace period started. Either way, we can immediately report - * the quiescent state. + * Second, the less-common case of being in an RCU read-side + * critical section. In this case we can count on a future + * rcu_read_unlock(). However, this rcu_read_unlock() might + * execute on some other CPU, but in that case there will be + * a future context switch. Either way, if the expedited + * grace period is still waiting on this CPU, set ->deferred_qs + * so that the eventual quiescent state will be reported. + * Note that there is a large group of race conditions that + * can have caused this quiescent state to already have been + * reported, so we really do need to check ->expmask. */ - rdp = this_cpu_ptr(rsp->rda); - rcu_report_exp_rdp(rsp, rdp, true); + if (t->rcu_read_lock_nesting > 0) { + raw_spin_lock_irqsave_rcu_node(rnp, flags); + if (rnp->expmask & rdp->grpmask) + rdp->deferred_qs = true; + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); + } + + /* + * The final and least likely case is where the interrupted + * code was just about to or just finished exiting the RCU-preempt + * read-side critical section, and no, we can't tell which. + * So either way, set ->deferred_qs to flag later code that + * a quiescent state is required. + * + * If the CPU is fully enabled (or if some buggy RCU-preempt + * read-side critical section is being used from idle), just + * invoke rcu_preempt_defer_qs() to immediately report the + * quiescent state. We cannot use rcu_read_unlock_special() + * because we are in an interrupt handler, which will cause that + * function to take an early exit without doing anything. + * + * Otherwise, use resched_cpu() to force a context switch after + * the CPU enables everything. + */ + rdp->deferred_qs = true; + if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) || + WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs())) + rcu_preempt_deferred_qs(t); + else + resched_cpu(rdp->cpu); } /** diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index c1b17f5b9361..ff5c70eae47d 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -371,6 +371,9 @@ static void rcu_preempt_note_context_switch(bool preempt) * behalf of preempted instance of __rcu_read_unlock(). */ rcu_read_unlock_special(t); + rcu_preempt_deferred_qs(t); + } else { + rcu_preempt_deferred_qs(t); } /* @@ -464,54 +467,51 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp) } /* - * Handle special cases during rcu_read_unlock(), such as needing to - * notify RCU core processing or task having blocked during the RCU - * read-side critical section. + * Report deferred quiescent states. The deferral time can + * be quite short, for example, in the case of the call from + * rcu_read_unlock_special(). */ -static void rcu_read_unlock_special(struct task_struct *t) +static void +rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) { bool empty_exp; bool empty_norm; bool empty_exp_now; - unsigned long flags; struct list_head *np; bool drop_boost_mutex = false; struct rcu_data *rdp; struct rcu_node *rnp; union rcu_special special; - /* NMI handlers cannot block and cannot safely manipulate state. */ - if (in_nmi()) - return; - - local_irq_save(flags); - /* * If RCU core is waiting for this CPU to exit its critical section, * report the fact that it has exited. Because irqs are disabled, * t->rcu_read_unlock_special cannot change. */ special = t->rcu_read_unlock_special; + rdp = this_cpu_ptr(rcu_state_p->rda); + if (!special.s && !rdp->deferred_qs) { + local_irq_restore(flags); + return; + } if (special.b.need_qs) { rcu_preempt_qs(); t->rcu_read_unlock_special.b.need_qs = false; - if (!t->rcu_read_unlock_special.s) { + if (!t->rcu_read_unlock_special.s && !rdp->deferred_qs) { local_irq_restore(flags); return; } } /* - * Respond to a request for an expedited grace period, but only if - * we were not preempted, meaning that we were running on the same - * CPU throughout. If we were preempted, the exp_need_qs flag - * would have been cleared at the time of the first preemption, - * and the quiescent state would be reported when we were dequeued. + * Respond to a request by an expedited grace period for a + * quiescent state from this CPU. Note that requests from + * tasks are handled when removing the task from the + * blocked-tasks list below. */ - if (special.b.exp_need_qs) { - WARN_ON_ONCE(special.b.blocked); + if (special.b.exp_need_qs || rdp->deferred_qs) { t->rcu_read_unlock_special.b.exp_need_qs = false; - rdp = this_cpu_ptr(rcu_state_p->rda); + rdp->deferred_qs = false; rcu_report_exp_rdp(rcu_state_p, rdp, true); if (!t->rcu_read_unlock_special.s) { local_irq_restore(flags); @@ -519,19 +519,6 @@ static void rcu_read_unlock_special(struct task_struct *t) } } - /* Hardware IRQ handlers cannot block, complain if they get here. */ - if (in_irq() || in_serving_softirq()) { - lockdep_rcu_suspicious(__FILE__, __LINE__, - "rcu_read_unlock() from irq or softirq with blocking in critical section!!!\n"); - pr_alert("->rcu_read_unlock_special: %#x (b: %d, enq: %d nq: %d)\n", - t->rcu_read_unlock_special.s, - t->rcu_read_unlock_special.b.blocked, - t->rcu_read_unlock_special.b.exp_need_qs, - t->rcu_read_unlock_special.b.need_qs); - local_irq_restore(flags); - return; - } - /* Clean up if blocked during RCU read-side critical section. */ if (special.b.blocked) { t->rcu_read_unlock_special.b.blocked = false; @@ -602,6 +589,66 @@ static void rcu_read_unlock_special(struct task_struct *t) } } +/* + * Is a deferred quiescent-state pending, and are we also not in + * an RCU read-side critical section? It is the caller's responsibility + * to ensure it is otherwise safe to report any deferred quiescent + * states. The reason for this is that it is safe to report a + * quiescent state during context switch even though preemption + * is disabled. This function cannot be expected to understand these + * nuances, so the caller must handle them. + */ +static bool rcu_preempt_need_deferred_qs(struct task_struct *t) +{ + return (this_cpu_ptr(&rcu_preempt_data)->deferred_qs || + READ_ONCE(t->rcu_read_unlock_special.s)) && + !t->rcu_read_lock_nesting; +} + +/* + * Report a deferred quiescent state if needed and safe to do so. + * As with rcu_preempt_need_deferred_qs(), "safe" involves only + * not being in an RCU read-side critical section. The caller must + * evaluate safety in terms of interrupt, softirq, and preemption + * disabling. + */ +static void rcu_preempt_deferred_qs(struct task_struct *t) +{ + unsigned long flags; + + if (!rcu_preempt_need_deferred_qs(t)) + return; + local_irq_save(flags); + rcu_preempt_deferred_qs_irqrestore(t, flags); +} + +/* + * Handle special cases during rcu_read_unlock(), such as needing to + * notify RCU core processing or task having blocked during the RCU + * read-side critical section. + */ +static void rcu_read_unlock_special(struct task_struct *t) +{ + unsigned long flags; + bool preempt_bh_were_disabled = !!(preempt_count() & ~HARDIRQ_MASK); + bool irqs_were_disabled; + + /* NMI handlers cannot block and cannot safely manipulate state. */ + if (in_nmi()) + return; + + local_irq_save(flags); + irqs_were_disabled = irqs_disabled_flags(flags); + if ((preempt_bh_were_disabled || irqs_were_disabled) && + t->rcu_read_unlock_special.b.blocked) { + /* Need to defer quiescent state until everything is enabled. */ + raise_softirq_irqoff(RCU_SOFTIRQ); + local_irq_restore(flags); + return; + } + rcu_preempt_deferred_qs_irqrestore(t, flags); +} + /* * Dump detailed information for all tasks blocking the current RCU * grace period on the specified rcu_node structure. @@ -737,10 +784,20 @@ static void rcu_preempt_check_callbacks(void) struct rcu_state *rsp = &rcu_preempt_state; struct task_struct *t = current; - if (t->rcu_read_lock_nesting == 0) { - rcu_preempt_qs(); + if (t->rcu_read_lock_nesting > 0 || + (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { + /* No QS, force context switch if deferred. */ + if (rcu_preempt_need_deferred_qs(t)) + resched_cpu(smp_processor_id()); + } else if (rcu_preempt_need_deferred_qs(t)) { + rcu_preempt_deferred_qs(t); /* Report deferred QS. */ + return; + } else if (!t->rcu_read_lock_nesting) { + rcu_preempt_qs(); /* Report immediate QS. */ return; } + + /* If GP is oldish, ask for help from rcu_read_unlock_special(). */ if (t->rcu_read_lock_nesting > 0 && __this_cpu_read(rcu_data_p->core_needs_qs) && __this_cpu_read(rcu_data_p->cpu_no_qs.b.norm) && @@ -859,6 +916,7 @@ void exit_rcu(void) barrier(); t->rcu_read_unlock_special.b.blocked = true; __rcu_read_unlock(); + rcu_preempt_deferred_qs(current); } /* @@ -940,6 +998,16 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp) return false; } +/* + * Because there is no preemptible RCU, there can be no deferred quiescent + * states. + */ +static bool rcu_preempt_need_deferred_qs(struct task_struct *t) +{ + return false; +} +static void rcu_preempt_deferred_qs(struct task_struct *t) { } + /* * Because preemptible RCU does not exist, we never have to check for * tasks blocked within RCU read-side critical sections. -- 2.17.1