Received: by 2002:a05:6a10:8a4d:0:0:0:0 with SMTP id dn13csp1035233pxb; Fri, 13 Aug 2021 11:54:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz9I3KavgFr8QKDjBp6P1fFj/zf8bqafyjMw7A4TFdF5XPh0R1JHQx12DW9EQApUcKPqb2H X-Received: by 2002:a05:6402:430b:: with SMTP id m11mr4887632edc.55.1628880876087; Fri, 13 Aug 2021 11:54:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628880876; cv=none; d=google.com; s=arc-20160816; b=uz8UlAZJ/B/EqQTyaUmdgAuJrSZSioNLvQblEKcK5NIF1X9CP6VKQ/CyJptoGWNgfS SkqNnur7ac9reVPsQqhmeBPrfqnaDtq5yeAS5BpmH897gzS/ES11wac787jZwDEHPj2m BGRZoafJZmi89ku1lgf0MglL3PIzHiMWV1NtyqepfOpEcsgsi9Asd+3I8Mbnn5840Lw8 4cEzPUISogthmW6RzLMl9N13zzOCI9+wARXoHua78tljraYHFSa/7QbPLhbsE4AfST3l gCUNYR3D+yZJ6EXby6W3/LTBHYJ2YhBRPVCiLEwrEiNnHKfZuOHcUdo22GvJFBUPxhN9 +TtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=s17wFA3yfuU3D5IokT/c7b33F7Due5ujIq4b7Yh3kgY=; b=gEtKvF4CHMzfVnl0snuhaN/hnG8RpllxxTnOC6owk2uRx3yE4NNCybDwMmR5Fev2sD Axsgs3QgnHF1L8KELQxqxftEms9gbRRElamTNRf3koDijmlWZ9RgbwRrOzWplFqLW9P2 hBiRPQpWQXnt8QTXUHbiy+Q7Vla9obZkARfMM6qNOShix0ylAkzMcFY14O60swwjFRTy HE1ray4pvodMxRJDrl+eslL+uiMjxUQwttUKqCjMQ6cpoCglJ/L3AAnhI1F0ZC77yMFG rQ2NIlPEpToCJJk54Fx260imBr7cpXws3kOPNqc7M0BO6OaA4y1LgvzSu0rKqrbGZGhX 5MJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l3si2797699ejd.407.2021.08.13.11.54.12; Fri, 13 Aug 2021 11:54:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233250AbhHMStr (ORCPT + 99 others); Fri, 13 Aug 2021 14:49:47 -0400 Received: from foss.arm.com ([217.140.110.172]:56640 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233213AbhHMStR (ORCPT ); Fri, 13 Aug 2021 14:49:17 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CEB77D6E; Fri, 13 Aug 2021 11:48:49 -0700 (PDT) Received: from e113632-lin (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CC95E3F40C; Fri, 13 Aug 2021 11:48:46 -0700 (PDT) From: Valentin Schneider To: paulmck@kernel.org Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, rcu@vger.kernel.org, linux-rt-users@vger.kernel.org, Catalin Marinas , Will Deacon , Ingo Molnar , Peter Zijlstra , Thomas Gleixner , Steven Rostedt , Daniel Bristot de Oliveira , Sebastian Andrzej Siewior , Frederic Weisbecker , Josh Triplett , Mathieu Desnoyers , Davidlohr Bueso , Lai Jiangshan , Joel Fernandes , Anshuman Khandual , Vincenzo Frascino , Steven Price , Ard Biesheuvel , Boqun Feng , Mike Galbraith Subject: Re: [PATCH v3 3/4] rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT In-Reply-To: <20210813002045.GZ4126399@paulmck-ThinkPad-P17-Gen-1> References: <20210811201354.1976839-1-valentin.schneider@arm.com> <20210811201354.1976839-4-valentin.schneider@arm.com> <20210813002045.GZ4126399@paulmck-ThinkPad-P17-Gen-1> Date: Fri, 13 Aug 2021 19:48:41 +0100 Message-ID: <87v9496v06.mognet@arm.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/08/21 17:20, Paul E. McKenney wrote: > On Wed, Aug 11, 2021 at 09:13:53PM +0100, Valentin Schneider wrote: >> rcu_core() preemptability considerations >> ======================================== >> >> As pointed out by Paul [2], keeping rcu_check_quiescent_state() preemptible >> (which is the case under CONFIG_PREEMPT_RT) requires some consideration. >> >> note_gp_changes() itself runs with irqs off, and enters >> __note_gp_changes() with rnp->lock held (raw_spinlock), thus is safe vs >> preemption. >> >> rdp->core_needs_qs *could* change after being read by the RCU core >> kthread if it then gets preempted. Consider, with >> CONFIG_RCU_STRICT_GRACE_PERIOD: >> >> rcuc/x task_foo >> >> rcu_check_quiescent_state() >> `\ >> rdp->core_needs_qs == true >> >> rcu_read_unlock() >> `\ >> rcu_preempt_deferred_qs_irqrestore() >> `\ >> rcu_report_qs_rdp() >> `\ >> rdp->core_needs_qs := false; >> >> This would let rcuc/x's rcu_check_quiescent_state() proceed further down to >> rcu_report_qs_rdp(), but if task_foo's earlier rcu_report_qs_rdp() >> invocation would have cleared the rdp grpmask from the rnp mask, so >> rcuc/x's invocation would simply bail. >> >> Since rcu_report_qs_rdp() can be safely invoked, even if rdp->core_needs_qs >> changed, it appears safe to keep rcu_check_quiescent_state() preemptible. > > Another concern... > > During the preemption of rcu_check_quiescent_state() someone might report > a quiescent state on behalf of CPU x (perhaps due to its having recently > been idle) and then the RCU grace-period kthread might start running on > CPU x, where it might initialize a new grace period in rcu_gp_init(). > It can then invoke __note_gp_changes(), also on CPU x. > (this is me "writing out loud" to make sure I can follow) I take it the preemption of rcuc/x itself would count as a quiescent state, if the current grace period started before rcuc/x's rcu_read_lock() (rcuc/x would end up in ->blkd_tasks but wouldn't affect ->gp_tasks). Then if we get something to run rcu_report_qs_rnp() with a mask spanning CPU x - I think the GP kthread doing quiescent state forcing might fit the bill - we'll let the GP kthread initialize a new GP. > If preempted as shown above just after checking >core_needs_qs, the > ->cpu_no_qs.b.norm field will be set by the grace-period kthread, which > will cause the rcu_check_quiescent_state() function's subsequent check > of ->cpu_no_qs.b.norm to take an early exit. So OK here. > Right > On the other hand, if preempted just after the rcu_check_quiescent_state() > function's check of ->cpu_no_qs.b.norm, the later invocation of > rcu_report_qs_rdp() should take an early exit due to ->gp_seq mismatch. > So OK here. > If, as described in your scenario above, the GP kthread has preempted rcuc/x, initialized a new GP and has run __note_gp_changes() (on CPU x), then wouldn't the rdp->gp_seq and rnp->gp_seq match when rcuc/x gets to run again? And because we would've then had a context switch between the GP kthread and rcuc/x, we would've noted a quiescent state for CPU x, which would let rcuc/x's rcu_report_qs_rdp() continue - or have I erred on the way there? > However, this should be added to the commit log. Might be a big commit > log, but mass storage is cheap these days. ;-) > No objections here! > This needs a review of each and every manipulation of ->core_needs_qs > and ->cpu_no_qs.b.norm. For example, the preemptions will cause the > scheduler to invoke RCU's context-switch hooks, which also mess with > ->cpu_no_qs.b.norm. I can get to that some time next week (or tomorrow, > if things go better than expected), but it would be good for you (and > others) to check as well. > I have some notes scribbled down regarding those that I need to go through again, but that won't be before a week's time - I'll be away next week. > Frederic should look this over, but I am taking a quick pass in the > meantime. Please see below. > Thanks! >> @@ -210,6 +210,8 @@ struct rcu_data { >> struct timer_list nocb_timer; /* Enforce finite deferral. */ >> unsigned long nocb_gp_adv_time; /* Last call_rcu() CB adv (jiffies). */ >> >> + local_lock_t nocb_local_lock; > > Should this go near the beginning of the structure, given that code > paths taking this lock tend to access ->cpu_no_qs, ->core_needs_qs, > and so on? > > Given that it is used to protect core processing (not just offloaded > callbacks), might ->core_local_lock be a better name? > It would still have to be in a #define CONFIG_RCU_NOCB_CPU region IMO, as it only exists to protect access of the offloading state - hence the name, but I'm not particularly attached to it. > Please keep in mind that you can build kernels that offload callbacks > but that still use softirq for RCU core processing. And vice versa, > that is, kernels that use rcuc kthreads but do not offload callbacks. > AFAICT the problem only stands if core processing becomes preemptible, which is only the case under PREEMPT_RT (at least for now), and that implies core processing done purely via kthreads. >> + >> /* The following fields are used by call_rcu, hence own cacheline. */ >> raw_spinlock_t nocb_bypass_lock ____cacheline_internodealigned_in_smp; >> struct rcu_cblist nocb_bypass; /* Lock-contention-bypass CB list. */ >> @@ -445,6 +447,8 @@ static void rcu_nocb_unlock(struct rcu_data *rdp); >> static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp, >> unsigned long flags); >> static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp); >> +static void rcu_nocb_local_lock(struct rcu_data *rdp); >> +static void rcu_nocb_local_unlock(struct rcu_data *rdp); >> #ifdef CONFIG_RCU_NOCB_CPU >> static void __init rcu_organize_nocb_kthreads(void); >> #define rcu_nocb_lock_irqsave(rdp, flags) \ >> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h >> index 0ff5e4fb933e..11c4ff00afde 100644 >> --- a/kernel/rcu/tree_plugin.h >> +++ b/kernel/rcu/tree_plugin.h >> @@ -21,6 +21,17 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp) >> return lockdep_is_held(&rdp->nocb_lock); >> } >> >> +static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp) >> +{ >> + return lockdep_is_held( >> +#ifdef CONFIG_PREEMPT_RT >> + &rdp->nocb_local_lock.lock >> +#else >> + &rdp->nocb_local_lock >> +#endif > > It would be good if this was abstracted. Or is this the only place > in the kernel that needs this #ifdef? Maybe a lockdep_is_held_rt() > or lockdep_is_held_local(), as the case may be? > It does look like the first/only place that tries to access a local_lock's dep_map regardless of CONFIG_PREEMPT_RT. Looking at it some more, I'm not sure if it's really useful: !PREEMPT_RT local_locks disable preemption, so the preemption check in rcu_rdp_is_offloaded() will short circuit the lockdep check when !PREEMPT_RT... One thing I should perhaps point out - the local lock is only really needed for PREEMPT_RT; for !PREEMPT_RT it will just disable preemption, and it's only used in places that *already* disabled preemption (when !PREEMPT_RT). I initially gated the lock under CONFIG_PREEMPT_RT, but changed that as I found it reduced the ifdeffery. >> + ); >> +} >> + >> static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp) >> { >> /* Race on early boot between thread creation and assignment */ >> @@ -38,7 +49,10 @@ static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp) >> { >> return 0; >> } >> - >> +static inline int rcu_lockdep_is_held_nocb_local(struct rcu_data *rdp) >> +{ >> + return 0; > > This is backwards of normal lockdep practice, which defaults to locks > always held. And that will be what happens once lockdep has detected > its first deadlock, correct? At which point, this function and its > earlier instance will be in conflict. > > Or is there some subtle reason why this conflict would be OK? > This follows the !CONFIG_RCU_NOCB definition of rcu_lockdep_is_held_nocb(), which looked fine to me. Actually with the way I wrote rcu_local_offload_access_safe(), we don't even access that function when !CONFIG_RCU_NOCB... >> +} >> static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp) >> { >> return false; >> @@ -46,23 +60,44 @@ static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp) >> >> #endif /* #ifdef CONFIG_RCU_NOCB_CPU */ >> >> +/* >> + * Is a local read of the rdp's offloaded state safe and stable? >> + * See rcu_nocb_local_lock() & family. >> + */ >> +static inline bool rcu_local_offload_access_safe(struct rcu_data *rdp) >> +{ >> + if (!preemptible()) >> + return true; >> + >> + if (!migratable()) { >> + if (!IS_ENABLED(CONFIG_RCU_NOCB)) > > Do we also need to consult the use_softirq module parameter that controls > whether or not there are rcuc kthreads? > > Actually, if !IS_ENABLED(CONFIG_RCU_NOCB) then rcu_rdp_is_offloaded() > can simply return false without bothering with the RCU_LOCKDEP_WARN(). > Might be worth getting that out of the way of the RCU_LOCKDEP_WARN() > condition. ;-) > I _assumed_ the check was there even for !CONFIG_RCU_NOCB to provide a wider test coverage of the calling conditions. If that RCU_LOCKDEP_WARN() becomes conditionnal on CONFIG_RCU_NOCB then yes, we could clean that up. >> + return true; >> + >> + return rcu_lockdep_is_held_nocb_local(rdp); >> + } >> + >> + return false; >> +} >> +