The only code change in this v2 is the use of tif_need_resched() instead
of need_resched() on 3/4 (build issue reported on linux-next).
The rest is added Tested-by and Reviewed-by tags.
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
sched/idle-v2
HEAD: c246718af0112c8624ec9c46a85bf0ef1562e050
Thanks,
Frederic
---
Frederic Weisbecker (4):
sched/idle: Fix missing need_resched() check after rcu_idle_enter()
cpuidle: Fix missing need_resched() check after rcu_idle_enter()
ARM: imx6q: Fix missing need_resched() check after rcu_idle_enter()
ACPI: processor: Fix missing need_resched() check after rcu_idle_enter()
arch/arm/mach-imx/cpuidle-imx6q.c | 8 +++++++-
drivers/acpi/processor_idle.c | 10 ++++++++--
drivers/cpuidle/cpuidle.c | 33 +++++++++++++++++++++++++--------
kernel/sched/idle.c | 18 ++++++++++++------
4 files changed, 52 insertions(+), 17 deletions(-)
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.
Usually a wake up happening while running the idle task is spotted in
one of the need_resched() checks carefully placed within the idle loop
that can break to the scheduler.
Unfortunately in default_idle_call(), the call to rcu_idle_enter() is
already beyond the last need_resched() check and we may halt the CPU
with a resched request unhandled, leaving the task hanging.
Fix this with performing a last minute need_resched() check after
calling rcu_idle_enter().
Reported-and-tested-by: Paul E. McKenney <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]>
Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf)
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar<[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/sched/idle.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 305727ea0677..1af60dc50beb 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -109,15 +109,21 @@ void __cpuidle default_idle_call(void)
rcu_idle_enter();
lockdep_hardirqs_on(_THIS_IP_);
- arch_cpu_idle();
+ /*
+ * Last need_resched() check must come after rcu_idle_enter()
+ * which may wake up RCU internal tasks.
+ */
+ if (!need_resched()) {
+ arch_cpu_idle();
+ raw_local_irq_disable();
+ }
/*
- * OK, so IRQs are enabled here, but RCU needs them disabled to
- * turn itself back on.. funny thing is that disabling IRQs
- * will cause tracing, which needs RCU. Jump through hoops to
- * make it 'work'.
+ * OK, so IRQs are enabled after arch_cpu_idle(), but RCU needs
+ * them disabled to turn itself back on.. funny thing is that
+ * disabling IRQs will cause tracing, which needs RCU. Jump through
+ * hoops to make it 'work'.
*/
- raw_local_irq_disable();
lockdep_hardirqs_off(_THIS_IP_);
rcu_idle_exit();
lockdep_hardirqs_on(_THIS_IP_);
--
2.25.1
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.
Usually a wake up happening while running the idle task is spotted in
one of the need_resched() checks carefully placed within the idle loop
that can break to the scheduler.
Unfortunately within cpuidle the call to rcu_idle_enter() is already
beyond the last generic need_resched() check. Some drivers may perform
their own checks like with mwait_idle_with_hints() but many others don't
and we may halt the CPU with a resched request unhandled, leaving the
task hanging.
Fix this with performing a last minute need_resched() check after
calling rcu_idle_enter().
Reported-by: Paul E. McKenney <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]>
Fixes: 1098582a0f6c (sched,idle,rcu: Push rcu_idle deeper into the idle path)
Cc: [email protected]
Cc: Daniel Lezcano <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
drivers/cpuidle/cpuidle.c | 33 +++++++++++++++++++++++++--------
1 file changed, 25 insertions(+), 8 deletions(-)
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index ef2ea1b12cd8..4cc1ba49ce05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -134,8 +134,8 @@ int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
}
#ifdef CONFIG_SUSPEND
-static void enter_s2idle_proper(struct cpuidle_driver *drv,
- struct cpuidle_device *dev, int index)
+static int enter_s2idle_proper(struct cpuidle_driver *drv,
+ struct cpuidle_device *dev, int index)
{
ktime_t time_start, time_end;
struct cpuidle_state *target_state = &drv->states[index];
@@ -151,7 +151,14 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
stop_critical_timings();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_enter();
- target_state->enter_s2idle(dev, drv, index);
+ /*
+ * Last need_resched() check must come after rcu_idle_enter()
+ * which may wake up RCU internal tasks.
+ */
+ if (!need_resched())
+ target_state->enter_s2idle(dev, drv, index);
+ else
+ index = -EBUSY;
if (WARN_ON_ONCE(!irqs_disabled()))
local_irq_disable();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
@@ -159,10 +166,13 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
tick_unfreeze();
start_critical_timings();
- time_end = ns_to_ktime(local_clock());
+ if (index > 0) {
+ time_end = ns_to_ktime(local_clock());
+ dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
+ dev->states_usage[index].s2idle_usage++;
+ }
- dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
- dev->states_usage[index].s2idle_usage++;
+ return index;
}
/**
@@ -184,7 +194,7 @@ int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
*/
index = find_deepest_state(drv, dev, U64_MAX, 0, true);
if (index > 0) {
- enter_s2idle_proper(drv, dev, index);
+ index = enter_s2idle_proper(drv, dev, index);
local_irq_enable();
}
return index;
@@ -234,7 +244,14 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
stop_critical_timings();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_enter();
- entered_state = target_state->enter(dev, drv, index);
+ /*
+ * Last need_resched() check must come after rcu_idle_enter()
+ * which may wake up RCU internal tasks.
+ */
+ if (!need_resched())
+ entered_state = target_state->enter(dev, drv, index);
+ else
+ entered_state = -EBUSY;
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_exit();
start_critical_timings();
--
2.25.1
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.
Usually a wake up happening while running the idle task is spotted in
one of the need_resched() checks carefully placed within the idle loop
that can break to the scheduler.
Unfortunately within acpi_idle_enter_bm() the call to rcu_idle_enter()
is already beyond the last generic need_resched() check. The cpu idle
implementation happens to be ok because it ends up calling
mwait_idle_with_hints() or acpi_safe_halt() which both perform their own
need_resched() checks. But the suspend to idle implementation doesn't so
it may suspend the CPU with a resched request unhandled, leaving the
task hanging.
Fix this with performing a last minute need_resched() check after
calling rcu_idle_enter().
Reported-by: Paul E. McKenney <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]>
Fixes: 1fecfdbb7acc (ACPI: processor: Take over RCU-idle for C3-BM idle)
Cc: [email protected]
Cc: Len Brown <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar<[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
drivers/acpi/processor_idle.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index d93e400940a3..c4939c49d972 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -604,8 +604,14 @@ static int acpi_idle_enter_bm(struct cpuidle_driver *drv,
}
rcu_idle_enter();
-
- acpi_idle_do_entry(cx);
+ /*
+ * Last need_resched() check must come after rcu_idle_enter()
+ * which may wake up RCU internal tasks. mwait_idle_with_hints()
+ * and acpi_safe_halt() have their own checks but s2idle
+ * implementation doesn't.
+ */
+ if (!need_resched())
+ acpi_idle_do_entry(cx);
rcu_idle_exit();
--
2.25.1
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.
Usually a wake up happening while running the idle task is spotted in
one of the need_resched() checks carefully placed within the idle loop
that can break to the scheduler.
Unfortunately imx6q_enter_wait() is beyond the last generic
need_resched() check and it performs a wfi right away after the call to
rcu_idle_enter(). We may halt the CPU with a resched request unhandled,
leaving the task hanging.
Fix this with performing a last minute need_resched() check after
calling rcu_idle_enter().
Reported-by: Paul E. McKenney <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]>
Fixes: 1a67b9263e06 (ARM: imx6q: Fixup RCU usage for cpuidle)
Cc: [email protected]
Cc: Shawn Guo <[email protected]>
Cc: Sascha Hauer <[email protected]>
Cc: Pengutronix Kernel Team <[email protected]>
Cc: Fabio Estevam <[email protected]>
Cc: NXP Linux Team <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/arm/mach-imx/cpuidle-imx6q.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-imx/cpuidle-imx6q.c b/arch/arm/mach-imx/cpuidle-imx6q.c
index 094337dc1bc7..1115f4dc6d1d 100644
--- a/arch/arm/mach-imx/cpuidle-imx6q.c
+++ b/arch/arm/mach-imx/cpuidle-imx6q.c
@@ -5,6 +5,7 @@
#include <linux/cpuidle.h>
#include <linux/module.h>
+#include <linux/thread_info.h>
#include <asm/cpuidle.h>
#include <soc/imx/cpuidle.h>
@@ -25,7 +26,12 @@ static int imx6q_enter_wait(struct cpuidle_device *dev,
raw_spin_unlock(&cpuidle_lock);
rcu_idle_enter();
- cpu_do_idle();
+ /*
+ * Last need_resched() check must come after rcu_idle_enter()
+ * which may wake up RCU internal tasks.
+ */
+ if (!tif_need_resched())
+ cpu_do_idle();
rcu_idle_exit();
raw_spin_lock(&cpuidle_lock);
--
2.25.1
Hi Frederic,
On Mon, Jan 4, 2021 at 12:21 PM Frederic Weisbecker <[email protected]> wrote:
>
> Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
> kthread (rcuog) to be serviced.
>
> Usually a wake up happening while running the idle task is spotted in
> one of the need_resched() checks carefully placed within the idle loop
> that can break to the scheduler.
>
> Unfortunately imx6q_enter_wait() is beyond the last generic
> need_resched() check and it performs a wfi right away after the call to
> rcu_idle_enter(). We may halt the CPU with a resched request unhandled,
> leaving the task hanging.
>
> Fix this with performing a last minute need_resched() check after
> calling rcu_idle_enter().
Shouldn't tif_need_resched() be used instead of need_resched() in the
commit log?
On Mon, Jan 04, 2021 at 04:20:55PM +0100, Frederic Weisbecker wrote:
> Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
> kthread (rcuog) to be serviced.
>
> Usually a wake up happening while running the idle task is spotted in
> one of the need_resched() checks carefully placed within the idle loop
> that can break to the scheduler.
Urgh, this is horrific and fragile :/ You having had to audit and fix a
number of rcu_idle_enter() callers should've made you realize that
making rcu_idle_enter() return something would've been saner.
Also, I might hope that when RCU does do that wakeup, it will not have
put RCU in idle mode? So it is a natural 'fail' state for
rcu_idle_enter(), *sigh* it continues to put RCU to sleep, so that needs
fixing too.
I'm thinking that rcu_user_enter() will have the exact same problem? Did
you audit that?
Something like the below, combined with a fixup for all callers (which
the compiler will help us find thanks to __must_check).
---
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de0826411311..612f66c16078 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -95,10 +95,10 @@ static inline void rcu_sysrq_end(void) { }
#endif /* #else #ifdef CONFIG_RCU_STALL_COMMON */
#ifdef CONFIG_NO_HZ_FULL
-void rcu_user_enter(void);
+bool __must_check rcu_user_enter(void);
void rcu_user_exit(void);
#else
-static inline void rcu_user_enter(void) { }
+static inline bool __must_check rcu_user_enter(void) { return true; }
static inline void rcu_user_exit(void) { }
#endif /* CONFIG_NO_HZ_FULL */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index df578b73960f..9ba0c5d9e99e 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -43,7 +43,7 @@ bool rcu_gp_might_be_stalled(void);
unsigned long get_state_synchronize_rcu(void);
void cond_synchronize_rcu(unsigned long oldstate);
-void rcu_idle_enter(void);
+bool __must_check rcu_idle_enter(void);
void rcu_idle_exit(void);
void rcu_irq_enter(void);
void rcu_irq_exit(void);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 40e5e3dd253e..13e19e5db0b8 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -625,7 +625,7 @@ EXPORT_SYMBOL_GPL(rcutorture_get_gp_data);
* the possibility of usermode upcalls having messed up our count
* of interrupt nesting level during the prior busy period.
*/
-static noinstr void rcu_eqs_enter(bool user)
+static noinstr bool rcu_eqs_enter(bool user)
{
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
@@ -636,7 +636,7 @@ static noinstr void rcu_eqs_enter(bool user)
if (rdp->dynticks_nesting != 1) {
// RCU will still be watching, so just do accounting and leave.
rdp->dynticks_nesting--;
- return;
+ return true;
}
lockdep_assert_irqs_disabled();
@@ -644,7 +644,14 @@ static noinstr void rcu_eqs_enter(bool user)
trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, atomic_read(&rdp->dynticks));
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
rdp = this_cpu_ptr(&rcu_data);
- do_nocb_deferred_wakeup(rdp);
+ if (do_nocb_deferred_wakeup(rdp)) {
+ /*
+ * We did the wakeup, don't enter EQS, we'll need to abort idle
+ * and schedule.
+ */
+ return false;
+ }
+
rcu_prepare_for_idle();
rcu_preempt_deferred_qs(current);
@@ -657,6 +664,8 @@ static noinstr void rcu_eqs_enter(bool user)
rcu_dynticks_eqs_enter();
// ... but is no longer watching here.
rcu_dynticks_task_enter();
+
+ return true;
}
/**
@@ -670,10 +679,10 @@ static noinstr void rcu_eqs_enter(bool user)
* If you add or remove a call to rcu_idle_enter(), be sure to test with
* CONFIG_RCU_EQS_DEBUG=y.
*/
-void rcu_idle_enter(void)
+bool rcu_idle_enter(void)
{
lockdep_assert_irqs_disabled();
- rcu_eqs_enter(false);
+ return rcu_eqs_enter(false);
}
EXPORT_SYMBOL_GPL(rcu_idle_enter);
@@ -689,10 +698,10 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
* If you add or remove a call to rcu_user_enter(), be sure to test with
* CONFIG_RCU_EQS_DEBUG=y.
*/
-noinstr void rcu_user_enter(void)
+noinstr bool rcu_user_enter(void)
{
lockdep_assert_irqs_disabled();
- rcu_eqs_enter(true);
+ return rcu_eqs_enter(true);
}
#endif /* CONFIG_NO_HZ_FULL */
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 7708ed161f4a..9226f4021a36 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -433,7 +433,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
unsigned long flags);
static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp);
-static void do_nocb_deferred_wakeup(struct rcu_data *rdp);
+static bool do_nocb_deferred_wakeup(struct rcu_data *rdp);
static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
static void rcu_spawn_cpu_nocb_kthread(int cpu);
static void __init rcu_spawn_nocb_kthreads(void);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7e291ce0a1d6..8ca41b3fe4f9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1631,7 +1631,7 @@ bool rcu_is_nocb_cpu(int cpu)
* Kick the GP kthread for this NOCB group. Caller holds ->nocb_lock
* and this function releases it.
*/
-static void wake_nocb_gp(struct rcu_data *rdp, bool force,
+static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
unsigned long flags)
__releases(rdp->nocb_lock)
{
@@ -1654,8 +1654,11 @@ static void wake_nocb_gp(struct rcu_data *rdp, bool force,
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DoWake"));
}
raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
- if (needwake)
+ if (needwake) {
wake_up_process(rdp_gp->nocb_gp_kthread);
+ return true;
+ }
+ return false;
}
/*
@@ -2155,17 +2158,19 @@ static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
static void do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
{
unsigned long flags;
+ bool ret;
int ndw;
rcu_nocb_lock_irqsave(rdp, flags);
if (!rcu_nocb_need_deferred_wakeup(rdp)) {
rcu_nocb_unlock_irqrestore(rdp, flags);
- return;
+ return false;
}
ndw = READ_ONCE(rdp->nocb_defer_wakeup);
WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
- wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
+ ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
+ return ret;
}
/* Do a deferred wakeup of rcu_nocb_kthread() from a timer handler. */
@@ -2181,10 +2186,12 @@ static void do_nocb_deferred_wakeup_timer(struct timer_list *t)
* This means we do an inexact common-case check. Note that if
* we miss, ->nocb_timer will eventually clean things up.
*/
-static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
+static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
{
if (rcu_nocb_need_deferred_wakeup(rdp))
- do_nocb_deferred_wakeup_common(rdp);
+ return do_nocb_deferred_wakeup_common(rdp);
+
+ return false;
}
void __init rcu_init_nohz(void)
@@ -2518,8 +2525,9 @@ static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
return false;
}
-static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
+static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
{
+ return false
}
static void rcu_spawn_cpu_nocb_kthread(int cpu)
On Tue, Jan 05, 2021 at 10:55:03AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 04, 2021 at 04:20:55PM +0100, Frederic Weisbecker wrote:
> > Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
> > kthread (rcuog) to be serviced.
> >
> > Usually a wake up happening while running the idle task is spotted in
> > one of the need_resched() checks carefully placed within the idle loop
> > that can break to the scheduler.
>
> Urgh, this is horrific and fragile :/ You having had to audit and fix a
> number of rcu_idle_enter() callers should've made you realize that
> making rcu_idle_enter() return something would've been saner.
>
> Also, I might hope that when RCU does do that wakeup, it will not have
> put RCU in idle mode? So it is a natural 'fail' state for
> rcu_idle_enter(), *sigh* it continues to put RCU to sleep, so that needs
> fixing too.
Heh, yes you're right, that looks saner.
>
> I'm thinking that rcu_user_enter() will have the exact same problem? Did
> you audit that?
Yes and I wanted to fix it seperately since it's a bit harder to fix because
we are past the last need_resched() check, all syscall exit works, lockdep
hardirqs on entry prep, tracing hardirqs on, etc... I need to manage to
rollback safely and cleanly.
Unless I can decouple the wakeup from rcu_user_enter() and put it around the
exit_to_user_mode_loop(). But then I must make sure that call_rcu() isn't called
afterward.
>
> Something like the below, combined with a fixup for all callers (which
> the compiler will help us find thanks to __must_check).
Right, I just need to make sure that the wake up is local as the kthread
awaken can be queued anywhere. But a simple need_resched() check after the
wake up should be fine to get that.
Thanks.
On Tue, Jan 05, 2021 at 10:55:03AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 04, 2021 at 04:20:55PM +0100, Frederic Weisbecker wrote:
> > Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
> > kthread (rcuog) to be serviced.
> >
> > Usually a wake up happening while running the idle task is spotted in
> > one of the need_resched() checks carefully placed within the idle loop
> > that can break to the scheduler.
>
> Urgh, this is horrific and fragile :/ You having had to audit and fix a
> number of rcu_idle_enter() callers should've made you realize that
> making rcu_idle_enter() return something would've been saner.
>
> Also, I might hope that when RCU does do that wakeup, it will not have
> put RCU in idle mode? So it is a natural 'fail' state for
> rcu_idle_enter(), *sigh* it continues to put RCU to sleep, so that needs
> fixing too.
It depends on what is being awakened. For example, the nocb rcuog
and rcuoc kthreads might be well on some other CPU, so RCU might need
the wakeup to happen, but might also need to go completely to sleep on
this CPU.
But yes, if the wakeup needs to be on the current CPU, then idle must
be exited and RCU needs to again be watching. However, RCU has no idea
what CPU the to-be-awakened kthread will be running on. And even if
it were to know at the time it does the wakeup, that kthread's location
might well have changed by the time the current CPU enters idle.
> I'm thinking that rcu_user_enter() will have the exact same problem? Did
> you audit that?
>
> Something like the below, combined with a fixup for all callers (which
> the compiler will help us find thanks to __must_check).
Looks at least somewhat plausible at first glance.
Though given the above, it is possible (likely, even) that
rcu_user_enter() returns true, but that this CPU still needs to enter
idle. So isn't a subsequent check of need_resched() or friends still
required? Or is your point that this will happen automatically upon
exit from the idle loop?
Thanx, Paul
> ---
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index de0826411311..612f66c16078 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -95,10 +95,10 @@ static inline void rcu_sysrq_end(void) { }
> #endif /* #else #ifdef CONFIG_RCU_STALL_COMMON */
>
> #ifdef CONFIG_NO_HZ_FULL
> -void rcu_user_enter(void);
> +bool __must_check rcu_user_enter(void);
> void rcu_user_exit(void);
> #else
> -static inline void rcu_user_enter(void) { }
> +static inline bool __must_check rcu_user_enter(void) { return true; }
> static inline void rcu_user_exit(void) { }
> #endif /* CONFIG_NO_HZ_FULL */
>
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index df578b73960f..9ba0c5d9e99e 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -43,7 +43,7 @@ bool rcu_gp_might_be_stalled(void);
> unsigned long get_state_synchronize_rcu(void);
> void cond_synchronize_rcu(unsigned long oldstate);
>
> -void rcu_idle_enter(void);
> +bool __must_check rcu_idle_enter(void);
> void rcu_idle_exit(void);
> void rcu_irq_enter(void);
> void rcu_irq_exit(void);
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 40e5e3dd253e..13e19e5db0b8 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -625,7 +625,7 @@ EXPORT_SYMBOL_GPL(rcutorture_get_gp_data);
> * the possibility of usermode upcalls having messed up our count
> * of interrupt nesting level during the prior busy period.
> */
> -static noinstr void rcu_eqs_enter(bool user)
> +static noinstr bool rcu_eqs_enter(bool user)
> {
> struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>
> @@ -636,7 +636,7 @@ static noinstr void rcu_eqs_enter(bool user)
> if (rdp->dynticks_nesting != 1) {
> // RCU will still be watching, so just do accounting and leave.
> rdp->dynticks_nesting--;
> - return;
> + return true;
> }
>
> lockdep_assert_irqs_disabled();
> @@ -644,7 +644,14 @@ static noinstr void rcu_eqs_enter(bool user)
> trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, atomic_read(&rdp->dynticks));
> WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
> rdp = this_cpu_ptr(&rcu_data);
> - do_nocb_deferred_wakeup(rdp);
> + if (do_nocb_deferred_wakeup(rdp)) {
> + /*
> + * We did the wakeup, don't enter EQS, we'll need to abort idle
> + * and schedule.
> + */
> + return false;
> + }
> +
> rcu_prepare_for_idle();
> rcu_preempt_deferred_qs(current);
>
> @@ -657,6 +664,8 @@ static noinstr void rcu_eqs_enter(bool user)
> rcu_dynticks_eqs_enter();
> // ... but is no longer watching here.
> rcu_dynticks_task_enter();
> +
> + return true;
> }
>
> /**
> @@ -670,10 +679,10 @@ static noinstr void rcu_eqs_enter(bool user)
> * If you add or remove a call to rcu_idle_enter(), be sure to test with
> * CONFIG_RCU_EQS_DEBUG=y.
> */
> -void rcu_idle_enter(void)
> +bool rcu_idle_enter(void)
> {
> lockdep_assert_irqs_disabled();
> - rcu_eqs_enter(false);
> + return rcu_eqs_enter(false);
> }
> EXPORT_SYMBOL_GPL(rcu_idle_enter);
>
> @@ -689,10 +698,10 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
> * If you add or remove a call to rcu_user_enter(), be sure to test with
> * CONFIG_RCU_EQS_DEBUG=y.
> */
> -noinstr void rcu_user_enter(void)
> +noinstr bool rcu_user_enter(void)
> {
> lockdep_assert_irqs_disabled();
> - rcu_eqs_enter(true);
> + return rcu_eqs_enter(true);
> }
> #endif /* CONFIG_NO_HZ_FULL */
>
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 7708ed161f4a..9226f4021a36 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -433,7 +433,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
> unsigned long flags);
> static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp);
> -static void do_nocb_deferred_wakeup(struct rcu_data *rdp);
> +static bool do_nocb_deferred_wakeup(struct rcu_data *rdp);
> static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
> static void rcu_spawn_cpu_nocb_kthread(int cpu);
> static void __init rcu_spawn_nocb_kthreads(void);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 7e291ce0a1d6..8ca41b3fe4f9 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1631,7 +1631,7 @@ bool rcu_is_nocb_cpu(int cpu)
> * Kick the GP kthread for this NOCB group. Caller holds ->nocb_lock
> * and this function releases it.
> */
> -static void wake_nocb_gp(struct rcu_data *rdp, bool force,
> +static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> unsigned long flags)
> __releases(rdp->nocb_lock)
> {
> @@ -1654,8 +1654,11 @@ static void wake_nocb_gp(struct rcu_data *rdp, bool force,
> trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DoWake"));
> }
> raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
> - if (needwake)
> + if (needwake) {
> wake_up_process(rdp_gp->nocb_gp_kthread);
> + return true;
> + }
> + return false;
> }
>
> /*
> @@ -2155,17 +2158,19 @@ static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
> static void do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
> {
> unsigned long flags;
> + bool ret;
> int ndw;
>
> rcu_nocb_lock_irqsave(rdp, flags);
> if (!rcu_nocb_need_deferred_wakeup(rdp)) {
> rcu_nocb_unlock_irqrestore(rdp, flags);
> - return;
> + return false;
> }
> ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> - wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
> + ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
> trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
> + return ret;
> }
>
> /* Do a deferred wakeup of rcu_nocb_kthread() from a timer handler. */
> @@ -2181,10 +2186,12 @@ static void do_nocb_deferred_wakeup_timer(struct timer_list *t)
> * This means we do an inexact common-case check. Note that if
> * we miss, ->nocb_timer will eventually clean things up.
> */
> -static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
> +static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
> {
> if (rcu_nocb_need_deferred_wakeup(rdp))
> - do_nocb_deferred_wakeup_common(rdp);
> + return do_nocb_deferred_wakeup_common(rdp);
> +
> + return false;
> }
>
> void __init rcu_init_nohz(void)
> @@ -2518,8 +2525,9 @@ static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
> return false;
> }
>
> -static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
> +static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
> {
> + return false
> }
>
> static void rcu_spawn_cpu_nocb_kthread(int cpu)
On Tue, Jan 05, 2021 at 03:25:10PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 05, 2021 at 10:55:03AM +0100, Peter Zijlstra wrote:
> > On Mon, Jan 04, 2021 at 04:20:55PM +0100, Frederic Weisbecker wrote:
> > > Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
> > > kthread (rcuog) to be serviced.
> > >
> > > Usually a wake up happening while running the idle task is spotted in
> > > one of the need_resched() checks carefully placed within the idle loop
> > > that can break to the scheduler.
> >
> > Urgh, this is horrific and fragile :/ You having had to audit and fix a
> > number of rcu_idle_enter() callers should've made you realize that
> > making rcu_idle_enter() return something would've been saner.
> >
> > Also, I might hope that when RCU does do that wakeup, it will not have
> > put RCU in idle mode? So it is a natural 'fail' state for
> > rcu_idle_enter(), *sigh* it continues to put RCU to sleep, so that needs
> > fixing too.
>
> It depends on what is being awakened. For example, the nocb rcuog
> and rcuoc kthreads might be well on some other CPU, so RCU might need
> the wakeup to happen, but might also need to go completely to sleep on
> this CPU.
>
> But yes, if the wakeup needs to be on the current CPU, then idle must
> be exited and RCU needs to again be watching. However, RCU has no idea
> what CPU the to-be-awakened kthread will be running on. And even if
> it were to know at the time it does the wakeup, that kthread's location
> might well have changed by the time the current CPU enters idle.
A simple check for need_resched() would do the trick. Sure that could also
catch other sources of wake up that would have been otherwise handled once IRQs get
re-enabled but that's not a problem.
>
> > I'm thinking that rcu_user_enter() will have the exact same problem? Did
> > you audit that?
> >
> > Something like the below, combined with a fixup for all callers (which
> > the compiler will help us find thanks to __must_check).
>
> Looks at least somewhat plausible at first glance.
>
> Though given the above, it is possible (likely, even) that
> rcu_user_enter() returns true, but that this CPU still needs to enter
> idle. So isn't a subsequent check of need_resched() or friends still
> required? Or is your point that this will happen automatically upon
> exit from the idle loop?
Exactly, upon "wake_up_process(rdp_gp->nocb_gp_kthread)", we just need to
make sure that need_resched() is set before returning false, namely:
> > @@ -644,7 +644,14 @@ static noinstr void rcu_eqs_enter(bool user)
> > trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, atomic_read(&rdp->dynticks));
> > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
> > rdp = this_cpu_ptr(&rcu_data);
> > - do_nocb_deferred_wakeup(rdp);
> > + if (do_nocb_deferred_wakeup(rdp)) {
> > + /*
> > + * We did the wakeup, don't enter EQS, we'll need to abort idle
> > + * and schedule.
> > + */
> > + return false;
Right here.
But still I think we should decouple the wake up from rcu_eqs_enter().
And have:
rcu_eqs_enter_prepare(): does the deferred wakeup and forbid from calling
call_rcu() from here.
rcu_eqs_enter(): enter RCU extended quiescent state
This way we can more easily fix the rcu_user_enter() case as it happens past
the last scheduler entrypoint before returning to userspace.
Thanks.
On Tue, Jan 05, 2021 at 01:57:22PM +0100, Frederic Weisbecker wrote:
> > Something like the below, combined with a fixup for all callers (which
> > the compiler will help us find thanks to __must_check).
>
> Right, I just need to make sure that the wake up is local as the kthread
> awaken can be queued anywhere. But a simple need_resched() check after the
> wake up should be fine to get that.
Duh, yes. Clearly I'm having startup problems after the holidays ;-)