This series improves/adds to RCU's warnings about CPU hotplug and adds
documentation and testing.
v3->v4: Minor cleanups.
Joel Fernandes (Google) (5):
rcu/tree: Add a warning if CPU being onlined did not report QS already
rcu/tree: Clarify comments about FQS loop reporting quiescent states
rcu/tree: Make FQS complaining about offline CPU more aggressive
rcutorture: Force synchronizing of RCU flavor from hotplug notifier
docs: Update RCU's hotplug requirements with a bit about design
.../RCU/Design/Requirements/Requirements.rst | 22 +++++
kernel/rcu/rcutorture.c | 81 ++++++++++---------
kernel/rcu/tree.c | 39 +++++++--
3 files changed, 95 insertions(+), 47 deletions(-)
--
2.28.0.236.gb10cc79966-goog
Currently, rcu_cpu_starting() checks to see if the RCU core expects a
quiescent state from the incoming CPU. However, the current interaction
between RCU quiescent-state reporting and CPU-hotplug operations should
mean that the incoming CPU never needs to report a quiescent state.
First, the outgoing CPU reports a quiescent state if needed. Second,
the race where the CPU is leaving just as RCU is initializing a new
grace period is handled by an explicit check for this condition. Third,
the CPU's leaf rcu_node structure's ->lock serializes these checks.
This means that if rcu_cpu_starting() ever feels the need to report
a quiescent state, then there is a bug somewhere in the CPU hotplug
code or the RCU grace-period handling code. This commit therefore
adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
Cc: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Suggested-by: Paul E. McKenney <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/tree.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 65e1b5e92319..a49fa3b60faa 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3996,7 +3996,14 @@ void rcu_cpu_starting(unsigned int cpu)
rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
- if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
+
+ /*
+ * XXX: The following rcu_report_qs_rnp() is redundant. If the below
+ * warning does not fire, consider replacing it with the "else" block,
+ * by June 2021 or so (while keeping the warning). Refer to RCU's
+ * Requirements documentation for the rationale.
+ */
+ if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
rcu_disable_urgency_upon_qs(rdp);
/* Report QS -after- changing ->qsmaskinitnext! */
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
--
2.28.0.236.gb10cc79966-goog
RCU's hotplug design will help understand the requirements an RCU
implementation needs to fullfill, such as dead-lock avoidance.
The rcu_barrier() section of the "Hotplug CPU" section already talks
about deadlocks, however the description of what else can deadlock other
than rcu_barrier is rather incomplete.
This commit therefore continues the section by describing how RCU's
design handles CPU hotplug in a deadlock-free way.
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
.../RCU/Design/Requirements/Requirements.rst | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index 16c64a2eff93..0a4148b9f743 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -1940,6 +1940,28 @@ deadlock. Furthermore, ``rcu_barrier()`` blocks CPU-hotplug operations
during its execution, which results in another type of deadlock when
invoked from a CPU-hotplug notifier.
+Also, RCU's implementation avoids serious deadlocks which could occur due to
+interaction between hotplug, timers and grace period processing. It does so by
+maintaining its own bookkeeping of every CPU's hotplug state, independent of
+the various CPU masks and by reporting quiescent states at explicit points. It
+may come across as a surprise, but the force quiescent state loop (FQS) does
+not report quiescent states for offline CPUs and is not required to.
+
+For an offline CPU, the quiescent state will be reported in either of:
+1. During CPU offlining, using RCU's hotplug notifier (``rcu_report_dead()``).
+2. During grace period initialization (``rcu_gp_init``) if it detected a race
+ with CPU offlining, or a race with a task unblocking on a node which
+ previously had all of its CPUs offlined.
+
+The CPU onlining path (``rcu_cpu_starting``) does not need to a report
+quiescent state for an offline CPU in fact it would trigger a warning if a
+quiescent state was not already reported for that CPU.
+
+During the checking/modification of RCU's hotplug bookkeeping, the
+corresponding CPU's leaf node lock is held. This avoids race conditions between
+RCU's hotplug notifier hooks, grace period initialization code and the FQS loop
+which can concurrently refer to or modify the bookkeeping.
+
Scheduler and RCU
~~~~~~~~~~~~~~~~~
--
2.28.0.236.gb10cc79966-goog
The FQS loop detecting that an offline CPU has not yet reported a
quiescent state, is a serious problem. The current interaction between
RCU quiescent-state reporting and CPU-hotplug operations means that the
FQS loop should never find that an offline CPU (by RCU's understanding
of what is offline or not) has not yet reported a quiescent state.
First, the outgoing CPU explicitly reports a quiescent state if needed
in rcu_report_dead(). Second, the race where the CPU is leaving just as
RCU is initializing a new grace period is handled by an explicit check
for this condition in rcu_gp_init(). Third, the CPU's leaf rcu_node
structure's ->lock serializes these modifications and checks.
At the moment, the FQS loop detections this anomaly but only reports it
after a second has elapsed.
This commit therefore makes the warning more aggressive such that we fail
immediately when the FQS loop scan happens and bring the problem to
everyone's attention.
Light testing with TREE03 and hotplug shows no warnings. Converted the
warning as well to WARN_ON_ONCE() to reduce log spam.
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/tree.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2fb66cdbfa25..850628aba813 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1208,13 +1208,27 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
return 1;
}
- /* If waiting too long on an offline CPU, complain. */
- if (!(rdp->grpmask & rcu_rnp_online_cpus(rnp)) &&
- time_after(jiffies, rcu_state.gp_start + HZ)) {
+ /*
+ * Complain if an offline CPU by RCU's hotplug bookkeeping has not
+ * reported a quiescent state yet.
+ *
+ * Either a quiescent state for the offline CPU should have been
+ * reported during CPU offlining, or during rcu_gp_init() if it
+ * detected a race with either CPU offlining or task unblocking on a
+ * node with all of its CPUs previously offlined. If neither of these
+ * happened, we have a problem.
+ *
+ * The node lock is held here, ensuring exclusion from the
+ * modification/checking of RCU's hotplug books by RCU's hotplug
+ * notifier hooks and grace period initialization code.
+ *
+ * Refer to RCU's Requirements documentation about hotplug requirements
+ * as well.
+ */
+ if (WARN_ON_ONCE(!(rdp->grpmask & rcu_rnp_online_cpus(rnp)))) {
bool onl;
struct rcu_node *rnp1;
- WARN_ON(1); /* Offline CPUs are supposed to report QS! */
pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n",
__func__, rnp->grplo, rnp->grphi, rnp->level,
(long)rnp->gp_seq, (long)rnp->completedqs);
--
2.28.0.236.gb10cc79966-goog
RCU has had deadlocks in the past related to synchronizing in a hotplug
notifier. Typically, this has occurred because timer callbacks did not get
migrated before the CPU hotplug notifier requesting RCU's services is
called. If RCU's grace period processing has a timer callback queued in
the meanwhile, it may never get called causing RCU stalls.
These issues have been fixed by removing such dependencies from grace
period processing, however there are no testing scenarios for such
cases.
This commit therefore reuses rcutorture's existing hotplug notifier to
invoke the flavor-specific synchronize callback. If anything locks up,
we expect stall warnings and/or other splats.
Obviously, we need not test for rcu_barrier from a notifier, since those
are not allowed from notifiers. This fact is already detailed in the
documentation as well.
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/rcutorture.c | 81 +++++++++++++++++++++--------------------
1 file changed, 42 insertions(+), 39 deletions(-)
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 92cb79620939..083b65e4877d 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1645,12 +1645,37 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
read_exit_delay, read_exit_burst);
}
-static int rcutorture_booster_cleanup(unsigned int cpu)
+static bool rcu_torture_can_boost(void)
+{
+ static int boost_warn_once;
+ int prio;
+
+ if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
+ return false;
+
+ prio = rcu_get_gp_kthreads_prio();
+ if (!prio)
+ return false;
+
+ if (prio < 2) {
+ if (boost_warn_once == 1)
+ return false;
+
+ pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
+ boost_warn_once = 1;
+ return false;
+ }
+
+ return true;
+}
+
+static int rcutorture_hp_cleanup(unsigned int cpu)
{
struct task_struct *t;
- if (boost_tasks[cpu] == NULL)
+ if (!rcu_torture_can_boost() || boost_tasks[cpu] == NULL)
return 0;
+
mutex_lock(&boost_mutex);
t = boost_tasks[cpu];
boost_tasks[cpu] = NULL;
@@ -1662,11 +1687,14 @@ static int rcutorture_booster_cleanup(unsigned int cpu)
return 0;
}
-static int rcutorture_booster_init(unsigned int cpu)
+static int rcutorture_hp_init(unsigned int cpu)
{
int retval;
- if (boost_tasks[cpu] != NULL)
+ /* Force synchronizing from hotplug notifier to ensure it is safe. */
+ cur_ops->sync();
+
+ if (!rcu_torture_can_boost() || boost_tasks[cpu] != NULL)
return 0; /* Already created, nothing more to do. */
/* Don't allow time recalculation while creating a new task. */
@@ -2336,30 +2364,6 @@ static void rcu_torture_barrier_cleanup(void)
}
}
-static bool rcu_torture_can_boost(void)
-{
- static int boost_warn_once;
- int prio;
-
- if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
- return false;
-
- prio = rcu_get_gp_kthreads_prio();
- if (!prio)
- return false;
-
- if (prio < 2) {
- if (boost_warn_once == 1)
- return false;
-
- pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
- boost_warn_once = 1;
- return false;
- }
-
- return true;
-}
-
static bool read_exit_child_stop;
static bool read_exit_child_stopped;
static wait_queue_head_t read_exit_wq;
@@ -2503,8 +2507,7 @@ rcu_torture_cleanup(void)
rcutorture_seq_diff(gp_seq, start_gp_seq));
torture_stop_kthread(rcu_torture_stats, stats_task);
torture_stop_kthread(rcu_torture_fqs, fqs_task);
- if (rcu_torture_can_boost())
- cpuhp_remove_state(rcutor_hp);
+ cpuhp_remove_state(rcutor_hp);
/*
* Wait for all RCU callbacks to fire, then do torture-type-specific
@@ -2773,21 +2776,21 @@ rcu_torture_init(void)
if (firsterr)
goto unwind;
}
+
+ firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
+ rcutorture_hp_init,
+ rcutorture_hp_cleanup);
+ if (firsterr < 0)
+ goto unwind;
+ rcutor_hp = firsterr;
+
if (test_boost_interval < 1)
test_boost_interval = 1;
if (test_boost_duration < 2)
test_boost_duration = 2;
- if (rcu_torture_can_boost()) {
-
+ if (rcu_torture_can_boost())
boost_starttime = jiffies + test_boost_interval * HZ;
- firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
- rcutorture_booster_init,
- rcutorture_booster_cleanup);
- if (firsterr < 0)
- goto unwind;
- rcutor_hp = firsterr;
- }
shutdown_jiffies = jiffies + shutdown_secs * HZ;
firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
if (firsterr)
--
2.28.0.236.gb10cc79966-goog
At least since v4.19, the FQS loop no longer reports quiescent states
for offline CPUs unless it is an emergency.
This commit therefore fixes the comment in rcu_gp_init() to match the
current code.
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/tree.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a49fa3b60faa..2fb66cdbfa25 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1701,9 +1701,11 @@ static bool rcu_gp_init(void)
/*
* Apply per-leaf buffered online and offline operations to the
- * rcu_node tree. Note that this new grace period need not wait
- * for subsequent online CPUs, and that quiescent-state forcing
- * will handle subsequent offline CPUs.
+ * rcu_node tree. Note that this new grace period need not wait for
+ * subsequent online CPUs, and that RCU hooks in the CPU offlining
+ * path, when combined with checks in this function, will handle CPUs
+ * that are currently going offline or that go offline later. Refer to
+ * RCU's Requirements documentation about hotplug requirements as well.
*/
rcu_state.gp_state = RCU_GP_ONOFF;
rcu_for_each_leaf_node(rnp) {
--
2.28.0.236.gb10cc79966-goog
On Fri, Aug 7, 2020 at 1:07 PM Joel Fernandes (Google)
<[email protected]> wrote:
>
>
> This series improves/adds to RCU's warnings about CPU hotplug and adds
> documentation and testing.
>
> v3->v4: Minor cleanups.
>
> Joel Fernandes (Google) (5):
> rcu/tree: Add a warning if CPU being onlined did not report QS already
> rcu/tree: Clarify comments about FQS loop reporting quiescent states
> rcu/tree: Make FQS complaining about offline CPU more aggressive
> rcutorture: Force synchronizing of RCU flavor from hotplug notifier
> docs: Update RCU's hotplug requirements with a bit about design
Apologies that the cover-letter subject has the weird 'option-subject'
in it. Likely my git-send script misunderstood one of my markers.
thanks,
- Joel
Hi--
On 8/7/20 10:07 AM, Joel Fernandes (Google) wrote:
> RCU's hotplug design will help understand the requirements an RCU
> implementation needs to fullfill, such as dead-lock avoidance.
>
> The rcu_barrier() section of the "Hotplug CPU" section already talks
> about deadlocks, however the description of what else can deadlock other
> than rcu_barrier is rather incomplete.
>
> This commit therefore continues the section by describing how RCU's
> design handles CPU hotplug in a deadlock-free way.
>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> .../RCU/Design/Requirements/Requirements.rst | 22 +++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
> index 16c64a2eff93..0a4148b9f743 100644
> --- a/Documentation/RCU/Design/Requirements/Requirements.rst
> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst
> @@ -1940,6 +1940,28 @@ deadlock. Furthermore, ``rcu_barrier()`` blocks CPU-hotplug operations
> during its execution, which results in another type of deadlock when
> invoked from a CPU-hotplug notifier.
>
> +Also, RCU's implementation avoids serious deadlocks which could occur due to
> +interaction between hotplug, timers and grace period processing. It does so by
> +maintaining its own bookkeeping of every CPU's hotplug state, independent of
> +the various CPU masks and by reporting quiescent states at explicit points. It
> +may come across as a surprise, but the force quiescent state loop (FQS) does
> +not report quiescent states for offline CPUs and is not required to.
> +
> +For an offline CPU, the quiescent state will be reported in either of:
> +1. During CPU offlining, using RCU's hotplug notifier (``rcu_report_dead()``).
note, uses (), which is good: ()
> +2. During grace period initialization (``rcu_gp_init``) if it detected a race
add for consistency & readability: rcu_gp_init()
> + with CPU offlining, or a race with a task unblocking on a node which
> + previously had all of its CPUs offlined.
> +
> +The CPU onlining path (``rcu_cpu_starting``) does not need to a report
ditto: rcu_cpu_starting()
> +quiescent state for an offline CPU in fact it would trigger a warning if a
Missing something; maybe like so:
for an offline CPU; in fact
> +quiescent state was not already reported for that CPU.
> +
> +During the checking/modification of RCU's hotplug bookkeeping, the
> +corresponding CPU's leaf node lock is held. This avoids race conditions between
> +RCU's hotplug notifier hooks, grace period initialization code and the FQS loop
> +which can concurrently refer to or modify the bookkeeping.
> +
> Scheduler and RCU
> ~~~~~~~~~~~~~~~~~
>
>
cheers.
--
~Randy
On Fri, Aug 07, 2020 at 01:07:18PM -0400, Joel Fernandes (Google) wrote:
> Currently, rcu_cpu_starting() checks to see if the RCU core expects a
> quiescent state from the incoming CPU. However, the current interaction
> between RCU quiescent-state reporting and CPU-hotplug operations should
> mean that the incoming CPU never needs to report a quiescent state.
> First, the outgoing CPU reports a quiescent state if needed. Second,
> the race where the CPU is leaving just as RCU is initializing a new
> grace period is handled by an explicit check for this condition. Third,
> the CPU's leaf rcu_node structure's ->lock serializes these checks.
>
> This means that if rcu_cpu_starting() ever feels the need to report
> a quiescent state, then there is a bug somewhere in the CPU hotplug
> code or the RCU grace-period handling code. This commit therefore
> adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Suggested-by: Paul E. McKenney <[email protected]>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> kernel/rcu/tree.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 65e1b5e92319..a49fa3b60faa 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3996,7 +3996,14 @@ void rcu_cpu_starting(unsigned int cpu)
> rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
> rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> +
> + /*
> + * XXX: The following rcu_report_qs_rnp() is redundant. If the below
> + * warning does not fire, consider replacing it with the "else" block,
> + * by June 2021 or so (while keeping the warning). Refer to RCU's
> + * Requirements documentation for the rationale.
Let's suppose that this change is made, and further that in a year or
two the "if" statement below is replaced with its "else" block.
Now let's suppose that (some years after that) a hard-to-trigger bug
makes its way into RCU's CPU-hotplug code that would have resulted in
the WARN_ON_ONCE() triggering, but that this bug turns out to be not so
hard to trigger in certain large production environments.
Let's suppose further that you have moved on to where you are responsible
for one of these large production environments. How would this
hypothetical RCU/CPU-hotplug bug manifest?
Thanx, Paul
> + */
> + if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
> rcu_disable_urgency_upon_qs(rdp);
> /* Report QS -after- changing ->qsmaskinitnext! */
> rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> --
> 2.28.0.236.gb10cc79966-goog
>
On Fri, Aug 07, 2020 at 01:07:21PM -0400, Joel Fernandes (Google) wrote:
> RCU has had deadlocks in the past related to synchronizing in a hotplug
> notifier. Typically, this has occurred because timer callbacks did not get
> migrated before the CPU hotplug notifier requesting RCU's services is
> called. If RCU's grace period processing has a timer callback queued in
> the meanwhile, it may never get called causing RCU stalls.
>
> These issues have been fixed by removing such dependencies from grace
> period processing, however there are no testing scenarios for such
> cases.
>
> This commit therefore reuses rcutorture's existing hotplug notifier to
> invoke the flavor-specific synchronize callback. If anything locks up,
> we expect stall warnings and/or other splats.
>
> Obviously, we need not test for rcu_barrier from a notifier, since those
> are not allowed from notifiers. This fact is already detailed in the
> documentation as well.
>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
Given that rcutorture_booster_init() is invoked on the CPU in question
only after it is up and running, and that (if I remember correctly)
rcutorture_booster_cleanup() is invoked on the outgoing CPU before it
has really started going away, would this code really have caught that
timer/CPU-hotplug/RCU bug?
Thanx, Paul
> ---
> kernel/rcu/rcutorture.c | 81 +++++++++++++++++++++--------------------
> 1 file changed, 42 insertions(+), 39 deletions(-)
>
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 92cb79620939..083b65e4877d 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -1645,12 +1645,37 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
> read_exit_delay, read_exit_burst);
> }
>
> -static int rcutorture_booster_cleanup(unsigned int cpu)
> +static bool rcu_torture_can_boost(void)
> +{
> + static int boost_warn_once;
> + int prio;
> +
> + if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
> + return false;
> +
> + prio = rcu_get_gp_kthreads_prio();
> + if (!prio)
> + return false;
> +
> + if (prio < 2) {
> + if (boost_warn_once == 1)
> + return false;
> +
> + pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
> + boost_warn_once = 1;
> + return false;
> + }
> +
> + return true;
> +}
> +
> +static int rcutorture_hp_cleanup(unsigned int cpu)
> {
> struct task_struct *t;
>
> - if (boost_tasks[cpu] == NULL)
> + if (!rcu_torture_can_boost() || boost_tasks[cpu] == NULL)
> return 0;
> +
> mutex_lock(&boost_mutex);
> t = boost_tasks[cpu];
> boost_tasks[cpu] = NULL;
> @@ -1662,11 +1687,14 @@ static int rcutorture_booster_cleanup(unsigned int cpu)
> return 0;
> }
>
> -static int rcutorture_booster_init(unsigned int cpu)
> +static int rcutorture_hp_init(unsigned int cpu)
> {
> int retval;
>
> - if (boost_tasks[cpu] != NULL)
> + /* Force synchronizing from hotplug notifier to ensure it is safe. */
> + cur_ops->sync();
> +
> + if (!rcu_torture_can_boost() || boost_tasks[cpu] != NULL)
> return 0; /* Already created, nothing more to do. */
>
> /* Don't allow time recalculation while creating a new task. */
> @@ -2336,30 +2364,6 @@ static void rcu_torture_barrier_cleanup(void)
> }
> }
>
> -static bool rcu_torture_can_boost(void)
> -{
> - static int boost_warn_once;
> - int prio;
> -
> - if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
> - return false;
> -
> - prio = rcu_get_gp_kthreads_prio();
> - if (!prio)
> - return false;
> -
> - if (prio < 2) {
> - if (boost_warn_once == 1)
> - return false;
> -
> - pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
> - boost_warn_once = 1;
> - return false;
> - }
> -
> - return true;
> -}
> -
> static bool read_exit_child_stop;
> static bool read_exit_child_stopped;
> static wait_queue_head_t read_exit_wq;
> @@ -2503,8 +2507,7 @@ rcu_torture_cleanup(void)
> rcutorture_seq_diff(gp_seq, start_gp_seq));
> torture_stop_kthread(rcu_torture_stats, stats_task);
> torture_stop_kthread(rcu_torture_fqs, fqs_task);
> - if (rcu_torture_can_boost())
> - cpuhp_remove_state(rcutor_hp);
> + cpuhp_remove_state(rcutor_hp);
>
> /*
> * Wait for all RCU callbacks to fire, then do torture-type-specific
> @@ -2773,21 +2776,21 @@ rcu_torture_init(void)
> if (firsterr)
> goto unwind;
> }
> +
> + firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
> + rcutorture_hp_init,
> + rcutorture_hp_cleanup);
> + if (firsterr < 0)
> + goto unwind;
> + rcutor_hp = firsterr;
> +
> if (test_boost_interval < 1)
> test_boost_interval = 1;
> if (test_boost_duration < 2)
> test_boost_duration = 2;
> - if (rcu_torture_can_boost()) {
> -
> + if (rcu_torture_can_boost())
> boost_starttime = jiffies + test_boost_interval * HZ;
>
> - firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
> - rcutorture_booster_init,
> - rcutorture_booster_cleanup);
> - if (firsterr < 0)
> - goto unwind;
> - rcutor_hp = firsterr;
> - }
> shutdown_jiffies = jiffies + shutdown_secs * HZ;
> firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
> if (firsterr)
> --
> 2.28.0.236.gb10cc79966-goog
>
Hi Paul,
On Mon, Aug 10, 2020 at 09:19:45AM -0700, Paul E. McKenney wrote:
> On Fri, Aug 07, 2020 at 01:07:21PM -0400, Joel Fernandes (Google) wrote:
> > RCU has had deadlocks in the past related to synchronizing in a hotplug
> > notifier. Typically, this has occurred because timer callbacks did not get
> > migrated before the CPU hotplug notifier requesting RCU's services is
> > called. If RCU's grace period processing has a timer callback queued in
> > the meanwhile, it may never get called causing RCU stalls.
> >
> > These issues have been fixed by removing such dependencies from grace
> > period processing, however there are no testing scenarios for such
> > cases.
> >
> > This commit therefore reuses rcutorture's existing hotplug notifier to
> > invoke the flavor-specific synchronize callback. If anything locks up,
> > we expect stall warnings and/or other splats.
> >
> > Obviously, we need not test for rcu_barrier from a notifier, since those
> > are not allowed from notifiers. This fact is already detailed in the
> > documentation as well.
> >
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
>
> Given that rcutorture_booster_init() is invoked on the CPU in question
> only after it is up and running, and that (if I remember correctly)
> rcutorture_booster_cleanup() is invoked on the outgoing CPU before it
> has really started going away, would this code really have caught that
> timer/CPU-hotplug/RCU bug?
You are right, it would not have caught that particular one because the timer
callbacks would have been migrated by the time the rcutorture_booster_init()
is called.
I still thought it is a good idea anyway to test if the dynamic hotplug
notifiers don't have these issues.
Did you have a better idea on how to test the timer/hotplug/rcu bug?
thanks,
- Joel
> > kernel/rcu/rcutorture.c | 81 +++++++++++++++++++++--------------------
> > 1 file changed, 42 insertions(+), 39 deletions(-)
> >
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index 92cb79620939..083b65e4877d 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -1645,12 +1645,37 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
> > read_exit_delay, read_exit_burst);
> > }
> >
> > -static int rcutorture_booster_cleanup(unsigned int cpu)
> > +static bool rcu_torture_can_boost(void)
> > +{
> > + static int boost_warn_once;
> > + int prio;
> > +
> > + if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
> > + return false;
> > +
> > + prio = rcu_get_gp_kthreads_prio();
> > + if (!prio)
> > + return false;
> > +
> > + if (prio < 2) {
> > + if (boost_warn_once == 1)
> > + return false;
> > +
> > + pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
> > + boost_warn_once = 1;
> > + return false;
> > + }
> > +
> > + return true;
> > +}
> > +
> > +static int rcutorture_hp_cleanup(unsigned int cpu)
> > {
> > struct task_struct *t;
> >
> > - if (boost_tasks[cpu] == NULL)
> > + if (!rcu_torture_can_boost() || boost_tasks[cpu] == NULL)
> > return 0;
> > +
> > mutex_lock(&boost_mutex);
> > t = boost_tasks[cpu];
> > boost_tasks[cpu] = NULL;
> > @@ -1662,11 +1687,14 @@ static int rcutorture_booster_cleanup(unsigned int cpu)
> > return 0;
> > }
> >
> > -static int rcutorture_booster_init(unsigned int cpu)
> > +static int rcutorture_hp_init(unsigned int cpu)
> > {
> > int retval;
> >
> > - if (boost_tasks[cpu] != NULL)
> > + /* Force synchronizing from hotplug notifier to ensure it is safe. */
> > + cur_ops->sync();
> > +
> > + if (!rcu_torture_can_boost() || boost_tasks[cpu] != NULL)
> > return 0; /* Already created, nothing more to do. */
> >
> > /* Don't allow time recalculation while creating a new task. */
> > @@ -2336,30 +2364,6 @@ static void rcu_torture_barrier_cleanup(void)
> > }
> > }
> >
> > -static bool rcu_torture_can_boost(void)
> > -{
> > - static int boost_warn_once;
> > - int prio;
> > -
> > - if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
> > - return false;
> > -
> > - prio = rcu_get_gp_kthreads_prio();
> > - if (!prio)
> > - return false;
> > -
> > - if (prio < 2) {
> > - if (boost_warn_once == 1)
> > - return false;
> > -
> > - pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
> > - boost_warn_once = 1;
> > - return false;
> > - }
> > -
> > - return true;
> > -}
> > -
> > static bool read_exit_child_stop;
> > static bool read_exit_child_stopped;
> > static wait_queue_head_t read_exit_wq;
> > @@ -2503,8 +2507,7 @@ rcu_torture_cleanup(void)
> > rcutorture_seq_diff(gp_seq, start_gp_seq));
> > torture_stop_kthread(rcu_torture_stats, stats_task);
> > torture_stop_kthread(rcu_torture_fqs, fqs_task);
> > - if (rcu_torture_can_boost())
> > - cpuhp_remove_state(rcutor_hp);
> > + cpuhp_remove_state(rcutor_hp);
> >
> > /*
> > * Wait for all RCU callbacks to fire, then do torture-type-specific
> > @@ -2773,21 +2776,21 @@ rcu_torture_init(void)
> > if (firsterr)
> > goto unwind;
> > }
> > +
> > + firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
> > + rcutorture_hp_init,
> > + rcutorture_hp_cleanup);
> > + if (firsterr < 0)
> > + goto unwind;
> > + rcutor_hp = firsterr;
> > +
> > if (test_boost_interval < 1)
> > test_boost_interval = 1;
> > if (test_boost_duration < 2)
> > test_boost_duration = 2;
> > - if (rcu_torture_can_boost()) {
> > -
> > + if (rcu_torture_can_boost())
> > boost_starttime = jiffies + test_boost_interval * HZ;
> >
> > - firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
> > - rcutorture_booster_init,
> > - rcutorture_booster_cleanup);
> > - if (firsterr < 0)
> > - goto unwind;
> > - rcutor_hp = firsterr;
> > - }
> > shutdown_jiffies = jiffies + shutdown_secs * HZ;
> > firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
> > if (firsterr)
> > --
> > 2.28.0.236.gb10cc79966-goog
> >
On Mon, Aug 10, 2020 at 08:46:54AM -0700, Paul E. McKenney wrote:
> On Fri, Aug 07, 2020 at 01:07:18PM -0400, Joel Fernandes (Google) wrote:
> > Currently, rcu_cpu_starting() checks to see if the RCU core expects a
> > quiescent state from the incoming CPU. However, the current interaction
> > between RCU quiescent-state reporting and CPU-hotplug operations should
> > mean that the incoming CPU never needs to report a quiescent state.
> > First, the outgoing CPU reports a quiescent state if needed. Second,
> > the race where the CPU is leaving just as RCU is initializing a new
> > grace period is handled by an explicit check for this condition. Third,
> > the CPU's leaf rcu_node structure's ->lock serializes these checks.
> >
> > This means that if rcu_cpu_starting() ever feels the need to report
> > a quiescent state, then there is a bug somewhere in the CPU hotplug
> > code or the RCU grace-period handling code. This commit therefore
> > adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
> >
> > Cc: Paul E. McKenney <[email protected]>
> > Cc: Neeraj Upadhyay <[email protected]>
> > Suggested-by: Paul E. McKenney <[email protected]>
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > ---
> > kernel/rcu/tree.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 65e1b5e92319..a49fa3b60faa 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3996,7 +3996,14 @@ void rcu_cpu_starting(unsigned int cpu)
> > rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
> > rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> > rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> > - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > +
> > + /*
> > + * XXX: The following rcu_report_qs_rnp() is redundant. If the below
> > + * warning does not fire, consider replacing it with the "else" block,
> > + * by June 2021 or so (while keeping the warning). Refer to RCU's
> > + * Requirements documentation for the rationale.
>
> Let's suppose that this change is made, and further that in a year or
> two the "if" statement below is replaced with its "else" block.
>
> Now let's suppose that (some years after that) a hard-to-trigger bug
> makes its way into RCU's CPU-hotplug code that would have resulted in
> the WARN_ON_ONCE() triggering, but that this bug turns out to be not so
> hard to trigger in certain large production environments.
>
> Let's suppose further that you have moved on to where you are responsible
> for one of these large production environments. How would this
> hypothetical RCU/CPU-hotplug bug manifest?
It could manifest as an RCU stall (after the warning triggers) since RCU
would wait forever.
Were you thinking it is not worth doing this? I thought we wanted to remove
the reundant rcu_report_qs_rnp here to solidify everyone's understanding of
the code and fail early if there's something misunderstood (since such
misunderstanding could mean there are other hidden bugs somewhere). The
counter-argument to that being, making the code robust is more important for
the large production failure scenario where failures are costly.
thanks,
- Joel
> Thanx, Paul
>
> > + */
> > + if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
> > rcu_disable_urgency_upon_qs(rdp);
> > /* Report QS -after- changing ->qsmaskinitnext! */
> > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > --
> > 2.28.0.236.gb10cc79966-goog
> >
On Fri, Aug 07, 2020 at 07:10:38PM -0700, Randy Dunlap wrote:
> Hi--
>
> On 8/7/20 10:07 AM, Joel Fernandes (Google) wrote:
> > RCU's hotplug design will help understand the requirements an RCU
> > implementation needs to fullfill, such as dead-lock avoidance.
> >
> > The rcu_barrier() section of the "Hotplug CPU" section already talks
> > about deadlocks, however the description of what else can deadlock other
> > than rcu_barrier is rather incomplete.
> >
> > This commit therefore continues the section by describing how RCU's
> > design handles CPU hotplug in a deadlock-free way.
> >
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
Thank you Randy for pointing these out. I will fix the nits in the next revision.
Regards,
- Joel
> > .../RCU/Design/Requirements/Requirements.rst | 22 +++++++++++++++++++
> > 1 file changed, 22 insertions(+)
> >
> > diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
> > index 16c64a2eff93..0a4148b9f743 100644
> > --- a/Documentation/RCU/Design/Requirements/Requirements.rst
> > +++ b/Documentation/RCU/Design/Requirements/Requirements.rst
> > @@ -1940,6 +1940,28 @@ deadlock. Furthermore, ``rcu_barrier()`` blocks CPU-hotplug operations
> > during its execution, which results in another type of deadlock when
> > invoked from a CPU-hotplug notifier.
> >
> > +Also, RCU's implementation avoids serious deadlocks which could occur due to
> > +interaction between hotplug, timers and grace period processing. It does so by
> > +maintaining its own bookkeeping of every CPU's hotplug state, independent of
> > +the various CPU masks and by reporting quiescent states at explicit points. It
> > +may come across as a surprise, but the force quiescent state loop (FQS) does
> > +not report quiescent states for offline CPUs and is not required to.
> > +
> > +For an offline CPU, the quiescent state will be reported in either of:
> > +1. During CPU offlining, using RCU's hotplug notifier (``rcu_report_dead()``).
>
> note, uses (), which is good: ()
>
> > +2. During grace period initialization (``rcu_gp_init``) if it detected a race
>
> add for consistency & readability: rcu_gp_init()
>
> > + with CPU offlining, or a race with a task unblocking on a node which
> > + previously had all of its CPUs offlined.
> > +
> > +The CPU onlining path (``rcu_cpu_starting``) does not need to a report
>
> ditto: rcu_cpu_starting()
>
> > +quiescent state for an offline CPU in fact it would trigger a warning if a
>
> Missing something; maybe like so:
>
> for an offline CPU; in fact
>
> > +quiescent state was not already reported for that CPU.
> > +
> > +During the checking/modification of RCU's hotplug bookkeeping, the
> > +corresponding CPU's leaf node lock is held. This avoids race conditions between
> > +RCU's hotplug notifier hooks, grace period initialization code and the FQS loop
> > +which can concurrently refer to or modify the bookkeeping.
> > +
> > Scheduler and RCU
> > ~~~~~~~~~~~~~~~~~
> >
> >
>
> cheers.
> --
> ~Randy
>
On Mon, Aug 10, 2020 at 01:31:09PM -0400, Joel Fernandes wrote:
> Hi Paul,
>
> On Mon, Aug 10, 2020 at 09:19:45AM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 07, 2020 at 01:07:21PM -0400, Joel Fernandes (Google) wrote:
> > > RCU has had deadlocks in the past related to synchronizing in a hotplug
> > > notifier. Typically, this has occurred because timer callbacks did not get
> > > migrated before the CPU hotplug notifier requesting RCU's services is
> > > called. If RCU's grace period processing has a timer callback queued in
> > > the meanwhile, it may never get called causing RCU stalls.
> > >
> > > These issues have been fixed by removing such dependencies from grace
> > > period processing, however there are no testing scenarios for such
> > > cases.
> > >
> > > This commit therefore reuses rcutorture's existing hotplug notifier to
> > > invoke the flavor-specific synchronize callback. If anything locks up,
> > > we expect stall warnings and/or other splats.
> > >
> > > Obviously, we need not test for rcu_barrier from a notifier, since those
> > > are not allowed from notifiers. This fact is already detailed in the
> > > documentation as well.
> > >
> > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> >
> > Given that rcutorture_booster_init() is invoked on the CPU in question
> > only after it is up and running, and that (if I remember correctly)
> > rcutorture_booster_cleanup() is invoked on the outgoing CPU before it
> > has really started going away, would this code really have caught that
> > timer/CPU-hotplug/RCU bug?
>
> You are right, it would not have caught that particular one because the timer
> callbacks would have been migrated by the time the rcutorture_booster_init()
> is called.
>
> I still thought it is a good idea anyway to test if the dynamic hotplug
> notifiers don't have these issues.
>
> Did you have a better idea on how to test the timer/hotplug/rcu bug?
My suggestion would be to place an rcutorture hook in all of the RCU
notifiers that support blocking and that have some possibility of making
this deadlock happen. There are some similar hooks in other parts of RCU.
Thanx, Paul
> thanks,
>
> - Joel
>
>
>
> > > kernel/rcu/rcutorture.c | 81 +++++++++++++++++++++--------------------
> > > 1 file changed, 42 insertions(+), 39 deletions(-)
> > >
> > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > > index 92cb79620939..083b65e4877d 100644
> > > --- a/kernel/rcu/rcutorture.c
> > > +++ b/kernel/rcu/rcutorture.c
> > > @@ -1645,12 +1645,37 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
> > > read_exit_delay, read_exit_burst);
> > > }
> > >
> > > -static int rcutorture_booster_cleanup(unsigned int cpu)
> > > +static bool rcu_torture_can_boost(void)
> > > +{
> > > + static int boost_warn_once;
> > > + int prio;
> > > +
> > > + if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
> > > + return false;
> > > +
> > > + prio = rcu_get_gp_kthreads_prio();
> > > + if (!prio)
> > > + return false;
> > > +
> > > + if (prio < 2) {
> > > + if (boost_warn_once == 1)
> > > + return false;
> > > +
> > > + pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
> > > + boost_warn_once = 1;
> > > + return false;
> > > + }
> > > +
> > > + return true;
> > > +}
> > > +
> > > +static int rcutorture_hp_cleanup(unsigned int cpu)
> > > {
> > > struct task_struct *t;
> > >
> > > - if (boost_tasks[cpu] == NULL)
> > > + if (!rcu_torture_can_boost() || boost_tasks[cpu] == NULL)
> > > return 0;
> > > +
> > > mutex_lock(&boost_mutex);
> > > t = boost_tasks[cpu];
> > > boost_tasks[cpu] = NULL;
> > > @@ -1662,11 +1687,14 @@ static int rcutorture_booster_cleanup(unsigned int cpu)
> > > return 0;
> > > }
> > >
> > > -static int rcutorture_booster_init(unsigned int cpu)
> > > +static int rcutorture_hp_init(unsigned int cpu)
> > > {
> > > int retval;
> > >
> > > - if (boost_tasks[cpu] != NULL)
> > > + /* Force synchronizing from hotplug notifier to ensure it is safe. */
> > > + cur_ops->sync();
> > > +
> > > + if (!rcu_torture_can_boost() || boost_tasks[cpu] != NULL)
> > > return 0; /* Already created, nothing more to do. */
> > >
> > > /* Don't allow time recalculation while creating a new task. */
> > > @@ -2336,30 +2364,6 @@ static void rcu_torture_barrier_cleanup(void)
> > > }
> > > }
> > >
> > > -static bool rcu_torture_can_boost(void)
> > > -{
> > > - static int boost_warn_once;
> > > - int prio;
> > > -
> > > - if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
> > > - return false;
> > > -
> > > - prio = rcu_get_gp_kthreads_prio();
> > > - if (!prio)
> > > - return false;
> > > -
> > > - if (prio < 2) {
> > > - if (boost_warn_once == 1)
> > > - return false;
> > > -
> > > - pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
> > > - boost_warn_once = 1;
> > > - return false;
> > > - }
> > > -
> > > - return true;
> > > -}
> > > -
> > > static bool read_exit_child_stop;
> > > static bool read_exit_child_stopped;
> > > static wait_queue_head_t read_exit_wq;
> > > @@ -2503,8 +2507,7 @@ rcu_torture_cleanup(void)
> > > rcutorture_seq_diff(gp_seq, start_gp_seq));
> > > torture_stop_kthread(rcu_torture_stats, stats_task);
> > > torture_stop_kthread(rcu_torture_fqs, fqs_task);
> > > - if (rcu_torture_can_boost())
> > > - cpuhp_remove_state(rcutor_hp);
> > > + cpuhp_remove_state(rcutor_hp);
> > >
> > > /*
> > > * Wait for all RCU callbacks to fire, then do torture-type-specific
> > > @@ -2773,21 +2776,21 @@ rcu_torture_init(void)
> > > if (firsterr)
> > > goto unwind;
> > > }
> > > +
> > > + firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
> > > + rcutorture_hp_init,
> > > + rcutorture_hp_cleanup);
> > > + if (firsterr < 0)
> > > + goto unwind;
> > > + rcutor_hp = firsterr;
> > > +
> > > if (test_boost_interval < 1)
> > > test_boost_interval = 1;
> > > if (test_boost_duration < 2)
> > > test_boost_duration = 2;
> > > - if (rcu_torture_can_boost()) {
> > > -
> > > + if (rcu_torture_can_boost())
> > > boost_starttime = jiffies + test_boost_interval * HZ;
> > >
> > > - firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
> > > - rcutorture_booster_init,
> > > - rcutorture_booster_cleanup);
> > > - if (firsterr < 0)
> > > - goto unwind;
> > > - rcutor_hp = firsterr;
> > > - }
> > > shutdown_jiffies = jiffies + shutdown_secs * HZ;
> > > firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
> > > if (firsterr)
> > > --
> > > 2.28.0.236.gb10cc79966-goog
> > >
On Mon, Aug 10, 2020 at 01:39:31PM -0400, Joel Fernandes wrote:
> On Mon, Aug 10, 2020 at 08:46:54AM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 07, 2020 at 01:07:18PM -0400, Joel Fernandes (Google) wrote:
> > > Currently, rcu_cpu_starting() checks to see if the RCU core expects a
> > > quiescent state from the incoming CPU. However, the current interaction
> > > between RCU quiescent-state reporting and CPU-hotplug operations should
> > > mean that the incoming CPU never needs to report a quiescent state.
> > > First, the outgoing CPU reports a quiescent state if needed. Second,
> > > the race where the CPU is leaving just as RCU is initializing a new
> > > grace period is handled by an explicit check for this condition. Third,
> > > the CPU's leaf rcu_node structure's ->lock serializes these checks.
> > >
> > > This means that if rcu_cpu_starting() ever feels the need to report
> > > a quiescent state, then there is a bug somewhere in the CPU hotplug
> > > code or the RCU grace-period handling code. This commit therefore
> > > adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
> > >
> > > Cc: Paul E. McKenney <[email protected]>
> > > Cc: Neeraj Upadhyay <[email protected]>
> > > Suggested-by: Paul E. McKenney <[email protected]>
> > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > ---
> > > kernel/rcu/tree.c | 9 ++++++++-
> > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 65e1b5e92319..a49fa3b60faa 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3996,7 +3996,14 @@ void rcu_cpu_starting(unsigned int cpu)
> > > rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
> > > rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> > > rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> > > - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > > +
> > > + /*
> > > + * XXX: The following rcu_report_qs_rnp() is redundant. If the below
> > > + * warning does not fire, consider replacing it with the "else" block,
> > > + * by June 2021 or so (while keeping the warning). Refer to RCU's
> > > + * Requirements documentation for the rationale.
> >
> > Let's suppose that this change is made, and further that in a year or
> > two the "if" statement below is replaced with its "else" block.
> >
> > Now let's suppose that (some years after that) a hard-to-trigger bug
> > makes its way into RCU's CPU-hotplug code that would have resulted in
> > the WARN_ON_ONCE() triggering, but that this bug turns out to be not so
> > hard to trigger in certain large production environments.
> >
> > Let's suppose further that you have moved on to where you are responsible
> > for one of these large production environments. How would this
> > hypothetical RCU/CPU-hotplug bug manifest?
>
> It could manifest as an RCU stall (after the warning triggers) since RCU
> would wait forever.
>
> Were you thinking it is not worth doing this? I thought we wanted to remove
> the reundant rcu_report_qs_rnp here to solidify everyone's understanding of
> the code and fail early if there's something misunderstood (since such
> misunderstanding could mean there are other hidden bugs somewhere). The
> counter-argument to that being, making the code robust is more important for
> the large production failure scenario where failures are costly.
The benefits of removing code that is in theory redundant was my thought
at one point, but sleeping on this several times since has made me much
less favorable to this change. And perhaps my experiences with my new
employer have affected my views on this as well. You never know! ;-)
Thanx, Paul
> thanks,
>
> - Joel
>
>
> > Thanx, Paul
> >
> > > + */
> > > + if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
> > > rcu_disable_urgency_upon_qs(rdp);
> > > /* Report QS -after- changing ->qsmaskinitnext! */
> > > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > > --
> > > 2.28.0.236.gb10cc79966-goog
> > >
On Fri, Aug 07, 2020 at 01:07:19PM -0400, Joel Fernandes (Google) wrote:
> At least since v4.19, the FQS loop no longer reports quiescent states
> for offline CPUs unless it is an emergency.
>
> This commit therefore fixes the comment in rcu_gp_init() to match the
> current code.
>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> kernel/rcu/tree.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index a49fa3b60faa..2fb66cdbfa25 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1701,9 +1701,11 @@ static bool rcu_gp_init(void)
>
> /*
> * Apply per-leaf buffered online and offline operations to the
> - * rcu_node tree. Note that this new grace period need not wait
> - * for subsequent online CPUs, and that quiescent-state forcing
> - * will handle subsequent offline CPUs.
> + * rcu_node tree. Note that this new grace period need not wait for
> + * subsequent online CPUs, and that RCU hooks in the CPU offlining
> + * path, when combined with checks in this function, will handle CPUs
> + * that are currently going offline or that go offline later. Refer to
> + * RCU's Requirements documentation about hotplug requirements as well.
> */
> rcu_state.gp_state = RCU_GP_ONOFF;
> rcu_for_each_leaf_node(rnp) {
Very good! I pulled this in with light edits as shown below.
Please let me know if I messed something up.
Thanx, Paul
------------------------------------------------------------------------
commit a6117399840b963f90cc5322ef9ea7c52de639b2
Author: Joel Fernandes (Google) <[email protected]>
Date: Fri Aug 7 13:07:19 2020 -0400
rcu/tree: Clarify comments about FQS loop reporting quiescent states
Since at least v4.19, the FQS loop no longer reports quiescent states
for offline CPUs except in emergency situations.
This commit therefore fixes the comment in rcu_gp_init() to match the
current code.
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bcc6160..59e1943 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1730,10 +1730,13 @@ static bool rcu_gp_init(void)
raw_spin_unlock_irq_rcu_node(rnp);
/*
- * Apply per-leaf buffered online and offline operations to the
- * rcu_node tree. Note that this new grace period need not wait
- * for subsequent online CPUs, and that quiescent-state forcing
- * will handle subsequent offline CPUs.
+ * Apply per-leaf buffered online and offline operations to
+ * the rcu_node tree. Note that this new grace period need not
+ * wait for subsequent online CPUs, and that RCU hooks in the CPU
+ * offlining path, when combined with checks in this function,
+ * will handle CPUs that are currently going offline or that will
+ * go offline later. Please also refer to "Hotplug CPU" section
+ * of RCU's Requirements documentation.
*/
rcu_state.gp_state = RCU_GP_ONOFF;
rcu_for_each_leaf_node(rnp) {
On Mon, Aug 10, 2020 at 11:06:47AM -0700, Paul E. McKenney wrote:
> On Fri, Aug 07, 2020 at 01:07:19PM -0400, Joel Fernandes (Google) wrote:
> > At least since v4.19, the FQS loop no longer reports quiescent states
> > for offline CPUs unless it is an emergency.
> >
> > This commit therefore fixes the comment in rcu_gp_init() to match the
> > current code.
> >
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > ---
> > kernel/rcu/tree.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index a49fa3b60faa..2fb66cdbfa25 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1701,9 +1701,11 @@ static bool rcu_gp_init(void)
> >
> > /*
> > * Apply per-leaf buffered online and offline operations to the
> > - * rcu_node tree. Note that this new grace period need not wait
> > - * for subsequent online CPUs, and that quiescent-state forcing
> > - * will handle subsequent offline CPUs.
> > + * rcu_node tree. Note that this new grace period need not wait for
> > + * subsequent online CPUs, and that RCU hooks in the CPU offlining
> > + * path, when combined with checks in this function, will handle CPUs
> > + * that are currently going offline or that go offline later. Refer to
> > + * RCU's Requirements documentation about hotplug requirements as well.
> > */
> > rcu_state.gp_state = RCU_GP_ONOFF;
> > rcu_for_each_leaf_node(rnp) {
>
> Very good! I pulled this in with light edits as shown below.
> Please let me know if I messed something up.
Yes, looks good, thanks!
- Joel
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit a6117399840b963f90cc5322ef9ea7c52de639b2
> Author: Joel Fernandes (Google) <[email protected]>
> Date: Fri Aug 7 13:07:19 2020 -0400
>
> rcu/tree: Clarify comments about FQS loop reporting quiescent states
>
> Since at least v4.19, the FQS loop no longer reports quiescent states
> for offline CPUs except in emergency situations.
>
> This commit therefore fixes the comment in rcu_gp_init() to match the
> current code.
>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index bcc6160..59e1943 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1730,10 +1730,13 @@ static bool rcu_gp_init(void)
> raw_spin_unlock_irq_rcu_node(rnp);
>
> /*
> - * Apply per-leaf buffered online and offline operations to the
> - * rcu_node tree. Note that this new grace period need not wait
> - * for subsequent online CPUs, and that quiescent-state forcing
> - * will handle subsequent offline CPUs.
> + * Apply per-leaf buffered online and offline operations to
> + * the rcu_node tree. Note that this new grace period need not
> + * wait for subsequent online CPUs, and that RCU hooks in the CPU
> + * offlining path, when combined with checks in this function,
> + * will handle CPUs that are currently going offline or that will
> + * go offline later. Please also refer to "Hotplug CPU" section
> + * of RCU's Requirements documentation.
> */
> rcu_state.gp_state = RCU_GP_ONOFF;
> rcu_for_each_leaf_node(rnp) {
On Mon, Aug 10, 2020 at 10:57:17AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 10, 2020 at 01:39:31PM -0400, Joel Fernandes wrote:
> > On Mon, Aug 10, 2020 at 08:46:54AM -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 07, 2020 at 01:07:18PM -0400, Joel Fernandes (Google) wrote:
> > > > Currently, rcu_cpu_starting() checks to see if the RCU core expects a
> > > > quiescent state from the incoming CPU. However, the current interaction
> > > > between RCU quiescent-state reporting and CPU-hotplug operations should
> > > > mean that the incoming CPU never needs to report a quiescent state.
> > > > First, the outgoing CPU reports a quiescent state if needed. Second,
> > > > the race where the CPU is leaving just as RCU is initializing a new
> > > > grace period is handled by an explicit check for this condition. Third,
> > > > the CPU's leaf rcu_node structure's ->lock serializes these checks.
> > > >
> > > > This means that if rcu_cpu_starting() ever feels the need to report
> > > > a quiescent state, then there is a bug somewhere in the CPU hotplug
> > > > code or the RCU grace-period handling code. This commit therefore
> > > > adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
> > > >
> > > > Cc: Paul E. McKenney <[email protected]>
> > > > Cc: Neeraj Upadhyay <[email protected]>
> > > > Suggested-by: Paul E. McKenney <[email protected]>
> > > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > > ---
> > > > kernel/rcu/tree.c | 9 ++++++++-
> > > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 65e1b5e92319..a49fa3b60faa 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3996,7 +3996,14 @@ void rcu_cpu_starting(unsigned int cpu)
> > > > rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
> > > > rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> > > > rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> > > > - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > > > +
> > > > + /*
> > > > + * XXX: The following rcu_report_qs_rnp() is redundant. If the below
> > > > + * warning does not fire, consider replacing it with the "else" block,
> > > > + * by June 2021 or so (while keeping the warning). Refer to RCU's
> > > > + * Requirements documentation for the rationale.
> > >
> > > Let's suppose that this change is made, and further that in a year or
> > > two the "if" statement below is replaced with its "else" block.
> > >
> > > Now let's suppose that (some years after that) a hard-to-trigger bug
> > > makes its way into RCU's CPU-hotplug code that would have resulted in
> > > the WARN_ON_ONCE() triggering, but that this bug turns out to be not so
> > > hard to trigger in certain large production environments.
> > >
> > > Let's suppose further that you have moved on to where you are responsible
> > > for one of these large production environments. How would this
> > > hypothetical RCU/CPU-hotplug bug manifest?
> >
> > It could manifest as an RCU stall (after the warning triggers) since RCU
> > would wait forever.
> >
> > Were you thinking it is not worth doing this? I thought we wanted to remove
> > the reundant rcu_report_qs_rnp here to solidify everyone's understanding of
> > the code and fail early if there's something misunderstood (since such
> > misunderstanding could mean there are other hidden bugs somewhere). The
> > counter-argument to that being, making the code robust is more important for
> > the large production failure scenario where failures are costly.
>
> The benefits of removing code that is in theory redundant was my thought
> at one point, but sleeping on this several times since has made me much
> less favorable to this change. And perhaps my experiences with my new
> employer have affected my views on this as well. You never know! ;-)
Can we just keep the warning then, and delete the comments to revisit?
IMHO a comment saying this rcu_report_qs_rnp() is not necessary here but is
done anyway, would be quite useful to a code reader, (with appropriate
comments to point to RCU requirements section and the added warning) :-)
thanks,
- Joel
>
> Thanx, Paul
>
> > thanks,
> >
> > - Joel
> >
> >
> > > Thanx, Paul
> > >
> > > > + */
> > > > + if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
> > > > rcu_disable_urgency_upon_qs(rdp);
> > > > /* Report QS -after- changing ->qsmaskinitnext! */
> > > > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > > > --
> > > > 2.28.0.236.gb10cc79966-goog
> > > >
On Mon, Aug 10, 2020 at 10:54:34AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 10, 2020 at 01:31:09PM -0400, Joel Fernandes wrote:
> > Hi Paul,
> >
> > On Mon, Aug 10, 2020 at 09:19:45AM -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 07, 2020 at 01:07:21PM -0400, Joel Fernandes (Google) wrote:
> > > > RCU has had deadlocks in the past related to synchronizing in a hotplug
> > > > notifier. Typically, this has occurred because timer callbacks did not get
> > > > migrated before the CPU hotplug notifier requesting RCU's services is
> > > > called. If RCU's grace period processing has a timer callback queued in
> > > > the meanwhile, it may never get called causing RCU stalls.
> > > >
> > > > These issues have been fixed by removing such dependencies from grace
> > > > period processing, however there are no testing scenarios for such
> > > > cases.
> > > >
> > > > This commit therefore reuses rcutorture's existing hotplug notifier to
> > > > invoke the flavor-specific synchronize callback. If anything locks up,
> > > > we expect stall warnings and/or other splats.
> > > >
> > > > Obviously, we need not test for rcu_barrier from a notifier, since those
> > > > are not allowed from notifiers. This fact is already detailed in the
> > > > documentation as well.
> > > >
> > > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > >
> > > Given that rcutorture_booster_init() is invoked on the CPU in question
> > > only after it is up and running, and that (if I remember correctly)
> > > rcutorture_booster_cleanup() is invoked on the outgoing CPU before it
> > > has really started going away, would this code really have caught that
> > > timer/CPU-hotplug/RCU bug?
> >
> > You are right, it would not have caught that particular one because the timer
> > callbacks would have been migrated by the time the rcutorture_booster_init()
> > is called.
> >
> > I still thought it is a good idea anyway to test if the dynamic hotplug
> > notifiers don't have these issues.
> >
> > Did you have a better idea on how to test the timer/hotplug/rcu bug?
>
> My suggestion would be to place an rcutorture hook in all of the RCU
> notifiers that support blocking and that have some possibility of making
> this deadlock happen. There are some similar hooks in other parts of RCU.
Sure that's a good idea, I will look into it. Thanks!
- Joel
> Thanx, Paul
>
> > thanks,
> >
> > - Joel
> >
> >
> >
> > > > kernel/rcu/rcutorture.c | 81 +++++++++++++++++++++--------------------
> > > > 1 file changed, 42 insertions(+), 39 deletions(-)
> > > >
> > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > > > index 92cb79620939..083b65e4877d 100644
> > > > --- a/kernel/rcu/rcutorture.c
> > > > +++ b/kernel/rcu/rcutorture.c
> > > > @@ -1645,12 +1645,37 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
> > > > read_exit_delay, read_exit_burst);
> > > > }
> > > >
> > > > -static int rcutorture_booster_cleanup(unsigned int cpu)
> > > > +static bool rcu_torture_can_boost(void)
> > > > +{
> > > > + static int boost_warn_once;
> > > > + int prio;
> > > > +
> > > > + if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
> > > > + return false;
> > > > +
> > > > + prio = rcu_get_gp_kthreads_prio();
> > > > + if (!prio)
> > > > + return false;
> > > > +
> > > > + if (prio < 2) {
> > > > + if (boost_warn_once == 1)
> > > > + return false;
> > > > +
> > > > + pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
> > > > + boost_warn_once = 1;
> > > > + return false;
> > > > + }
> > > > +
> > > > + return true;
> > > > +}
> > > > +
> > > > +static int rcutorture_hp_cleanup(unsigned int cpu)
> > > > {
> > > > struct task_struct *t;
> > > >
> > > > - if (boost_tasks[cpu] == NULL)
> > > > + if (!rcu_torture_can_boost() || boost_tasks[cpu] == NULL)
> > > > return 0;
> > > > +
> > > > mutex_lock(&boost_mutex);
> > > > t = boost_tasks[cpu];
> > > > boost_tasks[cpu] = NULL;
> > > > @@ -1662,11 +1687,14 @@ static int rcutorture_booster_cleanup(unsigned int cpu)
> > > > return 0;
> > > > }
> > > >
> > > > -static int rcutorture_booster_init(unsigned int cpu)
> > > > +static int rcutorture_hp_init(unsigned int cpu)
> > > > {
> > > > int retval;
> > > >
> > > > - if (boost_tasks[cpu] != NULL)
> > > > + /* Force synchronizing from hotplug notifier to ensure it is safe. */
> > > > + cur_ops->sync();
> > > > +
> > > > + if (!rcu_torture_can_boost() || boost_tasks[cpu] != NULL)
> > > > return 0; /* Already created, nothing more to do. */
> > > >
> > > > /* Don't allow time recalculation while creating a new task. */
> > > > @@ -2336,30 +2364,6 @@ static void rcu_torture_barrier_cleanup(void)
> > > > }
> > > > }
> > > >
> > > > -static bool rcu_torture_can_boost(void)
> > > > -{
> > > > - static int boost_warn_once;
> > > > - int prio;
> > > > -
> > > > - if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
> > > > - return false;
> > > > -
> > > > - prio = rcu_get_gp_kthreads_prio();
> > > > - if (!prio)
> > > > - return false;
> > > > -
> > > > - if (prio < 2) {
> > > > - if (boost_warn_once == 1)
> > > > - return false;
> > > > -
> > > > - pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
> > > > - boost_warn_once = 1;
> > > > - return false;
> > > > - }
> > > > -
> > > > - return true;
> > > > -}
> > > > -
> > > > static bool read_exit_child_stop;
> > > > static bool read_exit_child_stopped;
> > > > static wait_queue_head_t read_exit_wq;
> > > > @@ -2503,8 +2507,7 @@ rcu_torture_cleanup(void)
> > > > rcutorture_seq_diff(gp_seq, start_gp_seq));
> > > > torture_stop_kthread(rcu_torture_stats, stats_task);
> > > > torture_stop_kthread(rcu_torture_fqs, fqs_task);
> > > > - if (rcu_torture_can_boost())
> > > > - cpuhp_remove_state(rcutor_hp);
> > > > + cpuhp_remove_state(rcutor_hp);
> > > >
> > > > /*
> > > > * Wait for all RCU callbacks to fire, then do torture-type-specific
> > > > @@ -2773,21 +2776,21 @@ rcu_torture_init(void)
> > > > if (firsterr)
> > > > goto unwind;
> > > > }
> > > > +
> > > > + firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
> > > > + rcutorture_hp_init,
> > > > + rcutorture_hp_cleanup);
> > > > + if (firsterr < 0)
> > > > + goto unwind;
> > > > + rcutor_hp = firsterr;
> > > > +
> > > > if (test_boost_interval < 1)
> > > > test_boost_interval = 1;
> > > > if (test_boost_duration < 2)
> > > > test_boost_duration = 2;
> > > > - if (rcu_torture_can_boost()) {
> > > > -
> > > > + if (rcu_torture_can_boost())
> > > > boost_starttime = jiffies + test_boost_interval * HZ;
> > > >
> > > > - firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
> > > > - rcutorture_booster_init,
> > > > - rcutorture_booster_cleanup);
> > > > - if (firsterr < 0)
> > > > - goto unwind;
> > > > - rcutor_hp = firsterr;
> > > > - }
> > > > shutdown_jiffies = jiffies + shutdown_secs * HZ;
> > > > firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
> > > > if (firsterr)
> > > > --
> > > > 2.28.0.236.gb10cc79966-goog
> > > >
On Mon, Aug 10, 2020 at 03:25:54PM -0400, Joel Fernandes wrote:
> On Mon, Aug 10, 2020 at 10:57:17AM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 10, 2020 at 01:39:31PM -0400, Joel Fernandes wrote:
> > > On Mon, Aug 10, 2020 at 08:46:54AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Aug 07, 2020 at 01:07:18PM -0400, Joel Fernandes (Google) wrote:
> > > > > Currently, rcu_cpu_starting() checks to see if the RCU core expects a
> > > > > quiescent state from the incoming CPU. However, the current interaction
> > > > > between RCU quiescent-state reporting and CPU-hotplug operations should
> > > > > mean that the incoming CPU never needs to report a quiescent state.
> > > > > First, the outgoing CPU reports a quiescent state if needed. Second,
> > > > > the race where the CPU is leaving just as RCU is initializing a new
> > > > > grace period is handled by an explicit check for this condition. Third,
> > > > > the CPU's leaf rcu_node structure's ->lock serializes these checks.
> > > > >
> > > > > This means that if rcu_cpu_starting() ever feels the need to report
> > > > > a quiescent state, then there is a bug somewhere in the CPU hotplug
> > > > > code or the RCU grace-period handling code. This commit therefore
> > > > > adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
> > > > >
> > > > > Cc: Paul E. McKenney <[email protected]>
> > > > > Cc: Neeraj Upadhyay <[email protected]>
> > > > > Suggested-by: Paul E. McKenney <[email protected]>
> > > > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > > > ---
> > > > > kernel/rcu/tree.c | 9 ++++++++-
> > > > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 65e1b5e92319..a49fa3b60faa 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -3996,7 +3996,14 @@ void rcu_cpu_starting(unsigned int cpu)
> > > > > rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
> > > > > rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> > > > > rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> > > > > - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > > > > +
> > > > > + /*
> > > > > + * XXX: The following rcu_report_qs_rnp() is redundant. If the below
> > > > > + * warning does not fire, consider replacing it with the "else" block,
> > > > > + * by June 2021 or so (while keeping the warning). Refer to RCU's
> > > > > + * Requirements documentation for the rationale.
> > > >
> > > > Let's suppose that this change is made, and further that in a year or
> > > > two the "if" statement below is replaced with its "else" block.
> > > >
> > > > Now let's suppose that (some years after that) a hard-to-trigger bug
> > > > makes its way into RCU's CPU-hotplug code that would have resulted in
> > > > the WARN_ON_ONCE() triggering, but that this bug turns out to be not so
> > > > hard to trigger in certain large production environments.
> > > >
> > > > Let's suppose further that you have moved on to where you are responsible
> > > > for one of these large production environments. How would this
> > > > hypothetical RCU/CPU-hotplug bug manifest?
> > >
> > > It could manifest as an RCU stall (after the warning triggers) since RCU
> > > would wait forever.
> > >
> > > Were you thinking it is not worth doing this? I thought we wanted to remove
> > > the reundant rcu_report_qs_rnp here to solidify everyone's understanding of
> > > the code and fail early if there's something misunderstood (since such
> > > misunderstanding could mean there are other hidden bugs somewhere). The
> > > counter-argument to that being, making the code robust is more important for
> > > the large production failure scenario where failures are costly.
> >
> > The benefits of removing code that is in theory redundant was my thought
> > at one point, but sleeping on this several times since has made me much
> > less favorable to this change. And perhaps my experiences with my new
> > employer have affected my views on this as well. You never know! ;-)
>
> Can we just keep the warning then, and delete the comments to revisit?
>
> IMHO a comment saying this rcu_report_qs_rnp() is not necessary here but is
> done anyway, would be quite useful to a code reader, (with appropriate
> comments to point to RCU requirements section and the added warning) :-)
Agreed, the warning does make sense.
Thanx, Paul
> thanks,
>
> - Joel
>
> >
> > Thanx, Paul
> >
> > > thanks,
> > >
> > > - Joel
> > >
> > >
> > > > Thanx, Paul
> > > >
> > > > > + */
> > > > > + if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
> > > > > rcu_disable_urgency_upon_qs(rdp);
> > > > > /* Report QS -after- changing ->qsmaskinitnext! */
> > > > > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > > > > --
> > > > > 2.28.0.236.gb10cc79966-goog
> > > > >
On Fri, Aug 07, 2020 at 01:07:20PM -0400, Joel Fernandes (Google) wrote:
> The FQS loop detecting that an offline CPU has not yet reported a
> quiescent state, is a serious problem. The current interaction between
> RCU quiescent-state reporting and CPU-hotplug operations means that the
> FQS loop should never find that an offline CPU (by RCU's understanding
> of what is offline or not) has not yet reported a quiescent state.
>
> First, the outgoing CPU explicitly reports a quiescent state if needed
> in rcu_report_dead(). Second, the race where the CPU is leaving just as
> RCU is initializing a new grace period is handled by an explicit check
> for this condition in rcu_gp_init(). Third, the CPU's leaf rcu_node
> structure's ->lock serializes these modifications and checks.
>
> At the moment, the FQS loop detections this anomaly but only reports it
> after a second has elapsed.
>
> This commit therefore makes the warning more aggressive such that we fail
> immediately when the FQS loop scan happens and bring the problem to
> everyone's attention.
>
> Light testing with TREE03 and hotplug shows no warnings. Converted the
> warning as well to WARN_ON_ONCE() to reduce log spam.
>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
Looks good, queued, thank you!
I did a bit of editing as shown below. Please let me know if I messed
anything up.
Thanx, Paul
------------------------------------------------------------------------
commit 61b82b349d0089120a9705240ece6ecf2b176fd5
Author: Joel Fernandes (Google) <[email protected]>
Date: Fri Aug 7 13:07:20 2020 -0400
rcu: Make FQS more aggressive in complaining about offline CPUs
The RCU grace-period kthread's force-quiescent state (FQS) loop should
never see an offline CPU that has not yet reported a quiescent state.
After all, the offline CPU should have reported a quiescent state
during the CPU-offline process, or, failing that, by rcu_gp_init()
if it ran concurrently with either the CPU going offline or the last
task on a leaf rcu_node structure exiting its RCU read-side critical
section while all CPUs corresponding to that structure are offline.
The FQS loop should therefore complain if it does see an offline CPU
that has not yet reported a quiescent state.
And it does, but only once the grace period has been in force for a
full second. This commit therefore makes this warning more aggressive,
so that it will trigger as soon as the condition makes its appearance.
Light testing with TREE03 and hotplug shows no warnings. This commit
also converts the warning to WARN_ON_ONCE() in order to stave off possible
log spam.
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 59e1943..f79827b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1215,13 +1215,28 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
return 1;
}
- /* If waiting too long on an offline CPU, complain. */
- if (!(rdp->grpmask & rcu_rnp_online_cpus(rnp)) &&
- time_after(jiffies, rcu_state.gp_start + HZ)) {
+ /*
+ * Complain if a CPU that is considered to be offline from RCU's
+ * perspective has not yet reported a quiescent state. After all,
+ * the offline CPU should have reported a quiescent state during
+ * the CPU-offline process, or, failing that, by rcu_gp_init()
+ * if it ran concurrently with either the CPU going offline or the
+ * last task on a leaf rcu_node structure exiting its RCU read-side
+ * critical section while all CPUs corresponding to that structure
+ * are offline. This added warning detects bugs in any of these
+ * code paths.
+ *
+ * The rcu_node structure's ->lock is held here, which excludes
+ * the relevant portions the CPU-hotplug code, the grace-period
+ * initialization code, and the rcu_read_unlock() code paths.
+ *
+ * For more detail, please refer to the "Hotplug CPU" section
+ * of RCU's Requirements documentation.
+ */
+ if (WARN_ON_ONCE(!(rdp->grpmask & rcu_rnp_online_cpus(rnp)))) {
bool onl;
struct rcu_node *rnp1;
- WARN_ON(1); /* Offline CPUs are supposed to report QS! */
pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n",
__func__, rnp->grplo, rnp->grphi, rnp->level,
(long)rnp->gp_seq, (long)rnp->completedqs);