2022-08-30 08:44:02

by Zqiang

[permalink] [raw]
Subject: [PATCH] rcu: Remove duplicate RCU exp QS report from rcu_report_dead()

For PREEMPT_RCU, the rcu_report_dead() is invoked means that the
outgoing CPU mask is clear from leaf rcu_node and has no further
need of RCU, so invoke rcu_preempt_depth() return value is always
zero in rcu_report_dead(), if the current outgoing CPU rcu_data
structure's cpu_no_qs.b.exp is true, the rcu_preempt_deferred_qs()
will invoke rcu_report_exp_rdp() to report exp QS.

for non-PREEMPT_RCU, the rcu_preempt_deferred_qs() is equivalent to
rcu_report_exp_rdp().

Signed-off-by: Zqiang <[email protected]>
---
kernel/rcu/tree.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6bb8e72bc815..0ca21ac0f064 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4276,8 +4276,6 @@ void rcu_report_dead(unsigned int cpu)
// Do any dangling deferred wakeups.
do_nocb_deferred_wakeup(rdp);

- /* QS for any half-done expedited grace period. */
- rcu_report_exp_rdp(rdp);
rcu_preempt_deferred_qs(current);

/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
--
2.25.1


2022-08-30 09:38:46

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: Remove duplicate RCU exp QS report from rcu_report_dead()

On Tue, Aug 30, 2022 at 04:31:51PM +0800, Zqiang wrote:
> For PREEMPT_RCU, the rcu_report_dead() is invoked means that the
> outgoing CPU mask is clear from leaf rcu_node and has no further
> need of RCU, so invoke rcu_preempt_depth() return value is always
> zero in rcu_report_dead(), if the current outgoing CPU rcu_data
> structure's cpu_no_qs.b.exp is true, the rcu_preempt_deferred_qs()
> will invoke rcu_report_exp_rdp() to report exp QS.
>
> for non-PREEMPT_RCU, the rcu_preempt_deferred_qs() is equivalent to
> rcu_report_exp_rdp().
>
> Signed-off-by: Zqiang <[email protected]>

Nice!

One question... Currently, for PREEMPT_RCU, the outgoing CPU silently
reports a quiescent state even if there was a bug that resulted in that
CPU still being in an RCU read-side critical section. With your change,
the outgoing CPU would silently refuse to report a quiescent state.

Is there something along the CPU-offline code path that already complains
about this situation? If not, I believe that the first WARN_ON_ONCE()
in rcu_implicit_dynticks_qs() would complain.

Could you please try this, just so we know what happens in this case?
One way of forcing this would be to do rcu_read_lock() just before the
call to rcu_report_dead(), though other diagnostics might require that
rcu_read_lock() to be earlier in the code.

Another question in both cases... There is a more subtle change where the
old code ignores rdp->cpu_no_qs.b.exp (thus invoking rcu_report_exp_rdp()
unconditionally) and the new code avoids invoking rcu_report_exp_rdp()
unless this is set. How does this interact with a new expedited
grace period that starts just as this CPU calls rcu_report_dead()?
The expedited grace-period code in __sync_rcu_exp_select_node_cpus()
is of special concern here.

Thanx, Paul

> ---
> kernel/rcu/tree.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6bb8e72bc815..0ca21ac0f064 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4276,8 +4276,6 @@ void rcu_report_dead(unsigned int cpu)
> // Do any dangling deferred wakeups.
> do_nocb_deferred_wakeup(rdp);
>
> - /* QS for any half-done expedited grace period. */
> - rcu_report_exp_rdp(rdp);
> rcu_preempt_deferred_qs(current);
>
> /* Remove outgoing CPU from mask in the leaf rcu_node structure. */
> --
> 2.25.1
>

2022-08-30 11:38:39

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH] rcu: Remove duplicate RCU exp QS report from rcu_report_dead()

On Tue, Aug 30, 2022 at 04:31:51PM +0800, Zqiang wrote:
> For PREEMPT_RCU, the rcu_report_dead() is invoked means that the
> outgoing CPU mask is clear from leaf rcu_node and has no further
> need of RCU, so invoke rcu_preempt_depth() return value is always
> zero in rcu_report_dead(), if the current outgoing CPU rcu_data
> structure's cpu_no_qs.b.exp is true, the rcu_preempt_deferred_qs()
> will invoke rcu_report_exp_rdp() to report exp QS.
>
> for non-PREEMPT_RCU, the rcu_preempt_deferred_qs() is equivalent to
> rcu_report_exp_rdp().
>
> Signed-off-by: Zqiang <[email protected]>
>
>Nice!
>
>One question... Currently, for PREEMPT_RCU, the outgoing CPU silently
>reports a quiescent state even if there was a bug that resulted in that
>CPU still being in an RCU read-side critical section. With your change,
>the outgoing CPU would silently refuse to report a quiescent state.
>
>Is there something along the CPU-offline code path that already complains
>about this situation? If not, I believe that the first WARN_ON_ONCE()
>in rcu_implicit_dynticks_qs() would complain.

In the following code, the current CPU will report QS if (rnp->qsmask & mask) return true.
it means that the WARN_ON_ONCE(!rcu_rdp_cpu_online(rdp)) is not trigger.

if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
/* Report quiescent state -before- changing ->qsmaskinitnext! */
rcu_disable_urgency_upon_qs(rdp);
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
raw_spin_lock_irqsave_rcu_node(rnp, flags);
}
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);

>
>Could you please try this, just so we know what happens in this case?
>One way of forcing this would be to do rcu_read_lock() just before the
>call to rcu_report_dead(), though other diagnostics might require that
>rcu_read_lock() to be earlier in the code.
>
>
>Another question in both cases... There is a more subtle change where the
>old code ignores rdp->cpu_no_qs.b.exp (thus invoking rcu_report_exp_rdp()
>unconditionally) and the new code avoids invoking rcu_report_exp_rdp()
>unless this is set. How does this interact with a new expedited
>grace period that starts just as this CPU calls rcu_report_dead()?

1.When a new expedited grace period that starts just as this CPU call rcu_report_dead(),
if in this time, this CPU rcu_data structure's cpu_no_qs.b.exp is not set true,
the rcu_preempt_deferred_qs() will not call rcu_report_exp_rdp(). but when call
rcu_report_dead(), this CPU have been offline(cpu_is_offline(this CPU) return true).

2.In __sync_rcu_exp_select_node_cpus(), invoke smp_call_function_single() for this
CPU will return -ENXIO, and then check (rnp->qsmaskinitnext & mask) and (rnp->expmask & mask)

3.If in this time, the rcu_report_dead() has not yet clear CPU mask from rnp->qsmaskinitnext,
we will redo 2 step, recall smp_call_function_single(), but is always return -ENXIO,
and recheck (rnp->qsmaskinitnext & mask) and (rnp->expmask & mask), until rcu_report_dead()
clear CPU mask from rnp->qsmaskinitnext.

Therefore, the __sync_rcu_exp_select_node_cpus() will call rcu_report_exp_cpu_mult()
to report this offline CPU's exp QS


Thanks
Zqiang

>The expedited grace-period code in __sync_rcu_exp_select_node_cpus()
>is of special concern here.
>
> Thanx, Paul

> ---
> kernel/rcu/tree.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6bb8e72bc815..0ca21ac0f064 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4276,8 +4276,6 @@ void rcu_report_dead(unsigned int cpu)
> // Do any dangling deferred wakeups.
> do_nocb_deferred_wakeup(rdp);
>
> - /* QS for any half-done expedited grace period. */
> - rcu_report_exp_rdp(rdp);
> rcu_preempt_deferred_qs(current);
>
> /* Remove outgoing CPU from mask in the leaf rcu_node structure. */
> --
> 2.25.1
>

2022-08-31 17:17:23

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: Remove duplicate RCU exp QS report from rcu_report_dead()

On Tue, Aug 30, 2022 at 11:30:39AM +0000, Zhang, Qiang1 wrote:
> On Tue, Aug 30, 2022 at 04:31:51PM +0800, Zqiang wrote:
> > For PREEMPT_RCU, the rcu_report_dead() is invoked means that the
> > outgoing CPU mask is clear from leaf rcu_node and has no further
> > need of RCU, so invoke rcu_preempt_depth() return value is always
> > zero in rcu_report_dead(), if the current outgoing CPU rcu_data
> > structure's cpu_no_qs.b.exp is true, the rcu_preempt_deferred_qs()
> > will invoke rcu_report_exp_rdp() to report exp QS.
> >
> > for non-PREEMPT_RCU, the rcu_preempt_deferred_qs() is equivalent to
> > rcu_report_exp_rdp().
> >
> > Signed-off-by: Zqiang <[email protected]>
> >
> >Nice!
> >
> >One question... Currently, for PREEMPT_RCU, the outgoing CPU silently
> >reports a quiescent state even if there was a bug that resulted in that
> >CPU still being in an RCU read-side critical section. With your change,
> >the outgoing CPU would silently refuse to report a quiescent state.
> >
> >Is there something along the CPU-offline code path that already complains
> >about this situation? If not, I believe that the first WARN_ON_ONCE()
> >in rcu_implicit_dynticks_qs() would complain.
>
> In the following code, the current CPU will report QS if (rnp->qsmask & mask) return true.
> it means that the WARN_ON_ONCE(!rcu_rdp_cpu_online(rdp)) is not trigger.
>
> if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
> /* Report quiescent state -before- changing ->qsmaskinitnext! */
> rcu_disable_urgency_upon_qs(rdp);
> rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> raw_spin_lock_irqsave_rcu_node(rnp, flags);
> }
> WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
>
> >
> >Could you please try this, just so we know what happens in this case?
> >One way of forcing this would be to do rcu_read_lock() just before the
> >call to rcu_report_dead(), though other diagnostics might require that
> >rcu_read_lock() to be earlier in the code.
> >
> >
> >Another question in both cases... There is a more subtle change where the
> >old code ignores rdp->cpu_no_qs.b.exp (thus invoking rcu_report_exp_rdp()
> >unconditionally) and the new code avoids invoking rcu_report_exp_rdp()
> >unless this is set. How does this interact with a new expedited
> >grace period that starts just as this CPU calls rcu_report_dead()?
>
> 1.When a new expedited grace period that starts just as this CPU call rcu_report_dead(),
> if in this time, this CPU rcu_data structure's cpu_no_qs.b.exp is not set true,
> the rcu_preempt_deferred_qs() will not call rcu_report_exp_rdp(). but when call
> rcu_report_dead(), this CPU have been offline(cpu_is_offline(this CPU) return true).
>
> 2.In __sync_rcu_exp_select_node_cpus(), invoke smp_call_function_single() for this
> CPU will return -ENXIO, and then check (rnp->qsmaskinitnext & mask) and (rnp->expmask & mask)
>
> 3.If in this time, the rcu_report_dead() has not yet clear CPU mask from rnp->qsmaskinitnext,
> we will redo 2 step, recall smp_call_function_single(), but is always return -ENXIO,
> and recheck (rnp->qsmaskinitnext & mask) and (rnp->expmask & mask), until rcu_report_dead()
> clear CPU mask from rnp->qsmaskinitnext.
>
> Therefore, the __sync_rcu_exp_select_node_cpus() will call rcu_report_exp_cpu_mult()
> to report this offline CPU's exp QS

Very good, thank you!

Please see below for the newly applied patch with the usual wordsmithing.
As always, please let me know if I messed anything up.

Thanx, Paul

------------------------------------------------------------------------

commit d51fbcdfb30dbe6e2e9c5874b9821f2c78092b2e
Author: Zqiang <[email protected]>
Date: Tue Aug 30 16:31:51 2022 +0800

rcu: Remove duplicate RCU exp QS report from rcu_report_dead()

The rcu_report_dead() function invokes rcu_report_exp_rdp() in order
to force an immediate expedited quiescent state on the outgoing
CPU, and then it invokes rcu_preempt_deferred_qs() to provide any
required deferred quiescent state of either sort. Because the call to
rcu_preempt_deferred_qs() provides the expedited RCU quiescent state if
requested, the call to rcu_report_exp_rdp() is potentially redundant.

One possible issue is a concurrent start of a new expedited RCU
grace period, but this situation is already handled correctly
by __sync_rcu_exp_select_node_cpus(). This function will detect
that the CPU is going offline via the error return from its call
to smp_call_function_single(). In that case, it will retry, and
eventually stop retrying due to rcu_report_exp_rdp() clearing the
->qsmaskinitnext bit corresponding to the target CPU. As a result,
__sync_rcu_exp_select_node_cpus() will report the necessary quiescent
state after dealing with any remaining CPU.

This change assumes that control does not enter rcu_report_dead() within
an RCU read-side critical section, but then again, the surviving call
to rcu_preempt_deferred_qs() has always made this assumption.

This commit therefore removes the call to rcu_report_exp_rdp(), thus
relying on rcu_preempt_deferred_qs() to handle both normal and expedited
quiescent states.

Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6bb8e72bc8151..0ca21ac0f0648 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4276,8 +4276,6 @@ void rcu_report_dead(unsigned int cpu)
// Do any dangling deferred wakeups.
do_nocb_deferred_wakeup(rdp);

- /* QS for any half-done expedited grace period. */
- rcu_report_exp_rdp(rdp);
rcu_preempt_deferred_qs(current);

/* Remove outgoing CPU from mask in the leaf rcu_node structure. */