2022-12-20 11:31:38

by Zqiang

[permalink] [raw]
Subject: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask

For the kernel bulit with CONFIG_NO_HZ_FULL enabled and the following
cpus is nohz_full cpus:

CPU1 CPU2
rcu_report_exp_cpu_mult synchronize_rcu_expedited_wait
acquires rnp->lock mask = rnp->expmask;
for_each_leaf_node_cpu_mask(rnp, cpu, mask)
rnp->expmask = rnp->expmask & ~mask; rdp = per_cpu_ptr(&rcu_data, cpu1);
for_each_leaf_node_cpu_mask(rnp, cpu, mask)
rdp = per_cpu_ptr(&rcu_data, cpu1);
if (!rdp->rcu_forced_tick_exp)
continue; rdp->rcu_forced_tick_exp = true;
tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);

In the above scenario, after CPU1 reported the quiescent state, CPU1
misses the opportunity to clear the TICK_DEP_BIT_RCU_EXP bitmask, it
will not be cleared until the next expedited grace period starts and
the CPU1 quiescent state is reported again. during this window period,
the CPU1 whose tick can not be stopped, if CPU1 has only one runnable
task and this task has aggressive real-time response constraints, this
task may have one of the worst response times.

Therefore, this commit add rnp->lock when set TICK_DEP_BIT_RCU_EXP
bitmask to fix this race.

Signed-off-by: Zqiang <[email protected]>
---
kernel/rcu/tree_exp.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 927abaf6c822..e5fe0099488b 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -593,6 +593,7 @@ static void synchronize_rcu_expedited_wait(void)
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_node *rnp_root = rcu_get_root();
+ unsigned long flags;

trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
jiffies_stall = rcu_exp_jiffies_till_stall_check();
@@ -601,17 +602,17 @@ static void synchronize_rcu_expedited_wait(void)
if (synchronize_rcu_expedited_wait_once(1))
return;
rcu_for_each_leaf_node(rnp) {
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
mask = READ_ONCE(rnp->expmask);
for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
rdp = per_cpu_ptr(&rcu_data, cpu);
if (rdp->rcu_forced_tick_exp)
continue;
rdp->rcu_forced_tick_exp = true;
- preempt_disable();
if (cpu_online(cpu))
tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
- preempt_enable();
}
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
j = READ_ONCE(jiffies_till_first_fqs);
if (synchronize_rcu_expedited_wait_once(j + HZ))
--
2.25.1


2022-12-21 20:21:24

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask

On Tue, Dec 20, 2022 at 07:25:20PM +0800, Zqiang wrote:
> For the kernel bulit with CONFIG_NO_HZ_FULL enabled and the following
> cpus is nohz_full cpus:
>
> CPU1 CPU2
> rcu_report_exp_cpu_mult synchronize_rcu_expedited_wait
> acquires rnp->lock mask = rnp->expmask;
> for_each_leaf_node_cpu_mask(rnp, cpu, mask)
> rnp->expmask = rnp->expmask & ~mask; rdp = per_cpu_ptr(&rcu_data, cpu1);
> for_each_leaf_node_cpu_mask(rnp, cpu, mask)
> rdp = per_cpu_ptr(&rcu_data, cpu1);
> if (!rdp->rcu_forced_tick_exp)
> continue; rdp->rcu_forced_tick_exp = true;
> tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
>
> In the above scenario, after CPU1 reported the quiescent state, CPU1
> misses the opportunity to clear the TICK_DEP_BIT_RCU_EXP bitmask, it
> will not be cleared until the next expedited grace period starts and
> the CPU1 quiescent state is reported again. during this window period,
> the CPU1 whose tick can not be stopped, if CPU1 has only one runnable
> task and this task has aggressive real-time response constraints, this
> task may have one of the worst response times.
>
> Therefore, this commit add rnp->lock when set TICK_DEP_BIT_RCU_EXP
> bitmask to fix this race.
>
> Signed-off-by: Zqiang <[email protected]>

Good eyes, thank you!!!

Queued for testing and further review as follows, as always, please
check for errors.

Thanx, Paul

------------------------------------------------------------------------

commit acfe689f2e473fb59b6d2c95af5fe36198bb9a84
Author: Zqiang <[email protected]>
Date: Tue Dec 20 19:25:20 2022 +0800

rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race

For kernels built with CONFIG_NO_HZ_FULL=y, the following scenario can result
in the scheduling-clock interrupt remaining enabled on a holdout CPU after
its quiescent state has been reported:

CPU1 CPU2
rcu_report_exp_cpu_mult synchronize_rcu_expedited_wait
acquires rnp->lock mask = rnp->expmask;
for_each_leaf_node_cpu_mask(rnp, cpu, mask)
rnp->expmask = rnp->expmask & ~mask; rdp = per_cpu_ptr(&rcu_data, cpu1);
for_each_leaf_node_cpu_mask(rnp, cpu, mask)
rdp = per_cpu_ptr(&rcu_data, cpu1);
if (!rdp->rcu_forced_tick_exp)
continue; rdp->rcu_forced_tick_exp = true;
tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);

The problem is that CPU2's sampling of rnp->expmask is obsolete by the
time it invokes tick_dep_set_cpu(), and CPU1 is not guaranteed to see
CPU2's store to ->rcu_forced_tick_exp in time to clear it. And even if
CPU1 does see that store, it might invoke tick_dep_clear_cpu() before
CPU2 got around to executing its tick_dep_set_cpu(), which would still
leave the victim CPU with its scheduler-clock tick running.

Either way, an nohz_full real-time application running on the victim
CPU would have its latency needlessly degraded.

Note that expedited RCU grace periods look at context-tracking
information, and so if the CPU is executing in nohz_full usermode
throughout, that CPU cannot be victimized in this manner.

This commit therefore causes synchronize_rcu_expedited_wait to hold
the rcu_node structure's ->lock when checking for holdout CPUs, setting
TICK_DEP_BIT_RCU_EXP, and invoking tick_dep_set_cpu(), thus preventing
this race.

Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 249c2967d9e6c..7cc4856da0817 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_node *rnp_root = rcu_get_root();
+ unsigned long flags;

trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
jiffies_stall = rcu_exp_jiffies_till_stall_check();
@@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
if (synchronize_rcu_expedited_wait_once(1))
return;
rcu_for_each_leaf_node(rnp) {
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
mask = READ_ONCE(rnp->expmask);
for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
rdp = per_cpu_ptr(&rcu_data, cpu);
if (rdp->rcu_forced_tick_exp)
continue;
rdp->rcu_forced_tick_exp = true;
- preempt_disable();
if (cpu_online(cpu))
tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
- preempt_enable();
}
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
j = READ_ONCE(jiffies_till_first_fqs);
if (synchronize_rcu_expedited_wait_once(j + HZ))

2022-12-22 10:08:03

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask

> For the kernel bulit with CONFIG_NO_HZ_FULL enabled and the following
> cpus is nohz_full cpus:
>
> CPU1 CPU2
> rcu_report_exp_cpu_mult synchronize_rcu_expedited_wait
> acquires rnp->lock mask = rnp->expmask;
> for_each_leaf_node_cpu_mask(rnp, cpu, mask)
> rnp->expmask = rnp->expmask & ~mask; rdp = per_cpu_ptr(&rcu_data, cpu1);
> for_each_leaf_node_cpu_mask(rnp, cpu, mask)
> rdp = per_cpu_ptr(&rcu_data, cpu1);
> if (!rdp->rcu_forced_tick_exp)
> continue; rdp->rcu_forced_tick_exp = true;
>
> tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
>
> In the above scenario, after CPU1 reported the quiescent state, CPU1
> misses the opportunity to clear the TICK_DEP_BIT_RCU_EXP bitmask, it
> will not be cleared until the next expedited grace period starts and
> the CPU1 quiescent state is reported again. during this window period,
> the CPU1 whose tick can not be stopped, if CPU1 has only one runnable
> task and this task has aggressive real-time response constraints, this
> task may have one of the worst response times.
>
> Therefore, this commit add rnp->lock when set TICK_DEP_BIT_RCU_EXP
> bitmask to fix this race.
>
> Signed-off-by: Zqiang <[email protected]>
>
>Good eyes, thank you!!!
>
>Queued for testing and further review as follows, as always, please check for errors.
>

It looks more clear now, thank you!

Thanks
Zqiang

> Thanx, Paul
>
>------------------------------------------------------------------------

commit acfe689f2e473fb59b6d2c95af5fe36198bb9a84
Author: Zqiang <[email protected]>
Date: Tue Dec 20 19:25:20 2022 +0800

rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race

For kernels built with CONFIG_NO_HZ_FULL=y, the following scenario can result
in the scheduling-clock interrupt remaining enabled on a holdout CPU after
its quiescent state has been reported:

CPU1 CPU2
rcu_report_exp_cpu_mult synchronize_rcu_expedited_wait
acquires rnp->lock mask = rnp->expmask;
for_each_leaf_node_cpu_mask(rnp, cpu, mask)
rnp->expmask = rnp->expmask & ~mask; rdp = per_cpu_ptr(&rcu_data, cpu1);
for_each_leaf_node_cpu_mask(rnp, cpu, mask)
rdp = per_cpu_ptr(&rcu_data, cpu1);
if (!rdp->rcu_forced_tick_exp)
continue; rdp->rcu_forced_tick_exp = true;
tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);

The problem is that CPU2's sampling of rnp->expmask is obsolete by the
time it invokes tick_dep_set_cpu(), and CPU1 is not guaranteed to see
CPU2's store to ->rcu_forced_tick_exp in time to clear it. And even if
CPU1 does see that store, it might invoke tick_dep_clear_cpu() before
CPU2 got around to executing its tick_dep_set_cpu(), which would still
leave the victim CPU with its scheduler-clock tick running.

Either way, an nohz_full real-time application running on the victim
CPU would have its latency needlessly degraded.

Note that expedited RCU grace periods look at context-tracking
information, and so if the CPU is executing in nohz_full usermode
throughout, that CPU cannot be victimized in this manner.

This commit therefore causes synchronize_rcu_expedited_wait to hold
the rcu_node structure's ->lock when checking for holdout CPUs, setting
TICK_DEP_BIT_RCU_EXP, and invoking tick_dep_set_cpu(), thus preventing
this race.

Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 249c2967d9e6c..7cc4856da0817 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_node *rnp_root = rcu_get_root();
+ unsigned long flags;

trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
jiffies_stall = rcu_exp_jiffies_till_stall_check();
@@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
if (synchronize_rcu_expedited_wait_once(1))
return;
rcu_for_each_leaf_node(rnp) {
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
mask = READ_ONCE(rnp->expmask);
for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
rdp = per_cpu_ptr(&rcu_data, cpu);
if (rdp->rcu_forced_tick_exp)
continue;
rdp->rcu_forced_tick_exp = true;
- preempt_disable();
if (cpu_online(cpu))
tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
- preempt_enable();
}
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
j = READ_ONCE(jiffies_till_first_fqs);
if (synchronize_rcu_expedited_wait_once(j + HZ))

2022-12-22 15:52:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask

On Thu, Dec 22, 2022 at 09:48:14AM +0000, Zhang, Qiang1 wrote:
> > For the kernel bulit with CONFIG_NO_HZ_FULL enabled and the following
> > cpus is nohz_full cpus:
> >
> > CPU1 CPU2
> > rcu_report_exp_cpu_mult synchronize_rcu_expedited_wait
> > acquires rnp->lock mask = rnp->expmask;
> > for_each_leaf_node_cpu_mask(rnp, cpu, mask)
> > rnp->expmask = rnp->expmask & ~mask; rdp = per_cpu_ptr(&rcu_data, cpu1);
> > for_each_leaf_node_cpu_mask(rnp, cpu, mask)
> > rdp = per_cpu_ptr(&rcu_data, cpu1);
> > if (!rdp->rcu_forced_tick_exp)
> > continue; rdp->rcu_forced_tick_exp = true;
> >
> > tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
> >
> > In the above scenario, after CPU1 reported the quiescent state, CPU1
> > misses the opportunity to clear the TICK_DEP_BIT_RCU_EXP bitmask, it
> > will not be cleared until the next expedited grace period starts and
> > the CPU1 quiescent state is reported again. during this window period,
> > the CPU1 whose tick can not be stopped, if CPU1 has only one runnable
> > task and this task has aggressive real-time response constraints, this
> > task may have one of the worst response times.
> >
> > Therefore, this commit add rnp->lock when set TICK_DEP_BIT_RCU_EXP
> > bitmask to fix this race.
> >
> > Signed-off-by: Zqiang <[email protected]>
> >
> >Good eyes, thank you!!!
> >
> >Queued for testing and further review as follows, as always, please check for errors.
> >
>
> It looks more clear now, thank you!

Thank you for checking them both!

Thanx, Paul

> Thanks
> Zqiang
>
> > Thanx, Paul
> >
> >------------------------------------------------------------------------
>
> commit acfe689f2e473fb59b6d2c95af5fe36198bb9a84
> Author: Zqiang <[email protected]>
> Date: Tue Dec 20 19:25:20 2022 +0800
>
> rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
>
> For kernels built with CONFIG_NO_HZ_FULL=y, the following scenario can result
> in the scheduling-clock interrupt remaining enabled on a holdout CPU after
> its quiescent state has been reported:
>
> CPU1 CPU2
> rcu_report_exp_cpu_mult synchronize_rcu_expedited_wait
> acquires rnp->lock mask = rnp->expmask;
> for_each_leaf_node_cpu_mask(rnp, cpu, mask)
> rnp->expmask = rnp->expmask & ~mask; rdp = per_cpu_ptr(&rcu_data, cpu1);
> for_each_leaf_node_cpu_mask(rnp, cpu, mask)
> rdp = per_cpu_ptr(&rcu_data, cpu1);
> if (!rdp->rcu_forced_tick_exp)
> continue; rdp->rcu_forced_tick_exp = true;
> tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
>
> The problem is that CPU2's sampling of rnp->expmask is obsolete by the
> time it invokes tick_dep_set_cpu(), and CPU1 is not guaranteed to see
> CPU2's store to ->rcu_forced_tick_exp in time to clear it. And even if
> CPU1 does see that store, it might invoke tick_dep_clear_cpu() before
> CPU2 got around to executing its tick_dep_set_cpu(), which would still
> leave the victim CPU with its scheduler-clock tick running.
>
> Either way, an nohz_full real-time application running on the victim
> CPU would have its latency needlessly degraded.
>
> Note that expedited RCU grace periods look at context-tracking
> information, and so if the CPU is executing in nohz_full usermode
> throughout, that CPU cannot be victimized in this manner.
>
> This commit therefore causes synchronize_rcu_expedited_wait to hold
> the rcu_node structure's ->lock when checking for holdout CPUs, setting
> TICK_DEP_BIT_RCU_EXP, and invoking tick_dep_set_cpu(), thus preventing
> this race.
>
> Signed-off-by: Zqiang <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 249c2967d9e6c..7cc4856da0817 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
> struct rcu_data *rdp;
> struct rcu_node *rnp;
> struct rcu_node *rnp_root = rcu_get_root();
> + unsigned long flags;
>
> trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
> jiffies_stall = rcu_exp_jiffies_till_stall_check();
> @@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
> if (synchronize_rcu_expedited_wait_once(1))
> return;
> rcu_for_each_leaf_node(rnp) {
> + raw_spin_lock_irqsave_rcu_node(rnp, flags);
> mask = READ_ONCE(rnp->expmask);
> for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
> rdp = per_cpu_ptr(&rcu_data, cpu);
> if (rdp->rcu_forced_tick_exp)
> continue;
> rdp->rcu_forced_tick_exp = true;
> - preempt_disable();
> if (cpu_online(cpu))
> tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
> - preempt_enable();
> }
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> }
> j = READ_ONCE(jiffies_till_first_fqs);
> if (synchronize_rcu_expedited_wait_once(j + HZ))

2022-12-31 18:29:17

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask

On Wed, Dec 21, 2022 at 12:08:49PM -0800, Paul E. McKenney wrote:
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 249c2967d9e6c..7cc4856da0817 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
> struct rcu_data *rdp;
> struct rcu_node *rnp;
> struct rcu_node *rnp_root = rcu_get_root();
> + unsigned long flags;
>
> trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
> jiffies_stall = rcu_exp_jiffies_till_stall_check();
> @@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
> if (synchronize_rcu_expedited_wait_once(1))
> return;
> rcu_for_each_leaf_node(rnp) {
> + raw_spin_lock_irqsave_rcu_node(rnp, flags);
> mask = READ_ONCE(rnp->expmask);
> for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
> rdp = per_cpu_ptr(&rcu_data, cpu);
> if (rdp->rcu_forced_tick_exp)
> continue;
> rdp->rcu_forced_tick_exp = true;
> - preempt_disable();
> if (cpu_online(cpu))
> tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
> - preempt_enable();
> }
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> }
> j = READ_ONCE(jiffies_till_first_fqs);
> if (synchronize_rcu_expedited_wait_once(j + HZ))

Reviewed-by: Frederic Weisbecker <[email protected]>

BTW why are we forcing the tick on the whole node?
And shouldn't we set the tick dependency from rcu_exp_handler() instead?

Thanks.

2022-12-31 19:30:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask

On Sat, Dec 31, 2022 at 07:25:08PM +0100, Frederic Weisbecker wrote:
> On Wed, Dec 21, 2022 at 12:08:49PM -0800, Paul E. McKenney wrote:
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 249c2967d9e6c..7cc4856da0817 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
> > struct rcu_data *rdp;
> > struct rcu_node *rnp;
> > struct rcu_node *rnp_root = rcu_get_root();
> > + unsigned long flags;
> >
> > trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
> > jiffies_stall = rcu_exp_jiffies_till_stall_check();
> > @@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
> > if (synchronize_rcu_expedited_wait_once(1))
> > return;
> > rcu_for_each_leaf_node(rnp) {
> > + raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > mask = READ_ONCE(rnp->expmask);
> > for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
> > rdp = per_cpu_ptr(&rcu_data, cpu);
> > if (rdp->rcu_forced_tick_exp)
> > continue;
> > rdp->rcu_forced_tick_exp = true;
> > - preempt_disable();
> > if (cpu_online(cpu))
> > tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
> > - preempt_enable();
> > }
> > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > }
> > j = READ_ONCE(jiffies_till_first_fqs);
> > if (synchronize_rcu_expedited_wait_once(j + HZ))
>
> Reviewed-by: Frederic Weisbecker <[email protected]>

Thank you!

> BTW why are we forcing the tick on the whole node?

Now that you mention it, that would be more precise.

> And shouldn't we set the tick dependency from rcu_exp_handler() instead?

Because it never occurred to me to check whether this could be invoked
from an interrupt handler? ;-)

But that does sound like it might be a better approach.

Zqiang, would you be willing to look into this?

Thanx, Paul

2023-01-01 10:23:54

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask

> > >On Sat, Dec 31, 2022 at 07:25:08PM +0100, Frederic Weisbecker wrote:
> On Wed, Dec 21, 2022 at 12:08:49PM -0800, Paul E. McKenney wrote:
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index
> > 249c2967d9e6c..7cc4856da0817 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
> > struct rcu_data *rdp;
> > struct rcu_node *rnp;
> > struct rcu_node *rnp_root = rcu_get_root();
> > + unsigned long flags;
> >
> > trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
> > jiffies_stall = rcu_exp_jiffies_till_stall_check();
> > @@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
> > if (synchronize_rcu_expedited_wait_once(1))
> > return;
> > rcu_for_each_leaf_node(rnp) {
> > + raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > mask = READ_ONCE(rnp->expmask);
> > for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
> > rdp = per_cpu_ptr(&rcu_data, cpu);
> > if (rdp->rcu_forced_tick_exp)
> > continue;
> > rdp->rcu_forced_tick_exp = true;
> > - preempt_disable();
> > if (cpu_online(cpu))
> > tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
> > - preempt_enable();
> > }
> > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > }
> > j = READ_ONCE(jiffies_till_first_fqs);
> > if (synchronize_rcu_expedited_wait_once(j + HZ))
>
> Reviewed-by: Frederic Weisbecker <[email protected]>
>
>Thank you!
>
> BTW why are we forcing the tick on the whole node?
>
>Now that you mention it, that would be more precise.
>
> And shouldn't we set the tick dependency from rcu_exp_handler() instead?
>
>Because it never occurred to me to check whether this could be invoked from an interrupt handler? ;-)
>
>But that does sound like it might be a better approach.
>
>Zqiang, would you be willing to look into this?


Yes, and I have a question, we forcing the tick dependency because the expedited grace period
is not end for the first time, this means that it is not to set the tick dependency every time.
if we set the tick dependency in rcu_exp_handler(), does this mean that every time the expedited
grace period starts the tick dependency will be set unconditionally ?

Thoughts ?

Thanks
Zqiang

>
> Thanx, Paul

2023-01-03 12:16:53

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask

On Sun, Jan 01, 2023 at 09:41:57AM +0000, Zhang, Qiang1 wrote:
> > > >On Sat, Dec 31, 2022 at 07:25:08PM +0100, Frederic Weisbecker wrote:
>
> Yes, and I have a question, we forcing the tick dependency because the expedited grace period
> is not end for the first time, this means that it is not to set the tick dependency every time.
> if we set the tick dependency in rcu_exp_handler(), does this mean that every time the expedited
> grace period starts the tick dependency will be set unconditionally ?
>
> Thoughts ?

Only if rcu_exp_handler() fails to report a quiescent state. Then it means we
must poll on the CPU looking for a future one.

In fact the tick dependency should be set when rdp->cpu_no_qs.b.exp is set to
true and cleared when rdp->cpu_no_qs.b.exp is set to false.

Thanks.

>
> Thanks
> Zqiang
>
> >
> > Thanx, Paul