2022-05-09 03:58:24

by Zqiang

[permalink] [raw]
Subject: [PATCH] rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()

Currently, the rnp's cbovlmask is set in call_rcu(). when CPU going
offline, the outgoing CPU's callbacks is migrated to target CPU, the
number of callbacks on the my_rdp may be overloaded, if overload and
there is no call_rcu() call on target CPU for a long time, the rnp's
cbovldmask is not set in time. in order to fix this situation, add
check_cb_ovld_locked() in rcutree_migrate_callbacks() to help CPU more
quickly reach quiescent states.

Signed-off-by: Zqiang <[email protected]>
---
kernel/rcu/tree.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9dc4c4e82db6..bcc5876c9753 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4577,6 +4577,7 @@ void rcutree_migrate_callbacks(int cpu)
needwake = needwake || rcu_advance_cbs(my_rnp, my_rdp);
rcu_segcblist_disable(&rdp->cblist);
WARN_ON_ONCE(rcu_segcblist_empty(&my_rdp->cblist) != !rcu_segcblist_n_cbs(&my_rdp->cblist));
+ check_cb_ovld_locked(my_rdp, my_rnp);
if (rcu_rdp_is_offloaded(my_rdp)) {
raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */
__call_rcu_nocb_wake(my_rdp, true, flags);
--
2.25.1



2022-05-09 04:14:50

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()

On Fri, May 06, 2022 at 12:43:35PM +0000, Zhang, Qiang1 wrote:
>
> On Fri, May 06, 2022 at 12:40:09AM +0000, Zhang, Qiang1 wrote:
> > On Thu, May 05, 2022 at 11:52:36PM +0800, Zqiang wrote:
> > > Currently, the rnp's cbovlmask is set in call_rcu(). when CPU going
> > > offline, the outgoing CPU's callbacks is migrated to target CPU, the
> > > number of callbacks on the my_rdp may be overloaded, if overload and
> > > there is no call_rcu() call on target CPU for a long time, the rnp's
> > > cbovldmask is not set in time. in order to fix this situation, add
> > > check_cb_ovld_locked() in rcutree_migrate_callbacks() to help CPU
> > > more quickly reach quiescent states.
> > >
> > > Signed-off-by: Zqiang <[email protected]>
> >
> > >Doesn't this get set right at the end of the current grace period?
> > >Given that there is a callback overload, there should be a grace period in progress.
> > >
> > >See this code in rcu_gp_cleanup():
> > >
> > > if (rcu_is_leaf_node(rnp))
> > > for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
> > > rdp = per_cpu_ptr(&rcu_data, cpu);
> > > check_cb_ovld_locked(rdp, rnp);
> > > }
> > >
> > >So what am I missing here? Or are you planning to remove the above code?
> >
> > We only checked the overloaded rdp at the end of current grace period,
> > for my_rdp overloaded cause by migration callbacks to it, if the
> > my_rdp overloaded, and the my_rdp->mynode 's cbovldmask is empty,
> > the my_rdp overloaded may be not checked at end of the current grace period.
> >
> >Very good!
> >
> > I have another question, why don't we call check_cb_ovld_locked() when rdp's n_cbs decreases.
> > for example call check_cb_ovld_locked() in rcu_do_bacth(), not at the end of grace period.
>
> >The idea (as you noted above) is that it gets cleared at the end of each grace period. We could also clear it in rcu_do_batch() as you suggest, but to make that change you would need to convince me that the extra overhead and complexity would provide a useful benefit. This will not be easy. ;-)
>
> > >If so, wouldn't you also need to clear the indication for the CPU that is going offline, being careful to handle the case where the two CPUs have different leaf rcu_node structures?
> >
> > Yes the offline CPU need to clear.
> >
> >But again, the clearing happens at the end of the next grace period.
> >Here we lose (almost) nothing by leaving the bit set because the other bit is set as well.
> >
> >Another question, as long as we brought up rcu_do_batch().
> >
> >Why have the local variable "empty" given that the local variable "count"
> >could be checked against zero?
>
> Thanks for reminding
> I noticed when RCU_NOCB_CPU and DEBUG_OBJECTS_RCU_HEAD is not enable .
> double call call_rcu() will cause the rdp->cblist's len increase, but
> actually, the number of objects in the rdp->cblist has not changed. the
> WARN_ON_ONCE(!IS_ENABLED(CONFIG_RCU_NOCB_CPU) && count != 0 && empty)
> will be triggered.

In this case, the system is probably dead anyway due to the callback being
reused. But good point, this is a case where the counts can diverge.

Let this be a lesson to you. Never invoke call_rcu() on an rcu_head
structure that is already queued waiting for a grace period to elapse. ;-)

> When RCU_NOCB_CPU is enabled, even without double call call_rcu(). due to nocb bypass
> Some objects may be in the rdp->nocb_bypass list, this causes the count to be non-zero
> when the rdp->cblist list is empty.

Exactly! Very good!!!

Thanx, Paul

> >In the meantime, I have queued your commit for v5.20, thank you and good eyes! As always, I could not resist the urge to wordsmith the commit log, so could you please check it for errors?
>
> Thank you very much.
>
> > Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 5c36f04bd460246dd28c178ce5dce6fb02f898e1
> Author: Zqiang <[email protected]>
> Date: Thu May 5 23:52:36 2022 +0800
>
> rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()
>
> Currently, the rcu_node structure's ->cbovlmask field is set in call_rcu()
> when a given CPU is suffering from callback overload. But if that CPU
> goes offline, the outgoing CPU's callbacks is migrated to the running
> CPU, which is likely to overload the running CPU. However, that CPU's
> bit in its leaf rcu_node structure's ->cbovlmask field remains zero.
>
> Initially, this is OK because the outgoing CPU's bit remains set.
> However, that bit will be cleared at the next end of a grace period,
> at which time it is quite possible that the running CPU will still
> be overloaded. If the running CPU invokes call_rcu(), then overload
> will be checked for and the bit will be set. Except that there is no
> guarantee that the running CPU will invoke call_rcu(), in which case the
> next grace period will fail to take the running CPU's overload condition
> into account. Plus, because the bit is not set, the end of the grace
> period won't check for overload on this CPU.
>
> This commit therefore adds a call to check_cb_ovld_locked() in
> check_cb_ovld_locked() to set the running CPU's ->cbovlmask bit
> appropriately.
>
> Signed-off-by: Zqiang <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9dc4c4e82db62..bcc5876c9753b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4577,6 +4577,7 @@ void rcutree_migrate_callbacks(int cpu)
> needwake = needwake || rcu_advance_cbs(my_rnp, my_rdp);
> rcu_segcblist_disable(&rdp->cblist);
> WARN_ON_ONCE(rcu_segcblist_empty(&my_rdp->cblist) != !rcu_segcblist_n_cbs(&my_rdp->cblist));
> + check_cb_ovld_locked(my_rdp, my_rnp);
> if (rcu_rdp_is_offloaded(my_rdp)) {
> raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */
> __call_rcu_nocb_wake(my_rdp, true, flags);