2020-05-13 16:52:03

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

Allow a CPU's rdp to quit the callback offlined mode.
The switch happens on the target with IRQs disabled and rdp->nocb_lock
held to avoid races between local callbacks handling and kthread
offloaded callbacks handling.

nocb_cb kthread is first parked to avoid any future race with
concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
so that the nocb_gp kthread ignores this rdp.

Inspired-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Joel Fernandes <[email protected]>
---
include/linux/rcupdate.h | 2 ++
kernel/rcu/rcu_segcblist.c | 4 +--
kernel/rcu/rcu_segcblist.h | 2 +-
kernel/rcu/tree_plugin.h | 50 +++++++++++++++++++++++++++++++++++++-
4 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 2678a37c3169..1d3a4c37c3c1 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -96,8 +96,10 @@ static inline void rcu_user_exit(void) { }

#ifdef CONFIG_RCU_NOCB_CPU
void rcu_init_nohz(void);
+void rcu_nocb_cpu_deoffload(int cpu);
#else /* #ifdef CONFIG_RCU_NOCB_CPU */
static inline void rcu_init_nohz(void) { }
+static inline void rcu_nocb_cpu_deoffload(int cpu) { }
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */

/**
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index d8ea2bef5574..4bed48da7702 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -171,9 +171,9 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
* Mark the specified rcu_segcblist structure as offloaded. This
* structure must be empty.
*/
-void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
+void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
{
- rsclp->offloaded = 1;
+ rsclp->offloaded = offload;
}
#endif

diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 4c1503a82492..8f7c6c34cb1b 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -83,7 +83,7 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
void rcu_segcblist_init(struct rcu_segcblist *rsclp);
void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
#ifdef CONFIG_RCU_NOCB_CPU
-void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
+void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
#endif
bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index f19e81e0c691..c74a4df8d5f2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1943,6 +1943,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+ if (!rcu_segcblist_is_offloaded(&rdp->cblist)) {
+ raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
+ continue;
+ }
bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
if (bypass_ncbs &&
(time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
@@ -2176,6 +2180,50 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
do_nocb_deferred_wakeup_common(rdp);
}

+static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
+{
+ unsigned long flags;
+ struct rcu_node *rnp = rdp->mynode;
+
+ printk("De-offloading %d\n", rdp->cpu);
+ kthread_park(rdp->nocb_cb_kthread);
+
+ raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+ rcu_nocb_flush_bypass(rdp, NULL, jiffies);
+ raw_spin_lock_rcu_node(rnp);
+ rcu_segcblist_offload(&rdp->cblist, false);
+ raw_spin_unlock_rcu_node(rnp);
+ raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
+}
+
+static long rcu_nocb_rdp_deoffload(void *arg)
+{
+ struct rcu_data *rdp = arg;
+
+ WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
+ __rcu_nocb_rdp_deoffload(rdp);
+
+ return 0;
+}
+
+void rcu_nocb_cpu_deoffload(int cpu)
+{
+ struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
+ mutex_lock(&rcu_state.barrier_mutex);
+ cpus_read_lock();
+ if (rcu_segcblist_is_offloaded(&rdp->cblist)) {
+ if (cpu_online(cpu)) {
+ work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
+ } else {
+ __rcu_nocb_rdp_deoffload(rdp);
+ }
+ cpumask_clear_cpu(cpu, rcu_nocb_mask);
+ }
+ cpus_read_unlock();
+ mutex_unlock(&rcu_state.barrier_mutex);
+}
+
void __init rcu_init_nohz(void)
{
int cpu;
@@ -2218,7 +2266,7 @@ void __init rcu_init_nohz(void)
rdp = per_cpu_ptr(&rcu_data, cpu);
if (rcu_segcblist_empty(&rdp->cblist))
rcu_segcblist_init(&rdp->cblist);
- rcu_segcblist_offload(&rdp->cblist);
+ rcu_segcblist_offload(&rdp->cblist, true);
}
rcu_organize_nocb_kthreads();
}
--
2.25.0


2020-05-13 20:53:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> Allow a CPU's rdp to quit the callback offlined mode.
> The switch happens on the target with IRQs disabled and rdp->nocb_lock
> held to avoid races between local callbacks handling and kthread
> offloaded callbacks handling.
>
> nocb_cb kthread is first parked to avoid any future race with
> concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> so that the nocb_gp kthread ignores this rdp.
>
> Inspired-by: Paul E. McKenney <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Lai Jiangshan <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> ---
> include/linux/rcupdate.h | 2 ++
> kernel/rcu/rcu_segcblist.c | 4 +--
> kernel/rcu/rcu_segcblist.h | 2 +-
> kernel/rcu/tree_plugin.h | 50 +++++++++++++++++++++++++++++++++++++-
> 4 files changed, 54 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 2678a37c3169..1d3a4c37c3c1 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -96,8 +96,10 @@ static inline void rcu_user_exit(void) { }
>
> #ifdef CONFIG_RCU_NOCB_CPU
> void rcu_init_nohz(void);
> +void rcu_nocb_cpu_deoffload(int cpu);
> #else /* #ifdef CONFIG_RCU_NOCB_CPU */
> static inline void rcu_init_nohz(void) { }
> +static inline void rcu_nocb_cpu_deoffload(int cpu) { }
> #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
>
> /**
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index d8ea2bef5574..4bed48da7702 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -171,9 +171,9 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
> * Mark the specified rcu_segcblist structure as offloaded. This
> * structure must be empty.
> */
> -void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
> +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
> {
> - rsclp->offloaded = 1;
> + rsclp->offloaded = offload;
> }
> #endif
>
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 4c1503a82492..8f7c6c34cb1b 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -83,7 +83,7 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
> void rcu_segcblist_init(struct rcu_segcblist *rsclp);
> void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
> #ifdef CONFIG_RCU_NOCB_CPU
> -void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
> #endif
> bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
> bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index f19e81e0c691..c74a4df8d5f2 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1943,6 +1943,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> + if (!rcu_segcblist_is_offloaded(&rdp->cblist)) {
> + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> + continue;
> + }
> bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> if (bypass_ncbs &&
> (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> @@ -2176,6 +2180,50 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
> do_nocb_deferred_wakeup_common(rdp);
> }
>
> +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> +{
> + unsigned long flags;
> + struct rcu_node *rnp = rdp->mynode;
> +
> + printk("De-offloading %d\n", rdp->cpu);
> + kthread_park(rdp->nocb_cb_kthread);

I am a bit concerned about this, because from this point until the
end of this function, no RCU callbacks can be invoked for this CPU.
This could be a problem if there is a callback flood in progress, and
such callback floods are in fact one reason that the sysadm might want
to be switching from offloaded to non-offloaded. Is it possible to
move this kthread_park() to the end of this function? Yes, this can
result in concurrent invocation of different callbacks for this CPU,
but because we have excluded rcu_barrier(), that should be OK.

Or is there some failure mode that I am failing to see? (Wouldn't
be the first time...)

> + raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> + rcu_nocb_flush_bypass(rdp, NULL, jiffies);
> + raw_spin_lock_rcu_node(rnp);
> + rcu_segcblist_offload(&rdp->cblist, false);
> + raw_spin_unlock_rcu_node(rnp);
> + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> +}
> +
> +static long rcu_nocb_rdp_deoffload(void *arg)
> +{
> + struct rcu_data *rdp = arg;
> +
> + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> + __rcu_nocb_rdp_deoffload(rdp);
> +
> + return 0;
> +}

For example, is the problem caused by invocations of this
rcu_nocb_rdp_deoffload() function? But if so, do we really need to
acquire rcu_state.barrier_mutex?

That aside, if it is possible to do the switch without interrupting
callback invocation? Or is your idea that because we are always executing
on the CPU being deoffloaded, that CPU has been prevented from posting
callbacks in any case? If the latter, does that mean that it is not
possible to deoffload offlined CPUs? (Not sure whether this restriction
is a real problem, but figured that I should ask.)

Thanx, Paul

> +void rcu_nocb_cpu_deoffload(int cpu)
> +{
> + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
> + mutex_lock(&rcu_state.barrier_mutex);
> + cpus_read_lock();
> + if (rcu_segcblist_is_offloaded(&rdp->cblist)) {
> + if (cpu_online(cpu)) {
> + work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
> + } else {
> + __rcu_nocb_rdp_deoffload(rdp);
> + }
> + cpumask_clear_cpu(cpu, rcu_nocb_mask);
> + }
> + cpus_read_unlock();
> + mutex_unlock(&rcu_state.barrier_mutex);
> +}
> +
> void __init rcu_init_nohz(void)
> {
> int cpu;
> @@ -2218,7 +2266,7 @@ void __init rcu_init_nohz(void)
> rdp = per_cpu_ptr(&rcu_data, cpu);
> if (rcu_segcblist_empty(&rdp->cblist))
> rcu_segcblist_init(&rdp->cblist);
> - rcu_segcblist_offload(&rdp->cblist);
> + rcu_segcblist_offload(&rdp->cblist, true);
> }
> rcu_organize_nocb_kthreads();
> }
> --
> 2.25.0
>

2020-05-13 22:47:42

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Wed, May 13, 2020 at 11:38:31AM -0700, Paul E. McKenney wrote:
> On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > + unsigned long flags;
> > + struct rcu_node *rnp = rdp->mynode;
> > +
> > + printk("De-offloading %d\n", rdp->cpu);
> > + kthread_park(rdp->nocb_cb_kthread);
>
> I am a bit concerned about this, because from this point until the
> end of this function, no RCU callbacks can be invoked for this CPU.
> This could be a problem if there is a callback flood in progress, and
> such callback floods are in fact one reason that the sysadm might want
> to be switching from offloaded to non-offloaded. Is it possible to
> move this kthread_park() to the end of this function? Yes, this can
> result in concurrent invocation of different callbacks for this CPU,
> but because we have excluded rcu_barrier(), that should be OK.
>
> Or is there some failure mode that I am failing to see? (Wouldn't
> be the first time...)

Heh I actually worried about that. Ok the issue is that it leaves
a window where nocb_cb and local caller of rcu_do_batch() can
compete but the local caller doesn't lock the nocb_lock.

So there are two ways to solve that.:

1) - set cblist->offloaded = 0 locally
- From the kthread while calling rcu_do_batch():
check the value of cblist->offloaded everytime after
we call rcu_nocb_lock() and stop messsing with the
cblist and return when we see cblist->offloaded == 0
- Allow to handle cblist locally without taking the nocb_lock
- Park kthread

But there I'm worried about races. Imagine we have:


Kthread Local
-------- -------
rcu_do_batch() {
rcu_nocb_lock()
do stuff with cblist
rcu_nocb_unlock()
rcu_nocb_lock()
set cblist->offloaded = 0
rcu_nocb_unlock()
===> INT or preemption
rcu_do_batch() {
do stuff with cblist

Are we guaranteed that the Local CPU will see the updates from
the kthread while calling rcu_do_batch()? I would tend to say
yes but I'm not entirely sure...

Oh wait, that solution also implies that we can't re-enqueue
extracted callbacks if we spent took much time in threaded
rcu_do_batch(), as the cblist may have been offloaded while
we executed the extracted callbacks.

That's a lot of corner cases to handle, which is why I prefer
the other solution:

2) enum cblist_offloaded {
CBLIST_NOT_OFFLOADED,
CBLIST_(DE)OFFLOADING,
CBLIST_OFFLOADED
}

- Locally set cblist->offloaded = CBLIST_DEOFFLOADING
- From the kthread while calling rcu_do_batch(), do as
usual.
- Local CPU can call rcu_do_batch() and if it sees CBLIST_DEOFFLOADING,
rcu_nocb_lock() will take the lock.
- Park kthread
- Locally set cblist->offloaded = CBLIST_NOT_OFFLOADED
- Local calls to rcu_do_batch() won't take the lock anymore.


> > +static long rcu_nocb_rdp_deoffload(void *arg)
> > +{
> > + struct rcu_data *rdp = arg;
> > +
> > + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > + __rcu_nocb_rdp_deoffload(rdp);
> > +
> > + return 0;
> > +}
>
> For example, is the problem caused by invocations of this
> rcu_nocb_rdp_deoffload() function?

How so?

> But if so, do we really need to acquire rcu_state.barrier_mutex?

Indeed it was probably not needed if we parked the kthread before
anything, as we would have kept the callbacks ordering.

But now if we allow concurrent callbacks execution during the small
window, we'll need it.

>
> That aside, if it is possible to do the switch without interrupting
> callback invocation? Or is your idea that because we are always executing
> on the CPU being deoffloaded, that CPU has been prevented from posting
> callbacks in any case?

No in the tiny window between kthread_park() and the irqs being disabled,
the workqueue can be preempted and thus call_rcu() can be called anytime.

> If the latter, does that mean that it is not
> possible to deoffload offlined CPUs? (Not sure whether this restriction
> is a real problem, but figured that I should ask.)

Ah in the case of offlined CPUs I simply call the function directly from the CPU
that disables the nocb remotely. So we remotely park the kthread (that we
always do anyway) and set offlined.

Thanks.

2020-05-14 15:49:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Thu, May 14, 2020 at 12:45:26AM +0200, Frederic Weisbecker wrote:
> On Wed, May 13, 2020 at 11:38:31AM -0700, Paul E. McKenney wrote:
> > On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> > > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > > +{
> > > + unsigned long flags;
> > > + struct rcu_node *rnp = rdp->mynode;
> > > +
> > > + printk("De-offloading %d\n", rdp->cpu);
> > > + kthread_park(rdp->nocb_cb_kthread);
> >
> > I am a bit concerned about this, because from this point until the
> > end of this function, no RCU callbacks can be invoked for this CPU.
> > This could be a problem if there is a callback flood in progress, and
> > such callback floods are in fact one reason that the sysadm might want
> > to be switching from offloaded to non-offloaded. Is it possible to
> > move this kthread_park() to the end of this function? Yes, this can
> > result in concurrent invocation of different callbacks for this CPU,
> > but because we have excluded rcu_barrier(), that should be OK.
> >
> > Or is there some failure mode that I am failing to see? (Wouldn't
> > be the first time...)
>
> Heh I actually worried about that. Ok the issue is that it leaves
> a window where nocb_cb and local caller of rcu_do_batch() can
> compete but the local caller doesn't lock the nocb_lock.

Indeed, my nightmare scenario involves some sort of preemption or
similar long delay at that point. Callbacks pile up, and then OOM!

> So there are two ways to solve that.:
>
> 1) - set cblist->offloaded = 0 locally
> - From the kthread while calling rcu_do_batch():
> check the value of cblist->offloaded everytime after
> we call rcu_nocb_lock() and stop messsing with the
> cblist and return when we see cblist->offloaded == 0
> - Allow to handle cblist locally without taking the nocb_lock
> - Park kthread
>
> But there I'm worried about races. Imagine we have:
>
>
> Kthread Local
> -------- -------
> rcu_do_batch() {
> rcu_nocb_lock()
> do stuff with cblist
> rcu_nocb_unlock()
> rcu_nocb_lock()
> set cblist->offloaded = 0
> rcu_nocb_unlock()
> ===> INT or preemption
> rcu_do_batch() {
> do stuff with cblist
>
> Are we guaranteed that the Local CPU will see the updates from
> the kthread while calling rcu_do_batch()? I would tend to say
> yes but I'm not entirely sure...
>
> Oh wait, that solution also implies that we can't re-enqueue
> extracted callbacks if we spent took much time in threaded
> rcu_do_batch(), as the cblist may have been offloaded while
> we executed the extracted callbacks.
>
> That's a lot of corner cases to handle, which is why I prefer
> the other solution:
>
> 2) enum cblist_offloaded {
> CBLIST_NOT_OFFLOADED,
> CBLIST_(DE)OFFLOADING,
> CBLIST_OFFLOADED
> }
>
> - Locally set cblist->offloaded = CBLIST_DEOFFLOADING
> - From the kthread while calling rcu_do_batch(), do as
> usual.
> - Local CPU can call rcu_do_batch() and if it sees CBLIST_DEOFFLOADING,
> rcu_nocb_lock() will take the lock.
> - Park kthread
> - Locally set cblist->offloaded = CBLIST_NOT_OFFLOADED
> - Local calls to rcu_do_batch() won't take the lock anymore.

This last seems best to me. The transition from CBLIST_NOT_OFFLOADED
to CBLIST_OFFLOADING of course needs to be on the CPU in question with
at least bh disabled. Probably best to be holding rcu_nocb_lock(),
but that might just be me being overly paranoid.

> > > +static long rcu_nocb_rdp_deoffload(void *arg)
> > > +{
> > > + struct rcu_data *rdp = arg;
> > > +
> > > + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > > + __rcu_nocb_rdp_deoffload(rdp);
> > > +
> > > + return 0;
> > > +}
> >
> > For example, is the problem caused by invocations of this
> > rcu_nocb_rdp_deoffload() function?
>
> How so?

It looked to me like it wasn't excluding either rcu_barrier() or CPU
hotplug. It might also not have been pinning onto the CPU in question,
but that might just be me misremembering. Then again, I didn't see a
call to it, so maybe its callers set things up appropriately.

OK, I will bite... What is the purpose of rcu_nocb_rdp_deoffload()? ;-)

> > But if so, do we really need to acquire rcu_state.barrier_mutex?
>
> Indeed it was probably not needed if we parked the kthread before
> anything, as we would have kept the callbacks ordering.
>
> But now if we allow concurrent callbacks execution during the small
> window, we'll need it.

Agreed! And I do believe that concurrent callback execution will
prove better than a possibly indefinite gap in callback execution.

> > That aside, if it is possible to do the switch without interrupting
> > callback invocation? Or is your idea that because we are always executing
> > on the CPU being deoffloaded, that CPU has been prevented from posting
> > callbacks in any case?
>
> No in the tiny window between kthread_park() and the irqs being disabled,
> the workqueue can be preempted and thus call_rcu() can be called anytime.

Agreed! ;-)

> > If the latter, does that mean that it is not
> > possible to deoffload offlined CPUs? (Not sure whether this restriction
> > is a real problem, but figured that I should ask.)
>
> Ah in the case of offlined CPUs I simply call the function directly from the CPU
> that disables the nocb remotely. So we remotely park the kthread (that we
> always do anyway) and set offlined.

And the cpus_read_lock() in rcu_nocb_cpu_deoffload() prevents that CPU
from coming back online, so looks good!

Thanx, Paul

2020-05-14 23:30:51

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Thu, May 14, 2020 at 08:47:07AM -0700, Paul E. McKenney wrote:
> On Thu, May 14, 2020 at 12:45:26AM +0200, Frederic Weisbecker wrote:
> This last seems best to me. The transition from CBLIST_NOT_OFFLOADED
> to CBLIST_OFFLOADING of course needs to be on the CPU in question with
> at least bh disabled. Probably best to be holding rcu_nocb_lock(),
> but that might just be me being overly paranoid.

So that's in the case of offloading, right? Well, I don't think we'd
need to even disable bh nor lock nocb. We just need the current CPU
to see the local update of cblist->offloaded = CBLIST_OFFLOADING
before the kthread is unparked:

cblist->offloaded = CBLIST_OFFLOADING;
/* Make sure subsequent softirq lock nocb */
barrier();
kthread_unpark(rdp->nocb_cb_thread);

Now, although that guarantees that nocb_cb will see CBLIST_OFFLOADING
upon unparking, it's not guaranteed that the nocb_gp will see it on its
next round. Ok so eventually you're right, I should indeed lock nocb...

>
> > > > +static long rcu_nocb_rdp_deoffload(void *arg)
> > > > +{
> > > > + struct rcu_data *rdp = arg;
> > > > +
> > > > + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > > > + __rcu_nocb_rdp_deoffload(rdp);
> > > > +
> > > > + return 0;
> > > > +}
> > >
> > > For example, is the problem caused by invocations of this
> > > rcu_nocb_rdp_deoffload() function?
> >
> > How so?
>
> It looked to me like it wasn't excluding either rcu_barrier() or CPU
> hotplug. It might also not have been pinning onto the CPU in question,
> but that might just be me misremembering. Then again, I didn't see a
> call to it, so maybe its callers set things up appropriately.
>
> OK, I will bite... What is the purpose of rcu_nocb_rdp_deoffload()? ;-)

Ah it's called using work_on_cpu() which launch a workqueue on the
target and waits for completion. And that whole thing is protected
inside the barrier mutex and hotplug.

> Agreed! And I do believe that concurrent callback execution will
> prove better than a possibly indefinite gap in callback execution.

Mutual agreement! :-)

Thanks.

2020-05-14 23:36:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Fri, May 15, 2020 at 12:30:23AM +0200, Frederic Weisbecker wrote:
> On Thu, May 14, 2020 at 08:47:07AM -0700, Paul E. McKenney wrote:
> > On Thu, May 14, 2020 at 12:45:26AM +0200, Frederic Weisbecker wrote:
> > This last seems best to me. The transition from CBLIST_NOT_OFFLOADED
> > to CBLIST_OFFLOADING of course needs to be on the CPU in question with
> > at least bh disabled. Probably best to be holding rcu_nocb_lock(),
> > but that might just be me being overly paranoid.
>
> So that's in the case of offloading, right? Well, I don't think we'd
> need to even disable bh nor lock nocb. We just need the current CPU
> to see the local update of cblist->offloaded = CBLIST_OFFLOADING
> before the kthread is unparked:
>
> cblist->offloaded = CBLIST_OFFLOADING;
> /* Make sure subsequent softirq lock nocb */
> barrier();
> kthread_unpark(rdp->nocb_cb_thread);
>
> Now, although that guarantees that nocb_cb will see CBLIST_OFFLOADING
> upon unparking, it's not guaranteed that the nocb_gp will see it on its
> next round. Ok so eventually you're right, I should indeed lock nocb...

I suspect that our future selves would hate us much less if we held
that lock. ;-)

> > > > > +static long rcu_nocb_rdp_deoffload(void *arg)
> > > > > +{
> > > > > + struct rcu_data *rdp = arg;
> > > > > +
> > > > > + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > > > > + __rcu_nocb_rdp_deoffload(rdp);
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > >
> > > > For example, is the problem caused by invocations of this
> > > > rcu_nocb_rdp_deoffload() function?
> > >
> > > How so?
> >
> > It looked to me like it wasn't excluding either rcu_barrier() or CPU
> > hotplug. It might also not have been pinning onto the CPU in question,
> > but that might just be me misremembering. Then again, I didn't see a
> > call to it, so maybe its callers set things up appropriately.
> >
> > OK, I will bite... What is the purpose of rcu_nocb_rdp_deoffload()? ;-)
>
> Ah it's called using work_on_cpu() which launch a workqueue on the
> target and waits for completion. And that whole thing is protected
> inside the barrier mutex and hotplug.

Ah! Yet again, color me blind.

Thanx, Paul

> > Agreed! And I do believe that concurrent callback execution will
> > prove better than a possibly indefinite gap in callback execution.
>
> Mutual agreement! :-)
>
> Thanks.

2020-05-14 23:40:10

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Thu, May 14, 2020 at 03:47:35PM -0700, Paul E. McKenney wrote:
> On Fri, May 15, 2020 at 12:30:23AM +0200, Frederic Weisbecker wrote:
> > On Thu, May 14, 2020 at 08:47:07AM -0700, Paul E. McKenney wrote:
> > > On Thu, May 14, 2020 at 12:45:26AM +0200, Frederic Weisbecker wrote:
> > > This last seems best to me. The transition from CBLIST_NOT_OFFLOADED
> > > to CBLIST_OFFLOADING of course needs to be on the CPU in question with
> > > at least bh disabled. Probably best to be holding rcu_nocb_lock(),
> > > but that might just be me being overly paranoid.
> >
> > So that's in the case of offloading, right? Well, I don't think we'd
> > need to even disable bh nor lock nocb. We just need the current CPU
> > to see the local update of cblist->offloaded = CBLIST_OFFLOADING
> > before the kthread is unparked:
> >
> > cblist->offloaded = CBLIST_OFFLOADING;
> > /* Make sure subsequent softirq lock nocb */
> > barrier();
> > kthread_unpark(rdp->nocb_cb_thread);
> >
> > Now, although that guarantees that nocb_cb will see CBLIST_OFFLOADING
> > upon unparking, it's not guaranteed that the nocb_gp will see it on its
> > next round. Ok so eventually you're right, I should indeed lock nocb...
>
> I suspect that our future selves would hate us much less if we held
> that lock. ;-)

Also, taking the decision to hold that lock could teach a lesson to our
past selves. Win-win! Let us become that most welcome time bridge!

2020-05-27 00:55:02

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> Allow a CPU's rdp to quit the callback offlined mode.

nit: s/offlined/offloaded/ ?

> The switch happens on the target with IRQs disabled and rdp->nocb_lock
> held to avoid races between local callbacks handling and kthread
> offloaded callbacks handling.
> nocb_cb kthread is first parked to avoid any future race with
> concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> so that the nocb_gp kthread ignores this rdp.

nit: you mean cblist is set to non-offloaded mode right?

Also, could you clarify better the rcu_barrier bits in the changelog. I know
there's some issue if the cblist has both offloaded and non-offloaded
callbacks, but it would be good to clarify this here better IMHO.

[...]
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index d8ea2bef5574..4bed48da7702 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -171,9 +171,9 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
> * Mark the specified rcu_segcblist structure as offloaded. This
> * structure must be empty.
> */
> -void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
> +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
> {
> - rsclp->offloaded = 1;
> + rsclp->offloaded = offload;
> }
> #endif
>
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 4c1503a82492..8f7c6c34cb1b 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -83,7 +83,7 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
> void rcu_segcblist_init(struct rcu_segcblist *rsclp);
> void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
> #ifdef CONFIG_RCU_NOCB_CPU
> -void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
> #endif
> bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
> bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index f19e81e0c691..c74a4df8d5f2 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1943,6 +1943,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> + if (!rcu_segcblist_is_offloaded(&rdp->cblist)) {
> + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> + continue;
> + }
> bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> if (bypass_ncbs &&
> (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> @@ -2176,6 +2180,50 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
> do_nocb_deferred_wakeup_common(rdp);
> }
>
> +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> +{
> + unsigned long flags;
> + struct rcu_node *rnp = rdp->mynode;
> +
> + printk("De-offloading %d\n", rdp->cpu);

nit: s/printk/pr_debug/ ?

thanks,

- Joel

> + kthread_park(rdp->nocb_cb_kthread);
> +
> + raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> + rcu_nocb_flush_bypass(rdp, NULL, jiffies);
> + raw_spin_lock_rcu_node(rnp);
> + rcu_segcblist_offload(&rdp->cblist, false);
> + raw_spin_unlock_rcu_node(rnp);
> + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> +}
> +
> +static long rcu_nocb_rdp_deoffload(void *arg)
> +{
> + struct rcu_data *rdp = arg;
> +
> + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> + __rcu_nocb_rdp_deoffload(rdp);
> +
> + return 0;
> +}
> +
> +void rcu_nocb_cpu_deoffload(int cpu)
> +{
> + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
> + mutex_lock(&rcu_state.barrier_mutex);
> + cpus_read_lock();
> + if (rcu_segcblist_is_offloaded(&rdp->cblist)) {
> + if (cpu_online(cpu)) {
> + work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
> + } else {
> + __rcu_nocb_rdp_deoffload(rdp);
> + }
> + cpumask_clear_cpu(cpu, rcu_nocb_mask);
> + }
> + cpus_read_unlock();
> + mutex_unlock(&rcu_state.barrier_mutex);
> +}
> +
> void __init rcu_init_nohz(void)
> {
> int cpu;
> @@ -2218,7 +2266,7 @@ void __init rcu_init_nohz(void)
> rdp = per_cpu_ptr(&rcu_data, cpu);
> if (rcu_segcblist_empty(&rdp->cblist))
> rcu_segcblist_init(&rdp->cblist);
> - rcu_segcblist_offload(&rdp->cblist);
> + rcu_segcblist_offload(&rdp->cblist, true);
> }
> rcu_organize_nocb_kthreads();
> }
> --
> 2.25.0
>

2020-05-27 09:23:43

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:

> > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > held to avoid races between local callbacks handling and kthread
> > offloaded callbacks handling.
> > nocb_cb kthread is first parked to avoid any future race with
> > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > so that the nocb_gp kthread ignores this rdp.
>
> nit: you mean cblist is set to non-offloaded mode right?
>
> Also, could you clarify better the rcu_barrier bits in the changelog. I know
> there's some issue if the cblist has both offloaded and non-offloaded
> callbacks, but it would be good to clarify this here better IMHO.

And for archival purposes: rcu_barrier needs excluding here because it is
possible that for a brief period of time, the callback kthread has been
parked to do the mode-switch, and it could be executing a bunch of callbacks
when it was asked to park.

Meanwhile, more interrupts happen and more callbacks are queued which are now
executing in softirq. This ruins the ordering of callbacks that rcu_barrier
needs.

thanks,

- Joel


>
> [...]
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index d8ea2bef5574..4bed48da7702 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -171,9 +171,9 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
> > * Mark the specified rcu_segcblist structure as offloaded. This
> > * structure must be empty.
> > */
> > -void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
> > +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
> > {
> > - rsclp->offloaded = 1;
> > + rsclp->offloaded = offload;
> > }
> > #endif
> >
> > diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> > index 4c1503a82492..8f7c6c34cb1b 100644
> > --- a/kernel/rcu/rcu_segcblist.h
> > +++ b/kernel/rcu/rcu_segcblist.h
> > @@ -83,7 +83,7 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
> > void rcu_segcblist_init(struct rcu_segcblist *rsclp);
> > void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
> > #ifdef CONFIG_RCU_NOCB_CPU
> > -void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> > +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
> > #endif
> > bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
> > bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index f19e81e0c691..c74a4df8d5f2 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1943,6 +1943,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> > trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> > raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > + if (!rcu_segcblist_is_offloaded(&rdp->cblist)) {
> > + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > + continue;
> > + }
> > bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > if (bypass_ncbs &&
> > (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> > @@ -2176,6 +2180,50 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
> > do_nocb_deferred_wakeup_common(rdp);
> > }
> >
> > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > + unsigned long flags;
> > + struct rcu_node *rnp = rdp->mynode;
> > +
> > + printk("De-offloading %d\n", rdp->cpu);
>
> nit: s/printk/pr_debug/ ?
>
> thanks,
>
> - Joel
>
> > + kthread_park(rdp->nocb_cb_kthread);
> > +
> > + raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > + rcu_nocb_flush_bypass(rdp, NULL, jiffies);
> > + raw_spin_lock_rcu_node(rnp);
> > + rcu_segcblist_offload(&rdp->cblist, false);
> > + raw_spin_unlock_rcu_node(rnp);
> > + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > +}
> > +
> > +static long rcu_nocb_rdp_deoffload(void *arg)
> > +{
> > + struct rcu_data *rdp = arg;
> > +
> > + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > + __rcu_nocb_rdp_deoffload(rdp);
> > +
> > + return 0;
> > +}
> > +
> > +void rcu_nocb_cpu_deoffload(int cpu)
> > +{
> > + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > +
> > + mutex_lock(&rcu_state.barrier_mutex);
> > + cpus_read_lock();
> > + if (rcu_segcblist_is_offloaded(&rdp->cblist)) {
> > + if (cpu_online(cpu)) {
> > + work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
> > + } else {
> > + __rcu_nocb_rdp_deoffload(rdp);
> > + }
> > + cpumask_clear_cpu(cpu, rcu_nocb_mask);
> > + }
> > + cpus_read_unlock();
> > + mutex_unlock(&rcu_state.barrier_mutex);
> > +}
> > +
> > void __init rcu_init_nohz(void)
> > {
> > int cpu;
> > @@ -2218,7 +2266,7 @@ void __init rcu_init_nohz(void)
> > rdp = per_cpu_ptr(&rcu_data, cpu);
> > if (rcu_segcblist_empty(&rdp->cblist))
> > rcu_segcblist_init(&rdp->cblist);
> > - rcu_segcblist_offload(&rdp->cblist);
> > + rcu_segcblist_offload(&rdp->cblist, true);
> > }
> > rcu_organize_nocb_kthreads();
> > }
> > --
> > 2.25.0
> >

2020-06-04 13:15:13

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Tue, May 26, 2020 at 06:49:08PM -0400, Joel Fernandes wrote:
> On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:
>
> > > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > > held to avoid races between local callbacks handling and kthread
> > > offloaded callbacks handling.
> > > nocb_cb kthread is first parked to avoid any future race with
> > > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > > so that the nocb_gp kthread ignores this rdp.
> >
> > nit: you mean cblist is set to non-offloaded mode right?
> >
> > Also, could you clarify better the rcu_barrier bits in the changelog. I know
> > there's some issue if the cblist has both offloaded and non-offloaded
> > callbacks, but it would be good to clarify this here better IMHO.
>
> And for archival purposes: rcu_barrier needs excluding here because it is
> possible that for a brief period of time, the callback kthread has been
> parked to do the mode-switch, and it could be executing a bunch of callbacks
> when it was asked to park.
>
> Meanwhile, more interrupts happen and more callbacks are queued which are now
> executing in softirq. This ruins the ordering of callbacks that rcu_barrier
> needs.

I think in that case the callbacks would still be executed in order. We wait
for the kthread to park before switching to softirq callback execution.

Initially it was to avoid callback ordering issues but I don't recall
exactly which. Maybe it wasn't actually needed. But anyway I'll keep it
for the next version where, for a brief period of time, nocb kthread will
be able to compete with callback execution in softirq.

I'll clarify that in the changelog.

Thanks.

2020-06-04 13:17:59

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:
> On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> > Allow a CPU's rdp to quit the callback offlined mode.
>
> nit: s/offlined/offloaded/ ?

Oh, looks like I did that everywhere :)

>
> > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > held to avoid races between local callbacks handling and kthread
> > offloaded callbacks handling.
> > nocb_cb kthread is first parked to avoid any future race with
> > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > so that the nocb_gp kthread ignores this rdp.
>
> nit: you mean cblist is set to non-offloaded mode right?

Ah right!

> > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > +{
> > + unsigned long flags;
> > + struct rcu_node *rnp = rdp->mynode;
> > +
> > + printk("De-offloading %d\n", rdp->cpu);
>
> nit: s/printk/pr_debug/ ?

Ok.

Thanks!

2020-06-11 01:36:26

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Thu, Jun 04, 2020 at 03:10:30PM +0200, Frederic Weisbecker wrote:
> On Tue, May 26, 2020 at 06:49:08PM -0400, Joel Fernandes wrote:
> > On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:
> >
> > > > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > > > held to avoid races between local callbacks handling and kthread
> > > > offloaded callbacks handling.
> > > > nocb_cb kthread is first parked to avoid any future race with
> > > > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > > > so that the nocb_gp kthread ignores this rdp.
> > >
> > > nit: you mean cblist is set to non-offloaded mode right?
> > >
> > > Also, could you clarify better the rcu_barrier bits in the changelog. I know
> > > there's some issue if the cblist has both offloaded and non-offloaded
> > > callbacks, but it would be good to clarify this here better IMHO.
> >
> > And for archival purposes: rcu_barrier needs excluding here because it is
> > possible that for a brief period of time, the callback kthread has been
> > parked to do the mode-switch, and it could be executing a bunch of callbacks
> > when it was asked to park.
> >
> > Meanwhile, more interrupts happen and more callbacks are queued which are now
> > executing in softirq. This ruins the ordering of callbacks that rcu_barrier
> > needs.
>
> I think in that case the callbacks would still be executed in order. We wait
> for the kthread to park before switching to softirq callback execution.

Ah ok, you are parking the CB kthread after the no-cb CB's are already
invoked (that's when parkme() is called -- i.e. after rcu_do_batch() in the
CB kthread runs).

Yeah, I don't see the purpose of acquiring rcu_barrier mutex either now. Once
you park, all CBs should have been invoked by the nocb CB thread right?
kthread_park() waits for the thread to be parked before proceeding. And you
don't de-offload before it is parked.

> Initially it was to avoid callback ordering issues but I don't recall
> exactly which. Maybe it wasn't actually needed. But anyway I'll keep it
> for the next version where, for a brief period of time, nocb kthread will
> be able to compete with callback execution in softirq.

Which nocb kthread is competing? Do you mean GP or CB?

Either way, could you clarify how does softirqs compete? Until the thread is
parked, you wouldn't de-offload. And once you de-offload, only then the
softirq would be executing callbacks. So at any point of time, it is
either the CB kthread executing CBs or the softirq executing CBs, not both.
Or did I miss something?

thanks,

- Joel


> I'll clarify that in the changelog.
>
> Thanks.

2020-06-11 17:05:55

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

On Wed, Jun 10, 2020 at 09:32:03PM -0400, Joel Fernandes wrote:
> On Thu, Jun 04, 2020 at 03:10:30PM +0200, Frederic Weisbecker wrote:
> > On Tue, May 26, 2020 at 06:49:08PM -0400, Joel Fernandes wrote:
> > > On Tue, May 26, 2020 at 05:20:17PM -0400, Joel Fernandes wrote:
> > >
> > > > > The switch happens on the target with IRQs disabled and rdp->nocb_lock
> > > > > held to avoid races between local callbacks handling and kthread
> > > > > offloaded callbacks handling.
> > > > > nocb_cb kthread is first parked to avoid any future race with
> > > > > concurrent rcu_do_batch() executions. Then the cblist is set to offloaded
> > > > > so that the nocb_gp kthread ignores this rdp.
> > > >
> > > > nit: you mean cblist is set to non-offloaded mode right?
> > > >
> > > > Also, could you clarify better the rcu_barrier bits in the changelog. I know
> > > > there's some issue if the cblist has both offloaded and non-offloaded
> > > > callbacks, but it would be good to clarify this here better IMHO.
> > >
> > > And for archival purposes: rcu_barrier needs excluding here because it is
> > > possible that for a brief period of time, the callback kthread has been
> > > parked to do the mode-switch, and it could be executing a bunch of callbacks
> > > when it was asked to park.
> > >
> > > Meanwhile, more interrupts happen and more callbacks are queued which are now
> > > executing in softirq. This ruins the ordering of callbacks that rcu_barrier
> > > needs.
> >
> > I think in that case the callbacks would still be executed in order. We wait
> > for the kthread to park before switching to softirq callback execution.
>
> Ah ok, you are parking the CB kthread after the no-cb CB's are already
> invoked (that's when parkme() is called -- i.e. after rcu_do_batch() in the
> CB kthread runs).
>
> Yeah, I don't see the purpose of acquiring rcu_barrier mutex either now. Once
> you park, all CBs should have been invoked by the nocb CB thread right?
> kthread_park() waits for the thread to be parked before proceeding. And you
> don't de-offload before it is parked.

We absolutely must execute callbacks out of order in order to avoid
OOM due to RCU callback floods. This is because if we don't execute
callbacks out of order, there will be a time when we are not executing
callbacks at all. If execution gets preempted at this point, it is
quite possibly game over due to OOM.

Thanx, Paul

> > Initially it was to avoid callback ordering issues but I don't recall
> > exactly which. Maybe it wasn't actually needed. But anyway I'll keep it
> > for the next version where, for a brief period of time, nocb kthread will
> > be able to compete with callback execution in softirq.
>
> Which nocb kthread is competing? Do you mean GP or CB?
>
> Either way, could you clarify how does softirqs compete? Until the thread is
> parked, you wouldn't de-offload. And once you de-offload, only then the
> softirq would be executing callbacks. So at any point of time, it is
> either the CB kthread executing CBs or the softirq executing CBs, not both.
> Or did I miss something?
>
> thanks,
>
> - Joel
>
>
> > I'll clarify that in the changelog.
> >
> > Thanks.