2021-12-11 17:01:57

by Neeraj Upadhyay

[permalink] [raw]
Subject: [PATCH v2] rcu/nocb: Handle concurrent nocb kthreads creation

When multiple CPUs in the same nocb gp/cb group concurrently
come online, they might try to concurrently create the same
rcuog kthread. Fix this by using nocb gp CPU's spawn mutex to
provide mutual exclusion for the rcuog kthread creation code.

Signed-off-by: Neeraj Upadhyay <[email protected]>
---
Change in v2:
Fix missing mutex_unlock in nocb gp kthread creation err path.

kernel/rcu/tree.h | 2 ++
kernel/rcu/tree_nocb.h | 7 ++++++-
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4b889081f4f4..9815b7844e58 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -204,6 +204,8 @@ struct rcu_data {
int nocb_defer_wakeup; /* Defer wakeup of nocb_kthread. */
struct timer_list nocb_timer; /* Enforce finite deferral. */
unsigned long nocb_gp_adv_time; /* Last call_rcu() CB adv (jiffies). */
+ struct mutex nocb_gp_kthread_mutex; /* Exclusion for nocb gp kthread */
+ /* spawning */

/* The following fields are used by call_rcu, hence own cacheline. */
raw_spinlock_t nocb_bypass_lock ____cacheline_internodealigned_in_smp;
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index eeafb546a7a0..dd4753219dca 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1226,6 +1226,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
raw_spin_lock_init(&rdp->nocb_gp_lock);
timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0);
rcu_cblist_init(&rdp->nocb_bypass);
+ mutex_init(&rdp->nocb_gp_kthread_mutex);
}

/*
@@ -1248,13 +1249,17 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)

/* If we didn't spawn the GP kthread first, reorganize! */
rdp_gp = rdp->nocb_gp_rdp;
+ mutex_lock(&rdp_gp->nocb_gp_kthread_mutex);
if (!rdp_gp->nocb_gp_kthread) {
t = kthread_run(rcu_nocb_gp_kthread, rdp_gp,
"rcuog/%d", rdp_gp->cpu);
- if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__))
+ if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
+ mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
return;
+ }
WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
}
+ mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);

/* Spawn the kthread for this CPU. */
t = kthread_run(rcu_nocb_cb_kthread, rdp,
--
2.17.1



2021-12-13 08:18:39

by David Woodhouse

[permalink] [raw]
Subject: Re: [EXTERNAL] [PATCH v2] rcu/nocb: Handle concurrent nocb kthreads creation

On Sat, 2021-12-11 at 22:31 +0530, Neeraj Upadhyay wrote:
> When multiple CPUs in the same nocb gp/cb group concurrently
> come online, they might try to concurrently create the same
> rcuog kthread. Fix this by using nocb gp CPU's spawn mutex to
> provide mutual exclusion for the rcuog kthread creation code.
>
> Signed-off-by: Neeraj Upadhyay <[email protected]>
> ---
> Change in v2:
> Fix missing mutex_unlock in nocb gp kthread creation err path.

I think this ends up being not strictly necessary in the short term too
because we aren't currently planning to run rcutree_prepare_cpu()
concurrently anyway. But harmless and worth fixing in the longer term.

Although, if I've already added a mutex for adding the boost thread,
could we manage to use the *same* mutex instead of adding another one?

Acked-by: David Woodhouse <[email protected]>
+ mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
> return;
> + }
> WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
> }
> + mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
>
> /* Spawn the kthread for this CPU. */

Some whitespace damage there.


Attachments:
smime.p7s (5.05 kB)

2021-12-13 08:55:41

by Neeraj Upadhyay

[permalink] [raw]
Subject: Re: [PATCH v2] rcu/nocb: Handle concurrent nocb kthreads creation

Hi David,

Thanks for the review; some replies inline.

On 12/13/2021 1:48 PM, David Woodhouse wrote:
> On Sat, 2021-12-11 at 22:31 +0530, Neeraj Upadhyay wrote:
>> When multiple CPUs in the same nocb gp/cb group concurrently
>> come online, they might try to concurrently create the same
>> rcuog kthread. Fix this by using nocb gp CPU's spawn mutex to
>> provide mutual exclusion for the rcuog kthread creation code.
>>
>> Signed-off-by: Neeraj Upadhyay <[email protected]>
>> ---
>> Change in v2:
>> Fix missing mutex_unlock in nocb gp kthread creation err path.
>
> I think this ends up being not strictly necessary in the short term too
> because we aren't currently planning to run rcutree_prepare_cpu()
> concurrently anyway. But harmless and worth fixing in the longer term.
>
> Although, if I've already added a mutex for adding the boost thread,
> could we manage to use the *same* mutex instead of adding another one?
>

Let me think about it; the nocb-gp and nocb-cb kthreads are grouped
based on rcu_nocb_gp_stride; whereas, boost kthreads are per rnp. So, I
need to see how we can use a common mutex for both.


> Acked-by: David Woodhouse <[email protected]>
> + mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
>> return;
>> + }
>> WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
>> }
>> + mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
>>
>> /* Spawn the kthread for this CPU. */
>
> Some whitespace damage there.

Will fix in next version.

Thanks
Neeraj

>

2021-12-13 11:22:55

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2] rcu/nocb: Handle concurrent nocb kthreads creation

On Mon, Dec 13, 2021 at 02:25:30PM +0530, Neeraj Upadhyay wrote:
> Hi David,
>
> Thanks for the review; some replies inline.
>
> On 12/13/2021 1:48 PM, David Woodhouse wrote:
> > On Sat, 2021-12-11 at 22:31 +0530, Neeraj Upadhyay wrote:
> > > When multiple CPUs in the same nocb gp/cb group concurrently
> > > come online, they might try to concurrently create the same
> > > rcuog kthread. Fix this by using nocb gp CPU's spawn mutex to
> > > provide mutual exclusion for the rcuog kthread creation code.
> > >
> > > Signed-off-by: Neeraj Upadhyay <[email protected]>
> > > ---
> > > Change in v2:
> > > Fix missing mutex_unlock in nocb gp kthread creation err path.
> >
> > I think this ends up being not strictly necessary in the short term too
> > because we aren't currently planning to run rcutree_prepare_cpu()
> > concurrently anyway. But harmless and worth fixing in the longer term.
> >
> > Although, if I've already added a mutex for adding the boost thread,
> > could we manage to use the *same* mutex instead of adding another one?
> >
>
> Let me think about it; the nocb-gp and nocb-cb kthreads are grouped based on
> rcu_nocb_gp_stride; whereas, boost kthreads are per rnp. So, I need to see
> how we can use a common mutex for both.
>
>
> > Acked-by: David Woodhouse <[email protected]>
> > + mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
> > > return;
> > > + }
> > > WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
> > > }
> > > + mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
> > >
> > > /* Spawn the kthread for this CPU. */
> >
> > Some whitespace damage there.
>
> Will fix in next version.

I was about to ack the patch but, should we really add code that isn't going to
be necessary before a long while?

Thanks!


>
> Thanks
> Neeraj
>
> >

2021-12-13 11:29:07

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v2] rcu/nocb: Handle concurrent nocb kthreads creation

On Mon, 2021-12-13 at 12:22 +0100, Frederic Weisbecker wrote:
> I was about to ack the patch but, should we really add code that isn't going to
> be necessary before a long while?

Yeah, I'm torn on that. In this case it's harmless enough and it makes
the code reentrant in its own right instead of relying on the fact that
the cpuhp code won't invoke it multiple times in parallel. So I think
that's reasonable defensive programming.


Attachments:
smime.p7s (5.05 kB)

2021-12-13 13:14:16

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2] rcu/nocb: Handle concurrent nocb kthreads creation

On Mon, Dec 13, 2021 at 11:28:45AM +0000, David Woodhouse wrote:
> On Mon, 2021-12-13 at 12:22 +0100, Frederic Weisbecker wrote:
> > I was about to ack the patch but, should we really add code that isn't going to
> > be necessary before a long while?
>
> Yeah, I'm torn on that. In this case it's harmless enough and it makes
> the code reentrant in its own right instead of relying on the fact that
> the cpuhp code won't invoke it multiple times in parallel. So I think
> that's reasonable defensive programming.
>

The thing is that RCU code is already quite complicated. Are we even at least
sure that we'll ever make CPU hotplug allow concurrent CPU onlining/offlining?

This will require much more thoughts and a new hotplug concurrency
infrastructure that we'll need to base RCU on. IMHO it's a bit early to handle
that on hotplug individual callbacks.

But anyway, let's see what Paul thinks about it...

2021-12-13 19:00:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu/nocb: Handle concurrent nocb kthreads creation

On Mon, Dec 13, 2021 at 02:14:07PM +0100, Frederic Weisbecker wrote:
> On Mon, Dec 13, 2021 at 11:28:45AM +0000, David Woodhouse wrote:
> > On Mon, 2021-12-13 at 12:22 +0100, Frederic Weisbecker wrote:
> > > I was about to ack the patch but, should we really add code that isn't going to
> > > be necessary before a long while?
> >
> > Yeah, I'm torn on that. In this case it's harmless enough and it makes
> > the code reentrant in its own right instead of relying on the fact that
> > the cpuhp code won't invoke it multiple times in parallel. So I think
> > that's reasonable defensive programming.
>
> The thing is that RCU code is already quite complicated. Are we even at least
> sure that we'll ever make CPU hotplug allow concurrent CPU onlining/offlining?
>
> This will require much more thoughts and a new hotplug concurrency
> infrastructure that we'll need to base RCU on. IMHO it's a bit early to handle
> that on hotplug individual callbacks.
>
> But anyway, let's see what Paul thinks about it...

We need to at least think through parallelizing the various RCU
CPU-hotplug notifiers. For one thing, it might turn out to be necessary
to parallelize those notifiers, perhaps on some non-x86 architecture or
on some non-Amazon .config. For another thing, doing so might suggest
some simplifications, as has happened in my ongoing rcu_barrier() work.

My thought is to pull in the patches and at the very least leave a
tag recording them for later possible use.

Thanx, Paul