2020-03-13 23:32:22

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic

rcu_read_unlock() can incur an infrequent deadlock in
sched_core_balance(). Fix this by using the RCU-sched flavor instead.

This fixes the following spinlock recursion observed when testing the
core scheduling patches on PREEMPT=y kernel on ChromeOS:

[ 3.240891] BUG: spinlock recursion on CPU#2, swapper/2/0
[ 3.240900] lock: 0xffff9cd1eeb28e40, .magic: dead4ead, .owner: swapper/2/0, .owner_cpu: 2
[ 3.240905] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.22htcore #4
[ 3.240908] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.174.0 05/29/2018
[ 3.240910] Call Trace:
[ 3.240919] dump_stack+0x97/0xdb
[ 3.240924] ? spin_bug+0xa4/0xb1
[ 3.240927] do_raw_spin_lock+0x79/0x98
[ 3.240931] try_to_wake_up+0x367/0x61b
[ 3.240935] rcu_read_unlock_special+0xde/0x169
[ 3.240938] ? sched_core_balance+0xd9/0x11e
[ 3.240941] __rcu_read_unlock+0x48/0x4a
[ 3.240945] __balance_callback+0x50/0xa1
[ 3.240949] __schedule+0x55a/0x61e
[ 3.240952] schedule_idle+0x21/0x2d
[ 3.240956] do_idle+0x1d5/0x1f8
[ 3.240960] cpu_startup_entry+0x1d/0x1f
[ 3.240964] start_secondary+0x159/0x174
[ 3.240967] secondary_startup_64+0xa4/0xb0
[ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965]

Cc: vpillai <[email protected]>
Cc: Aaron Lu <[email protected]>
Cc: Aubrey Li <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/sched/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3045bd50e249..037e8f2e2686 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4735,7 +4735,7 @@ static void sched_core_balance(struct rq *rq)
struct sched_domain *sd;
int cpu = cpu_of(rq);

- rcu_read_lock();
+ rcu_read_lock_sched();
raw_spin_unlock_irq(rq_lockp(rq));
for_each_domain(cpu, sd) {
if (!(sd->flags & SD_LOAD_BALANCE))
@@ -4748,7 +4748,7 @@ static void sched_core_balance(struct rq *rq)
break;
}
raw_spin_lock_irq(rq_lockp(rq));
- rcu_read_unlock();
+ rcu_read_unlock_sched();
}

static DEFINE_PER_CPU(struct callback_head, core_balance_head);
--
2.25.1.481.gfbce0eb801-goog


2020-03-14 00:31:44

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic

On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote:
> rcu_read_unlock() can incur an infrequent deadlock in
> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
>
> This fixes the following spinlock recursion observed when testing the
> core scheduling patches on PREEMPT=y kernel on ChromeOS:
>
> [ 3.240891] BUG: spinlock recursion on CPU#2, swapper/2/0
> [ 3.240900] lock: 0xffff9cd1eeb28e40, .magic: dead4ead, .owner: swapper/2/0, .owner_cpu: 2
> [ 3.240905] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.22htcore #4
> [ 3.240908] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.174.0 05/29/2018
> [ 3.240910] Call Trace:
> [ 3.240919] dump_stack+0x97/0xdb
> [ 3.240924] ? spin_bug+0xa4/0xb1
> [ 3.240927] do_raw_spin_lock+0x79/0x98
> [ 3.240931] try_to_wake_up+0x367/0x61b
> [ 3.240935] rcu_read_unlock_special+0xde/0x169
> [ 3.240938] ? sched_core_balance+0xd9/0x11e
> [ 3.240941] __rcu_read_unlock+0x48/0x4a
> [ 3.240945] __balance_callback+0x50/0xa1
> [ 3.240949] __schedule+0x55a/0x61e
> [ 3.240952] schedule_idle+0x21/0x2d
> [ 3.240956] do_idle+0x1d5/0x1f8
> [ 3.240960] cpu_startup_entry+0x1d/0x1f
> [ 3.240964] start_secondary+0x159/0x174
> [ 3.240967] secondary_startup_64+0xa4/0xb0
> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965]
>
> Cc: vpillai <[email protected]>
> Cc: Aaron Lu <[email protected]>
> Cc: Aubrey Li <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Joel Fernandes (Google) <[email protected]>

The original could indeed deadlock, and this would avoid that deadlock.
(The commit to solve this deadlock is sadly not yet in mainline.)

Acked-by: Paul E. McKenney <[email protected]>

> ---
> kernel/sched/core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3045bd50e249..037e8f2e2686 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4735,7 +4735,7 @@ static void sched_core_balance(struct rq *rq)
> struct sched_domain *sd;
> int cpu = cpu_of(rq);
>
> - rcu_read_lock();
> + rcu_read_lock_sched();
> raw_spin_unlock_irq(rq_lockp(rq));
> for_each_domain(cpu, sd) {
> if (!(sd->flags & SD_LOAD_BALANCE))
> @@ -4748,7 +4748,7 @@ static void sched_core_balance(struct rq *rq)
> break;
> }
> raw_spin_lock_irq(rq_lockp(rq));
> - rcu_read_unlock();
> + rcu_read_unlock_sched();
> }
>
> static DEFINE_PER_CPU(struct callback_head, core_balance_head);
> --
> 2.25.1.481.gfbce0eb801-goog
>

2020-03-23 06:59:29

by Li, Aubrey

[permalink] [raw]
Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic

On 2020/3/14 8:30, Paul E. McKenney wrote:
> On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote:
>> rcu_read_unlock() can incur an infrequent deadlock in
>> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
>>
>> This fixes the following spinlock recursion observed when testing the
>> core scheduling patches on PREEMPT=y kernel on ChromeOS:
>>
>> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965]
>>
>
> The original could indeed deadlock, and this would avoid that deadlock.
> (The commit to solve this deadlock is sadly not yet in mainline.)
>
> Acked-by: Paul E. McKenney <[email protected]>

I saw this in dmesg with this patch, is it expected?

Thanks,
-Aubrey

[ 117.000905] =============================
[ 117.000907] WARNING: suspicious RCU usage
[ 117.000911] 5.5.7+ #160 Not tainted
[ 117.000913] -----------------------------
[ 117.000916] kernel/sched/core.c:4747 suspicious rcu_dereference_check() usage!
[ 117.000918]
other info that might help us debug this:

[ 117.000921]
rcu_scheduler_active = 2, debug_locks = 1
[ 117.000923] 1 lock held by swapper/52/0:
[ 117.000925] #0: ffffffff82670960 (rcu_read_lock_sched){....}, at: sched_core_balance+0x5/0x700
[ 117.000937]
stack backtrace:
[ 117.000940] CPU: 52 PID: 0 Comm: swapper/52 Kdump: loaded Not tainted 5.5.7+ #160
[ 117.000943] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0412.020920172159 02/09/2017
[ 117.000945] Call Trace:
[ 117.000955] dump_stack+0x86/0xcb
[ 117.000962] sched_core_balance+0x634/0x700
[ 117.000982] __balance_callback+0x49/0xa0
[ 117.000990] __schedule+0x1416/0x1620
[ 117.001000] ? lockdep_hardirqs_off+0xa0/0xe0
[ 117.001005] ? _raw_spin_unlock_irqrestore+0x41/0x70
[ 117.001024] schedule_idle+0x28/0x40
[ 117.001030] do_idle+0x17e/0x2a0
[ 117.001041] cpu_startup_entry+0x19/0x20
[ 117.001048] start_secondary+0x16c/0x1c0
[ 117.001055] secondary_startup_64+0xa4/0xb0

>
>> ---
>> kernel/sched/core.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 3045bd50e249..037e8f2e2686 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4735,7 +4735,7 @@ static void sched_core_balance(struct rq *rq)
>> struct sched_domain *sd;
>> int cpu = cpu_of(rq);
>>
>> - rcu_read_lock();
>> + rcu_read_lock_sched();
>> raw_spin_unlock_irq(rq_lockp(rq));
>> for_each_domain(cpu, sd) {
>> if (!(sd->flags & SD_LOAD_BALANCE))
>> @@ -4748,7 +4748,7 @@ static void sched_core_balance(struct rq *rq)
>> break;
>> }
>> raw_spin_lock_irq(rq_lockp(rq));
>> - rcu_read_unlock();
>> + rcu_read_unlock_sched();
>> }
>>
>> static DEFINE_PER_CPU(struct callback_head, core_balance_head);
>> --
>> 2.25.1.481.gfbce0eb801-goog
>>

2020-03-23 15:24:10

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic

On Mon, Mar 23, 2020 at 02:58:18PM +0800, Li, Aubrey wrote:
> On 2020/3/14 8:30, Paul E. McKenney wrote:
> > On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote:
> >> rcu_read_unlock() can incur an infrequent deadlock in
> >> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
> >>
> >> This fixes the following spinlock recursion observed when testing the
> >> core scheduling patches on PREEMPT=y kernel on ChromeOS:
> >>
> >> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965]
> >>
> >
> > The original could indeed deadlock, and this would avoid that deadlock.
> > (The commit to solve this deadlock is sadly not yet in mainline.)
> >
> > Acked-by: Paul E. McKenney <[email protected]>
>
> I saw this in dmesg with this patch, is it expected?
>
> [ 117.000905] =============================
> [ 117.000907] WARNING: suspicious RCU usage
> [ 117.000911] 5.5.7+ #160 Not tainted
> [ 117.000913] -----------------------------
> [ 117.000916] kernel/sched/core.c:4747 suspicious rcu_dereference_check() usage!
> [ 117.000918]
> other info that might help us debug this:

Sigh, this is because for_each_domain() expects rcu_read_lock(). From an RCU
PoV, the code is correct (warning doesn't cause any issue).

To silence warning, we could replace the rcu_read_lock_sched() in my patch with:
preempt_disable();
rcu_read_lock();

and replace the unlock with:

rcu_read_unlock();
preempt_enable();

That should both take care of both the warning and the scheduler-related
deadlock. Thoughts?

Does that fix the warning for you?

thanks,

- Joel

>
> [ 117.000921]
> rcu_scheduler_active = 2, debug_locks = 1
> [ 117.000923] 1 lock held by swapper/52/0:
> [ 117.000925] #0: ffffffff82670960 (rcu_read_lock_sched){....}, at: sched_core_balance+0x5/0x700
> [ 117.000937]
> stack backtrace:
> [ 117.000940] CPU: 52 PID: 0 Comm: swapper/52 Kdump: loaded Not tainted 5.5.7+ #160
> [ 117.000943] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0412.020920172159 02/09/2017
> [ 117.000945] Call Trace:
> [ 117.000955] dump_stack+0x86/0xcb
> [ 117.000962] sched_core_balance+0x634/0x700
> [ 117.000982] __balance_callback+0x49/0xa0
> [ 117.000990] __schedule+0x1416/0x1620
> [ 117.001000] ? lockdep_hardirqs_off+0xa0/0xe0
> [ 117.001005] ? _raw_spin_unlock_irqrestore+0x41/0x70
> [ 117.001024] schedule_idle+0x28/0x40
> [ 117.001030] do_idle+0x17e/0x2a0
> [ 117.001041] cpu_startup_entry+0x19/0x20
> [ 117.001048] start_secondary+0x16c/0x1c0
> [ 117.001055] secondary_startup_64+0xa4/0xb0
>
> >
> >> ---
> >> kernel/sched/core.c | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 3045bd50e249..037e8f2e2686 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -4735,7 +4735,7 @@ static void sched_core_balance(struct rq *rq)
> >> struct sched_domain *sd;
> >> int cpu = cpu_of(rq);
> >>
> >> - rcu_read_lock();
> >> + rcu_read_lock_sched();
> >> raw_spin_unlock_irq(rq_lockp(rq));
> >> for_each_domain(cpu, sd) {
> >> if (!(sd->flags & SD_LOAD_BALANCE))
> >> @@ -4748,7 +4748,7 @@ static void sched_core_balance(struct rq *rq)
> >> break;
> >> }
> >> raw_spin_lock_irq(rq_lockp(rq));
> >> - rcu_read_unlock();
> >> + rcu_read_unlock_sched();
> >> }
> >>
> >> static DEFINE_PER_CPU(struct callback_head, core_balance_head);
> >> --
> >> 2.25.1.481.gfbce0eb801-goog
> >>
>

2020-03-24 03:02:05

by Li, Aubrey

[permalink] [raw]
Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic

On 2020/3/23 23:21, Joel Fernandes wrote:
> On Mon, Mar 23, 2020 at 02:58:18PM +0800, Li, Aubrey wrote:
>> On 2020/3/14 8:30, Paul E. McKenney wrote:
>>> On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote:
>>>> rcu_read_unlock() can incur an infrequent deadlock in
>>>> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
>>>>
>>>> This fixes the following spinlock recursion observed when testing the
>>>> core scheduling patches on PREEMPT=y kernel on ChromeOS:
>>>>
>>>> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965]
>>>>
>>>
>>> The original could indeed deadlock, and this would avoid that deadlock.
>>> (The commit to solve this deadlock is sadly not yet in mainline.)
>>>
>>> Acked-by: Paul E. McKenney <[email protected]>
>>
>> I saw this in dmesg with this patch, is it expected?
>>
>> [ 117.000905] =============================
>> [ 117.000907] WARNING: suspicious RCU usage
>> [ 117.000911] 5.5.7+ #160 Not tainted
>> [ 117.000913] -----------------------------
>> [ 117.000916] kernel/sched/core.c:4747 suspicious rcu_dereference_check() usage!
>> [ 117.000918]
>> other info that might help us debug this:
>
> Sigh, this is because for_each_domain() expects rcu_read_lock(). From an RCU
> PoV, the code is correct (warning doesn't cause any issue).
>
> To silence warning, we could replace the rcu_read_lock_sched() in my patch with:
> preempt_disable();
> rcu_read_lock();
>
> and replace the unlock with:
>
> rcu_read_unlock();
> preempt_enable();
>
> That should both take care of both the warning and the scheduler-related
> deadlock. Thoughts?
>

How about this?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a01df3e..7ff694e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4743,7 +4743,6 @@ static void sched_core_balance(struct rq *rq)
int cpu = cpu_of(rq);

rcu_read_lock();
- raw_spin_unlock_irq(rq_lockp(rq));
for_each_domain(cpu, sd) {
if (!(sd->flags & SD_LOAD_BALANCE))
break;
@@ -4754,7 +4753,6 @@ static void sched_core_balance(struct rq *rq)
if (steal_cookie_task(cpu, sd))
break;
}
- raw_spin_lock_irq(rq_lockp(rq));
rcu_read_unlock();
}

Thanks,
-Aubrey

2020-03-24 13:31:51

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic

On Tue, Mar 24, 2020 at 11:01:27AM +0800, Li, Aubrey wrote:
> On 2020/3/23 23:21, Joel Fernandes wrote:
> > On Mon, Mar 23, 2020 at 02:58:18PM +0800, Li, Aubrey wrote:
> >> On 2020/3/14 8:30, Paul E. McKenney wrote:
> >>> On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote:
> >>>> rcu_read_unlock() can incur an infrequent deadlock in
> >>>> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
> >>>>
> >>>> This fixes the following spinlock recursion observed when testing the
> >>>> core scheduling patches on PREEMPT=y kernel on ChromeOS:
> >>>>
> >>>> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965]
> >>>>
> >>>
> >>> The original could indeed deadlock, and this would avoid that deadlock.
> >>> (The commit to solve this deadlock is sadly not yet in mainline.)
> >>>
> >>> Acked-by: Paul E. McKenney <[email protected]>
> >>
> >> I saw this in dmesg with this patch, is it expected?
> >>
> >> [ 117.000905] =============================
> >> [ 117.000907] WARNING: suspicious RCU usage
> >> [ 117.000911] 5.5.7+ #160 Not tainted
> >> [ 117.000913] -----------------------------
> >> [ 117.000916] kernel/sched/core.c:4747 suspicious rcu_dereference_check() usage!
> >> [ 117.000918]
> >> other info that might help us debug this:
> >
> > Sigh, this is because for_each_domain() expects rcu_read_lock(). From an RCU
> > PoV, the code is correct (warning doesn't cause any issue).
> >
> > To silence warning, we could replace the rcu_read_lock_sched() in my patch with:
> > preempt_disable();
> > rcu_read_lock();
> >
> > and replace the unlock with:
> >
> > rcu_read_unlock();
> > preempt_enable();
> >
> > That should both take care of both the warning and the scheduler-related
> > deadlock. Thoughts?
> >
>
> How about this?
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a01df3e..7ff694e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4743,7 +4743,6 @@ static void sched_core_balance(struct rq *rq)
> int cpu = cpu_of(rq);
>
> rcu_read_lock();
> - raw_spin_unlock_irq(rq_lockp(rq));
> for_each_domain(cpu, sd) {
> if (!(sd->flags & SD_LOAD_BALANCE))
> break;
> @@ -4754,7 +4753,6 @@ static void sched_core_balance(struct rq *rq)
> if (steal_cookie_task(cpu, sd))
> break;
> }
> - raw_spin_lock_irq(rq_lockp(rq));
> rcu_read_unlock();
> }

As an alternative, I am backporting the -rcu commit 2b5e19e20fc2 ("rcu:
Make rcu_read_unlock_special() safe for rq/pi locks") to v5.6-rc6 and
testing it. The other support for this is already in mainline. I just
now started testing it, and will let you know how it goes.

If that works for you, and if the bug you are looking to fix is also
in v5.5 or earlier, please let me know so that we can work out how to
deal with the older releases.

Thanx, Paul

2020-03-24 15:14:42

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic

On Tue, Mar 24, 2020 at 06:30:12AM -0700, Paul E. McKenney wrote:
> On Tue, Mar 24, 2020 at 11:01:27AM +0800, Li, Aubrey wrote:
> > On 2020/3/23 23:21, Joel Fernandes wrote:
> > > On Mon, Mar 23, 2020 at 02:58:18PM +0800, Li, Aubrey wrote:
> > >> On 2020/3/14 8:30, Paul E. McKenney wrote:
> > >>> On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote:
> > >>>> rcu_read_unlock() can incur an infrequent deadlock in
> > >>>> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
> > >>>>
> > >>>> This fixes the following spinlock recursion observed when testing the
> > >>>> core scheduling patches on PREEMPT=y kernel on ChromeOS:
> > >>>>
> > >>>> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965]
> > >>>>
> > >>>
> > >>> The original could indeed deadlock, and this would avoid that deadlock.
> > >>> (The commit to solve this deadlock is sadly not yet in mainline.)
> > >>>
> > >>> Acked-by: Paul E. McKenney <[email protected]>
> > >>
> > >> I saw this in dmesg with this patch, is it expected?
> > >>
> > >> [ 117.000905] =============================
> > >> [ 117.000907] WARNING: suspicious RCU usage
> > >> [ 117.000911] 5.5.7+ #160 Not tainted
> > >> [ 117.000913] -----------------------------
> > >> [ 117.000916] kernel/sched/core.c:4747 suspicious rcu_dereference_check() usage!
> > >> [ 117.000918]
> > >> other info that might help us debug this:
> > >
> > > Sigh, this is because for_each_domain() expects rcu_read_lock(). From an RCU
> > > PoV, the code is correct (warning doesn't cause any issue).
> > >
> > > To silence warning, we could replace the rcu_read_lock_sched() in my patch with:
> > > preempt_disable();
> > > rcu_read_lock();
> > >
> > > and replace the unlock with:
> > >
> > > rcu_read_unlock();
> > > preempt_enable();
> > >
> > > That should both take care of both the warning and the scheduler-related
> > > deadlock. Thoughts?
> > >
> >
> > How about this?
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index a01df3e..7ff694e 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4743,7 +4743,6 @@ static void sched_core_balance(struct rq *rq)
> > int cpu = cpu_of(rq);
> >
> > rcu_read_lock();
> > - raw_spin_unlock_irq(rq_lockp(rq));
> > for_each_domain(cpu, sd) {
> > if (!(sd->flags & SD_LOAD_BALANCE))
> > break;
> > @@ -4754,7 +4753,6 @@ static void sched_core_balance(struct rq *rq)
> > if (steal_cookie_task(cpu, sd))
> > break;
> > }
> > - raw_spin_lock_irq(rq_lockp(rq));
> > rcu_read_unlock();
> > }
>
> As an alternative, I am backporting the -rcu commit 2b5e19e20fc2 ("rcu:
> Make rcu_read_unlock_special() safe for rq/pi locks") to v5.6-rc6 and
> testing it. The other support for this is already in mainline. I just
> now started testing it, and will let you know how it goes.
>
> If that works for you, and if the bug you are looking to fix is also
> in v5.5 or earlier, please let me know so that we can work out how to
> deal with the older releases.

And the backported patch below passes moderate rcutorture testing.
But the big question... Does it work for you?

Thanx, Paul

------------------------------------------------------------------------

commit 0ed0648e23f6aca2a6543fee011d74ab674e08e8
Author: Paul E. McKenney <[email protected]>
Date: Tue Mar 24 06:20:14 2020 -0700

rcu: Make rcu_read_unlock_special() safe for rq/pi locks

The scheduler is currently required to hold rq/pi locks across the entire
RCU read-side critical section or not at all. This is inconvenient and
leaves traps for the unwary, including the author of this commit.

But now that excessively ong grace periods enable scheduling-clock
interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in
rcu_read_unlock_special() can be dispensed with. In other words, the
rcu_read_unlock_special() function can refrain from doing wakeups unless
such wakeups are guaranteed safe.

This commit therefore avoids unsafe wakeups, freeing the scheduler to
hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU
read-side critical section might have been preempted.

This commit is inspired by a patch from Lai Jiangshan:
https://lore.kernel.org/lkml/[email protected]
This commit is further intended to be a step towards his goal of permitting
the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock().

Cc: Lai Jiangshan <[email protected]>
[ paulmck: Backported to v5.6-rc6. ]
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c6ea81cd4189..2e1bfa0cc393 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -614,18 +614,16 @@ static void rcu_read_unlock_special(struct task_struct *t)
struct rcu_node *rnp = rdp->mynode;

exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
- (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
- tick_nohz_full_cpu(rdp->cpu);
+ (rdp->grpmask & READ_ONCE(rnp->expmask));
// Need to defer quiescent state until everything is enabled.
- if (irqs_were_disabled && use_softirq &&
- (in_interrupt() ||
- (exp && !t->rcu_read_unlock_special.b.deferred_qs))) {
- // Using softirq, safe to awaken, and we get
- // no help from enabling irqs, unlike bh/preempt.
+ if (use_softirq && (in_irq() || (exp && !irqs_were_disabled))) {
+ // Using softirq, safe to awaken, and either the
+ // wakeup is free or there is an expedited GP.
raise_softirq_irqoff(RCU_SOFTIRQ);
} else {
// Enabling BH or preempt does reschedule, so...
- // Also if no expediting or NO_HZ_FULL, slow is OK.
+ // Also if no expediting, slow is OK.
+ // Plus nohz_full CPUs eventually get tick enabled.
set_tsk_need_resched(current);
set_preempt_need_resched();
if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&

2020-03-24 18:51:18

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic

On Tue, Mar 24, 2020 at 11:01:27AM +0800, Li, Aubrey wrote:
> On 2020/3/23 23:21, Joel Fernandes wrote:
> > On Mon, Mar 23, 2020 at 02:58:18PM +0800, Li, Aubrey wrote:
> >> On 2020/3/14 8:30, Paul E. McKenney wrote:
> >>> On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote:
> >>>> rcu_read_unlock() can incur an infrequent deadlock in
> >>>> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
> >>>>
> >>>> This fixes the following spinlock recursion observed when testing the
> >>>> core scheduling patches on PREEMPT=y kernel on ChromeOS:
> >>>>
> >>>> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965]
> >>>>
> >>>
> >>> The original could indeed deadlock, and this would avoid that deadlock.
> >>> (The commit to solve this deadlock is sadly not yet in mainline.)
> >>>
> >>> Acked-by: Paul E. McKenney <[email protected]>
> >>
> >> I saw this in dmesg with this patch, is it expected?
> >>
> >> [ 117.000905] =============================
> >> [ 117.000907] WARNING: suspicious RCU usage
> >> [ 117.000911] 5.5.7+ #160 Not tainted
> >> [ 117.000913] -----------------------------
> >> [ 117.000916] kernel/sched/core.c:4747 suspicious rcu_dereference_check() usage!
> >> [ 117.000918]
> >> other info that might help us debug this:
> >
> > Sigh, this is because for_each_domain() expects rcu_read_lock(). From an RCU
> > PoV, the code is correct (warning doesn't cause any issue).
> >
> > To silence warning, we could replace the rcu_read_lock_sched() in my patch with:
> > preempt_disable();
> > rcu_read_lock();
> >
> > and replace the unlock with:
> >
> > rcu_read_unlock();
> > preempt_enable();
> >
> > That should both take care of both the warning and the scheduler-related
> > deadlock. Thoughts?
> >
>
> How about this?
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a01df3e..7ff694e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4743,7 +4743,6 @@ static void sched_core_balance(struct rq *rq)
> int cpu = cpu_of(rq);
>
> rcu_read_lock();
> - raw_spin_unlock_irq(rq_lockp(rq));
> for_each_domain(cpu, sd) {
> if (!(sd->flags & SD_LOAD_BALANCE))
> break;
> @@ -4754,7 +4753,6 @@ static void sched_core_balance(struct rq *rq)
> if (steal_cookie_task(cpu, sd))
> break;
> }
> - raw_spin_lock_irq(rq_lockp(rq));

try_steal_cookie() does a double_rq_lock(). Would this change not deadlock
with that?

thanks,

- Joel

2020-03-25 00:40:45

by Li, Aubrey

[permalink] [raw]
Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic

On 2020/3/25 2:49, Joel Fernandes wrote:
> On Tue, Mar 24, 2020 at 11:01:27AM +0800, Li, Aubrey wrote:
>> On 2020/3/23 23:21, Joel Fernandes wrote:
>>> On Mon, Mar 23, 2020 at 02:58:18PM +0800, Li, Aubrey wrote:
>>>> On 2020/3/14 8:30, Paul E. McKenney wrote:
>>>>> On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote:
>>>>>> rcu_read_unlock() can incur an infrequent deadlock in
>>>>>> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
>>>>>>
>>>>>> This fixes the following spinlock recursion observed when testing the
>>>>>> core scheduling patches on PREEMPT=y kernel on ChromeOS:
>>>>>>
>>>>>> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965]
>>>>>>
>>>>>
>>>>> The original could indeed deadlock, and this would avoid that deadlock.
>>>>> (The commit to solve this deadlock is sadly not yet in mainline.)
>>>>>
>>>>> Acked-by: Paul E. McKenney <[email protected]>
>>>>
>>>> I saw this in dmesg with this patch, is it expected?
>>>>
>>>> [ 117.000905] =============================
>>>> [ 117.000907] WARNING: suspicious RCU usage
>>>> [ 117.000911] 5.5.7+ #160 Not tainted
>>>> [ 117.000913] -----------------------------
>>>> [ 117.000916] kernel/sched/core.c:4747 suspicious rcu_dereference_check() usage!
>>>> [ 117.000918]
>>>> other info that might help us debug this:
>>>
>>> Sigh, this is because for_each_domain() expects rcu_read_lock(). From an RCU
>>> PoV, the code is correct (warning doesn't cause any issue).
>>>
>>> To silence warning, we could replace the rcu_read_lock_sched() in my patch with:
>>> preempt_disable();
>>> rcu_read_lock();
>>>
>>> and replace the unlock with:
>>>
>>> rcu_read_unlock();
>>> preempt_enable();
>>>
>>> That should both take care of both the warning and the scheduler-related
>>> deadlock. Thoughts?
>>>
>>
>> How about this?
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index a01df3e..7ff694e 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4743,7 +4743,6 @@ static void sched_core_balance(struct rq *rq)
>> int cpu = cpu_of(rq);
>>
>> rcu_read_lock();
>> - raw_spin_unlock_irq(rq_lockp(rq));
>> for_each_domain(cpu, sd) {
>> if (!(sd->flags & SD_LOAD_BALANCE))
>> break;
>> @@ -4754,7 +4753,6 @@ static void sched_core_balance(struct rq *rq)
>> if (steal_cookie_task(cpu, sd))
>> break;
>> }
>> - raw_spin_lock_irq(rq_lockp(rq));
>
> try_steal_cookie() does a double_rq_lock(). Would this change not deadlock
> with that?
>
Oh yes, missed double_rq_lock() inside, just want to keep local intr disabled
to avoid preemption. will try Paul's patch and report back.

Thanks,
-Aubrey