LinuxLists.cc - sched_core_balance() releasing interrupts with pi

2022-03-09 01:45:42

Subject: sched_core_balance() releasing interrupts with pi_lock held

Hi Peter,

A ChromeOS bug report showed a lockdep splat that I first thought was a bad
backport. But when looking at upstream, I don't see how it would work there
either. The lockdep splat had:

[56064.673346] Call Trace:
[56064.676066] dump_stack+0xb9/0x117
[56064.679861] ? print_usage_bug+0x2af/0x2c2
[56064.684434] mark_lock_irq+0x25e/0x27d
[56064.688618] mark_lock+0x11a/0x16c
[56064.692412] mark_held_locks+0x57/0x87
[56064.696595] ? _raw_spin_unlock_irq+0x2c/0x40
[56064.701460] lockdep_hardirqs_on+0xb1/0x19d
[56064.706130] _raw_spin_unlock_irq+0x2c/0x40
[56064.710799] sched_core_balance+0x8a/0x4af
[56064.715369] ? __balance_callback+0x1f/0x9a
[56064.720030] __balance_callback+0x4f/0x9a
[56064.724506] rt_mutex_setprio+0x43a/0x48b
[56064.728982] task_blocks_on_rt_mutex+0x14d/0x1d5

Where I see:

task_blocks_on_rt_mutex() {
spin_lock(pi_lock);
rt_mutex_setprio() {
balance_callback() {
sched_core_balance() {
spin_unlock_irq(rq);

Where spin_unlock_irq() enables interrupts while holding the pi_lock, and
BOOM, lockdep (rightfully) complains.

The above was me looking at mainline, not the kernel that blew up. So, I'm
guessing that this is a bug in mainline as well.

-- Steve

2022-03-16 10:44:04

by Steven Rostedt

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On Tue, 8 Mar 2022 16:14:55 -0500
Steven Rostedt <[email protected]> wrote:

> Hi Peter,

Have you had time to look into this?

Thanks,

-- Steve

>
> A ChromeOS bug report showed a lockdep splat that I first thought was a bad
> backport. But when looking at upstream, I don't see how it would work there
> either. The lockdep splat had:
>
> [56064.673346] Call Trace:
> [56064.676066] dump_stack+0xb9/0x117
> [56064.679861] ? print_usage_bug+0x2af/0x2c2
> [56064.684434] mark_lock_irq+0x25e/0x27d
> [56064.688618] mark_lock+0x11a/0x16c
> [56064.692412] mark_held_locks+0x57/0x87
> [56064.696595] ? _raw_spin_unlock_irq+0x2c/0x40
> [56064.701460] lockdep_hardirqs_on+0xb1/0x19d
> [56064.706130] _raw_spin_unlock_irq+0x2c/0x40
> [56064.710799] sched_core_balance+0x8a/0x4af
> [56064.715369] ? __balance_callback+0x1f/0x9a
> [56064.720030] __balance_callback+0x4f/0x9a
> [56064.724506] rt_mutex_setprio+0x43a/0x48b
> [56064.728982] task_blocks_on_rt_mutex+0x14d/0x1d5
>
> Where I see:
>
> task_blocks_on_rt_mutex() {
> spin_lock(pi_lock);
> rt_mutex_setprio() {
> balance_callback() {
> sched_core_balance() {
> spin_unlock_irq(rq);
>
> Where spin_unlock_irq() enables interrupts while holding the pi_lock, and
> BOOM, lockdep (rightfully) complains.
>
> The above was me looking at mainline, not the kernel that blew up. So, I'm
> guessing that this is a bug in mainline as well.
>
> -- Steve

2022-03-17 03:44:51

by Sebastian Andrzej Siewior

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On 2022-03-16 17:03:14 [+0100], To Steven Rostedt wrote:
…
> which looked right but RT still fall apart:
…
> | =====================================
> | WARNING: bad unlock balance detected!
> | 5.17.0-rc8-rt14+ #10 Not tainted
> | -------------------------------------
…
…
> It is always the local-lock that is breaks apart. Based on "locks held"
> and the lock it tries to release it looks like the lock was acquired on
> CPU-A and released on CPU-B.

with

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 33ce5cd113d8..f4675bd8f878 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5968,6 +5967,9 @@ static bool try_steal_cookie(int this, int that)
if (p == src->core_pick || p == src->curr)
goto next;

+ if (p->migration_disabled)
+ goto next;
+
if (!cpumask_test_cpu(this, &p->cpus_mask))
goto next;

on top my problems are gone. Let me do some testing and then I would
patch unless PeterZ does the yelling :)

> > Thanks,
> >
> > -- Steve
>
Sebastian

2022-03-17 04:24:49

by Peter Zijlstra

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On Tue, Mar 15, 2022 at 05:46:06PM -0400, Steven Rostedt wrote:
> On Tue, 8 Mar 2022 16:14:55 -0500
> Steven Rostedt <[email protected]> wrote:
>
> > Hi Peter,
>
> Have you had time to look into this?

Not since I talk to you on IRC about it last week.

Like I wrote, the balance_callback should be ran under whichever
rq->lock instance it gets queued under. As per:

565790d28b1e ("sched: Fix balance_callback()")

Now, we only do queue_core_balance() from set_next_task_idle(), which
*should* only happen from pick_next_task(), and as such the callback
should only ever get called from finish_lock_switch() or the 'prev ==
next' case in __schedule().

Neither of these two sites holds pi_lock.

This is about as far as I got explaining things, and it being late, it's
about as far as I got looking at things.

Now that also makes conceptual sense, we only want to pull a core-cookie
task when we're scheduling an idle task.

Now, clearly this gets triggered from the PI path, but that's not making
immediate sense to me, it would mean we're boosting the idle task, which
is wrong too.

So it would be useful for someone that can reproduce this to provide a
trace of where queue_core_balance() gets called, because that *should*
only be in __schedule().

2022-03-17 04:50:48

by Peter Zijlstra

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On Wed, Mar 16, 2022 at 09:27:34PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 15, 2022 at 05:46:06PM -0400, Steven Rostedt wrote:
> > On Tue, 8 Mar 2022 16:14:55 -0500
> > Steven Rostedt <[email protected]> wrote:
> >
> > > Hi Peter,
> >
> > Have you had time to look into this?
>
> Not since I talk to you on IRC about it last week.
>
> Like I wrote, the balance_callback should be ran under whichever
> rq->lock instance it gets queued under. As per:
>
> 565790d28b1e ("sched: Fix balance_callback()")
>
> Now, we only do queue_core_balance() from set_next_task_idle(), which
> *should* only happen from pick_next_task(), and as such the callback
> should only ever get called from finish_lock_switch() or the 'prev ==
> next' case in __schedule().
>
> Neither of these two sites holds pi_lock.
>
>
> This is about as far as I got explaining things, and it being late, it's
> about as far as I got looking at things.
>
> Now that also makes conceptual sense, we only want to pull a core-cookie
> task when we're scheduling an idle task.
>
> Now, clearly this gets triggered from the PI path, but that's not making
> immediate sense to me, it would mean we're boosting the idle task, which
> is wrong too.
>
> So it would be useful for someone that can reproduce this to provide a
> trace of where queue_core_balance() gets called, because that *should*
> only be in __schedule().

Does something like the below (untested in the extreme) help?

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 83872f95a1ea..18163454bb47 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5665,6 +5665,8 @@ static inline struct task_struct *pick_task(struct rq *rq)

extern void task_vruntime_update(struct rq *rq, struct task_struct *p, bool in_fi);

+static void queue_core_balance(struct rq *rq);
+
static struct task_struct *
pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
@@ -5714,7 +5716,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
}

rq->core_pick = NULL;
- return next;
+ goto out;
}

put_prev_task_balance(rq, prev, rf);
@@ -5764,7 +5766,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
*/
WARN_ON_ONCE(fi_before);
task_vruntime_update(rq, next, false);
- goto done;
+ goto out_set_next;
}
}

@@ -5884,8 +5886,12 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
resched_curr(rq_i);
}

-done:
+out_set_next:
set_next_task(rq, next);
+out:
+ if (rq->core->core_forceidle_count && next == rq->idle)
+ queue_core_balance(rq);
+
return next;
}

@@ -5914,7 +5920,7 @@ static bool try_steal_cookie(int this, int that)
if (p == src->core_pick || p == src->curr)
goto next;

- if (!cpumask_test_cpu(this, &p->cpus_mask))
+ if (!is_cpu_allowed(p, this))
goto next;

if (p->core_occupation > dst->idle->core_occupation)
@@ -5980,7 +5986,7 @@ static void sched_core_balance(struct rq *rq)

static DEFINE_PER_CPU(struct callback_head, core_balance_head);

-void queue_core_balance(struct rq *rq)
+static void queue_core_balance(struct rq *rq)
{
if (!sched_core_enabled(rq))
return;
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index d17b0a5ce6ac..314c36fc9c42 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -437,7 +437,6 @@ static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool fir
{
update_idle_core(rq);
schedstat_inc(rq->sched_goidle);
- queue_core_balance(rq);
}

#ifdef CONFIG_SMP
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index de53be905739..b85c9344779a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1247,8 +1247,6 @@ static inline bool sched_group_cookie_match(struct rq *rq,
return false;
}

-extern void queue_core_balance(struct rq *rq);
-
static inline bool sched_core_enqueued(struct task_struct *p)
{
return !RB_EMPTY_NODE(&p->core_node);
@@ -1282,10 +1280,6 @@ static inline raw_spinlock_t *__rq_lockp(struct rq *rq)
return &rq->__lock;
}

-static inline void queue_core_balance(struct rq *rq)
-{
-}
-
static inline bool sched_cpu_cookie_match(struct rq *rq, struct task_struct *p)
{
return true;

2022-03-17 05:26:43

by Sebastian Andrzej Siewior

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On 2022-03-16 17:18:41 [+0100], To Steven Rostedt wrote:
> on top my problems are gone. Let me do some testing and then I would
> patch unless PeterZ does the yelling :)

so the long irq-off section is latency wise less than optimal. A
testcase went from ~40 to 140us. And then there is this:
| NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #80!!!
| NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #10!!!

which might be RT-only.
I *think* we want first the buggy part gone and then the latency fixed.

> > > Thanks,
> > >
> > > -- Steve

Sebastian

2022-03-17 05:35:21

by Peter Zijlstra

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On Wed, Mar 16, 2022 at 05:18:40PM +0100, Sebastian Andrzej Siewior wrote:

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 33ce5cd113d8..f4675bd8f878 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5968,6 +5967,9 @@ static bool try_steal_cookie(int this, int that)
> if (p == src->core_pick || p == src->curr)
> goto next;
>
> + if (p->migration_disabled)
> + goto next;
> +
> if (!cpumask_test_cpu(this, &p->cpus_mask))
> goto next;
>
> on top my problems are gone. Let me do some testing and then I would
> patch unless PeterZ does the yelling :)

The previous thing in wrong because it tries to solve the wrong thing,
the above makes sense, except I would write it like so:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 83872f95a1ea..04c05bc4062e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5914,7 +5914,7 @@ static bool try_steal_cookie(int this, int that)
if (p == src->core_pick || p == src->curr)
goto next;

- if (!cpumask_test_cpu(this, &p->cpus_mask))
+ if (!is_cpu_allowed(p, this))
goto next;

if (p->core_occupation > dst->idle->core_occupation)

2022-03-17 05:46:45

by Sebastian Andrzej Siewior

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On 2022-03-15 17:46:06 [-0400], Steven Rostedt wrote:
> On Tue, 8 Mar 2022 16:14:55 -0500
> Steven Rostedt <[email protected]> wrote:
>
> > Hi Peter,
>
> Have you had time to look into this?

yes, I can confirm that it is a problem ;) So I did this:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 33ce5cd113d8..56c286aaa01f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5950,7 +5950,6 @@ static bool try_steal_cookie(int this, int that)
unsigned long cookie;
bool success = false;

- local_irq_disable();
double_rq_lock(dst, src);

cookie = dst->core->core_cookie;
@@ -5989,7 +5988,6 @@ static bool try_steal_cookie(int this, int that)

unlock:
double_rq_unlock(dst, src);
- local_irq_enable();

return success;
}
@@ -6019,7 +6017,7 @@ static void sched_core_balance(struct rq *rq)

preempt_disable();
rcu_read_lock();
- raw_spin_rq_unlock_irq(rq);
+ raw_spin_rq_unlock(rq);
for_each_domain(cpu, sd) {
if (need_resched())
break;
@@ -6027,7 +6025,7 @@ static void sched_core_balance(struct rq *rq)
if (steal_cookie_task(cpu, sd))
break;
}
- raw_spin_rq_lock_irq(rq);
+ raw_spin_rq_lock(rq);
rcu_read_unlock();
preempt_enable();
}

which looked right but RT still fall apart:

| =====================================
| WARNING: bad unlock balance detected!
| 5.17.0-rc8-rt14+ #10 Not tainted
| -------------------------------------
| gcc/2608 is trying to release lock ((lock)) at:
| [<ffffffff8135a150>] folio_add_lru+0x60/0x90
| but there are no more locks to release!
|
| other info that might help us debug this:
| 4 locks held by gcc/2608:
| #0: ffff88826ea6efe0 (&sb->s_type->i_mutex_key#12){++++}-{3:3}, at: xfs_ilock+0x90/0xd0
| #1: ffff88826ea6f1a0 (mapping.invalidate_lock#2){++++}-{3:3}, at: page_cache_ra_unbounded+0x8e/0x1f0
| #2: ffff88852aba8d18 ((lock)#3){+.+.}-{2:2}, at: folio_add_lru+0x2a/0x90
| #3: ffffffff829a5140 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xe0
|
| stack backtrace:
| CPU: 18 PID: 2608 Comm: gcc Not tainted 5.17.0-rc8-rt14+ #10
| Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
| Call Trace:
| <TASK>
| dump_stack_lvl+0x4a/0x62
| lock_release.cold+0x32/0x37
| rt_spin_unlock+0x17/0x80
| folio_add_lru+0x60/0x90
| filemap_add_folio+0x53/0xa0
| page_cache_ra_unbounded+0x1c3/0x1f0
| filemap_get_pages+0xe3/0x5b0
| filemap_read+0xc5/0x2f0
| xfs_file_buffered_read+0x6b/0x1a0
| xfs_file_read_iter+0x6a/0xd0
| new_sync_read+0x11b/0x1a0
| vfs_read+0x134/0x1d0
| ksys_read+0x68/0xf0
| do_syscall_64+0x59/0x80
| entry_SYSCALL_64_after_hwframe+0x44/0xae
| RIP: 0033:0x7f3feab7310e

It is always the local-lock that is breaks apart. Based on "locks held"
and the lock it tries to release it looks like the lock was acquired on
CPU-A and released on CPU-B.

> Thanks,
>
> -- Steve

Sebastian

2022-03-17 12:40:29

by Sebastian Andrzej Siewior

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On 2022-03-16 21:35:24 [+0100], Peter Zijlstra wrote:
>
> The previous thing in wrong because it tries to solve the wrong thing,
> the above makes sense, except I would write it like so:

Perfect. You want me to dress it as a patch or have you already done so?

Sebastian

2022-03-17 15:38:23

by Sebastian Andrzej Siewior

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On 2022-03-16 21:27:34 [+0100], Peter Zijlstra wrote:
> Now, we only do queue_core_balance() from set_next_task_idle(), which
> *should* only happen from pick_next_task(), and as such the callback
> should only ever get called from finish_lock_switch() or the 'prev ==
> next' case in __schedule().
>
> Neither of these two sites holds pi_lock.

I've been trying to reproduce it and didn't make it. I see only the
idle/scheduler path.

> This is about as far as I got explaining things, and it being late, it's
> about as far as I got looking at things.
>
> Now that also makes conceptual sense, we only want to pull a core-cookie
> task when we're scheduling an idle task.
>
> Now, clearly this gets triggered from the PI path, but that's not making
> immediate sense to me, it would mean we're boosting the idle task, which
> is wrong too.

Looking at the idle task, it shouldn't be possible for !RT due to lack
of boostable locks and I don't see anything sleeping locks here on RT
either.

> So it would be useful for someone that can reproduce this to provide a
> trace of where queue_core_balance() gets called, because that *should*
> only be in __schedule().

I failed so far.

Sebastian

2022-03-17 20:29:09

by Sebastian Andrzej Siewior

[permalink] [raw]

Subject: [PATCH] sched: Teach the forced-newidle balancer about CPU affinity limitation.

try_steal_cookie() looks at task_struct::cpus_mask to decide if the
task could be moved to `this' CPU. It ignores that the task might be in
a migration disabled section while not on the CPU. In this case the task
must not be moved otherwise per-CPU assumption are broken.

Use is_cpu_allowed(), as suggested by Peter Zijlstra, to decide if the a
task can be moved.

Fixes: d2dfa17bc7de6 ("sched: Trivial forced-newidle balancer")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a44414946de3d..421ace2e007de 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5923,7 +5923,7 @@ static bool try_steal_cookie(int this, int that)
if (p == src->core_pick || p == src->curr)
goto next;

- if (!cpumask_test_cpu(this, &p->cpus_mask))
+ if (!is_cpu_allowed(p, this))
goto next;

if (p->core_occupation > dst->idle->core_occupation)
--
2.35.1

2022-03-21 23:09:56

by Steven Rostedt

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On Wed, 16 Mar 2022 22:03:41 +0100
Peter Zijlstra <[email protected]> wrote:

> Does something like the below (untested in the extreme) help?

Hi Peter,

This has been tested extensively by the ChromeOS team and said that it does
appear to fix the problem.

Could you get this into mainline, and tag it for stable so that it can be
backported to the appropriate stable releases?

Thanks for the fix!

-- Steve

2022-03-30 08:18:32

by Steven Rostedt

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On Mon, 21 Mar 2022 13:30:37 -0400
Steven Rostedt <[email protected]> wrote:

> On Wed, 16 Mar 2022 22:03:41 +0100
> Peter Zijlstra <[email protected]> wrote:
>
> > Does something like the below (untested in the extreme) help?
>
> Hi Peter,
>
> This has been tested extensively by the ChromeOS team and said that it does
> appear to fix the problem.
>
> Could you get this into mainline, and tag it for stable so that it can be
> backported to the appropriate stable releases?
>
> Thanks for the fix!
>

Hi Peter,

I just don't want you to forget about this :-)

-- Steve

2022-04-05 02:27:56

by T.J. Alumbaugh

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On 3/29/22 17:22, Steven Rostedt wrote:
> On Mon, 21 Mar 2022 13:30:37 -0400
> Steven Rostedt <[email protected]> wrote:
>
>> On Wed, 16 Mar 2022 22:03:41 +0100
>> Peter Zijlstra <[email protected]> wrote:
>>
>>> Does something like the below (untested in the extreme) help?
>> Hi Peter,
>>
>> This has been tested extensively by the ChromeOS team and said that it does
>> appear to fix the problem.
>>
>> Could you get this into mainline, and tag it for stable so that it can be
>> backported to the appropriate stable releases?
>>
>> Thanks for the fix!
>>
> Hi Peter,
>
> I just don't want you to forget about this :-)
>
> -- Steve
>
Hi Peter,

Just a note that if/when you send this out as a patch, feel free to add:

Tested-by: T.J. Alumbaugh <[email protected]>

Thanks,

- T.J.

2022-04-06 00:58:58

by tip-bot2 for Alexey Makhalov

[permalink] [raw]

Subject: [tip: sched/urgent] sched: Teach the forced-newidle balancer about CPU affinity limitation.

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 386ef214c3c6ab111d05e1790e79475363abaa05
Gitweb: https://git.kernel.org/tip/386ef214c3c6ab111d05e1790e79475363abaa05
Author: Sebastian Andrzej Siewior <[email protected]>
AuthorDate: Thu, 17 Mar 2022 15:51:32 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 05 Apr 2022 09:59:36 +02:00

sched: Teach the forced-newidle balancer about CPU affinity limitation.

try_steal_cookie() looks at task_struct::cpus_mask to decide if the
task could be moved to `this' CPU. It ignores that the task might be in
a migration disabled section while not on the CPU. In this case the task
must not be moved otherwise per-CPU assumption are broken.

Use is_cpu_allowed(), as suggested by Peter Zijlstra, to decide if the a
task can be moved.

Fixes: d2dfa17bc7de6 ("sched: Trivial forced-newidle balancer")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 017ee78..51efaab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6006,7 +6006,7 @@ static bool try_steal_cookie(int this, int that)
if (p == src->core_pick || p == src->curr)
goto next;

- if (!cpumask_test_cpu(this, &p->cpus_mask))
+ if (!is_cpu_allowed(p, this))
goto next;

if (p->core_occupation > dst->idle->core_occupation)

2022-04-06 01:00:21

by Peter Zijlstra

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On Mon, Apr 04, 2022 at 04:17:54PM -0400, T.J. Alumbaugh wrote:
>
> On 3/29/22 17:22, Steven Rostedt wrote:
> > On Mon, 21 Mar 2022 13:30:37 -0400
> > Steven Rostedt <[email protected]> wrote:
> >
> > > On Wed, 16 Mar 2022 22:03:41 +0100
> > > Peter Zijlstra <[email protected]> wrote:
> > >
> > > > Does something like the below (untested in the extreme) help?
> > > Hi Peter,
> > >
> > > This has been tested extensively by the ChromeOS team and said that it does
> > > appear to fix the problem.
> > >
> > > Could you get this into mainline, and tag it for stable so that it can be
> > > backported to the appropriate stable releases?
> > >
> > > Thanks for the fix!
> > >
> > Hi Peter,
> >
> > I just don't want you to forget about this :-)
> >
> > -- Steve
> >
> Hi Peter,
>
> Just a note that if/when you send this out as a patch, feel free to add:
>
> Tested-by: T.J. Alumbaugh <[email protected]>

https://lkml.kernel.org/r/[email protected]

2022-04-06 13:12:51

by Dietmar Eggemann

[permalink] [raw]

Subject: Re: sched_core_balance() releasing interrupts with pi_lock held

On 05/04/2022 09:48, Peter Zijlstra wrote:
> On Mon, Apr 04, 2022 at 04:17:54PM -0400, T.J. Alumbaugh wrote:
>>
>> On 3/29/22 17:22, Steven Rostedt wrote:
>>> On Mon, 21 Mar 2022 13:30:37 -0400
>>> Steven Rostedt <[email protected]> wrote:
>>>
>>>> On Wed, 16 Mar 2022 22:03:41 +0100
>>>> Peter Zijlstra <[email protected]> wrote:
>>>>
>>>>> Does something like the below (untested in the extreme) help?
>>>> Hi Peter,
>>>>
>>>> This has been tested extensively by the ChromeOS team and said that it does
>>>> appear to fix the problem.
>>>>
>>>> Could you get this into mainline, and tag it for stable so that it can be
>>>> backported to the appropriate stable releases?
>>>>
>>>> Thanks for the fix!
>>>>
>>> Hi Peter,
>>>
>>> I just don't want you to forget about this :-)
>>>
>>> -- Steve
>>>
>> Hi Peter,
>>
>> Just a note that if/when you send this out as a patch, feel free to add:
>>
>> Tested-by: T.J. Alumbaugh <[email protected]>
>
> https://lkml.kernel.org/r/[email protected]

I still wonder if this issue happened on a system w/o:

565790d28b1e ("sched: Fix balance_callback()")

Maybe chromeos-5.10 or earlier? In this case applying 565790d28b1e could
fix it as well.

The reason why I think the original issue happened on a system w/o
565790d28b1e is the call-stack in:

https://lkml.kernel.org/r/[email protected]

[56064.673346] Call Trace:
[56064.676066] dump_stack+0xb9/0x117
[56064.679861] ? print_usage_bug+0x2af/0x2c2
[56064.684434] mark_lock_irq+0x25e/0x27d
[56064.688618] mark_lock+0x11a/0x16c
[56064.692412] mark_held_locks+0x57/0x87
[56064.696595] ? _raw_spin_unlock_irq+0x2c/0x40
[56064.701460] lockdep_hardirqs_on+0xb1/0x19d
[56064.706130] _raw_spin_unlock_irq+0x2c/0x40
[56064.710799] sched_core_balance+0x8a/0x4af
[56064.715369] ? __balance_callback+0x1f/0x9a <--- !!!
[56064.720030] __balance_callback+0x4f/0x9a
[56064.724506] rt_mutex_setprio+0x43a/0x48b
[56064.728982] task_blocks_on_rt_mutex+0x14d/0x1d5

has __balance_callback().

565790d28b1e changes __balance_callback() to __balance_callbacks()
^