LinuxLists.cc - "Dying CPU not properly vacated" splat

2022-04-22 10:36:23

Subject: "Dying CPU not properly vacated" splat

Hello!

The rcutorture TREE03 scenario got the following splat, which appears
to be a one-off, or if not, having an MTBF in the thousands of hours,
even assuming that it is specific to TREE03. (If it is not specific to
TREE03, we are talking tens of thousands of hours of rcutorture runtime.)

So just in case this rings any bells or there are some diagnostics I
should add in case this ever happens again. ;-)

Thanx, Paul

------------------------------------------------------------------------

[29213.585252] ------------[ cut here ]------------
[29213.586287] Dying CPU not properly vacated!
[29213.586314] WARNING: CPU: 7 PID: 52 at kernel/sched/core.c:9386 sched_cpu_dying.cold.167+0xc/0xc3
[29213.589181] Modules linked in:
[29213.589864] CPU: 7 PID: 52 Comm: migration/7 Not tainted 5.18.0-rc1-00196-g24f4daa283b8 #5468
[29213.591720] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module_el8.5.0+746+bbd5d70c 04/01/2014
[29213.593698] Stopper: multi_cpu_stop+0x0/0xf0 <- stop_machine_cpuslocked+0xea/0x120
[29213.595333] RIP: 0010:sched_cpu_dying.cold.167+0xc/0xc3
[29213.596463] Code: 1f ea 00 49 89 e8 48 8d 93 38 07 00 00 48 c7 c7 60 2b d8 b9 e8 be 2a 00 00 e9 4b 7d 4a ff 48 c7 c7 18 2c d8 b9 e8 80 ed ff ff <0f> 0b 44 8b ab d0 09 00 00 8b 4b 04 48 c7 c6 95 d5 d6 b9 48 c7 c7
[29213.600479] RSP: 0018:ffff9cbbc023bd88 EFLAGS: 00010082
[29213.601630] RAX: 0000000000000000 RBX: ffff9b98df3e8cc0 RCX: 0000000000000003
[29213.603175] RDX: 0000000000000003 RSI: 0000000100021495 RDI: 00000000ffffffff
[29213.604699] RBP: 0000000000000086 R08: 0000000000000000 R09: c000000100021495
[29213.606230] R10: 000000000081dc88 R11: ffff9cbbc023bba8 R12: ffffffffb889d6c0
[29213.607755] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cbbc05bfd01
[29213.609297] FS: 0000000000000000(0000) GS:ffff9b98df3c0000(0000) knlGS:0000000000000000
[29213.611045] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[29213.612279] CR2: 0000000000000000 CR3: 0000000016c0c000 CR4: 00000000000006e0
[29213.613821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[29213.615433] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[29213.616972] Call Trace:
[29213.617516] <TASK>
[29213.617992] ? sched_cpu_wait_empty+0x70/0x70
[29213.618935] cpuhp_invoke_callback+0x100/0x4a0
[29213.619899] cpuhp_invoke_callback_range+0x3b/0x80
[29213.620933] take_cpu_down+0x55/0x80
[29213.621703] multi_cpu_stop+0x61/0xf0
[29213.622503] ? stop_machine_yield+0x10/0x10
[29213.623408] cpu_stopper_thread+0x84/0x120
[29213.624300] smpboot_thread_fn+0x181/0x220
[29213.625192] ? sort_range+0x20/0x20
[29213.625961] kthread+0xe3/0x110
[29213.626641] ? kthread_complete_and_exit+0x20/0x20
[29213.627777] ret_from_fork+0x22/0x30
[29213.628655] </TASK>
[29213.629185] ---[ end trace 0000000000000000 ]---

Attachments:

(No filename) (2.82 kB)
.config (135.50 kB)
.config Download all attachments

2022-04-25 18:47:29

by Paul E. McKenney

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

On Mon, Apr 25, 2022 at 05:15:13PM +0100, Valentin Schneider wrote:
>
> Hi Paul,
>
> On 21/04/22 12:38, Paul E. McKenney wrote:
> > Hello!
> >
> > The rcutorture TREE03 scenario got the following splat, which appears
> > to be a one-off, or if not, having an MTBF in the thousands of hours,
> > even assuming that it is specific to TREE03. (If it is not specific to
> > TREE03, we are talking tens of thousands of hours of rcutorture runtime.)
> >
> > So just in case this rings any bells or there are some diagnostics I
> > should add in case this ever happens again. ;-)
>
> There should be a dump of the enqueued tasks right after the snippet you've
> sent, any chance you could share that if it's there? That should tell us
> which tasks are potentially misbehaving.

And now that I know to look for them, there they are! Thank you!!!

CPU7 enqueued tasks (2 total):
pid: 52, name: migration/7
pid: 135, name: rcu_torture_rea
smpboot: CPU 7 is now offline

So what did rcu_torture_reader() do wrong here? ;-)

Thanx, Paul

> > ------------------------------------------------------------------------
> >
> > [29213.585252] ------------[ cut here ]------------
> > [29213.586287] Dying CPU not properly vacated!
> > [29213.586314] WARNING: CPU: 7 PID: 52 at kernel/sched/core.c:9386 sched_cpu_dying.cold.167+0xc/0xc3
> > [29213.589181] Modules linked in:
> > [29213.589864] CPU: 7 PID: 52 Comm: migration/7 Not tainted 5.18.0-rc1-00196-g24f4daa283b8 #5468
> > [29213.591720] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module_el8.5.0+746+bbd5d70c 04/01/2014
> > [29213.593698] Stopper: multi_cpu_stop+0x0/0xf0 <- stop_machine_cpuslocked+0xea/0x120
> > [29213.595333] RIP: 0010:sched_cpu_dying.cold.167+0xc/0xc3
> > [29213.596463] Code: 1f ea 00 49 89 e8 48 8d 93 38 07 00 00 48 c7 c7 60 2b d8 b9 e8 be 2a 00 00 e9 4b 7d 4a ff 48 c7 c7 18 2c d8 b9 e8 80 ed ff ff <0f> 0b 44 8b ab d0 09 00 00 8b 4b 04 48 c7 c6 95 d5 d6 b9 48 c7 c7
> > [29213.600479] RSP: 0018:ffff9cbbc023bd88 EFLAGS: 00010082
> > [29213.601630] RAX: 0000000000000000 RBX: ffff9b98df3e8cc0 RCX: 0000000000000003
> > [29213.603175] RDX: 0000000000000003 RSI: 0000000100021495 RDI: 00000000ffffffff
> > [29213.604699] RBP: 0000000000000086 R08: 0000000000000000 R09: c000000100021495
> > [29213.606230] R10: 000000000081dc88 R11: ffff9cbbc023bba8 R12: ffffffffb889d6c0
> > [29213.607755] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cbbc05bfd01
> > [29213.609297] FS: 0000000000000000(0000) GS:ffff9b98df3c0000(0000) knlGS:0000000000000000
> > [29213.611045] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [29213.612279] CR2: 0000000000000000 CR3: 0000000016c0c000 CR4: 00000000000006e0
> > [29213.613821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [29213.615433] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [29213.616972] Call Trace:
> > [29213.617516] <TASK>
> > [29213.617992] ? sched_cpu_wait_empty+0x70/0x70
> > [29213.618935] cpuhp_invoke_callback+0x100/0x4a0
> > [29213.619899] cpuhp_invoke_callback_range+0x3b/0x80
> > [29213.620933] take_cpu_down+0x55/0x80
> > [29213.621703] multi_cpu_stop+0x61/0xf0
> > [29213.622503] ? stop_machine_yield+0x10/0x10
> > [29213.623408] cpu_stopper_thread+0x84/0x120
> > [29213.624300] smpboot_thread_fn+0x181/0x220
> > [29213.625192] ? sort_range+0x20/0x20
> > [29213.625961] kthread+0xe3/0x110
> > [29213.626641] ? kthread_complete_and_exit+0x20/0x20
> > [29213.627777] ret_from_fork+0x22/0x30
> > [29213.628655] </TASK>
> > [29213.629185] ---[ end trace 0000000000000000 ]---
>

2022-04-26 00:56:40

by Valentin Schneider

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

Hi Paul,

On 21/04/22 12:38, Paul E. McKenney wrote:
> Hello!
>
> The rcutorture TREE03 scenario got the following splat, which appears
> to be a one-off, or if not, having an MTBF in the thousands of hours,
> even assuming that it is specific to TREE03. (If it is not specific to
> TREE03, we are talking tens of thousands of hours of rcutorture runtime.)
>
> So just in case this rings any bells or there are some diagnostics I
> should add in case this ever happens again. ;-)
>

There should be a dump of the enqueued tasks right after the snippet you've
sent, any chance you could share that if it's there? That should tell us
which tasks are potentially misbehaving.

> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> [29213.585252] ------------[ cut here ]------------
> [29213.586287] Dying CPU not properly vacated!
> [29213.586314] WARNING: CPU: 7 PID: 52 at kernel/sched/core.c:9386 sched_cpu_dying.cold.167+0xc/0xc3
> [29213.589181] Modules linked in:
> [29213.589864] CPU: 7 PID: 52 Comm: migration/7 Not tainted 5.18.0-rc1-00196-g24f4daa283b8 #5468
> [29213.591720] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module_el8.5.0+746+bbd5d70c 04/01/2014
> [29213.593698] Stopper: multi_cpu_stop+0x0/0xf0 <- stop_machine_cpuslocked+0xea/0x120
> [29213.595333] RIP: 0010:sched_cpu_dying.cold.167+0xc/0xc3
> [29213.596463] Code: 1f ea 00 49 89 e8 48 8d 93 38 07 00 00 48 c7 c7 60 2b d8 b9 e8 be 2a 00 00 e9 4b 7d 4a ff 48 c7 c7 18 2c d8 b9 e8 80 ed ff ff <0f> 0b 44 8b ab d0 09 00 00 8b 4b 04 48 c7 c6 95 d5 d6 b9 48 c7 c7
> [29213.600479] RSP: 0018:ffff9cbbc023bd88 EFLAGS: 00010082
> [29213.601630] RAX: 0000000000000000 RBX: ffff9b98df3e8cc0 RCX: 0000000000000003
> [29213.603175] RDX: 0000000000000003 RSI: 0000000100021495 RDI: 00000000ffffffff
> [29213.604699] RBP: 0000000000000086 R08: 0000000000000000 R09: c000000100021495
> [29213.606230] R10: 000000000081dc88 R11: ffff9cbbc023bba8 R12: ffffffffb889d6c0
> [29213.607755] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cbbc05bfd01
> [29213.609297] FS: 0000000000000000(0000) GS:ffff9b98df3c0000(0000) knlGS:0000000000000000
> [29213.611045] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [29213.612279] CR2: 0000000000000000 CR3: 0000000016c0c000 CR4: 00000000000006e0
> [29213.613821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [29213.615433] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [29213.616972] Call Trace:
> [29213.617516] <TASK>
> [29213.617992] ? sched_cpu_wait_empty+0x70/0x70
> [29213.618935] cpuhp_invoke_callback+0x100/0x4a0
> [29213.619899] cpuhp_invoke_callback_range+0x3b/0x80
> [29213.620933] take_cpu_down+0x55/0x80
> [29213.621703] multi_cpu_stop+0x61/0xf0
> [29213.622503] ? stop_machine_yield+0x10/0x10
> [29213.623408] cpu_stopper_thread+0x84/0x120
> [29213.624300] smpboot_thread_fn+0x181/0x220
> [29213.625192] ? sort_range+0x20/0x20
> [29213.625961] kthread+0xe3/0x110
> [29213.626641] ? kthread_complete_and_exit+0x20/0x20
> [29213.627777] ret_from_fork+0x22/0x30
> [29213.628655] </TASK>
> [29213.629185] ---[ end trace 0000000000000000 ]---

2022-04-26 08:32:35

by Paul E. McKenney

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

On Mon, Apr 25, 2022 at 10:59:44PM +0100, Valentin Schneider wrote:
> On 25/04/22 10:33, Paul E. McKenney wrote:
> > On Mon, Apr 25, 2022 at 05:15:13PM +0100, Valentin Schneider wrote:
> >>
> >> Hi Paul,
> >>
> >> On 21/04/22 12:38, Paul E. McKenney wrote:
> >> > Hello!
> >> >
> >> > The rcutorture TREE03 scenario got the following splat, which appears
> >> > to be a one-off, or if not, having an MTBF in the thousands of hours,
> >> > even assuming that it is specific to TREE03. (If it is not specific to
> >> > TREE03, we are talking tens of thousands of hours of rcutorture runtime.)
> >> >
> >> > So just in case this rings any bells or there are some diagnostics I
> >> > should add in case this ever happens again. ;-)
> >>
> >> There should be a dump of the enqueued tasks right after the snippet you've
> >> sent, any chance you could share that if it's there? That should tell us
> >> which tasks are potentially misbehaving.
> >
> > And now that I know to look for them, there they are! Thank you!!!
> >
> > CPU7 enqueued tasks (2 total):
> > pid: 52, name: migration/7
> > pid: 135, name: rcu_torture_rea
> > smpboot: CPU 7 is now offline
> >
> > So what did rcu_torture_reader() do wrong here? ;-)
> >
>
> So on teardown, CPUHP_AP_SCHED_WAIT_EMPTY->sched_cpu_wait_empty() waits for
> the rq to be empty. Tasks must *not* be enqueued onto that CPU after that
> step has been run - if there are per-CPU tasks bound to that CPU, they must
> be unbound in their respective hotplug callback.
>
> For instance for workqueue.c, we have workqueue_offline_cpu() as a hotplug
> callback which invokes unbind_workers(cpu), the interesting bit being:
>
> for_each_pool_worker(worker, pool) {
> kthread_set_per_cpu(worker->task, -1);
> WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
> }
>
> The rcu_torture_reader() kthreads aren't bound to any particular CPU are
> they? I can't find any code that would indicate they are - and in that case
> it means we have a problem with is_cpu_allowed() or related.

I did not intend that the rcu_torture_reader() kthreads be bound, and
I am not seeing anything that binds them.

Thoughts? (Other than that validating any alleged fix will be quite
"interesting".)

Thanx, Paul

2022-04-26 08:38:28

by Valentin Schneider

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

On 25/04/22 10:33, Paul E. McKenney wrote:
> On Mon, Apr 25, 2022 at 05:15:13PM +0100, Valentin Schneider wrote:
>>
>> Hi Paul,
>>
>> On 21/04/22 12:38, Paul E. McKenney wrote:
>> > Hello!
>> >
>> > The rcutorture TREE03 scenario got the following splat, which appears
>> > to be a one-off, or if not, having an MTBF in the thousands of hours,
>> > even assuming that it is specific to TREE03. (If it is not specific to
>> > TREE03, we are talking tens of thousands of hours of rcutorture runtime.)
>> >
>> > So just in case this rings any bells or there are some diagnostics I
>> > should add in case this ever happens again. ;-)
>>
>> There should be a dump of the enqueued tasks right after the snippet you've
>> sent, any chance you could share that if it's there? That should tell us
>> which tasks are potentially misbehaving.
>
> And now that I know to look for them, there they are! Thank you!!!
>
> CPU7 enqueued tasks (2 total):
> pid: 52, name: migration/7
> pid: 135, name: rcu_torture_rea
> smpboot: CPU 7 is now offline
>
> So what did rcu_torture_reader() do wrong here? ;-)
>

So on teardown, CPUHP_AP_SCHED_WAIT_EMPTY->sched_cpu_wait_empty() waits for
the rq to be empty. Tasks must *not* be enqueued onto that CPU after that
step has been run - if there are per-CPU tasks bound to that CPU, they must
be unbound in their respective hotplug callback.

For instance for workqueue.c, we have workqueue_offline_cpu() as a hotplug
callback which invokes unbind_workers(cpu), the interesting bit being:

for_each_pool_worker(worker, pool) {
kthread_set_per_cpu(worker->task, -1);
WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
}

The rcu_torture_reader() kthreads aren't bound to any particular CPU are
they? I can't find any code that would indicate they are - and in that case
it means we have a problem with is_cpu_allowed() or related.

2022-04-27 01:06:09

by Paul E. McKenney

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

On Tue, Apr 26, 2022 at 03:48:06PM +0100, Valentin Schneider wrote:
> On 25/04/22 17:03, Paul E. McKenney wrote:
> > On Mon, Apr 25, 2022 at 10:59:44PM +0100, Valentin Schneider wrote:
> >> On 25/04/22 10:33, Paul E. McKenney wrote:
> >> >
> >> > So what did rcu_torture_reader() do wrong here? ;-)
> >> >
> >>
> >> So on teardown, CPUHP_AP_SCHED_WAIT_EMPTY->sched_cpu_wait_empty() waits for
> >> the rq to be empty. Tasks must *not* be enqueued onto that CPU after that
> >> step has been run - if there are per-CPU tasks bound to that CPU, they must
> >> be unbound in their respective hotplug callback.
> >>
> >> For instance for workqueue.c, we have workqueue_offline_cpu() as a hotplug
> >> callback which invokes unbind_workers(cpu), the interesting bit being:
> >>
> >> for_each_pool_worker(worker, pool) {
> >> kthread_set_per_cpu(worker->task, -1);
> >> WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
> >> }
> >>
> >> The rcu_torture_reader() kthreads aren't bound to any particular CPU are
> >> they? I can't find any code that would indicate they are - and in that case
> >> it means we have a problem with is_cpu_allowed() or related.
> >
> > I did not intend that the rcu_torture_reader() kthreads be bound, and
> > I am not seeing anything that binds them.
> >
> > Thoughts? (Other than that validating any alleged fix will be quite
> > "interesting".)
>
> IIUC the bogus scenario is is_cpu_allowed() lets one of those kthreads be
> enqueued on the outgoing CPU *after* CPUHP_AP_SCHED_WAIT_EMPTY.teardown() has
> been run, and hilarity ensues.
>
> The cpu_dying() condition should prevent a regular kthread from getting
> enqueued there, most of the details have been evinced from my brain but I
> recall we got the ordering conditions right...
>
> The only other "obvious" thing here is migrate_disable() which lets the
> enqueue happen, but then balance_push()->select_fallback_rq() should punt
> it away on context switch.
>
> I need to rediscover those paths, I don't see any obvious clue right now.

Thank you for looking into this!

The only thought that came to me was to record that is_cpu_allowed()
returned true do to migration being disabled, and then use that in later
traces, printk()s or whatever.

My own favorite root-cause hypothesis was invalidated by the fact that
is_cpu_allowed() returns cpu_online(cpu) rather than just true. ;-)

Thanx, Paul

2022-04-27 09:31:43

by Valentin Schneider

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

On 25/04/22 17:03, Paul E. McKenney wrote:
> On Mon, Apr 25, 2022 at 10:59:44PM +0100, Valentin Schneider wrote:
>> On 25/04/22 10:33, Paul E. McKenney wrote:
>> >
>> > So what did rcu_torture_reader() do wrong here? ;-)
>> >
>>
>> So on teardown, CPUHP_AP_SCHED_WAIT_EMPTY->sched_cpu_wait_empty() waits for
>> the rq to be empty. Tasks must *not* be enqueued onto that CPU after that
>> step has been run - if there are per-CPU tasks bound to that CPU, they must
>> be unbound in their respective hotplug callback.
>>
>> For instance for workqueue.c, we have workqueue_offline_cpu() as a hotplug
>> callback which invokes unbind_workers(cpu), the interesting bit being:
>>
>> for_each_pool_worker(worker, pool) {
>> kthread_set_per_cpu(worker->task, -1);
>> WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
>> }
>>
>> The rcu_torture_reader() kthreads aren't bound to any particular CPU are
>> they? I can't find any code that would indicate they are - and in that case
>> it means we have a problem with is_cpu_allowed() or related.
>
> I did not intend that the rcu_torture_reader() kthreads be bound, and
> I am not seeing anything that binds them.
>
> Thoughts? (Other than that validating any alleged fix will be quite
> "interesting".)
>
> Thanx, Paul

IIUC the bogus scenario is is_cpu_allowed() lets one of those kthreads be
enqueued on the outgoing CPU *after* CPUHP_AP_SCHED_WAIT_EMPTY.teardown() has
been run, and hilarity ensues.

The cpu_dying() condition should prevent a regular kthread from getting
enqueued there, most of the details have been evinced from my brain but I
recall we got the ordering conditions right...

The only other "obvious" thing here is migrate_disable() which lets the
enqueue happen, but then balance_push()->select_fallback_rq() should punt
it away on context switch.

I need to rediscover those paths, I don't see any obvious clue right now.

2022-06-22 20:50:12

by Paul E. McKenney

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

On Tue, Apr 26, 2022 at 09:24:45AM -0700, Paul E. McKenney wrote:
> On Tue, Apr 26, 2022 at 03:48:06PM +0100, Valentin Schneider wrote:
> > On 25/04/22 17:03, Paul E. McKenney wrote:
> > > On Mon, Apr 25, 2022 at 10:59:44PM +0100, Valentin Schneider wrote:
> > >> On 25/04/22 10:33, Paul E. McKenney wrote:
> > >> >
> > >> > So what did rcu_torture_reader() do wrong here? ;-)
> > >> >
> > >>
> > >> So on teardown, CPUHP_AP_SCHED_WAIT_EMPTY->sched_cpu_wait_empty() waits for
> > >> the rq to be empty. Tasks must *not* be enqueued onto that CPU after that
> > >> step has been run - if there are per-CPU tasks bound to that CPU, they must
> > >> be unbound in their respective hotplug callback.
> > >>
> > >> For instance for workqueue.c, we have workqueue_offline_cpu() as a hotplug
> > >> callback which invokes unbind_workers(cpu), the interesting bit being:
> > >>
> > >> for_each_pool_worker(worker, pool) {
> > >> kthread_set_per_cpu(worker->task, -1);
> > >> WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
> > >> }
> > >>
> > >> The rcu_torture_reader() kthreads aren't bound to any particular CPU are
> > >> they? I can't find any code that would indicate they are - and in that case
> > >> it means we have a problem with is_cpu_allowed() or related.
> > >
> > > I did not intend that the rcu_torture_reader() kthreads be bound, and
> > > I am not seeing anything that binds them.
> > >
> > > Thoughts? (Other than that validating any alleged fix will be quite
> > > "interesting".)
> >
> > IIUC the bogus scenario is is_cpu_allowed() lets one of those kthreads be
> > enqueued on the outgoing CPU *after* CPUHP_AP_SCHED_WAIT_EMPTY.teardown() has
> > been run, and hilarity ensues.
> >
> > The cpu_dying() condition should prevent a regular kthread from getting
> > enqueued there, most of the details have been evinced from my brain but I
> > recall we got the ordering conditions right...
> >
> > The only other "obvious" thing here is migrate_disable() which lets the
> > enqueue happen, but then balance_push()->select_fallback_rq() should punt
> > it away on context switch.
> >
> > I need to rediscover those paths, I don't see any obvious clue right now.
>
> Thank you for looking into this!
>
> The only thought that came to me was to record that is_cpu_allowed()
> returned true do to migration being disabled, and then use that in later
> traces, printk()s or whatever.
>
> My own favorite root-cause hypothesis was invalidated by the fact that
> is_cpu_allowed() returns cpu_online(cpu) rather than just true. ;-)

And I hit this on two of fifteen TREE03 runs on a merge of -rcu with
yesterday's linus/master. Might be a fluke, but I thought I should
at least report it. This is the first time since my last email in
this thread.

Thanx, Paul

2022-07-05 07:47:26

by Valentin Schneider

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

Hi Paul,

On 22/06/22 12:58, Paul E. McKenney wrote:
>
> And I hit this on two of fifteen TREE03 runs on a merge of -rcu with
> yesterday's linus/master. Might be a fluke, but I thought I should
> at least report it. This is the first time since my last email in
> this thread.
>
> Thanx, Paul

Did you get anything other than rcu_torture_reader in the task dump this
time?

2022-07-05 17:31:16

by Paul E. McKenney

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

On Tue, Jul 05, 2022 at 08:45:35AM +0100, Valentin Schneider wrote:
> Hi Paul,
>
> On 22/06/22 12:58, Paul E. McKenney wrote:
> >
> > And I hit this on two of fifteen TREE03 runs on a merge of -rcu with
> > yesterday's linus/master. Might be a fluke, but I thought I should
> > at least report it. This is the first time since my last email in
> > this thread.
> >
> > Thanx, Paul
>
> Did you get anything other than rcu_torture_reader in the task dump this
> time?

I did. For my future reference, this is devserver rcutorture run
2022.06.21-21.13.37-remote, scenarios TREE03.11 and TREE03.5.

Here are the splats, the first including rcu_torture_reader() as before,
but the second instead containing rcu_torture_fwd():

------------------------------------------------------------------------

TREE03.5:

[[13599.132723] ------------[ cut here ]------------
[13599.132725] Dying CPU not properly vacated!
[13599.132740] WARNING: CPU: 3 PID: 31 at kernel/sched/core.c:9494 sched_cpu_dyi
ng.cold.167+0xc/0xc3
[13599.132756] Modules linked in:
[13599.132759] CPU: 3 PID: 31 Comm: migration/3 Not tainted 5.19.0-rc3-00143-g16
22bc5a1d51-dirty #64
[13599.132766] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-1.module_el8.7.0+
1140+ff0772f9 04/01/2014
[13599.132768] Stopper: multi_cpu_stop+0x0/0xf0 <- __stop_cpus.constprop.9+0x3c/
0x80
[13599.132777] RIP: 0010:sched_cpu_dying.cold.167+0xc/0xc3
[13599.132782] Code: b9 e8 00 49 89 e8 48 8d 93 38 07 00 00 48 c7 c7 70 60 99 b3
e8 f4 2d 00 00 e9 ed 65 49 ff 48 c7 c7 28 61 99 b3 e8 0e ee ff ff <0f> 0b 44 8b
ab d0 09 00 00 8b 4b 04 48 c7 c6 35 0b 98 b3 48 c7 c7
[13599.132785] RSP: 0000:ffff9e4080183d88 EFLAGS: 00010082
[13599.132788] RAX: 0000000000000000 RBX: ffff8b641f2e8cc0 RCX: 0000000000000003
[13599.132790] RDX: 0000000000000003 RSI: 4000000100010677 RDI: 00000000ffffffff
[13599.132791] RBP: 0000000000000086 R08: ffffffffb3c55d00 R09: 727020746f6e2055
[13599.132793] R10: ffffffffb3c55d00 R11: 2164657461636176 R12: ffffffffb249ed00
[13599.132794] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9e40805c7c01
[13599.132796] FS: 0000000000000000(0000) GS:ffff8b641f2c0000(0000) knlGS:00000
00000000000
[13599.132800] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13599.132801] CR2: 0000000000000000 CR3: 000000000d00c000 CR4: 00000000000006e0
[13599.132803] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13599.132804] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13599.132805] Call Trace:
[13599.132813] <TASK>
[13599.132822] ? sched_cpu_wait_empty+0x70/0x70
[13599.132828] cpuhp_invoke_callback+0x100/0x4a0
[13599.132832] cpuhp_invoke_callback_range+0x3b/0x80
[13599.132835] take_cpu_down+0x55/0x80
[13599.132837] multi_cpu_stop+0x61/0xf0
[13599.132840] ? stop_machine_yield+0x10/0x10
[13599.132843] cpu_stopper_thread+0x84/0x120
[13599.132847] smpboot_thread_fn+0x181/0x220
[13599.132851] ? sort_range+0x20/0x20
[13599.132853] kthread+0xe3/0x110
[13599.132857] ? kthread_complete_and_exit+0x20/0x20
[13599.132860] ret_from_fork+0x22/0x30
[13599.132865] </TASK>
[13599.132866] ---[ end trace 0000000000000000 ]---
[13599.132868] CPU3 enqueued tasks (2 total):
[13599.132873] pid: 31, name: migration/3
[13599.132883] pid: 127, name: rcu_torture_rea

TREE03.11:

[14440.605298] ------------[ cut here ]------------
[14440.605300] Dying CPU not properly vacated!
[14440.605316] WARNING: CPU: 8 PID: 58 at kernel/sched/core.c:9494 sched_cpu_dyi
ng.cold.167+0xc/0xc3
[14440.610797] Modules linked in:
[14440.611395] CPU: 8 PID: 58 Comm: migration/8 Not tainted 5.19.0-rc3-00143-g16
22bc5a1d51-dirty #64
[14440.613116] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-1.module_el8.7.0+
1140+ff0772f9 04/01/2014
[14440.614895] Stopper: multi_cpu_stop+0x0/0xf0 <- __stop_cpus.constprop.9+0x3c/
0x80
[14440.616341] RIP: 0010:sched_cpu_dying.cold.167+0xc/0xc3
[14440.617360] Code: b9 e8 00 49 89 e8 48 8d 93 38 07 00 00 48 c7 c7 70 60 19 a8
e8 f4 2d 00 00 e9 ed 65 49 ff 48 c7 c7 28 61 19 a8 e8 0e ee ff ff <0f> 0b 44 8b
ab d0 09 00 00 8b 4b 04 48 c7 c6 35 0b 18 a8 48 c7 c7
[14440.620974] RSP: 0000:ffffb68a0026fd88 EFLAGS: 00010082
[14440.621990] RAX: 0000000000000000 RBX: ffffa20adf428cc0 RCX: 0000000000000003
[14440.623386] RDX: 0000000080000003 RSI: 4000000100011647 RDI: 00000000ffffffff
[14440.624770] RBP: 0000000000000086 R08: ffffffffa8455d00 R09: 727020746f6e2055
[14440.626151] R10: ffffffffa8455d00 R11: 2164657461636176 R12: ffffffffa6c9ed00
[14440.627528] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb68a005bfc01
[14440.628912] FS: 0000000000000000(0000) GS:ffffa20adf400000(0000) knlGS:00000
00000000000
[14440.630479] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14440.631599] CR2: 0000000000000000 CR3: 000000001dc0c000 CR4: 00000000000006e0
[14440.632987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[14440.634368] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[14440.635754] Call Trace:
[14440.636244] <TASK>
[14440.636676] ? sched_cpu_wait_empty+0x70/0x70
[14440.637529] cpuhp_invoke_callback+0x100/0x4a0
[14440.638410] cpuhp_invoke_callback_range+0x3b/0x80
[14440.639347] take_cpu_down+0x55/0x80
[14440.640056] multi_cpu_stop+0x61/0xf0
[14440.640774] ? stop_machine_yield+0x10/0x10
[14440.641584] cpu_stopper_thread+0x84/0x120
[14440.642386] smpboot_thread_fn+0x181/0x220
[14440.643187] ? sort_range+0x20/0x20
[14440.643917] kthread+0xe3/0x110
[14440.644535] ? kthread_complete_and_exit+0x20/0x20
[14440.645466] ret_from_fork+0x22/0x30
[14440.646170] </TASK>
[14440.646599] ---[ end trace 0000000000000000 ]---
[14440.647495] CPU8 enqueued tasks (2 total):
[14440.647506] pid: 58, name: migration/8
[14440.647518] pid: 161, name: rcu_torture_fwd

------------------------------------------------------------------------

The second of these occurred near shutdown, but the first was quite some
time before shutdown. In case that makes a difference.

I have not seen this since.

Any other diagnostics I should add?

Thanx, Paul

2022-08-02 09:51:17

by Valentin Schneider

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

On 05/07/22 10:23, Paul E. McKenney wrote:
> The second of these occurred near shutdown, but the first was quite some
> time before shutdown. In case that makes a difference.
>
> I have not seen this since.
>
> Any other diagnostics I should add?
>

Sorry, I let this get buried to the bottom of my inbox :(

I've had another look at rcutorture.c but just like for
rcu_torture_reader(), I don't see any obvious culprit (no
kthread_set_per_cpu() usage)).

One thing I think would help is a scheduling trace (say sched_switch,
sched_wakeup and cpuhp*, combined with ftrace_dump_on_oops + panic_on_warn
?) - that should at least tell us if the issue is in the wakeup placement
(if the task gets placed on a dying CPU *after* CPUHP_AP_ACTIVE), or in the
balance_push() mechanism (the task was *already* on the CPU when it started
dying and never moved away).

Neither make sense to me, but it has to be somewhere in there...

> Thanx, Paul

2023-09-06 16:03:49

by Paul E. McKenney

[permalink] [raw]

Subject: Re: "Dying CPU not properly vacated" splat

On Tue, Aug 02, 2022 at 10:30:02AM +0100, Valentin Schneider wrote:
> On 05/07/22 10:23, Paul E. McKenney wrote:
> > The second of these occurred near shutdown, but the first was quite some
> > time before shutdown. In case that makes a difference.
> >
> > I have not seen this since.
> >
> > Any other diagnostics I should add?
> >
>
> Sorry, I let this get buried to the bottom of my inbox :(
>
> I've had another look at rcutorture.c but just like for
> rcu_torture_reader(), I don't see any obvious culprit (no
> kthread_set_per_cpu() usage)).
>
> One thing I think would help is a scheduling trace (say sched_switch,
> sched_wakeup and cpuhp*, combined with ftrace_dump_on_oops + panic_on_warn
> ?) - that should at least tell us if the issue is in the wakeup placement
> (if the task gets placed on a dying CPU *after* CPUHP_AP_ACTIVE), or in the
> balance_push() mechanism (the task was *already* on the CPU when it started
> dying and never moved away).
>
> Neither make sense to me, but it has to be somewhere in there...

And given that it has been more than a year since I have seen this,
I am considering it to be fixed, whether purposefully or accidentally.

Thanx, Paul