2022-07-13 13:12:16

by Schspa Shi

[permalink] [raw]
Subject: [PATCH v2] workqueue: Use active mask for new worker when pool is DISASSOCIATED

When CPU-[un]hotplugs, all workers will be bound to active CPU via
unbind_workers().

But the unbound worker still has a chance to create a new worker, which
has bound the newly created task to pool->attrs->cpumask. But the CPU has
been unplugged.

Please refer to the following scenarios.

CPU0 CPU1
------------------------------------------------------------------
sched_cpu_deactivate(cpu_active_mask clear)
workqueue_offline_cpu(work pool POOL_DISASSOCIATED)
-- all worker will migrate to another cpu --
worker_thread
-- will create new worker if
pool->worklist is not empty
create_worker()
-- new kworker will bound to CPU0
(pool->attrs->cpumask will be mask of CPU0).
kworker/0:x will running on rq

sched_cpu_dying
if (rq->nr_running != 1 || rq_has_pinned_tasks(rq))
WARN(true, "Dying CPU not properly vacated!");
---------OOPS-------------

And the crash log as fellowing.

The crash log is as follows:
[ 1622.829074] ------------[ cut here ]------------
[ 1622.829081] Dying CPU not properly vacated!
[ 1622.829091] WARNING: CPU: 3 PID: 31 at kernel/sched/core.c:7756 sched_cpu_dying+0x74/0x204
[ 1622.829374] CPU: 3 PID: 31 Comm: migration/3 Tainted: P O 5.10.59-rt52 #2
[ 1622.829386] Stopper: multi_cpu_stop+0x0/0x160 <- 0x0
[ 1622.829400] pstate: 60c00009 (nZCv daif +PAN +UAO -TCO BTYPE=--)
[ 1622.829408] pc : sched_cpu_dying+0x74/0x204
[ 1622.829415] lr : sched_cpu_dying+0x74/0x204
[ 1622.829421] sp : ffff800012933c70
[ 1622.829424] pmr_save: 00000060
[ 1622.829426] x29: ffff800012933c70 x28: 0000000000000000
[ 1622.829435] x27: 0000000000000000 x26: 0000000000000001
[ 1622.829444] x25: 0000000000000000 x24: ffff800018353c2c
[ 1622.829452] x23: 0000000000000003 x22: 0000000000000003
[ 1622.829460] x21: 0000000000000059 x20: 0000000000000000
[ 1622.829468] x19: ffff00027ee68a80 x18: 0000000000000000
[ 1622.829477] x17: 0000000000000000 x16: 0000000000000000
[ 1622.829485] x15: ffffffffffffffff x14: ffff80001169ae30
[ 1622.829493] x13: ffffffffffc38c07 x12: ffffffffffffffff
[ 1622.829501] x11: ffffffffffe00000 x10: ffff80001169ae58
[ 1622.829510] x9 : 000000000000001e x8 : ffff80001169ae30
[ 1622.829518] x7 : ffff800012933ab0 x6 : 00000000ffff0e20
[ 1622.829526] x5 : ffff00027ee62988 x4 : 00000000ffff0e20
[ 1622.829535] x3 : ffff800011e30180 x2 : 0000000100000002
[ 1622.829543] x1 : 0000000000000000 x0 : 0000000000000000
[ 1622.829552] Call trace:
[ 1622.829555] sched_cpu_dying+0x74/0x204
[ 1622.829562] cpuhp_invoke_callback+0xc0/0x1b0
[ 1622.829571] take_cpu_down+0xbc/0xd4
[ 1622.829577] multi_cpu_stop+0x138/0x160
[ 1622.829584] cpu_stopper_thread+0x9c/0x118
[ 1622.829591] smpboot_thread_fn+0x1e8/0x1ec
[ 1622.829598] kthread+0x114/0x124
[ 1622.829606] ret_from_fork+0x10/0x30
[ 1622.829615] ---[ end trace 0000000000000002 ]---
[ 1623.830273] CPU3 enqueued tasks (2 total):
[ 1623.830291] pid: 31, name: migration/3
[ 1623.830440] pid: 25654, name: kworker/3:0
[ 1623.830444] task:kworker/3:0 state:R running task stack: 0 pid:25654 ppid: 2 flags:0x00000028
[ 1623.830458] Call trace:
[ 1623.830460] __switch_to+0x164/0x17c
[ 1623.830472] __schedule+0x4cc/0x5c0
[ 1623.830483] schedule+0x7c/0xcc
[ 1623.830491] schedule_preempt_disabled+0x14/0x24
[ 1623.830500] kthread+0xd8/0x124
[ 1623.830509] ret_from_fork+0x10/0x30

In the crash log, the error enqueued kworker(pid: 25654, name: kworker/3:0)
is a newly created thread, and have bind to a unpluged CPU 3.

Can crash dump can also verify this.
crash> task -R nr_cpus_allowed,cpus_mask 25654
PID: 25654 TASK: ffff000181ff0000 CPU: 3 COMMAND: "kworker/3:0"
nr_cpus_allowed = 1,
cpus_mask = {
bits = {8, 0}
},

crash> struct worker_pool.cpu,nr_workers,attrs 0xffff00027ee68380
cpu = 3,
nr_workers = 3,
attrs = 0xffff000180004480,
crash> struct workqueue_attrs 0xffff000180004480
struct workqueue_attrs {
nice = 0,
cpumask = {{
bits = {8, 0}
}},
no_numa = false
}

The stack trace of the bad running task was dumped via the following patch:
Link: https://lore.kernel.org/all/[email protected]/
And I think this debug patch needs to be added to the mainline,
it can help us to debug this kind of problem

To fix it, we can use cpu_active_mask when work pool is DISASSOCIATED.

Signed-off-by: Schspa Shi <[email protected]>

--

Changelog:
v1 -> v2:
- Move worker task bind to worker_attach_to_pool, remove extra
wq_pool_attach_mutex added.
- Add a timing diagram to make this question clearer.
---
kernel/workqueue.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1ea50f6be843..3f6e6d884793 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1867,13 +1867,13 @@ static void worker_attach_to_pool(struct worker *worker,
* stable across this function. See the comments above the flag
* definition for details.
*/
- if (pool->flags & POOL_DISASSOCIATED)
+ if (pool->flags & POOL_DISASSOCIATED) {
worker->flags |= WORKER_UNBOUND;
- else
+ set_cpus_allowed_ptr(worker->task, cpu_active_mask);
+ } else {
kthread_set_per_cpu(worker->task, pool->cpu);
-
- if (worker->rescue_wq)
set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+ }

list_add_tail(&worker->node, &pool->workers);
worker->pool = pool;
@@ -1952,7 +1952,6 @@ static struct worker *create_worker(struct worker_pool *pool)
goto fail;

set_user_nice(worker->task, pool->attrs->nice);
- kthread_bind_mask(worker->task, pool->attrs->cpumask);

/* successful, attach the worker to the pool */
worker_attach_to_pool(worker, pool);
--
2.29.0


2022-07-14 00:34:56

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH v2] workqueue: Use active mask for new worker when pool is DISASSOCIATED

On Wed, Jul 13, 2022 at 8:51 PM Schspa Shi <[email protected]> wrote:
>
> When CPU-[un]hotplugs, all workers will be bound to active CPU via
> unbind_workers().
>
> But the unbound worker still has a chance to create a new worker, which
> has bound the newly created task to pool->attrs->cpumask. But the CPU has
> been unplugged.
>
> Please refer to the following scenarios.
>
> CPU0 CPU1
> ------------------------------------------------------------------
> sched_cpu_deactivate(cpu_active_mask clear)
> workqueue_offline_cpu(work pool POOL_DISASSOCIATED)
> -- all worker will migrate to another cpu --
> worker_thread
> -- will create new worker if
> pool->worklist is not empty
> create_worker()
> -- new kworker will bound to CPU0
> (pool->attrs->cpumask will be mask of CPU0).
> kworker/0:x will running on rq
>
> sched_cpu_dying
> if (rq->nr_running != 1 || rq_has_pinned_tasks(rq))
> WARN(true, "Dying CPU not properly vacated!");
> ---------OOPS-------------
>
> And the crash log as fellowing.
>
> The crash log is as follows:
> [ 1622.829074] ------------[ cut here ]------------
> [ 1622.829081] Dying CPU not properly vacated!
> [ 1622.829091] WARNING: CPU: 3 PID: 31 at kernel/sched/core.c:7756 sched_cpu_dying+0x74/0x204
> [ 1622.829374] CPU: 3 PID: 31 Comm: migration/3 Tainted: P O 5.10.59-rt52 #2
> [ 1622.829386] Stopper: multi_cpu_stop+0x0/0x160 <- 0x0
> [ 1622.829400] pstate: 60c00009 (nZCv daif +PAN +UAO -TCO BTYPE=--)
> [ 1622.829408] pc : sched_cpu_dying+0x74/0x204
> [ 1622.829415] lr : sched_cpu_dying+0x74/0x204
> [ 1622.829421] sp : ffff800012933c70
> [ 1622.829424] pmr_save: 00000060
> [ 1622.829426] x29: ffff800012933c70 x28: 0000000000000000
> [ 1622.829435] x27: 0000000000000000 x26: 0000000000000001
> [ 1622.829444] x25: 0000000000000000 x24: ffff800018353c2c
> [ 1622.829452] x23: 0000000000000003 x22: 0000000000000003
> [ 1622.829460] x21: 0000000000000059 x20: 0000000000000000
> [ 1622.829468] x19: ffff00027ee68a80 x18: 0000000000000000
> [ 1622.829477] x17: 0000000000000000 x16: 0000000000000000
> [ 1622.829485] x15: ffffffffffffffff x14: ffff80001169ae30
> [ 1622.829493] x13: ffffffffffc38c07 x12: ffffffffffffffff
> [ 1622.829501] x11: ffffffffffe00000 x10: ffff80001169ae58
> [ 1622.829510] x9 : 000000000000001e x8 : ffff80001169ae30
> [ 1622.829518] x7 : ffff800012933ab0 x6 : 00000000ffff0e20
> [ 1622.829526] x5 : ffff00027ee62988 x4 : 00000000ffff0e20
> [ 1622.829535] x3 : ffff800011e30180 x2 : 0000000100000002
> [ 1622.829543] x1 : 0000000000000000 x0 : 0000000000000000
> [ 1622.829552] Call trace:
> [ 1622.829555] sched_cpu_dying+0x74/0x204
> [ 1622.829562] cpuhp_invoke_callback+0xc0/0x1b0
> [ 1622.829571] take_cpu_down+0xbc/0xd4
> [ 1622.829577] multi_cpu_stop+0x138/0x160
> [ 1622.829584] cpu_stopper_thread+0x9c/0x118
> [ 1622.829591] smpboot_thread_fn+0x1e8/0x1ec
> [ 1622.829598] kthread+0x114/0x124
> [ 1622.829606] ret_from_fork+0x10/0x30
> [ 1622.829615] ---[ end trace 0000000000000002 ]---
> [ 1623.830273] CPU3 enqueued tasks (2 total):
> [ 1623.830291] pid: 31, name: migration/3
> [ 1623.830440] pid: 25654, name: kworker/3:0
> [ 1623.830444] task:kworker/3:0 state:R running task stack: 0 pid:25654 ppid: 2 flags:0x00000028
> [ 1623.830458] Call trace:
> [ 1623.830460] __switch_to+0x164/0x17c
> [ 1623.830472] __schedule+0x4cc/0x5c0
> [ 1623.830483] schedule+0x7c/0xcc
> [ 1623.830491] schedule_preempt_disabled+0x14/0x24
> [ 1623.830500] kthread+0xd8/0x124
> [ 1623.830509] ret_from_fork+0x10/0x30
>
> In the crash log, the error enqueued kworker(pid: 25654, name: kworker/3:0)
> is a newly created thread, and have bind to a unpluged CPU 3.
>
> Can crash dump can also verify this.
> crash> task -R nr_cpus_allowed,cpus_mask 25654
> PID: 25654 TASK: ffff000181ff0000 CPU: 3 COMMAND: "kworker/3:0"
> nr_cpus_allowed = 1,
> cpus_mask = {
> bits = {8, 0}
> },
>
> crash> struct worker_pool.cpu,nr_workers,attrs 0xffff00027ee68380
> cpu = 3,
> nr_workers = 3,
> attrs = 0xffff000180004480,
> crash> struct workqueue_attrs 0xffff000180004480
> struct workqueue_attrs {
> nice = 0,
> cpumask = {{
> bits = {8, 0}
> }},
> no_numa = false
> }
>
> The stack trace of the bad running task was dumped via the following patch:
> Link: https://lore.kernel.org/all/[email protected]/
> And I think this debug patch needs to be added to the mainline,
> it can help us to debug this kind of problem
>
> To fix it, we can use cpu_active_mask when work pool is DISASSOCIATED.
>
> Signed-off-by: Schspa Shi <[email protected]>
>
> --
>
> Changelog:
> v1 -> v2:
> - Move worker task bind to worker_attach_to_pool, remove extra
> wq_pool_attach_mutex added.
> - Add a timing diagram to make this question clearer.
> ---
> kernel/workqueue.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1ea50f6be843..3f6e6d884793 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1867,13 +1867,13 @@ static void worker_attach_to_pool(struct worker *worker,
> * stable across this function. See the comments above the flag
> * definition for details.
> */
> - if (pool->flags & POOL_DISASSOCIATED)
> + if (pool->flags & POOL_DISASSOCIATED) {
> worker->flags |= WORKER_UNBOUND;
> - else
> + set_cpus_allowed_ptr(worker->task, cpu_active_mask);

> + } else {
> kthread_set_per_cpu(worker->task, pool->cpu);
> -
> - if (worker->rescue_wq)
> set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> + }



For unbound pools (get_unbound_pool(), which also has POOL_DISASSOCIATED),
pool->attrs->cpumask should be used if pool->attrs->cpumask has active cpu.

if (cpumask_intersects(pool->attrs->cpumask, cpu_active_mask))
set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
else
set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask);

And put it above the "if (pool->flags & POOL_DISASSOCIATED)" code block.

>
> list_add_tail(&worker->node, &pool->workers);
> worker->pool = pool;
> @@ -1952,7 +1952,6 @@ static struct worker *create_worker(struct worker_pool *pool)
> goto fail;
>
> set_user_nice(worker->task, pool->attrs->nice);
> - kthread_bind_mask(worker->task, pool->attrs->cpumask);

worker->flags |= PF_NO_SETAFFINITY;
PF_NO_SETAFFINITY needs to be set for workers.

>
> /* successful, attach the worker to the pool */
> worker_attach_to_pool(worker, pool);
> --
> 2.29.0
>

2022-07-14 02:12:33

by Schspa Shi

[permalink] [raw]
Subject: Re: [PATCH v2] workqueue: Use active mask for new worker when pool is DISASSOCIATED

Lai Jiangshan <[email protected]> writes:

> On Wed, Jul 13, 2022 at 8:51 PM Schspa Shi <[email protected]> wrote:
>>
>> When CPU-[un]hotplugs, all workers will be bound to active CPU via
>> unbind_workers().
>>
>> But the unbound worker still has a chance to create a new worker, which
>> has bound the newly created task to pool->attrs->cpumask. But the CPU has
>> been unplugged.
>>
>> Please refer to the following scenarios.
>>
>> CPU0 CPU1
>> ------------------------------------------------------------------
>> sched_cpu_deactivate(cpu_active_mask clear)
>> workqueue_offline_cpu(work pool POOL_DISASSOCIATED)
>> -- all worker will migrate to another cpu --
>> worker_thread
>> -- will create new worker if
>> pool->worklist is not empty
>> create_worker()
>> -- new kworker will bound to CPU0
>> (pool->attrs->cpumask will be mask of CPU0).
>> kworker/0:x will running on rq
>>
>> sched_cpu_dying
>> if (rq->nr_running != 1 || rq_has_pinned_tasks(rq))
>> WARN(true, "Dying CPU not properly vacated!");
>> ---------OOPS-------------
>>
>> And the crash log as fellowing.
>>
>> The crash log is as follows:
>> [ 1622.829074] ------------[ cut here ]------------
>> [ 1622.829081] Dying CPU not properly vacated!
>> [ 1622.829091] WARNING: CPU: 3 PID: 31 at kernel/sched/core.c:7756 sched_cpu_dying+0x74/0x204
>> [ 1622.829374] CPU: 3 PID: 31 Comm: migration/3 Tainted: P O 5.10.59-rt52 #2
>> [ 1622.829386] Stopper: multi_cpu_stop+0x0/0x160 <- 0x0
>> [ 1622.829400] pstate: 60c00009 (nZCv daif +PAN +UAO -TCO BTYPE=--)
>> [ 1622.829408] pc : sched_cpu_dying+0x74/0x204
>> [ 1622.829415] lr : sched_cpu_dying+0x74/0x204
>> [ 1622.829421] sp : ffff800012933c70
>> [ 1622.829424] pmr_save: 00000060
>> [ 1622.829426] x29: ffff800012933c70 x28: 0000000000000000
>> [ 1622.829435] x27: 0000000000000000 x26: 0000000000000001
>> [ 1622.829444] x25: 0000000000000000 x24: ffff800018353c2c
>> [ 1622.829452] x23: 0000000000000003 x22: 0000000000000003
>> [ 1622.829460] x21: 0000000000000059 x20: 0000000000000000
>> [ 1622.829468] x19: ffff00027ee68a80 x18: 0000000000000000
>> [ 1622.829477] x17: 0000000000000000 x16: 0000000000000000
>> [ 1622.829485] x15: ffffffffffffffff x14: ffff80001169ae30
>> [ 1622.829493] x13: ffffffffffc38c07 x12: ffffffffffffffff
>> [ 1622.829501] x11: ffffffffffe00000 x10: ffff80001169ae58
>> [ 1622.829510] x9 : 000000000000001e x8 : ffff80001169ae30
>> [ 1622.829518] x7 : ffff800012933ab0 x6 : 00000000ffff0e20
>> [ 1622.829526] x5 : ffff00027ee62988 x4 : 00000000ffff0e20
>> [ 1622.829535] x3 : ffff800011e30180 x2 : 0000000100000002
>> [ 1622.829543] x1 : 0000000000000000 x0 : 0000000000000000
>> [ 1622.829552] Call trace:
>> [ 1622.829555] sched_cpu_dying+0x74/0x204
>> [ 1622.829562] cpuhp_invoke_callback+0xc0/0x1b0
>> [ 1622.829571] take_cpu_down+0xbc/0xd4
>> [ 1622.829577] multi_cpu_stop+0x138/0x160
>> [ 1622.829584] cpu_stopper_thread+0x9c/0x118
>> [ 1622.829591] smpboot_thread_fn+0x1e8/0x1ec
>> [ 1622.829598] kthread+0x114/0x124
>> [ 1622.829606] ret_from_fork+0x10/0x30
>> [ 1622.829615] ---[ end trace 0000000000000002 ]---
>> [ 1623.830273] CPU3 enqueued tasks (2 total):
>> [ 1623.830291] pid: 31, name: migration/3
>> [ 1623.830440] pid: 25654, name: kworker/3:0
>> [ 1623.830444] task:kworker/3:0 state:R running task stack: 0 pid:25654 ppid: 2 flags:0x00000028
>> [ 1623.830458] Call trace:
>> [ 1623.830460] __switch_to+0x164/0x17c
>> [ 1623.830472] __schedule+0x4cc/0x5c0
>> [ 1623.830483] schedule+0x7c/0xcc
>> [ 1623.830491] schedule_preempt_disabled+0x14/0x24
>> [ 1623.830500] kthread+0xd8/0x124
>> [ 1623.830509] ret_from_fork+0x10/0x30
>>
>> In the crash log, the error enqueued kworker(pid: 25654, name: kworker/3:0)
>> is a newly created thread, and have bind to a unpluged CPU 3.
>>
>> Can crash dump can also verify this.
>> crash> task -R nr_cpus_allowed,cpus_mask 25654
>> PID: 25654 TASK: ffff000181ff0000 CPU: 3 COMMAND: "kworker/3:0"
>> nr_cpus_allowed = 1,
>> cpus_mask = {
>> bits = {8, 0}
>> },
>>
>> crash> struct worker_pool.cpu,nr_workers,attrs 0xffff00027ee68380
>> cpu = 3,
>> nr_workers = 3,
>> attrs = 0xffff000180004480,
>> crash> struct workqueue_attrs 0xffff000180004480
>> struct workqueue_attrs {
>> nice = 0,
>> cpumask = {{
>> bits = {8, 0}
>> }},
>> no_numa = false
>> }
>>
>> The stack trace of the bad running task was dumped via the following patch:
>> Link: https://lore.kernel.org/all/[email protected]/
>> And I think this debug patch needs to be added to the mainline,
>> it can help us to debug this kind of problem
>>
>> To fix it, we can use cpu_active_mask when work pool is DISASSOCIATED.
>>
>> Signed-off-by: Schspa Shi <[email protected]>
>>
>> --
>>
>> Changelog:
>> v1 -> v2:
>> - Move worker task bind to worker_attach_to_pool, remove extra
>> wq_pool_attach_mutex added.
>> - Add a timing diagram to make this question clearer.
>> ---
>> kernel/workqueue.c | 9 ++++-----
>> 1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 1ea50f6be843..3f6e6d884793 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -1867,13 +1867,13 @@ static void worker_attach_to_pool(struct worker *worker,
>> * stable across this function. See the comments above the flag
>> * definition for details.
>> */
>> - if (pool->flags & POOL_DISASSOCIATED)
>> + if (pool->flags & POOL_DISASSOCIATED) {
>> worker->flags |= WORKER_UNBOUND;
>> - else
>> + set_cpus_allowed_ptr(worker->task, cpu_active_mask);
>
>> + } else {
>> kthread_set_per_cpu(worker->task, pool->cpu);
>> -
>> - if (worker->rescue_wq)
>> set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
>> + }
>
>
>
> For unbound pools (get_unbound_pool(), which also has POOL_DISASSOCIATED),
> pool->attrs->cpumask should be used if pool->attrs->cpumask has active cpu.
>
> if (cpumask_intersects(pool->attrs->cpumask, cpu_active_mask))
> set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> else
> set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask);
>
> And put it above the "if (pool->flags & POOL_DISASSOCIATED)" code block.
>

Thanks for remainding me. will change to this.

>>
>> list_add_tail(&worker->node, &pool->workers);
>> worker->pool = pool;
>> @@ -1952,7 +1952,6 @@ static struct worker *create_worker(struct worker_pool *pool)
>> goto fail;
>>
>> set_user_nice(worker->task, pool->attrs->nice);
>> - kthread_bind_mask(worker->task, pool->attrs->cpumask);
>
> worker->flags |= PF_NO_SETAFFINITY;
> PF_NO_SETAFFINITY needs to be set for workers.
>

Sorry for missing this step, it will be added in the next version of
the patch.


>>
>> /* successful, attach the worker to the pool */
>> worker_attach_to_pool(worker, pool);
>> --
>> 2.29.0
>>

--
BRs
Schspa Shi