[permalink] [raw]

Subject: Re: [PATCH -tip V2 00/10] workqueue: break affinity initiatively

On Sat, Dec 19, 2020 at 01:09:09AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> 06249738a41a ("workqueue: Manually break affinity on hotplug")
> said that scheduler will not force break affinity for us.
>
> But workqueue highly depends on the old behavior. Many parts of the codes
> relies on it, 06249738a41a ("workqueue: Manually break affinity on hotplug")
> is not enough to change it, and the commit has flaws in itself too.
>
> It doesn't handle for worker detachment.
> It doesn't handle for worker attachement, mainly worker creation
> which is handled by Valentin Schneider's patch [1].
> It doesn't handle for unbound workers which might be possible
> per-cpu-kthread.
>
> We need to thoroughly update the way workqueue handles affinity
> in cpu hot[un]plug, what is this patchset intends to do and
> replace the Valentin Schneider's patch [1]. The equivalent patch
> is patch 10.
>
> Patch 1 fixes a flaw reported by Hillf Danton <[email protected]>.
> I have to include this fix because later patches depends on it.
>
> The patchset is based on tip/master rather than workqueue tree,
> because the patchset is a complement for 06249738a41a ("workqueue:
> Manually break affinity on hotplug") which is only in tip/master by now.
>
> And TJ acked to route the series through tip.
>
> Changed from V1:
> Add TJ's acked-by for the whole patchset
>
> Add more words to the comments and the changelog, mainly derived
> from discussion with Peter.
>
> Update the comments as TJ suggested.
>
> Update a line of code as Valentin suggested.
>
> Add Valentin's ack for patch 10 because "Seems alright to me." and
> add Valentin's comments to the changelog which is integral.
>
> [1]: https://lore.kernel.org/r/[email protected]
> [V1 patcheset]: https://lore.kernel.org/lkml/[email protected]/
>
> Cc: Hillf Danton <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Vincent Donnefort <[email protected]>
> Cc: Tejun Heo <[email protected]>

And rcutorture hits this, so thank you for the fix!

Tested-by: Paul E. McKenney <[email protected]>

> Lai Jiangshan (10):
> workqueue: restore unbound_workers' cpumask correctly
> workqueue: use cpu_possible_mask instead of cpu_active_mask to break
> affinity
> workqueue: Manually break affinity on pool detachment
> workqueue: don't set the worker's cpumask when kthread_bind_mask()
> workqueue: introduce wq_online_cpumask
> workqueue: use wq_online_cpumask in restore_unbound_workers_cpumask()
> workqueue: Manually break affinity on hotplug for unbound pool
> workqueue: reorganize workqueue_online_cpu()
> workqueue: reorganize workqueue_offline_cpu() unbind_workers()
> workqueue: Fix affinity of kworkers when attaching into pool
>
> kernel/workqueue.c | 214 ++++++++++++++++++++++++++++-----------------
> 1 file changed, 132 insertions(+), 82 deletions(-)
>
> --
> 2.19.1.6.gb485710b
>

2020-12-23 20:28:57

by Dexuan Cui

[permalink] [raw]

Subject: RE: [PATCH -tip V2 00/10] workqueue: break affinity initiatively

> From: Lai Jiangshan <[email protected]>
> Sent: Wednesday, December 23, 2020 7:02 AM
> >
> > Hi,
> > I tested this patchset on today's tip.git's master branch
> > (981316394e35 ("Merge branch 'locking/urgent'")).
> >
> > Every time the kernel boots with 32 CPUs (I'm running the Linux VM on
> > Hyper-V), I get the below warning.
> > (BTW, with 8 or 16 CPUs, I don't see the warning).
> > By printing the cpumasks with "%*pbl", I know the warning happens
> > because:
> > new_mask = 16-31
> > cpu_online_mask= 0-16
> > cpu_active_mask= 0-15
> > p->nr_cpus_allowed=16
> >
>
> Hello, Dexuan
>
> Could you omit patch4 of the patchset and test it again, please?
> ("workqueue: don't set the worker's cpumask when kthread_bind_mask()")
>
> kthread_bind_mask() set the worker task to the pool's cpumask without
> any check. And set_cpus_allowed_ptr() finds that the task's cpumask
> is unchanged (already set by kthread_bind_mask()) and skips all the checks.
>
> And I found that numa=fake=2U seems broken on cpumask_of_node() in my
> box.
>
> Thanks,
> Lai

Looks like your analysis is correct: the warning can't repro if I configure all
the 32 vCPUs into 1 virtual NUMA node (and I don't see the message
"smpboot: CPU 16 Converting physical 0 to logical die 1"):

[ 1.495440] smp: Bringing up secondary CPUs ...
[ 1.499207] x86: Booting SMP configuration:
[ 1.503038] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
#8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26
#27 #28 #29 #30 #31
[ 1.531930] smp: Brought up 1 node, 32 CPUs
[ 1.538779] smpboot: Max logical packages: 1
[ 1.539041] smpboot: Total of 32 processors activated (146859.90 BogoMIPS)

The warning only repros if there are more than 1 node, and it only prints once
for the first vCPU of the second node (i.e. node #1).

With more than 1 node, if I don't use patch4, the warning does not repro.

Thanks,
-- Dexuan

2020-12-23 20:42:15

by Dexuan Cui

[permalink] [raw]

Subject: RE: [PATCH -tip V2 00/10] workqueue: break affinity initiatively

> From: Dexuan Cui
> Sent: Wednesday, December 23, 2020 12:27 PM
> ...
> The warning only repros if there are more than 1 node, and it only prints once
> for the first vCPU of the second node (i.e. node #1).

A correction: if I configure the 32 vCPUs evenly into 4 nodes, I get the warning
once for node #1~#3, respectively.

Thanks,
-- Dexuan

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2376,9 +2376,14 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
* For kernel threads that do indeed end up on online &&
* !active we want to ensure they are strict per-CPU threads.
*/
- WARN_ON(cpumask_intersects(new_mask, cpu_online_mask) &&
+ WARN(cpumask_intersects(new_mask, cpu_online_mask) &&
!cpumask_intersects(new_mask, cpu_active_mask) &&
- p->nr_cpus_allowed != 1);
+ p->nr_cpus_allowed != 1, "%*pbl, %*pbl, %*pbl, %d\n",
+ cpumask_pr_args(new_mask),
+ cpumask_pr_args(cpu_online_mask),
+ cpumask_pr_args(cpu_active_mask),
+ p->nr_cpus_allowed
+ );
}

[ 1.791611] smp: Bringing up secondary CPUs ...
[ 1.795225] x86: Booting SMP configuration:
[ 1.798964] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
[ 1.807068] .... node #1, CPUs: #8
[ 1.094226] smpboot: CPU 8 Converting physical 0 to logical die 1
[ 1.895211] ------------[ cut here ]------------
[ 1.899058] 8-15, 0-8, 0-7, 8
[ 1.899058] WARNING: CPU: 8 PID: 50 at kernel/sched/core.c:2386 __set_cpus_allowed_ptr+0x1c7/0x1e0
[ 1.899058] CPU: 8 PID: 50 Comm: cpuhp/8 Not tainted 5.10.0+ #4
[ 1.899058] RIP: 0010:__set_cpus_allowed_ptr+0x1c7/0x1e0
[ 1.899058] Call Trace:
[ 1.899058] worker_attach_to_pool+0x53/0xd0
[ 1.899058] create_worker+0xf9/0x190
[ 1.899058] alloc_unbound_pwq+0x3a5/0x3b0
[ 1.899058] wq_update_unbound_numa+0x112/0x1c0
[ 1.899058] workqueue_online_cpu+0x1d0/0x220
[ 1.899058] cpuhp_invoke_callback+0x82/0x4a0
[ 1.899058] cpuhp_thread_fun+0xb8/0x120
[ 1.899058] smpboot_thread_fn+0x198/0x230
[ 1.899058] kthread+0x13d/0x160
[ 1.899058] ret_from_fork+0x22/0x30
[ 1.903058] #9 #10 #11 #12 #13 #14 #15
[ 1.907092] .... node #2, CPUs: #16
[ 1.094226] smpboot: CPU 16 Converting physical 0 to logical die 2
[ 1.995205] ------------[ cut here ]------------
[ 1.999058] 16-23, 0-16, 0-15, 8
[ 1.999058] WARNING: CPU: 16 PID: 91 at kernel/sched/core.c:2386 __set_cpus_allowed_ptr+0x1c7/0x1e0
[ 1.999058] CPU: 16 PID: 91 Comm: cpuhp/16 Tainted: G W 5.10.0+ #4
[ 1.999058] RIP: 0010:__set_cpus_allowed_ptr+0x1c7/0x1e0
[ 1.999058] Call Trace:
[ 1.999058] worker_attach_to_pool+0x53/0xd0
[ 1.999058] create_worker+0xf9/0x190
[ 1.999058] alloc_unbound_pwq+0x3a5/0x3b0
[ 1.999058] wq_update_unbound_numa+0x112/0x1c0
[ 1.999058] workqueue_online_cpu+0x1d0/0x220
[ 1.999058] cpuhp_invoke_callback+0x82/0x4a0
[ 1.999058] cpuhp_thread_fun+0xb8/0x120
[ 1.999058] smpboot_thread_fn+0x198/0x230
[ 1.999058] kthread+0x13d/0x160
[ 1.999058] ret_from_fork+0x22/0x30
[ 2.003058] #17 #18 #19 #20 #21 #22 #23
[ 2.007092] .... node #3, CPUs: #24
[ 1.094226] smpboot: CPU 24 Converting physical 0 to logical die 3
[ 2.095220] ------------[ cut here ]------------
[ 2.099058] 24-31, 0-24, 0-23, 8
[ 2.099058] WARNING: CPU: 24 PID: 132 at kernel/sched/core.c:2386 __set_cpus_allowed_ptr+0x1c7/0x1e0
[ 2.099058] CPU: 24 PID: 132 Comm: cpuhp/24 Tainted: G W 5.10.0+ #4
[ 2.099058] Call Trace:
[ 2.099058] worker_attach_to_pool+0x53/0xd0
[ 2.099058] create_worker+0xf9/0x190
[ 2.099058] alloc_unbound_pwq+0x3a5/0x3b0
[ 2.099058] wq_update_unbound_numa+0x112/0x1c0
[ 2.099058] workqueue_online_cpu+0x1d0/0x220
[ 2.099058] cpuhp_invoke_callback+0x82/0x4a0
[ 2.099058] cpuhp_thread_fun+0xb8/0x120
[ 2.099058] smpboot_thread_fn+0x198/0x230
[ 2.099058] kthread+0x13d/0x160
[ 2.099058] ret_from_fork+0x22/0x30
[ 2.103058] #25 #26 #27 #28 #29 #30 #31
[ 2.108091] smp: Brought up 4 nodes, 32 CPUs
[ 2.115065] smpboot: Max logical packages: 4
[ 2.119067] smpboot: Total of 32 processors activated (146992.31 BogoMIPS)

2020-12-24 06:20:13

On Sun, Dec 27, 2020 at 10:08:51PM +0800, Lai Jiangshan wrote:
> On Sat, Dec 26, 2020 at 10:52 PM Paul E. McKenney <[email protected]> wrote:
>
> > >
> > > Can you please specify a bit what you encountered in rcutorture
> > > before this patchset? You know we cant have a correct estimation
> > > of the fix diameter without your help.
>
> >
> > It triggers the following in sched_cpu_dying() in kernel/sched/core.c,
> > exactly the same as for Lai Jiangshan:
> >
> > BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq))
> >
> > Which is in fact the "this" in my earlier "rcutorture hits this". ;-)
> >
> > Thanx, Paul
> >
>
> Hi, Hillf
>
> https://lkml.org/lkml/2020/12/22/141
>
> >From the email, I think rcutorture encountered the same problem.
>
> Hi, Paul
>
> I'm sorry to forget to add your Tested-by.

No need to apologize, especially given that I didn't get around to
testing it until after it was pulled into -tip. ;-)

Thank you for the patch series!

Thanx, Paul