2021-03-11 12:07:18

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v3 1/7] sched/fair: Ignore percpu threads for imbalance pulls

From: Lingutla Chandrasekhar <[email protected]>

In load balancing, when balancing group is unable to pull task
due to ->cpus_ptr constraints from busy group, then it sets
LBF_SOME_PINNED to lb env flags, as a consequence, sgc->imbalance
is set for its parent domain level. which makes the group
classified as imbalance to get help from another balancing cpu.

Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and
CPUs 2-3 as Bigs with below scenario:
- CPU0 doing newly_idle balancing
- CPU1 running percpu kworker and RT task (small tasks)
- CPU2 running 2 big tasks
- CPU3 running 1 medium task

While CPU0 is doing newly_idle load balance at MC level, it fails to
pull percpu kworker from CPU1 and sets LBF_SOME_PINNED to lb env flag
and set sgc->imbalance at DIE level domain. As LBF_ALL_PINNED not cleared,
it tries to redo the balancing by clearing CPU1 in env cpus, but it don't
find other busiest_group, so CPU0 stops balacing at MC level without
clearing 'sgc->imbalance' and restart the load balacing at DIE level.

And CPU0 (balancing cpu) finds LITTLE's group as busiest_group with group
type as imbalance, and Bigs that classified the level below imbalance type
would be ignored to pick as busiest, and the balancing would be aborted
without pulling any tasks (by the time, CPU1 might not have running tasks).

It is suboptimal decision to classify the group as imbalance due to
percpu threads. So don't use LBF_SOME_PINNED for per cpu threads.

Signed-off-by: Lingutla Chandrasekhar <[email protected]>
[Use kthread_is_per_cpu() rather than p->nr_cpus_allowed]
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/fair.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2e2ab1e00ef9..83aea97fbf22 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7565,6 +7565,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
return 0;

+ /* Disregard pcpu kthreads; they are where they need to be. */
+ if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
+ return 0;
+
if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
int cpu;

--
2.25.1


2021-03-16 16:55:59

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] sched/fair: Ignore percpu threads for imbalance pulls

On 11/03/2021 13:05, Valentin Schneider wrote:
> From: Lingutla Chandrasekhar <[email protected]>
>
> In load balancing, when balancing group is unable to pull task
> due to ->cpus_ptr constraints from busy group, then it sets
> LBF_SOME_PINNED to lb env flags, as a consequence, sgc->imbalance
> is set for its parent domain level. which makes the group
> classified as imbalance to get help from another balancing cpu.
>
> Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and

Does it have to be a big.LITTLE system? I assume this issue also happens
on an SMP system.

> CPUs 2-3 as Bigs with below scenario:
> - CPU0 doing newly_idle balancing
> - CPU1 running percpu kworker and RT task (small tasks)

What's the role of the small RT task here in the story?

> - CPU2 running 2 big tasks
> - CPU3 running 1 medium task
>
> While CPU0 is doing newly_idle load balance at MC level, it fails to
> pull percpu kworker from CPU1 and sets LBF_SOME_PINNED to lb env flag
> and set sgc->imbalance at DIE level domain. As LBF_ALL_PINNED not cleared,
> it tries to redo the balancing by clearing CPU1 in env cpus, but it don't
> find other busiest_group, so CPU0 stops balacing at MC level without
> clearing 'sgc->imbalance' and restart the load balacing at DIE level.
>
> And CPU0 (balancing cpu) finds LITTLE's group as busiest_group with group
> type as imbalance, and Bigs that classified the level below imbalance type
> would be ignored to pick as busiest, and the balancing would be aborted
> without pulling any tasks (by the time, CPU1 might not have running tasks).
>
> It is suboptimal decision to classify the group as imbalance due to
> percpu threads. So don't use LBF_SOME_PINNED for per cpu threads.

This sentence mentioned per-cpu threads (and so does the patch name) but
the implementation (only) deals with per-cpu kernel threads. IMHO, it
would be good to align this.

>
> Signed-off-by: Lingutla Chandrasekhar <[email protected]>
> [Use kthread_is_per_cpu() rather than p->nr_cpus_allowed]
> Signed-off-by: Valentin Schneider <[email protected]>
> ---
> kernel/sched/fair.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2e2ab1e00ef9..83aea97fbf22 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7565,6 +7565,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
> return 0;
>
> + /* Disregard pcpu kthreads; they are where they need to be. */
> + if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
> + return 0;
> +
> if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
> int cpu;
>
>

2021-03-16 17:05:43

by Lingutla Chandrasekhar

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] sched/fair: Ignore percpu threads for imbalance pulls

Hi Dietmar,

On 3/16/2021 9:19 PM, Dietmar Eggemann wrote:
> On 11/03/2021 13:05, Valentin Schneider wrote:
>> From: Lingutla Chandrasekhar <[email protected]>
>>
>> In load balancing, when balancing group is unable to pull task
>> due to ->cpus_ptr constraints from busy group, then it sets
>> LBF_SOME_PINNED to lb env flags, as a consequence, sgc->imbalance
>> is set for its parent domain level. which makes the group
>> classified as imbalance to get help from another balancing cpu.
>>
>> Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and
> Does it have to be a big.LITTLE system? I assume this issue also happens
> on an SMP system.

Yah, issue can happen on SMP system as well.  I will let Valentin update
the commit text on
his next version of this series.

>> CPUs 2-3 as Bigs with below scenario:
>> - CPU0 doing newly_idle balancing
>> - CPU1 running percpu kworker and RT task (small tasks)
> What's the role of the small RT task here in the story?
This is to satisfy 'busiest->nr_running > 1' checks.
>> - CPU2 running 2 big tasks
>> - CPU3 running 1 medium task
>>
>> While CPU0 is doing newly_idle load balance at MC level, it fails to
>> pull percpu kworker from CPU1 and sets LBF_SOME_PINNED to lb env flag
>> and set sgc->imbalance at DIE level domain. As LBF_ALL_PINNED not cleared,
>> it tries to redo the balancing by clearing CPU1 in env cpus, but it don't
>> find other busiest_group, so CPU0 stops balacing at MC level without
>> clearing 'sgc->imbalance' and restart the load balacing at DIE level.
>>
>> And CPU0 (balancing cpu) finds LITTLE's group as busiest_group with group
>> type as imbalance, and Bigs that classified the level below imbalance type
>> would be ignored to pick as busiest, and the balancing would be aborted
>> without pulling any tasks (by the time, CPU1 might not have running tasks).
>>
>> It is suboptimal decision to classify the group as imbalance due to
>> percpu threads. So don't use LBF_SOME_PINNED for per cpu threads.
> This sentence mentioned per-cpu threads (and so does the patch name) but
> the implementation (only) deals with per-cpu kernel threads. IMHO, it
> would be good to align this.
I will let Valentin update this on next version.
>> Signed-off-by: Lingutla Chandrasekhar <[email protected]>
>> [Use kthread_is_per_cpu() rather than p->nr_cpus_allowed]
>> Signed-off-by: Valentin Schneider <[email protected]>
>> ---
>> kernel/sched/fair.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 2e2ab1e00ef9..83aea97fbf22 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7565,6 +7565,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>> if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>> return 0;
>>
>> + /* Disregard pcpu kthreads; they are where they need to be. */
>> + if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))
>> + return 0;
>> +
>> if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
>> int cpu;
>>
>>

2021-03-16 17:35:43

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] sched/fair: Ignore percpu threads for imbalance pulls

On 16/03/21 16:49, Dietmar Eggemann wrote:
> On 11/03/2021 13:05, Valentin Schneider wrote:
>> From: Lingutla Chandrasekhar <[email protected]>
>>
>> In load balancing, when balancing group is unable to pull task
>> due to ->cpus_ptr constraints from busy group, then it sets
>> LBF_SOME_PINNED to lb env flags, as a consequence, sgc->imbalance
>> is set for its parent domain level. which makes the group
>> classified as imbalance to get help from another balancing cpu.
>>
>> Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and
>
> Does it have to be a big.LITTLE system? I assume this issue also happens
> on an SMP system.
>

Aye, though the consequences are "worse" on asym CPU capacity systems.

>> CPUs 2-3 as Bigs with below scenario:
>> - CPU0 doing newly_idle balancing
>> - CPU1 running percpu kworker and RT task (small tasks)
>
> What's the role of the small RT task here in the story?
>

I don't think it matters much here.

>> - CPU2 running 2 big tasks
>> - CPU3 running 1 medium task
>>
>> While CPU0 is doing newly_idle load balance at MC level, it fails to
>> pull percpu kworker from CPU1 and sets LBF_SOME_PINNED to lb env flag
>> and set sgc->imbalance at DIE level domain. As LBF_ALL_PINNED not cleared,
>> it tries to redo the balancing by clearing CPU1 in env cpus, but it don't
>> find other busiest_group, so CPU0 stops balacing at MC level without
>> clearing 'sgc->imbalance' and restart the load balacing at DIE level.
>>
>> And CPU0 (balancing cpu) finds LITTLE's group as busiest_group with group
>> type as imbalance, and Bigs that classified the level below imbalance type
>> would be ignored to pick as busiest, and the balancing would be aborted
>> without pulling any tasks (by the time, CPU1 might not have running tasks).
>>
>> It is suboptimal decision to classify the group as imbalance due to
>> percpu threads. So don't use LBF_SOME_PINNED for per cpu threads.
>
> This sentence mentioned per-cpu threads (and so does the patch name) but
> the implementation (only) deals with per-cpu kernel threads. IMHO, it
> would be good to align this.
>

Tell you what, I'll go for:
1) how can pcpu kthreads cause LBF_SOME_PINNED
2) why we may not want this, but still ignore !kthread pcpu tasks
3) why this is even more important for big.LITTLE

2021-03-16 21:21:56

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] sched/fair: Ignore percpu threads for imbalance pulls

On 16/03/2021 18:31, Valentin Schneider wrote:
> On 16/03/21 16:49, Dietmar Eggemann wrote:
>> On 11/03/2021 13:05, Valentin Schneider wrote:
>>> From: Lingutla Chandrasekhar <[email protected]>
>>>
>>> In load balancing, when balancing group is unable to pull task
>>> due to ->cpus_ptr constraints from busy group, then it sets
>>> LBF_SOME_PINNED to lb env flags, as a consequence, sgc->imbalance
>>> is set for its parent domain level. which makes the group
>>> classified as imbalance to get help from another balancing cpu.
>>>
>>> Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and
>>
>> Does it have to be a big.LITTLE system? I assume this issue also happens
>> on an SMP system.
>>
>
> Aye, though the consequences are "worse" on asym CPU capacity systems.

I can only think of higher group_type 'group_imbalanced' eclipses
'group_misfit_task' here?

>
>>> CPUs 2-3 as Bigs with below scenario:
>>> - CPU0 doing newly_idle balancing
>>> - CPU1 running percpu kworker and RT task (small tasks)
>>
>> What's the role of the small RT task here in the story?
>>
>
> I don't think it matters much here.

Chandra already mentioned that this is part of the story, namely to
start trying to move task on lb MC CPU1->CPU0 (if (busiest->nr_running >
1)).

[...]

>> This sentence mentioned per-cpu threads (and so does the patch name) but
>> the implementation (only) deals with per-cpu kernel threads. IMHO, it
>> would be good to align this.
>>
>
> Tell you what, I'll go for:
> 1) how can pcpu kthreads cause LBF_SOME_PINNED
> 2) why we may not want this, but still ignore !kthread pcpu tasks
> 3) why this is even more important for big.LITTLE

LGTM.

2021-03-16 21:23:19

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] sched/fair: Ignore percpu threads for imbalance pulls

Hi Chandra,

On 16/03/2021 17:03, Chandra Sekhar Lingutla wrote:
> Hi Dietmar,
>
> On 3/16/2021 9:19 PM, Dietmar Eggemann wrote:
>> On 11/03/2021 13:05, Valentin Schneider wrote:
>>> From: Lingutla Chandrasekhar <[email protected]>

[...]

>>> CPUs 2-3 as Bigs with below scenario:
>>> - CPU0 doing newly_idle balancing
>>> - CPU1 running percpu kworker and RT task (small tasks)
>> What's the role of the small RT task here in the story?
> This is to satisfy 'busiest->nr_running > 1' checks.

Ah, I see. Forgot about this bit of the story, the 'if
(busiest->nr_running > 1)' in load_balance().

[...]