2024-06-12 11:55:34

by zhangwei123171

[permalink] [raw]
Subject: [PATCH] sched/fair: prefer available idle cpu in select_idle_core

From: zhangwei123171 <[email protected]>

When the idle core cannot be found, the first sched idle cpu
or first available idle cpu will be used if exsit.

We can use the available idle cpu detected later to ensure it
can be used if exsit.

Signed-off-by: zhangwei123171 <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 41b58387023d..653ca3ea09b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7341,7 +7341,7 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
}
break;
}
- if (*idle_cpu == -1 && cpumask_test_cpu(cpu, cpus))
+ if (cpumask_test_cpu(cpu, cpus))
*idle_cpu = cpu;
}

--
2.33.0



2024-06-13 04:41:10

by K Prateek Nayak

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: prefer available idle cpu in select_idle_core

Hello there,

On 6/12/2024 5:24 PM, [email protected] wrote:
> From: zhangwei123171 <[email protected]>
>
> When the idle core cannot be found, the first sched idle cpu
> or first available idle cpu will be used if exsit.
>
> We can use the available idle cpu detected later to ensure it
> can be used if exsit.

Is there any particular advantage of the same? Based on my understanding
the check exists to prevent unnecessary calls to cpumask_test_cpu() if
an idle CPU is already found. On a large core count system with a large
number of cores in the LLC domain, this may result in a lot more calls
to cpumask_test_cpu() if only one core is in fact idle and there is a
storm of wakeups.

For SMT-2 system, I believe any idle thread on a busy core would be the
same (if we consider all task to have same behavior). On a larger SMT
system, it takes more overhead to consider which core is the most idle.
Consider the following case:

o CPUs of core: 0-7; Only CPU1 is busy (i is idle, b is busy)

+---+---+---+---+---+---+---+---+
| i | b | i | i | i | i | i | i |
+---+---+---+---+---+---+---+---+
^
select idle core bails out at first busy CPU which is CPU1 however
this core is only 1/8th busy.

o CPUs of core: 8-15; CPU10 to CPU15 are busy (i is idle, b is busy)

+---+---+---+---+---+---+---+---+
| i | i | b | b | b | b | b | b |
+---+---+---+---+---+---+---+---+
^
select idle core bails out at first busy CPU which is CPU10 however
this core is in fact 5/8th busy.

Technically, core with CPU0 is better but with your change, we'll select
core of CPU8. Bottom line being, there does not seem to exist a good
case where selecting the last idle thread is better than selecting the
first one. The best the scheduler can do is reduce the number of calls
to cpumask_test_cpu() once an idle CPU is found unless it decides to
scan all the CPUs of the core to find the core which is the idlest and
in a large, busy system, that is a big hammer.

Thoughts?

>
> Signed-off-by: zhangwei123171 <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 41b58387023d..653ca3ea09b6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7341,7 +7341,7 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
> }
> break;
> }
> - if (*idle_cpu == -1 && cpumask_test_cpu(cpu, cpus))
> + if (cpumask_test_cpu(cpu, cpus))
> *idle_cpu = cpu;
> }
>

--
Thanks and Regards,
Prateek