2022-09-09 06:19:48

by Abel Wu

[permalink] [raw]
Subject: [PATCH v5 2/5] sched/fair: Limited scan for idle cores when overloaded

The has_idle_cores hint could be misleading due to some kind of
rapid idling workloads, especially when LLC is overloaded. If this
is the case, then there will be some full scan cost incurred that
often fails to find a core.

So limit the scan depth for idle cores in such case to make a
speculative inspection at a reasonable cost.

Benchmark
=========

Tests are done in a dual socket (2 x 24C/48T) machine modeled Intel
Xeon(R) Platinum 8260, with SNC configuration:

SNC on: 4 NUMA nodes each of which has 12C/24T
SNC off: 2 NUMA nodes each of which has 24C/48T

All of the benchmarks are done inside a normal cpu cgroup in a clean
environment with cpu turbo disabled.

Based on tip sched/core 0fba527e959d (v5.19.0) plus previous patches
of this series.

Results
=======

hackbench-process-pipes
unpatched patched
(SNC on)
Amean 1 0.4470 ( 0.00%) 0.4557 ( -1.94%)
Amean 4 0.5947 ( 0.00%) 0.6033 ( -1.46%)
Amean 7 0.7450 ( 0.00%) 0.7627 ( -2.37%)
Amean 12 1.1053 ( 0.00%) 1.0653 ( 3.62%)
Amean 21 1.9420 ( 0.00%) 2.0283 * -4.45%*
Amean 30 2.9267 ( 0.00%) 2.9670 ( -1.38%)
Amean 48 4.7027 ( 0.00%) 4.6863 ( 0.35%)
Amean 79 7.7097 ( 0.00%) 7.9443 * -3.04%*
Amean 110 10.0680 ( 0.00%) 10.2393 ( -1.70%)
Amean 141 12.5450 ( 0.00%) 12.6343 ( -0.71%)
Amean 172 15.0297 ( 0.00%) 14.9957 ( 0.23%)
Amean 203 16.8827 ( 0.00%) 16.9133 ( -0.18%)
Amean 234 19.1183 ( 0.00%) 19.2433 ( -0.65%)
Amean 265 20.9893 ( 0.00%) 21.6917 ( -3.35%)
Amean 296 23.3920 ( 0.00%) 23.8743 ( -2.06%)
(SNC off)
Amean 1 0.2717 ( 0.00%) 0.3143 ( -15.71%)
Amean 4 0.6257 ( 0.00%) 0.6070 ( 2.98%)
Amean 7 0.7740 ( 0.00%) 0.7960 ( -2.84%)
Amean 12 1.2410 ( 0.00%) 1.1947 ( 3.73%)
Amean 21 2.6410 ( 0.00%) 2.4837 ( 5.96%)
Amean 30 3.7620 ( 0.00%) 3.4577 ( 8.09%)
Amean 48 6.7757 ( 0.00%) 5.5227 * 18.49%*
Amean 79 8.8827 ( 0.00%) 9.2933 ( -4.62%)
Amean 110 11.0583 ( 0.00%) 11.0443 ( 0.13%)
Amean 141 13.3387 ( 0.00%) 13.1360 ( 1.52%)
Amean 172 15.9583 ( 0.00%) 15.7770 ( 1.14%)
Amean 203 17.8757 ( 0.00%) 17.9557 ( -0.45%)
Amean 234 20.0543 ( 0.00%) 20.4373 * -1.91%*
Amean 265 22.6643 ( 0.00%) 23.6053 * -4.15%*
Amean 296 25.6677 ( 0.00%) 25.6803 ( -0.05%)

Run to run variations is large in the 1 group test, so can be safely
ignored.

With limited scan for idle cores when the LLC is overloaded, a slight
regression can be seen on the smaller LLC machine. It is because the
cost of full scan on these LLCs is much smaller than the machines with
bigger LLCs. So when comes to the SNC off case, the limited scan can
provide obvious benefit especially when the frequency of such scan is
relatively high, e.g. <48 groups.

It's not a universal win, but considering the LLCs are getting bigger
nowadays, we should be careful on the scan depth and limited scan on
certain circumstance is indeed necessary.

tbench4 Throughput
unpatched patched
(SNC on)
Hmean 1 309.43 ( 0.00%) 301.54 * -2.55%*
Hmean 2 613.92 ( 0.00%) 607.81 * -0.99%*
Hmean 4 1227.84 ( 0.00%) 1210.64 * -1.40%*
Hmean 8 2379.04 ( 0.00%) 2381.73 * 0.11%*
Hmean 16 4634.66 ( 0.00%) 4601.21 * -0.72%*
Hmean 32 7592.12 ( 0.00%) 7626.84 * 0.46%*
Hmean 64 9241.11 ( 0.00%) 9251.51 * 0.11%*
Hmean 128 17870.37 ( 0.00%) 20620.98 * 15.39%*
Hmean 256 19370.92 ( 0.00%) 20406.51 * 5.35%*
Hmean 384 19413.92 ( 0.00%) 20312.97 * 4.63%*
(SNC off)
Hmean 1 287.90 ( 0.00%) 292.37 * 1.55%*
Hmean 2 575.52 ( 0.00%) 583.29 * 1.35%*
Hmean 4 1137.94 ( 0.00%) 1155.83 * 1.57%*
Hmean 8 2250.42 ( 0.00%) 2297.63 * 2.10%*
Hmean 16 4363.41 ( 0.00%) 4562.44 * 4.56%*
Hmean 32 7338.06 ( 0.00%) 7425.69 * 1.19%*
Hmean 64 8914.66 ( 0.00%) 9021.77 * 1.20%*
Hmean 128 19978.59 ( 0.00%) 20257.76 * 1.40%*
Hmean 256 20057.49 ( 0.00%) 20043.54 * -0.07%*
Hmean 384 19846.74 ( 0.00%) 19528.03 * -1.61%*

Conclusion
==========

Limited scan for idle cores when LLC is overloaded is almost neutral
compared to full scan given smaller LLCs, but is an obvious win at
the bigger ones which are future-oriented.

Suggested-by: Mel Gorman <[email protected]>
Signed-off-by: Abel Wu <[email protected]>
---
kernel/sched/fair.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5af9bf246274..7abe188a1533 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6437,26 +6437,42 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
time = cpu_clock(this);
}

- if (sched_feat(SIS_UTIL) && !has_idle_core) {
+ if (sched_feat(SIS_UTIL)) {
sd_share = rcu_dereference(per_cpu(sd_llc_shared, target));
if (sd_share) {
/* because !--nr is the condition to stop scan */
nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
- /* overloaded LLC is unlikely to have idle cpu/core */
- if (nr == 1)
+
+ /*
+ * Overloaded LLC is unlikely to have idle cpus.
+ * But if has_idle_core hint is true, a limited
+ * speculative scan might help without incurring
+ * much overhead.
+ */
+ if (has_idle_core)
+ nr = nr > 1 ? INT_MAX : 3;
+ else if (nr == 1)
return -1;
}
}

for_each_cpu_wrap(cpu, cpus, target + 1) {
+ /*
+ * This might get the has_idle_cores hint cleared for a
+ * partial scan for idle cores but the hint is probably
+ * wrong anyway. What more important is that not clearing
+ * the hint may result in excessive partial scan for idle
+ * cores introducing innegligible overhead.
+ */
+ if (!--nr)
+ break;
+
if (has_idle_core) {
i = select_idle_core(p, cpu, cpus, &idle_cpu);
if ((unsigned int)i < nr_cpumask_bits)
return i;

} else {
- if (!--nr)
- return -1;
idle_cpu = __select_idle_cpu(cpu, p);
if ((unsigned int)idle_cpu < nr_cpumask_bits)
break;
--
2.37.3


2022-09-09 09:50:26

by Chen Yu

[permalink] [raw]
Subject: Re: [PATCH v5 2/5] sched/fair: Limited scan for idle cores when overloaded

On 2022-09-09 at 13:53:01 +0800, Abel Wu wrote:
> The has_idle_cores hint could be misleading due to some kind of
> rapid idling workloads, especially when LLC is overloaded. If this
> is the case, then there will be some full scan cost incurred that
> often fails to find a core.
>
> So limit the scan depth for idle cores in such case to make a
> speculative inspection at a reasonable cost.
>
> Benchmark
> =========
>
> Tests are done in a dual socket (2 x 24C/48T) machine modeled Intel
> Xeon(R) Platinum 8260, with SNC configuration:
>
> SNC on: 4 NUMA nodes each of which has 12C/24T
> SNC off: 2 NUMA nodes each of which has 24C/48T
>
> All of the benchmarks are done inside a normal cpu cgroup in a clean
> environment with cpu turbo disabled.
>
> Based on tip sched/core 0fba527e959d (v5.19.0) plus previous patches
> of this series.
>
> Results
> =======
>
> hackbench-process-pipes
> unpatched patched
> (SNC on)
> Amean 1 0.4470 ( 0.00%) 0.4557 ( -1.94%)
> Amean 4 0.5947 ( 0.00%) 0.6033 ( -1.46%)
> Amean 7 0.7450 ( 0.00%) 0.7627 ( -2.37%)
> Amean 12 1.1053 ( 0.00%) 1.0653 ( 3.62%)
> Amean 21 1.9420 ( 0.00%) 2.0283 * -4.45%*
> Amean 30 2.9267 ( 0.00%) 2.9670 ( -1.38%)
> Amean 48 4.7027 ( 0.00%) 4.6863 ( 0.35%)
> Amean 79 7.7097 ( 0.00%) 7.9443 * -3.04%*
> Amean 110 10.0680 ( 0.00%) 10.2393 ( -1.70%)
> Amean 141 12.5450 ( 0.00%) 12.6343 ( -0.71%)
> Amean 172 15.0297 ( 0.00%) 14.9957 ( 0.23%)
> Amean 203 16.8827 ( 0.00%) 16.9133 ( -0.18%)
> Amean 234 19.1183 ( 0.00%) 19.2433 ( -0.65%)
> Amean 265 20.9893 ( 0.00%) 21.6917 ( -3.35%)
> Amean 296 23.3920 ( 0.00%) 23.8743 ( -2.06%)
> (SNC off)
> Amean 1 0.2717 ( 0.00%) 0.3143 ( -15.71%)
> Amean 4 0.6257 ( 0.00%) 0.6070 ( 2.98%)
> Amean 7 0.7740 ( 0.00%) 0.7960 ( -2.84%)
> Amean 12 1.2410 ( 0.00%) 1.1947 ( 3.73%)
> Amean 21 2.6410 ( 0.00%) 2.4837 ( 5.96%)
> Amean 30 3.7620 ( 0.00%) 3.4577 ( 8.09%)
> Amean 48 6.7757 ( 0.00%) 5.5227 * 18.49%*
> Amean 79 8.8827 ( 0.00%) 9.2933 ( -4.62%)
> Amean 110 11.0583 ( 0.00%) 11.0443 ( 0.13%)
> Amean 141 13.3387 ( 0.00%) 13.1360 ( 1.52%)
> Amean 172 15.9583 ( 0.00%) 15.7770 ( 1.14%)
> Amean 203 17.8757 ( 0.00%) 17.9557 ( -0.45%)
> Amean 234 20.0543 ( 0.00%) 20.4373 * -1.91%*
> Amean 265 22.6643 ( 0.00%) 23.6053 * -4.15%*
> Amean 296 25.6677 ( 0.00%) 25.6803 ( -0.05%)
>
> Run to run variations is large in the 1 group test, so can be safely
> ignored.
>
> With limited scan for idle cores when the LLC is overloaded, a slight
> regression can be seen on the smaller LLC machine. It is because the
> cost of full scan on these LLCs is much smaller than the machines with
> bigger LLCs. So when comes to the SNC off case, the limited scan can
> provide obvious benefit especially when the frequency of such scan is
> relatively high, e.g. <48 groups.
>
> It's not a universal win, but considering the LLCs are getting bigger
> nowadays, we should be careful on the scan depth and limited scan on
> certain circumstance is indeed necessary.
>
> tbench4 Throughput
> unpatched patched
> (SNC on)
> Hmean 1 309.43 ( 0.00%) 301.54 * -2.55%*
> Hmean 2 613.92 ( 0.00%) 607.81 * -0.99%*
> Hmean 4 1227.84 ( 0.00%) 1210.64 * -1.40%*
> Hmean 8 2379.04 ( 0.00%) 2381.73 * 0.11%*
> Hmean 16 4634.66 ( 0.00%) 4601.21 * -0.72%*
> Hmean 32 7592.12 ( 0.00%) 7626.84 * 0.46%*
> Hmean 64 9241.11 ( 0.00%) 9251.51 * 0.11%*
> Hmean 128 17870.37 ( 0.00%) 20620.98 * 15.39%*
> Hmean 256 19370.92 ( 0.00%) 20406.51 * 5.35%*
> Hmean 384 19413.92 ( 0.00%) 20312.97 * 4.63%*
> (SNC off)
> Hmean 1 287.90 ( 0.00%) 292.37 * 1.55%*
> Hmean 2 575.52 ( 0.00%) 583.29 * 1.35%*
> Hmean 4 1137.94 ( 0.00%) 1155.83 * 1.57%*
> Hmean 8 2250.42 ( 0.00%) 2297.63 * 2.10%*
> Hmean 16 4363.41 ( 0.00%) 4562.44 * 4.56%*
> Hmean 32 7338.06 ( 0.00%) 7425.69 * 1.19%*
> Hmean 64 8914.66 ( 0.00%) 9021.77 * 1.20%*
> Hmean 128 19978.59 ( 0.00%) 20257.76 * 1.40%*
> Hmean 256 20057.49 ( 0.00%) 20043.54 * -0.07%*
> Hmean 384 19846.74 ( 0.00%) 19528.03 * -1.61%*
>
> Conclusion
> ==========
>
> Limited scan for idle cores when LLC is overloaded is almost neutral
> compared to full scan given smaller LLCs, but is an obvious win at
> the bigger ones which are future-oriented.
>
> Suggested-by: Mel Gorman <[email protected]>
> Signed-off-by: Abel Wu <[email protected]>
> ---
> kernel/sched/fair.c | 26 +++++++++++++++++++++-----
> 1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5af9bf246274..7abe188a1533 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6437,26 +6437,42 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> time = cpu_clock(this);
> }
>
> - if (sched_feat(SIS_UTIL) && !has_idle_core) {
> + if (sched_feat(SIS_UTIL)) {
[1/5] patch added !has_idle_core, but this patch removes the check.
I'm trying to figure out the reason. Is it to better illustrating the
benchmark difference?

thanks,
Chenyu

2022-09-09 10:25:25

by Abel Wu

[permalink] [raw]
Subject: Re: [PATCH v5 2/5] sched/fair: Limited scan for idle cores when overloaded

On 9/9/22 5:29 PM, Chen Yu wrote:
> On 2022-09-09 at 13:53:01 +0800, Abel Wu wrote:
>> The has_idle_cores hint could be misleading due to some kind of
>> rapid idling workloads, especially when LLC is overloaded. If this
>> is the case, then there will be some full scan cost incurred that
>> often fails to find a core.
>>
>> So limit the scan depth for idle cores in such case to make a
>> speculative inspection at a reasonable cost.
>>
>> Benchmark
>> =========
>>
>> Tests are done in a dual socket (2 x 24C/48T) machine modeled Intel
>> Xeon(R) Platinum 8260, with SNC configuration:
>>
>> SNC on: 4 NUMA nodes each of which has 12C/24T
>> SNC off: 2 NUMA nodes each of which has 24C/48T
>>
>> All of the benchmarks are done inside a normal cpu cgroup in a clean
>> environment with cpu turbo disabled.
>>
>> Based on tip sched/core 0fba527e959d (v5.19.0) plus previous patches
>> of this series.
>>
>> Results
>> =======
>>
>> hackbench-process-pipes
>> unpatched patched
>> (SNC on)
>> Amean 1 0.4470 ( 0.00%) 0.4557 ( -1.94%)
>> Amean 4 0.5947 ( 0.00%) 0.6033 ( -1.46%)
>> Amean 7 0.7450 ( 0.00%) 0.7627 ( -2.37%)
>> Amean 12 1.1053 ( 0.00%) 1.0653 ( 3.62%)
>> Amean 21 1.9420 ( 0.00%) 2.0283 * -4.45%*
>> Amean 30 2.9267 ( 0.00%) 2.9670 ( -1.38%)
>> Amean 48 4.7027 ( 0.00%) 4.6863 ( 0.35%)
>> Amean 79 7.7097 ( 0.00%) 7.9443 * -3.04%*
>> Amean 110 10.0680 ( 0.00%) 10.2393 ( -1.70%)
>> Amean 141 12.5450 ( 0.00%) 12.6343 ( -0.71%)
>> Amean 172 15.0297 ( 0.00%) 14.9957 ( 0.23%)
>> Amean 203 16.8827 ( 0.00%) 16.9133 ( -0.18%)
>> Amean 234 19.1183 ( 0.00%) 19.2433 ( -0.65%)
>> Amean 265 20.9893 ( 0.00%) 21.6917 ( -3.35%)
>> Amean 296 23.3920 ( 0.00%) 23.8743 ( -2.06%)
>> (SNC off)
>> Amean 1 0.2717 ( 0.00%) 0.3143 ( -15.71%)
>> Amean 4 0.6257 ( 0.00%) 0.6070 ( 2.98%)
>> Amean 7 0.7740 ( 0.00%) 0.7960 ( -2.84%)
>> Amean 12 1.2410 ( 0.00%) 1.1947 ( 3.73%)
>> Amean 21 2.6410 ( 0.00%) 2.4837 ( 5.96%)
>> Amean 30 3.7620 ( 0.00%) 3.4577 ( 8.09%)
>> Amean 48 6.7757 ( 0.00%) 5.5227 * 18.49%*
>> Amean 79 8.8827 ( 0.00%) 9.2933 ( -4.62%)
>> Amean 110 11.0583 ( 0.00%) 11.0443 ( 0.13%)
>> Amean 141 13.3387 ( 0.00%) 13.1360 ( 1.52%)
>> Amean 172 15.9583 ( 0.00%) 15.7770 ( 1.14%)
>> Amean 203 17.8757 ( 0.00%) 17.9557 ( -0.45%)
>> Amean 234 20.0543 ( 0.00%) 20.4373 * -1.91%*
>> Amean 265 22.6643 ( 0.00%) 23.6053 * -4.15%*
>> Amean 296 25.6677 ( 0.00%) 25.6803 ( -0.05%)
>>
>> Run to run variations is large in the 1 group test, so can be safely
>> ignored.
>>
>> With limited scan for idle cores when the LLC is overloaded, a slight
>> regression can be seen on the smaller LLC machine. It is because the
>> cost of full scan on these LLCs is much smaller than the machines with
>> bigger LLCs. So when comes to the SNC off case, the limited scan can
>> provide obvious benefit especially when the frequency of such scan is
>> relatively high, e.g. <48 groups.
>>
>> It's not a universal win, but considering the LLCs are getting bigger
>> nowadays, we should be careful on the scan depth and limited scan on
>> certain circumstance is indeed necessary.
>>
>> tbench4 Throughput
>> unpatched patched
>> (SNC on)
>> Hmean 1 309.43 ( 0.00%) 301.54 * -2.55%*
>> Hmean 2 613.92 ( 0.00%) 607.81 * -0.99%*
>> Hmean 4 1227.84 ( 0.00%) 1210.64 * -1.40%*
>> Hmean 8 2379.04 ( 0.00%) 2381.73 * 0.11%*
>> Hmean 16 4634.66 ( 0.00%) 4601.21 * -0.72%*
>> Hmean 32 7592.12 ( 0.00%) 7626.84 * 0.46%*
>> Hmean 64 9241.11 ( 0.00%) 9251.51 * 0.11%*
>> Hmean 128 17870.37 ( 0.00%) 20620.98 * 15.39%*
>> Hmean 256 19370.92 ( 0.00%) 20406.51 * 5.35%*
>> Hmean 384 19413.92 ( 0.00%) 20312.97 * 4.63%*
>> (SNC off)
>> Hmean 1 287.90 ( 0.00%) 292.37 * 1.55%*
>> Hmean 2 575.52 ( 0.00%) 583.29 * 1.35%*
>> Hmean 4 1137.94 ( 0.00%) 1155.83 * 1.57%*
>> Hmean 8 2250.42 ( 0.00%) 2297.63 * 2.10%*
>> Hmean 16 4363.41 ( 0.00%) 4562.44 * 4.56%*
>> Hmean 32 7338.06 ( 0.00%) 7425.69 * 1.19%*
>> Hmean 64 8914.66 ( 0.00%) 9021.77 * 1.20%*
>> Hmean 128 19978.59 ( 0.00%) 20257.76 * 1.40%*
>> Hmean 256 20057.49 ( 0.00%) 20043.54 * -0.07%*
>> Hmean 384 19846.74 ( 0.00%) 19528.03 * -1.61%*
>>
>> Conclusion
>> ==========
>>
>> Limited scan for idle cores when LLC is overloaded is almost neutral
>> compared to full scan given smaller LLCs, but is an obvious win at
>> the bigger ones which are future-oriented.
>>
>> Suggested-by: Mel Gorman <[email protected]>
>> Signed-off-by: Abel Wu <[email protected]>
>> ---
>> kernel/sched/fair.c | 26 +++++++++++++++++++++-----
>> 1 file changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 5af9bf246274..7abe188a1533 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6437,26 +6437,42 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>> time = cpu_clock(this);
>> }
>>
>> - if (sched_feat(SIS_UTIL) && !has_idle_core) {
>> + if (sched_feat(SIS_UTIL)) {
> [1/5] patch added !has_idle_core, but this patch removes the check.
> I'm trying to figure out the reason. Is it to better illustrating the
> benchmark difference?

This patch moves the check inside the SIS_UTIL block to limit the scan
depth when (!nr_idle_scan && has_idle_core), but if !nr_idle_scan which
means the LLC is not overloaded, a full scan will be triggered like
before. So in general, this patch turns nr_idle_scan to a boolean value
if has_idle_core.

Best Regards,
Abel

2022-09-14 22:44:58

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH v5 2/5] sched/fair: Limited scan for idle cores when overloaded

On Fri, 2022-09-09 at 13:53 +0800, Abel Wu wrote:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5af9bf246274..7abe188a1533 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6437,26 +6437,42 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> time = cpu_clock(this);
> }
>
> - if (sched_feat(SIS_UTIL) && !has_idle_core) {
> + if (sched_feat(SIS_UTIL)) {
> sd_share = rcu_dereference(per_cpu(sd_llc_shared, target));
> if (sd_share) {
> /* because !--nr is the condition to stop scan */
> nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
> - /* overloaded LLC is unlikely to have idle cpu/core */
> - if (nr == 1)
> +
> + /*
> + * Overloaded LLC is unlikely to have idle cpus.
> + * But if has_idle_core hint is true, a limited
> + * speculative scan might help without incurring
> + * much overhead.
> + */
> + if (has_idle_core)
> + nr = nr > 1 ? INT_MAX : 3;

The choice of nr is a very abrupt function of utilization when has_idle_core==true,
it is either feast or famine. Why is such choice better than a smoother
reduction of nr vs utilization? I agree that we want to scan more aggressively than
!has_idle_core, but it is not obvious why the above work better, versus something
like nr = nr*2+1.

Tim

> + else if (nr == 1)
> return -1;
> }
> }
>
> for_each_cpu_wrap(cpu, cpus, target + 1) {
> + /*
> + * This might get the has_idle_cores hint cleared for a
> + * partial scan for idle cores but the hint is probably
> + * wrong anyway. What more important is that not clearing
> + * the hint may result in excessive partial scan for idle
> + * cores introducing innegligible overhead.
> + */
> + if (!--nr)
> + break;
> +
> if (has_idle_core) {
> i = select_idle_core(p, cpu, cpus, &idle_cpu);
> if ((unsigned int)i < nr_cpumask_bits)
> return i;
>
> } else {
> - if (!--nr)
> - return -1;
> idle_cpu = __select_idle_cpu(cpu, p);
> if ((unsigned int)idle_cpu < nr_cpumask_bits)
> break;

2022-09-15 03:17:32

by Abel Wu

[permalink] [raw]
Subject: Re: [PATCH v5 2/5] sched/fair: Limited scan for idle cores when overloaded

Hi Tim, thanks for your reviewing!

On 9/15/22 6:25 AM, Tim Chen wrote:
> On Fri, 2022-09-09 at 13:53 +0800, Abel Wu wrote:
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 5af9bf246274..7abe188a1533 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6437,26 +6437,42 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>> time = cpu_clock(this);
>> }
>>
>> - if (sched_feat(SIS_UTIL) && !has_idle_core) {
>> + if (sched_feat(SIS_UTIL)) {
>> sd_share = rcu_dereference(per_cpu(sd_llc_shared, target));
>> if (sd_share) {
>> /* because !--nr is the condition to stop scan */
>> nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
>> - /* overloaded LLC is unlikely to have idle cpu/core */
>> - if (nr == 1)
>> +
>> + /*
>> + * Overloaded LLC is unlikely to have idle cpus.
>> + * But if has_idle_core hint is true, a limited
>> + * speculative scan might help without incurring
>> + * much overhead.
>> + */
>> + if (has_idle_core)
>> + nr = nr > 1 ? INT_MAX : 3;
>
> The choice of nr is a very abrupt function of utilization when has_idle_core==true,
> it is either feast or famine. Why is such choice better than a smoother
> reduction of nr vs utilization? I agree that we want to scan more aggressively than
> !has_idle_core, but it is not obvious why the above work better, versus something
> like nr = nr*2+1.
This has been discussed with Mel, and he suggested do simple things
first before scaling the depth.

https://lore.kernel.org/all/[email protected]/

Thanks and BR,
Abel