2019-06-27 01:36:15

by Subhra Mazumdar

[permalink] [raw]
Subject: [PATCH v3 3/7] sched: rotate the cpu search window for better spread

Rotate the cpu search window for better spread of threads. This will ensure
an idle cpu will quickly be found if one exists.

Signed-off-by: subhra mazumdar <[email protected]>
---
kernel/sched/fair.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b58f08f..c1ca88e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6188,7 +6188,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
u64 avg_cost, avg_idle;
u64 time, cost;
s64 delta;
- int cpu, limit, floor, nr = INT_MAX;
+ int cpu, limit, floor, target_tmp, nr = INT_MAX;

this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
if (!this_sd)
@@ -6219,9 +6219,15 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
}
}

+ if (per_cpu(next_cpu, target) != -1)
+ target_tmp = per_cpu(next_cpu, target);
+ else
+ target_tmp = target;
+
time = local_clock();

- for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
+ for_each_cpu_wrap(cpu, sched_domain_span(sd), target_tmp) {
+ per_cpu(next_cpu, target) = cpu;
if (!--nr)
return -1;
if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
--
2.9.3


2019-06-28 11:55:21

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH v3 3/7] sched: rotate the cpu search window for better spread

* subhra mazumdar <[email protected]> [2019-06-26 18:29:15]:

> Rotate the cpu search window for better spread of threads. This will ensure
> an idle cpu will quickly be found if one exists.

While rotating the cpu search window is good, not sure if this can find a
idle cpu quickly. The probability of finding an idle cpu still should remain
the same. No?

>
> Signed-off-by: subhra mazumdar <[email protected]>
> ---
> kernel/sched/fair.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> @@ -6219,9 +6219,15 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> }
> }
>
> + if (per_cpu(next_cpu, target) != -1)
> + target_tmp = per_cpu(next_cpu, target);
> + else
> + target_tmp = target;
> +
> time = local_clock();
>
> - for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
> + for_each_cpu_wrap(cpu, sched_domain_span(sd), target_tmp) {
> + per_cpu(next_cpu, target) = cpu;

Shouldn't this assignment be outside the for loop.
With the current code,
1. We keep reassigning multiple times.
2. The last assignment happes for idle_cpu and sometimes the
assignment is for non-idle cpu.

> if (!--nr)
> return -1;
> if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
> --
> 2.9.3
>

--
Thanks and Regards
Srikar Dronamraju

2019-06-28 18:37:35

by Parth Shah

[permalink] [raw]
Subject: Re: [PATCH v3 3/7] sched: rotate the cpu search window for better spread

Hi Subhra,

I ran your patch series on IBM POWER systems and this is what I have observed.

On 6/27/19 6:59 AM, subhra mazumdar wrote:
> Rotate the cpu search window for better spread of threads. This will ensure
> an idle cpu will quickly be found if one exists.
>
> Signed-off-by: subhra mazumdar <[email protected]>
> ---
> kernel/sched/fair.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b58f08f..c1ca88e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6188,7 +6188,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> u64 avg_cost, avg_idle;
> u64 time, cost;
> s64 delta;
> - int cpu, limit, floor, nr = INT_MAX;
> + int cpu, limit, floor, target_tmp, nr = INT_MAX;
>
> this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
> if (!this_sd)
> @@ -6219,9 +6219,15 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> }
> }
>
> + if (per_cpu(next_cpu, target) != -1)
> + target_tmp = per_cpu(next_cpu, target);
> + else
> + target_tmp = target;
> +
> time = local_clock();
>
> - for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
> + for_each_cpu_wrap(cpu, sched_domain_span(sd), target_tmp) {
> + per_cpu(next_cpu, target) = cpu;

This leads to a problem of cache hotness.
AFAIU, in most cases, `target = prev_cpu` of the task being woken up and
selecting an idle CPU nearest to the prev_cpu is favorable.
But since this doesn't keep track of last idle cpu per task, it fails to find the
nearest possible idle CPU in cases when the task is being woken up after other
scheduled task.

Consider below scenario:
=======================
- System: 44 cores, 88 CPUs
- 44 CPU intensive tasks pinned to any CPU in each core. This makes 'select_idle_core' return -1;
- Consider below shown timeline:
- Task T1 runs for time 0-5 on CPU0
- Then task T2 runs for time 6-10 on CPU0
- T1 wakes at time 7, with target=0, and setting
per_cpu(next_cpu,0)= 4 (let's say cpu 0-3 are busy at the time)
- So T1 runs for time 7-12 on CPU4.
- Meanwhile, T2 wakes at time 11, with target=0, but per_cpu(next_cpu, 0) is 4. So starts
searching from CPU4 and ends up at CPU 8 or so even though CPU0 is free at that time.
- This goes on further far away from the prev_cpu on each such iteration unless it wraps around after 44 CPUs.


^T1 T1$ ^T2 T2$
CPU 0 | | | |
-----------------------------------------------------------------------------
0 5 6 10 time----->

^T1 T1$
CPU 4 | |
-----------------------------------------------------------------------------
7 12 time----->

^T2 T2$
CPU 8 | |
-----------------------------------------------------------------------------
11 time------>

Symbols: ^Tn: Task Tn wake-up, Tn$: task Tn sleeps

Above example indicates the both the task T1 and T2 suffers from cache hotness in further iterations.


> if (!--nr)
> return -1;
> if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
>


Best
Parth

2019-06-28 22:22:14

by Subhra Mazumdar

[permalink] [raw]
Subject: Re: [PATCH v3 3/7] sched: rotate the cpu search window for better spread


On 6/28/19 11:36 AM, Parth Shah wrote:
> Hi Subhra,
>
> I ran your patch series on IBM POWER systems and this is what I have observed.
>
> On 6/27/19 6:59 AM, subhra mazumdar wrote:
>> Rotate the cpu search window for better spread of threads. This will ensure
>> an idle cpu will quickly be found if one exists.
>>
>> Signed-off-by: subhra mazumdar <[email protected]>
>> ---
>> kernel/sched/fair.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index b58f08f..c1ca88e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6188,7 +6188,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>> u64 avg_cost, avg_idle;
>> u64 time, cost;
>> s64 delta;
>> - int cpu, limit, floor, nr = INT_MAX;
>> + int cpu, limit, floor, target_tmp, nr = INT_MAX;
>>
>> this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
>> if (!this_sd)
>> @@ -6219,9 +6219,15 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>> }
>> }
>>
>> + if (per_cpu(next_cpu, target) != -1)
>> + target_tmp = per_cpu(next_cpu, target);
>> + else
>> + target_tmp = target;
>> +
>> time = local_clock();
>>
>> - for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
>> + for_each_cpu_wrap(cpu, sched_domain_span(sd), target_tmp) {
>> + per_cpu(next_cpu, target) = cpu;
> This leads to a problem of cache hotness.
> AFAIU, in most cases, `target = prev_cpu` of the task being woken up and
> selecting an idle CPU nearest to the prev_cpu is favorable.
> But since this doesn't keep track of last idle cpu per task, it fails to find the
> nearest possible idle CPU in cases when the task is being woken up after other
> scheduled task.
>
I had tested hackbench on SPARC SMT8 (see numbers in cover letter) and
showed improvement with this. Firstly it's a tradeoff between cache effects
vs time spent in searching idle CPU, and both x86 and SPARC numbers showed
tradeoff is worth it. Secondly there is a lot of cache affinity logic
in the beginning of select_idle_sibling. If select_idle_cpu is still called
that means we are past that and want any idle cpu. I don't think waking up
close to the prev cpu is the intention for starting search from there,
rather it is to spread threads across all cpus so that no cpu gets
victimized as there is no atomicity. Prev cpu just acts a good seed to do
the spreading.

Thanks,
Subhra

2019-06-28 22:41:20

by Subhra Mazumdar

[permalink] [raw]
Subject: Re: [PATCH v3 3/7] sched: rotate the cpu search window for better spread


On 6/28/19 4:54 AM, Srikar Dronamraju wrote:
> * subhra mazumdar <[email protected]> [2019-06-26 18:29:15]:
>
>> Rotate the cpu search window for better spread of threads. This will ensure
>> an idle cpu will quickly be found if one exists.
> While rotating the cpu search window is good, not sure if this can find a
> idle cpu quickly. The probability of finding an idle cpu still should remain
> the same. No?
>
>> Signed-off-by: subhra mazumdar <[email protected]>
>> ---
>> kernel/sched/fair.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> @@ -6219,9 +6219,15 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>> }
>> }
>>
>> + if (per_cpu(next_cpu, target) != -1)
>> + target_tmp = per_cpu(next_cpu, target);
>> + else
>> + target_tmp = target;
>> +
>> time = local_clock();
>>
>> - for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
>> + for_each_cpu_wrap(cpu, sched_domain_span(sd), target_tmp) {
>> + per_cpu(next_cpu, target) = cpu;
> Shouldn't this assignment be outside the for loop.
> With the current code,
> 1. We keep reassigning multiple times.
> 2. The last assignment happes for idle_cpu and sometimes the
> assignment is for non-idle cpu.
We want the last assignment irrespective of it was an idle cpu or not since
in both cases we want to track the boundary of search.

Thanks,
Subhra