2020-03-12 16:55:43

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH] sched/fair: improve spreading of utilization

During load_balancing, a group with spare capacity will try to pull some
utilizations from an overloaded group. In such case, the load balance
looks for the runqueue with the highest utilization. Nevertheless, it
should also ensure that there are some pending tasks to pull otherwise
the load balance will fail to pull a task and the spread of the load will
be delayed.

This situation is quite transient but it's possible to highlight the
effect with a short run of sysbench test so the time to spread task impacts
the global result significantly.

Below are the average results for 15 iterations on an arm64 octo core:
sysbench --test=cpu --num-threads=8 --max-requests=1000 run

tip/sched/core +patchset
total time: 172ms 158ms
per-request statistics:
avg: 1.337ms 1.244ms
max: 21.191ms 10.753ms

The average max doesn't fully reflect the wide spread of the value which
ranges from 1.350ms to more than 41ms for the tip/sched/core and from
1.350ms to 21ms with the patch.

Other factors like waiting for an idle load balance or cache hotness
can delay the spreading of the tasks which explains why we can still
have up to 21ms with the patch.

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3c8a379c357e..97a0307312d9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
case migrate_util:
util = cpu_util(cpu_of(rq));

+ /*
+ * Don't try to pull utilization from a CPU with one
+ * running task. Whatever its utilization, we will fail
+ * detach the task.
+ */
+ if (nr_running <= 1)
+ continue;
+
if (busiest_util < util) {
busiest_util = util;
busiest = rq;
--
2.17.1


2020-03-13 10:27:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization

On Thu, Mar 12, 2020 at 05:54:29PM +0100, Vincent Guittot wrote:
> During load_balancing, a group with spare capacity will try to pull some
> utilizations from an overloaded group. In such case, the load balance
> looks for the runqueue with the highest utilization. Nevertheless, it
> should also ensure that there are some pending tasks to pull otherwise
> the load balance will fail to pull a task and the spread of the load will
> be delayed.
>
> This situation is quite transient but it's possible to highlight the
> effect with a short run of sysbench test so the time to spread task impacts
> the global result significantly.
>
> Below are the average results for 15 iterations on an arm64 octo core:
> sysbench --test=cpu --num-threads=8 --max-requests=1000 run
>
> tip/sched/core +patchset
> total time: 172ms 158ms
> per-request statistics:
> avg: 1.337ms 1.244ms
> max: 21.191ms 10.753ms
>
> The average max doesn't fully reflect the wide spread of the value which
> ranges from 1.350ms to more than 41ms for the tip/sched/core and from
> 1.350ms to 21ms with the patch.
>
> Other factors like waiting for an idle load balance or cache hotness
> can delay the spreading of the tasks which explains why we can still
> have up to 21ms with the patch.

Thanks!

2020-03-13 11:02:48

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization


On Thu, Mar 12 2020, Vincent Guittot wrote:
> kernel/sched/fair.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3c8a379c357e..97a0307312d9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> case migrate_util:
> util = cpu_util(cpu_of(rq));
>
> + /*
> + * Don't try to pull utilization from a CPU with one
> + * running task. Whatever its utilization, we will fail
> + * detach the task.
> + */
> + if (nr_running <= 1)
> + continue;
> +

Doesn't this break misfit? If the busiest group is group_misfit_task, it
is totally valid for the runqueues to have a single running task -
that's the CPU-bound task we want to upmigrate.

If the busiest rq has only a single running task, we'll skip the
detach_tasks() block and go straight to the active balance bits.
Misfit balancing totally relies on this, and IMO ASYM_PACKING does
too. Looking at voluntary_active_balance(), it seems your change also
goes against the one added by
1aaf90a4b88a ("sched: Move CFS tasks to CPUs with higher capacity")

The bandaid here would be gate this 'continue' with checks against the
busiest_group_type, but that's only a loose link wrt
voluntary_active_balance().

> if (busiest_util < util) {
> busiest_util = util;
> busiest = rq;

2020-03-13 11:25:33

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization

On Fri, 13 Mar 2020 at 12:00, Valentin Schneider
<[email protected]> wrote:
>
>
> On Thu, Mar 12 2020, Vincent Guittot wrote:
> > kernel/sched/fair.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 3c8a379c357e..97a0307312d9 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > case migrate_util:
> > util = cpu_util(cpu_of(rq));
> >
> > + /*
> > + * Don't try to pull utilization from a CPU with one
> > + * running task. Whatever its utilization, we will fail
> > + * detach the task.
> > + */
> > + if (nr_running <= 1)
> > + continue;
> > +
>
> Doesn't this break misfit? If the busiest group is group_misfit_task, it
> is totally valid for the runqueues to have a single running task -
> that's the CPU-bound task we want to upmigrate.

group_misfit_task has its dedicated migrate_misfit case

>
> If the busiest rq has only a single running task, we'll skip the
> detach_tasks() block and go straight to the active balance bits.
> Misfit balancing totally relies on this, and IMO ASYM_PACKING does
> too. Looking at voluntary_active_balance(), it seems your change also
> goes against the one added by
> 1aaf90a4b88a ("sched: Move CFS tasks to CPUs with higher capacity")
>
> The bandaid here would be gate this 'continue' with checks against the
> busiest_group_type, but that's only a loose link wrt
> voluntary_active_balance().
>
> > if (busiest_util < util) {
> > busiest_util = util;
> > busiest = rq;

2020-03-13 11:29:46

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization


On Fri, Mar 13 2020, Vincent Guittot wrote:
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 3c8a379c357e..97a0307312d9 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>> > case migrate_util:
>> > util = cpu_util(cpu_of(rq));
>> >
>> > + /*
>> > + * Don't try to pull utilization from a CPU with one
>> > + * running task. Whatever its utilization, we will fail
>> > + * detach the task.
>> > + */
>> > + if (nr_running <= 1)
>> > + continue;
>> > +
>>
>> Doesn't this break misfit? If the busiest group is group_misfit_task, it
>> is totally valid for the runqueues to have a single running task -
>> that's the CPU-bound task we want to upmigrate.
>
> group_misfit_task has its dedicated migrate_misfit case
>

Doh, yes, sorry. I think my rambling on ASYM_PACKING / reduced capacity
migration is still relevant, though.

>>
>> If the busiest rq has only a single running task, we'll skip the
>> detach_tasks() block and go straight to the active balance bits.
>> Misfit balancing totally relies on this, and IMO ASYM_PACKING does
>> too. Looking at voluntary_active_balance(), it seems your change also
>> goes against the one added by
>> 1aaf90a4b88a ("sched: Move CFS tasks to CPUs with higher capacity")
>>
>> The bandaid here would be gate this 'continue' with checks against the
>> busiest_group_type, but that's only a loose link wrt
>> voluntary_active_balance().
>>
>> > if (busiest_util < util) {
>> > busiest_util = util;
>> > busiest = rq;

2020-03-13 12:43:55

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization


On Fri, Mar 13 2020, Valentin Schneider wrote:
> On Fri, Mar 13 2020, Vincent Guittot wrote:
>>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> > index 3c8a379c357e..97a0307312d9 100644
>>> > --- a/kernel/sched/fair.c
>>> > +++ b/kernel/sched/fair.c
>>> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>>> > case migrate_util:
>>> > util = cpu_util(cpu_of(rq));
>>> >
>>> > + /*
>>> > + * Don't try to pull utilization from a CPU with one
>>> > + * running task. Whatever its utilization, we will fail
>>> > + * detach the task.
>>> > + */
>>> > + if (nr_running <= 1)
>>> > + continue;
>>> > +
>>>
>>> Doesn't this break misfit? If the busiest group is group_misfit_task, it
>>> is totally valid for the runqueues to have a single running task -
>>> that's the CPU-bound task we want to upmigrate.
>>
>> group_misfit_task has its dedicated migrate_misfit case
>>
>
> Doh, yes, sorry. I think my rambling on ASYM_PACKING / reduced capacity
> migration is still relevant, though.
>

And with more coffee that's another Doh, ASYM_PACKING would end up as
migrate_task. So this only affects the reduced capacity migration, which
might be hard to notice in benchmarks.

2020-03-13 12:56:37

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization

On Fri, 13 Mar 2020 at 13:42, Valentin Schneider
<[email protected]> wrote:
>
>
> On Fri, Mar 13 2020, Valentin Schneider wrote:
> > On Fri, Mar 13 2020, Vincent Guittot wrote:
> >>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> > index 3c8a379c357e..97a0307312d9 100644
> >>> > --- a/kernel/sched/fair.c
> >>> > +++ b/kernel/sched/fair.c
> >>> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >>> > case migrate_util:
> >>> > util = cpu_util(cpu_of(rq));
> >>> >
> >>> > + /*
> >>> > + * Don't try to pull utilization from a CPU with one
> >>> > + * running task. Whatever its utilization, we will fail
> >>> > + * detach the task.
> >>> > + */
> >>> > + if (nr_running <= 1)
> >>> > + continue;
> >>> > +
> >>>
> >>> Doesn't this break misfit? If the busiest group is group_misfit_task, it
> >>> is totally valid for the runqueues to have a single running task -
> >>> that's the CPU-bound task we want to upmigrate.
> >>
> >> group_misfit_task has its dedicated migrate_misfit case
> >>
> >
> > Doh, yes, sorry. I think my rambling on ASYM_PACKING / reduced capacity
> > migration is still relevant, though.
> >
>
> And with more coffee that's another Doh, ASYM_PACKING would end up as
> migrate_task. So this only affects the reduced capacity migration, which

yes ASYM_PACKING uses migrate_task and the case of reduced capacity
would use it too and would not be impacted by this patch. I say
"would" because the original rework of load balance got rid of this
case. I'm going to prepare a separate fix for this

> might be hard to notice in benchmarks.

2020-03-13 14:27:37

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization

On Fri, 13 Mar 2020 at 13:55, Vincent Guittot
<[email protected]> wrote:
>
> On Fri, 13 Mar 2020 at 13:42, Valentin Schneider
> <[email protected]> wrote:
> >
> >
> > On Fri, Mar 13 2020, Valentin Schneider wrote:
> > > On Fri, Mar 13 2020, Vincent Guittot wrote:
> > >>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >>> > index 3c8a379c357e..97a0307312d9 100644
> > >>> > --- a/kernel/sched/fair.c
> > >>> > +++ b/kernel/sched/fair.c
> > >>> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > >>> > case migrate_util:
> > >>> > util = cpu_util(cpu_of(rq));
> > >>> >
> > >>> > + /*
> > >>> > + * Don't try to pull utilization from a CPU with one
> > >>> > + * running task. Whatever its utilization, we will fail
> > >>> > + * detach the task.
> > >>> > + */
> > >>> > + if (nr_running <= 1)
> > >>> > + continue;
> > >>> > +
> > >>>
> > >>> Doesn't this break misfit? If the busiest group is group_misfit_task, it
> > >>> is totally valid for the runqueues to have a single running task -
> > >>> that's the CPU-bound task we want to upmigrate.
> > >>
> > >> group_misfit_task has its dedicated migrate_misfit case
> > >>
> > >
> > > Doh, yes, sorry. I think my rambling on ASYM_PACKING / reduced capacity
> > > migration is still relevant, though.
> > >
> >
> > And with more coffee that's another Doh, ASYM_PACKING would end up as
> > migrate_task. So this only affects the reduced capacity migration, which
>
> yes ASYM_PACKING uses migrate_task and the case of reduced capacity
> would use it too and would not be impacted by this patch. I say
> "would" because the original rework of load balance got rid of this
> case. I'm going to prepare a separate fix for this

After more thought, I think that we are safe for reduced capacity too
because this is handled in the migrate_load case. In my previous
reply, I was thinking of the case where rq is not overloaded but cpu
has reduced capacity which is not handled. But in such case, we don't
have to force the migration of the task because there is still enough
capacity otherwise rq would be overloaded and we are back to the case
already handled

>
> > might be hard to notice in benchmarks.

2020-03-13 15:49:07

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization


On Fri, Mar 13 2020, Vincent Guittot wrote:
>> > And with more coffee that's another Doh, ASYM_PACKING would end up as
>> > migrate_task. So this only affects the reduced capacity migration, which
>>
>> yes ASYM_PACKING uses migrate_task and the case of reduced capacity
>> would use it too and would not be impacted by this patch. I say
>> "would" because the original rework of load balance got rid of this
>> case. I'm going to prepare a separate fix for this
>
> After more thought, I think that we are safe for reduced capacity too
> because this is handled in the migrate_load case. In my previous
> reply, I was thinking of the case where rq is not overloaded but cpu
> has reduced capacity which is not handled. But in such case, we don't
> have to force the migration of the task because there is still enough
> capacity otherwise rq would be overloaded and we are back to the case
> already handled
>

Good point on the capacity reduction vs group_is_overloaded.

That said, can't we also reach this with migrate_task? Say the local
group is entirely idle, and the busiest group has a few non-idle CPUs
but they all have at most 1 running task. AFAICT we would still go to
calculate_imbalance(), and try to balance out the number of idle CPUs.

If the migration_type is migrate_util, that can't happen because of this
change. Since we have this progressive balancing strategy (tasks -> util
-> load), it's a bit odd to have this "gap" in the middle where we get
one less possibility to trigger active balance, don't you think? That
is, providing I didn't say nonsense again :)

It's not a super big deal, but I think it's nice if we can maintain a
consistent / gradual migration policy.

>>
>> > might be hard to notice in benchmarks.

2020-03-13 16:11:06

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization

On Fri, 13 Mar 2020 at 16:47, Valentin Schneider
<[email protected]> wrote:
>
>
> On Fri, Mar 13 2020, Vincent Guittot wrote:
> >> > And with more coffee that's another Doh, ASYM_PACKING would end up as
> >> > migrate_task. So this only affects the reduced capacity migration, which
> >>
> >> yes ASYM_PACKING uses migrate_task and the case of reduced capacity
> >> would use it too and would not be impacted by this patch. I say
> >> "would" because the original rework of load balance got rid of this
> >> case. I'm going to prepare a separate fix for this
> >
> > After more thought, I think that we are safe for reduced capacity too
> > because this is handled in the migrate_load case. In my previous
> > reply, I was thinking of the case where rq is not overloaded but cpu
> > has reduced capacity which is not handled. But in such case, we don't
> > have to force the migration of the task because there is still enough
> > capacity otherwise rq would be overloaded and we are back to the case
> > already handled
> >
>
> Good point on the capacity reduction vs group_is_overloaded.
>
> That said, can't we also reach this with migrate_task? Say the local

The test has only been added for migrate_util so migrate_task is not impacted

> group is entirely idle, and the busiest group has a few non-idle CPUs
> but they all have at most 1 running task. AFAICT we would still go to
> calculate_imbalance(), and try to balance out the number of idle CPUs.

such case is handled by migrate_task when we try to even the number of
tasks between groups

>
> If the migration_type is migrate_util, that can't happen because of this
> change. Since we have this progressive balancing strategy (tasks -> util
> -> load), it's a bit odd to have this "gap" in the middle where we get
> one less possibility to trigger active balance, don't you think? That
> is, providing I didn't say nonsense again :)

Right now, I can't think of a use case that could trigger such
situation because we use migrate_util when source is overloaded which
means that there is at least one waiting task and we favor this task
in priority

>
> It's not a super big deal, but I think it's nice if we can maintain a
> consistent / gradual migration policy.
>
> >>
> >> > might be hard to notice in benchmarks.

2020-03-13 16:59:05

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization


On Fri, Mar 13 2020, Vincent Guittot wrote:

>> Good point on the capacity reduction vs group_is_overloaded.
>>
>> That said, can't we also reach this with migrate_task? Say the local
>
> The test has only been added for migrate_util so migrate_task is not impacted
>
>> group is entirely idle, and the busiest group has a few non-idle CPUs
>> but they all have at most 1 running task. AFAICT we would still go to
>> calculate_imbalance(), and try to balance out the number of idle CPUs.
>
> such case is handled by migrate_task when we try to even the number of
> tasks between groups
>
>>
>> If the migration_type is migrate_util, that can't happen because of this
>> change. Since we have this progressive balancing strategy (tasks -> util
>> -> load), it's a bit odd to have this "gap" in the middle where we get
>> one less possibility to trigger active balance, don't you think? That
>> is, providing I didn't say nonsense again :)
>
> Right now, I can't think of a use case that could trigger such
> situation because we use migrate_util when source is overloaded which
> means that there is at least one waiting task and we favor this task
> in priority
>

Right, what I was trying to say is that AIUI migration_type ==
migrate_task with <= 1 running task per CPU in the busiest group can
*currently* lead to a balance attempt, and thus a potential active
balance.

Consider a local group of 4 idle CPUs, and a busiest group of 3 busy 1
idle CPUs, each busy having only 1 running task. That busiest group
would be group_has_spare, so we would compute an imbalance of
(4-1) / 2 == 1 task to move. We'll proceed with the load balance, but
we'll only move things if we go through an active_balance.

My point is that if we prevent this for migrate_util, it would make
sense to prevent it for migrate_task, but it's not straightforward since
we have things like ASYM_PACKING.

>>
>> It's not a super big deal, but I think it's nice if we can maintain a
>> consistent / gradual migration policy.
>>
>> >>
>> >> > might be hard to notice in benchmarks.

2020-03-13 17:14:08

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization

On Fri, 13 Mar 2020 at 17:57, Valentin Schneider
<[email protected]> wrote:
>
>
> On Fri, Mar 13 2020, Vincent Guittot wrote:
>
> >> Good point on the capacity reduction vs group_is_overloaded.
> >>
> >> That said, can't we also reach this with migrate_task? Say the local
> >
> > The test has only been added for migrate_util so migrate_task is not impacted
> >
> >> group is entirely idle, and the busiest group has a few non-idle CPUs
> >> but they all have at most 1 running task. AFAICT we would still go to
> >> calculate_imbalance(), and try to balance out the number of idle CPUs.
> >
> > such case is handled by migrate_task when we try to even the number of
> > tasks between groups
> >
> >>
> >> If the migration_type is migrate_util, that can't happen because of this
> >> change. Since we have this progressive balancing strategy (tasks -> util
> >> -> load), it's a bit odd to have this "gap" in the middle where we get
> >> one less possibility to trigger active balance, don't you think? That
> >> is, providing I didn't say nonsense again :)
> >
> > Right now, I can't think of a use case that could trigger such
> > situation because we use migrate_util when source is overloaded which
> > means that there is at least one waiting task and we favor this task
> > in priority
> >
>
> Right, what I was trying to say is that AIUI migration_type ==
> migrate_task with <= 1 running task per CPU in the busiest group can
> *currently* lead to a balance attempt, and thus a potential active
> balance.
>
> Consider a local group of 4 idle CPUs, and a busiest group of 3 busy 1
> idle CPUs, each busy having only 1 running task. That busiest group
> would be group_has_spare, so we would compute an imbalance of
> (4-1) / 2 == 1 task to move. We'll proceed with the load balance, but
> we'll only move things if we go through an active_balance.

yes because we want to even as much as possible the number of tasks per group

>
> My point is that if we prevent this for migrate_util, it would make
> sense to prevent it for migrate_task, but it's not straightforward since

hmm but we don't want to prevent this active balance for migrate_task
because of cases like the one you mentioned above.

we might consider to finally select a CPU with only 1 running task
with migrate_util if there is no other CPU with more than 1 task. But
this would complexify the code and I don't think it's possible because
migrate_util is used to pull some utilizations from an overloaded
group which must have a CPU with a waiting task to be overloaded.

> we have things like ASYM_PACKING.
>
> >>
> >> It's not a super big deal, but I think it's nice if we can maintain a
> >> consistent / gradual migration policy.
> >>
> >> >>
> >> >> > might be hard to notice in benchmarks.

2020-03-13 17:34:43

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: improve spreading of utilization


On Fri, Mar 13 2020, Vincent Guittot wrote:
>> My point is that if we prevent this for migrate_util, it would make
>> sense to prevent it for migrate_task, but it's not straightforward since
>
> hmm but we don't want to prevent this active balance for migrate_task
> because of cases like the one you mentioned above.
>
> we might consider to finally select a CPU with only 1 running task
> with migrate_util if there is no other CPU with more than 1 task. But
> this would complexify the code and I don't think it's possible because
> migrate_util is used to pull some utilizations from an overloaded
> group which must have a CPU with a waiting task to be overloaded.
>

OK, so what we may want in the future is a tighter link between
find_busiest_queue() and voluntary_active_balance(). I don't see a neat
way of doing this right now, I'll ponder over it.

Thanks for keeping up with my rambling.

Subject: [tip: sched/core] sched/fair: Improve spreading of utilization

The following commit has been merged into the sched/core branch of tip:

Commit-ID: c32b4308295aaaaedd5beae56cb42e205ae63e58
Gitweb: https://git.kernel.org/tip/c32b4308295aaaaedd5beae56cb42e205ae63e58
Author: Vincent Guittot <[email protected]>
AuthorDate: Thu, 12 Mar 2020 17:54:29 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 20 Mar 2020 13:06:20 +01:00

sched/fair: Improve spreading of utilization

During load_balancing, a group with spare capacity will try to pull some
utilizations from an overloaded group. In such case, the load balance
looks for the runqueue with the highest utilization. Nevertheless, it
should also ensure that there are some pending tasks to pull otherwise
the load balance will fail to pull a task and the spread of the load will
be delayed.

This situation is quite transient but it's possible to highlight the
effect with a short run of sysbench test so the time to spread task impacts
the global result significantly.

Below are the average results for 15 iterations on an arm64 octo core:
sysbench --test=cpu --num-threads=8 --max-requests=1000 run

tip/sched/core +patchset
total time: 172ms 158ms
per-request statistics:
avg: 1.337ms 1.244ms
max: 21.191ms 10.753ms

The average max doesn't fully reflect the wide spread of the value which
ranges from 1.350ms to more than 41ms for the tip/sched/core and from
1.350ms to 21ms with the patch.

Other factors like waiting for an idle load balance or cache hotness
can delay the spreading of the tasks which explains why we can still
have up to 21ms with the patch.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7aaae2..783356f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9313,6 +9313,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
case migrate_util:
util = cpu_util(cpu_of(rq));

+ /*
+ * Don't try to pull utilization from a CPU with one
+ * running task. Whatever its utilization, we will fail
+ * detach the task.
+ */
+ if (nr_running <= 1)
+ continue;
+
if (busiest_util < util) {
busiest_util = util;
busiest = rq;