2020-06-24 15:45:21

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH] sched/cfs: change initial value of runnable_avg

Some performance regression on reaim benchmark have been raised with
commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")

The problem comes from the init value of runnable_avg which is initialized
with max value. This can be a problem if the newly forked task is finally
a short task because the group of CPUs is wrongly set to overloaded and
tasks are pulled less agressively.

Set initial value of runnable_avg equals to util_avg to reflect that there
is no waiting time so far.

Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0424a0af5f87..45e467bf42fc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -806,7 +806,7 @@ void post_init_entity_util_avg(struct task_struct *p)
}
}

- sa->runnable_avg = cpu_scale;
+ sa->runnable_avg = sa->util_avg;

if (p->sched_class != &fair_sched_class) {
/*
--
2.17.1


2020-06-24 16:35:02

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/cfs: change initial value of runnable_avg


On 24/06/20 16:44, Vincent Guittot wrote:
> Some performance regression on reaim benchmark have been raised with
> commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
>
> The problem comes from the init value of runnable_avg which is initialized
> with max value. This can be a problem if the newly forked task is finally
> a short task because the group of CPUs is wrongly set to overloaded and
> tasks are pulled less agressively.
>
> Set initial value of runnable_avg equals to util_avg to reflect that there
> is no waiting time so far.
>
> Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0424a0af5f87..45e467bf42fc 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -806,7 +806,7 @@ void post_init_entity_util_avg(struct task_struct *p)
> }
> }
>
> - sa->runnable_avg = cpu_scale;
> + sa->runnable_avg = sa->util_avg;

IIRC we didn't go for this initially because hackbench behaved slightly
worse with it. Did we end up re-evaluating this? Also, how does this reaim
benchmark behave with it? I *think* the table from that regression thread
says it behaves better, but I had a hard time parsing it (seems like it got
damaged by line wrapping)

Conceptually I'm all for it, so as long as the tests back it up:
Reviewed-by: Valentin Schneider <[email protected]>

>
> if (p->sched_class != &fair_sched_class) {
> /*

2020-06-24 16:42:14

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/cfs: change initial value of runnable_avg

On Wed, 24 Jun 2020 at 18:32, Valentin Schneider
<[email protected]> wrote:
>
>
> On 24/06/20 16:44, Vincent Guittot wrote:
> > Some performance regression on reaim benchmark have been raised with
> > commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
> >
> > The problem comes from the init value of runnable_avg which is initialized
> > with max value. This can be a problem if the newly forked task is finally
> > a short task because the group of CPUs is wrongly set to overloaded and
> > tasks are pulled less agressively.
> >
> > Set initial value of runnable_avg equals to util_avg to reflect that there
> > is no waiting time so far.
> >
> > Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
> > Reported-by: kernel test robot <[email protected]>
> > Signed-off-by: Vincent Guittot <[email protected]>
> > ---
> > kernel/sched/fair.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0424a0af5f87..45e467bf42fc 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -806,7 +806,7 @@ void post_init_entity_util_avg(struct task_struct *p)
> > }
> > }
> >
> > - sa->runnable_avg = cpu_scale;
> > + sa->runnable_avg = sa->util_avg;
>
> IIRC we didn't go for this initially because hackbench behaved slightly
> worse with it. Did we end up re-evaluating this? Also, how does this reaim

yes. hackbench was slightly worse and it was the only inputs at that
time, that's why we decided to keep the other init. Since, Rong
reported a significant regression for reaim which is fixed by this
patch

> benchmark behave with it? I *think* the table from that regression thread
> says it behaves better, but I had a hard time parsing it (seems like it got
> damaged by line wrapping)
>
> Conceptually I'm all for it, so as long as the tests back it up:
> Reviewed-by: Valentin Schneider <[email protected]>
>
> >
> > if (p->sched_class != &fair_sched_class) {
> > /*
>

2020-06-25 09:57:16

by Holger Hoffstätte

[permalink] [raw]
Subject: Re: [PATCH] sched/cfs: change initial value of runnable_avg

On 2020-06-24 17:44, Vincent Guittot wrote:
> Some performance regression on reaim benchmark have been raised with
> commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
>
> The problem comes from the init value of runnable_avg which is initialized
> with max value. This can be a problem if the newly forked task is finally
> a short task because the group of CPUs is wrongly set to overloaded and
> tasks are pulled less agressively.
>
> Set initial value of runnable_avg equals to util_avg to reflect that there
> is no waiting time so far.
>
> Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0424a0af5f87..45e467bf42fc 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -806,7 +806,7 @@ void post_init_entity_util_avg(struct task_struct *p)
> }
> }
>
> - sa->runnable_avg = cpu_scale;
> + sa->runnable_avg = sa->util_avg;
>
> if (p->sched_class != &fair_sched_class) {
> /*
>

Something is wrong here. I woke up my machine from suspend-to-RAM this morning
and saw that a completely idle machine had a loadavg of ~7. According to my
monitoring system this happened to be the loadavg right before I suspended.
I've reverted this, rebooted, created a loadavg >0, suspended and after wake up
loadavg again correctly ranges between 0 and whatever, as expected.

-h

2020-06-25 10:09:40

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/cfs: change initial value of runnable_avg

On Thu, 25 Jun 2020 at 11:24, Holger Hoffstätte
<[email protected]> wrote:
>
> On 2020-06-24 17:44, Vincent Guittot wrote:
> > Some performance regression on reaim benchmark have been raised with
> > commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
> >
> > The problem comes from the init value of runnable_avg which is initialized
> > with max value. This can be a problem if the newly forked task is finally
> > a short task because the group of CPUs is wrongly set to overloaded and
> > tasks are pulled less agressively.
> >
> > Set initial value of runnable_avg equals to util_avg to reflect that there
> > is no waiting time so far.
> >
> > Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
> > Reported-by: kernel test robot <[email protected]>
> > Signed-off-by: Vincent Guittot <[email protected]>
> > ---
> > kernel/sched/fair.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0424a0af5f87..45e467bf42fc 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -806,7 +806,7 @@ void post_init_entity_util_avg(struct task_struct *p)
> > }
> > }
> >
> > - sa->runnable_avg = cpu_scale;
> > + sa->runnable_avg = sa->util_avg;
> >
> > if (p->sched_class != &fair_sched_class) {
> > /*
> >
>
> Something is wrong here. I woke up my machine from suspend-to-RAM this morning
> and saw that a completely idle machine had a loadavg of ~7. According to my

Just to make sure: Are you speaking about loadavg that is output by
/proc/loadavg or load_avg which is the PELT load ?
The output of /proc/loadavg hasn't any link with runnable_avg. The 1st
one monitors nr_running at 5 sec interval whereas the other one is a
geometrics series of the weight of runnable tasks with a half time of
32ms

> monitoring system this happened to be the loadavg right before I suspended.
> I've reverted this, rebooted, created a loadavg >0, suspended and after wake up
> loadavg again correctly ranges between 0 and whatever, as expected.

I'm not sure to catch why ~7 is bad compared to correctly ranges
between 0 and whatever. Isn't ~7 part of the whatever ?

Vincent

>
> -h

2020-06-25 10:45:07

by Holger Hoffstätte

[permalink] [raw]
Subject: Re: [PATCH] sched/cfs: change initial value of runnable_avg

On 2020-06-25 11:56, Vincent Guittot wrote:
> On Thu, 25 Jun 2020 at 11:24, Holger Hoffstätte
> <[email protected]> wrote:
>>
>> On 2020-06-24 17:44, Vincent Guittot wrote:
>>> Some performance regression on reaim benchmark have been raised with
>>> commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
>>>
>>> The problem comes from the init value of runnable_avg which is initialized
>>> with max value. This can be a problem if the newly forked task is finally
>>> a short task because the group of CPUs is wrongly set to overloaded and
>>> tasks are pulled less agressively.
>>>
>>> Set initial value of runnable_avg equals to util_avg to reflect that there
>>> is no waiting time so far.
>>>
>>> Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
>>> Reported-by: kernel test robot <[email protected]>
>>> Signed-off-by: Vincent Guittot <[email protected]>
>>> ---
>>> kernel/sched/fair.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 0424a0af5f87..45e467bf42fc 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -806,7 +806,7 @@ void post_init_entity_util_avg(struct task_struct *p)
>>> }
>>> }
>>>
>>> - sa->runnable_avg = cpu_scale;
>>> + sa->runnable_avg = sa->util_avg;
>>>
>>> if (p->sched_class != &fair_sched_class) {
>>> /*
>>>
>>
>> Something is wrong here. I woke up my machine from suspend-to-RAM this morning
>> and saw that a completely idle machine had a loadavg of ~7. According to my
>
> Just to make sure: Are you speaking about loadavg that is output by
> /proc/loadavg or load_avg which is the PELT load ?

/proc/loadavg

>> monitoring system this happened to be the loadavg right before I suspended.
>> I've reverted this, rebooted, created a loadavg >0, suspended and after wake up
>> loadavg again correctly ranges between 0 and whatever, as expected.
>
> I'm not sure to catch why ~7 is bad compared to correctly ranges
> between 0 and whatever. Isn't ~7 part of the whatever ?

After wakeup the _baseline_ for loadavg seemed to be the last value before suspend,
not 0. The 7 then was the base loadavg for a _mostly idle machine_ (just reading
mail etc.), i.e. it never went below said baseline again, no matter the
_actual_ load.

Here's an image: https://imgur.com/a/kd2stqO

Before 02:00 last night the load was ~7 (compiled something), then all processes
were terminated and the machine was suspended. After wakeup the machine was mostly
idle (9am..11am), yet measured loadavg continued with the same value as before.
I didn't notice this right away since my CPU meter on the desktop didn't show any
*actual* activity (because there was none). The spike at ~11am is the revert/reboot.
After that loadavg became normal again, i.e. representative of the actual load,
even after suspend/resume cycles.
I suspend/resume every night and the only thing that changed recently was this
patch, so.. :)

-h

2020-06-25 11:58:17

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: sched/urgent] sched/cfs: change initial value of runnable_avg

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 68f7b5cc835de7d5b6c7696533c126018171e793
Gitweb: https://git.kernel.org/tip/68f7b5cc835de7d5b6c7696533c126018171e793
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Jun 2020 17:44:22 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 25 Jun 2020 13:45:38 +02:00

sched/cfs: change initial value of runnable_avg

Some performance regression on reaim benchmark have been raised with
commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")

The problem comes from the init value of runnable_avg which is initialized
with max value. This can be a problem if the newly forked task is finally
a short task because the group of CPUs is wrongly set to overloaded and
tasks are pulled less agressively.

Set initial value of runnable_avg equals to util_avg to reflect that there
is no waiting time so far.

Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cbcb2f7..658aa7a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -806,7 +806,7 @@ void post_init_entity_util_avg(struct task_struct *p)
}
}

- sa->runnable_avg = cpu_scale;
+ sa->runnable_avg = sa->util_avg;

if (p->sched_class != &fair_sched_class) {
/*

2020-06-25 12:09:50

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/cfs: change initial value of runnable_avg

On Thu, 25 Jun 2020 at 12:42, Holger Hoffstätte
<[email protected]> wrote:
>
> On 2020-06-25 11:56, Vincent Guittot wrote:
> > On Thu, 25 Jun 2020 at 11:24, Holger Hoffstätte
> > <[email protected]> wrote:
> >>
> >> On 2020-06-24 17:44, Vincent Guittot wrote:
> >>> Some performance regression on reaim benchmark have been raised with
> >>> commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
> >>>
> >>> The problem comes from the init value of runnable_avg which is initialized
> >>> with max value. This can be a problem if the newly forked task is finally
> >>> a short task because the group of CPUs is wrongly set to overloaded and
> >>> tasks are pulled less agressively.
> >>>
> >>> Set initial value of runnable_avg equals to util_avg to reflect that there
> >>> is no waiting time so far.
> >>>
> >>> Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
> >>> Reported-by: kernel test robot <[email protected]>
> >>> Signed-off-by: Vincent Guittot <[email protected]>
> >>> ---
> >>> kernel/sched/fair.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index 0424a0af5f87..45e467bf42fc 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -806,7 +806,7 @@ void post_init_entity_util_avg(struct task_struct *p)
> >>> }
> >>> }
> >>>
> >>> - sa->runnable_avg = cpu_scale;
> >>> + sa->runnable_avg = sa->util_avg;
> >>>
> >>> if (p->sched_class != &fair_sched_class) {
> >>> /*
> >>>
> >>
> >> Something is wrong here. I woke up my machine from suspend-to-RAM this morning
> >> and saw that a completely idle machine had a loadavg of ~7. According to my
> >
> > Just to make sure: Are you speaking about loadavg that is output by
> > /proc/loadavg or load_avg which is the PELT load ?
>
> /proc/loadavg
>
> >> monitoring system this happened to be the loadavg right before I suspended.
> >> I've reverted this, rebooted, created a loadavg >0, suspended and after wake up
> >> loadavg again correctly ranges between 0 and whatever, as expected.
> >
> > I'm not sure to catch why ~7 is bad compared to correctly ranges
> > between 0 and whatever. Isn't ~7 part of the whatever ?
>
> After wakeup the _baseline_ for loadavg seemed to be the last value before suspend,
> not 0. The 7 then was the base loadavg for a _mostly idle machine_ (just reading
> mail etc.), i.e. it never went below said baseline again, no matter the
> _actual_ load.
>
> Here's an image: https://imgur.com/a/kd2stqO
>
> Before 02:00 last night the load was ~7 (compiled something), then all processes
> were terminated and the machine was suspended. After wakeup the machine was mostly
> idle (9am..11am), yet measured loadavg continued with the same value as before.
> I didn't notice this right away since my CPU meter on the desktop didn't show any
> *actual* activity (because there was none). The spike at ~11am is the revert/reboot.

you have reverted only this patch ?

TBH, there is no link between these 2 metrics and I don't see how the
init value of runnable_avg can impact loadavg. As explained, loadavg
is doing a snapshot of nr_running every 5 seconds whereas the impact
of changing this init value will have disappeared in far less than
300ms most of the time.

Let me try to reproduce this on my system


> After that loadavg became normal again, i.e. representative of the actual load,
> even after suspend/resume cycles.
> I suspend/resume every night and the only thing that changed recently was this
> patch, so.. :)
>
> -h

2020-06-25 23:47:17

by Holger Hoffstätte

[permalink] [raw]
Subject: Re: [PATCH] sched/cfs: change initial value of runnable_avg

On 2020-06-25 12:42, Holger Hoffstätte wrote:
<loadavg weirdness>

Nevermind, it turned out to be something else after all.
I'm not entirely sure *what* exactly yet, but for some reason my loadavg
is high again for no good reason.
Sorry for the false alarm.

-h