2020-04-22 15:15:40

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH] sched/pelt: sync util/runnable_sum with PELT window when propagating

update_tg_cfs_util/runnable() propagate the impact of the attach/detach of
an entity down into the cfs_rq hierarchy which must keep the sync with
the current pelt window.

Even if we can't sync child rq and its group se, we can sync the group se
and parent cfs_rq with current PELT window. In fact, we must keep them sync
in order to stay also synced with others se and group se that are already
attached to the cfs_rq.

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 26 ++++++--------------------
1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..ca6aa89c88f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3441,52 +3441,38 @@ static inline void
update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
+ u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;

/* Nothing to update */
if (!delta)
return;

- /*
- * The relation between sum and avg is:
- *
- * LOAD_AVG_MAX - 1024 + sa->period_contrib
- *
- * however, the PELT windows are not aligned between grq and gse.
- */
-
/* Set new sched_entity's utilization */
se->avg.util_avg = gcfs_rq->avg.util_avg;
- se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX;
+ se->avg.util_sum = se->avg.util_avg * divider;

/* Update parent cfs_rq utilization */
add_positive(&cfs_rq->avg.util_avg, delta);
- cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * LOAD_AVG_MAX;
+ cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
}

static inline void
update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
long delta = gcfs_rq->avg.runnable_avg - se->avg.runnable_avg;
+ u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;

/* Nothing to update */
if (!delta)
return;

- /*
- * The relation between sum and avg is:
- *
- * LOAD_AVG_MAX - 1024 + sa->period_contrib
- *
- * however, the PELT windows are not aligned between grq and gse.
- */
-
/* Set new sched_entity's runnable */
se->avg.runnable_avg = gcfs_rq->avg.runnable_avg;
- se->avg.runnable_sum = se->avg.runnable_avg * LOAD_AVG_MAX;
+ se->avg.runnable_sum = se->avg.runnable_avg * divider;

/* Update parent cfs_rq runnable */
add_positive(&cfs_rq->avg.runnable_avg, delta);
- cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * LOAD_AVG_MAX;
+ cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
}

static inline void
--
2.17.1


2020-04-23 14:32:47

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/pelt: sync util/runnable_sum with PELT window when propagating



On 22/04/2020 17:14, Vincent Guittot wrote:
> update_tg_cfs_util/runnable() propagate the impact of the attach/detach of
> an entity down into the cfs_rq hierarchy which must keep the sync with
> the current pelt window.
>
> Even if we can't sync child rq and its group se, we can sync the group se

So we have

gcfs --> tg --> gse
________________|
|
V

cfs ---> tg (root)

|
V

rq


here. What is 'child rq' for 'group se' here? I guess 'parent cfs_rq' is
cfs_rq.

> and parent cfs_rq with current PELT window. In fact, we must keep them sync
> in order to stay also synced with others se and group se that are already
> attached to the cfs_rq.
>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
> kernel/sched/fair.c | 26 ++++++--------------------
> 1 file changed, 6 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02f323b85b6d..ca6aa89c88f2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3441,52 +3441,38 @@ static inline void
> update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
> {
> long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
> + u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;
>
> /* Nothing to update */
> if (!delta)
> return;
>
> - /*
> - * The relation between sum and avg is:
> - *
> - * LOAD_AVG_MAX - 1024 + sa->period_contrib
> - *
> - * however, the PELT windows are not aligned between grq and gse.
> - */
> -
> /* Set new sched_entity's utilization */
> se->avg.util_avg = gcfs_rq->avg.util_avg;
> - se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX;
> + se->avg.util_sum = se->avg.util_avg * divider;

divider uses cfs_rq but we sync se->avg.util_avg with gcfs_rq here.

But since avg.period_contrib of cfs_rq and gcfs_rq are the same this
should work.

> /* Update parent cfs_rq utilization */
> add_positive(&cfs_rq->avg.util_avg, delta);
> - cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * LOAD_AVG_MAX;
> + cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;

Looks like that avg.last_update_time of se (group entity), it's gcfs_rq
and cfs_rq is always the same in update_tg_cfs_[util\|runnable].

So that means the PELT windows are aligned for cfs_rqs and group se's?

And if we want to enforce this for cfs_rq and task, we have
sync_entity_load_avg().

[...]

2020-04-23 16:22:56

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/pelt: sync util/runnable_sum with PELT window when propagating

On Thu, 23 Apr 2020 at 16:30, Dietmar Eggemann <[email protected]> wrote:
>
>
>
> On 22/04/2020 17:14, Vincent Guittot wrote:
> > update_tg_cfs_util/runnable() propagate the impact of the attach/detach of
> > an entity down into the cfs_rq hierarchy which must keep the sync with
> > the current pelt window.
> >
> > Even if we can't sync child rq and its group se, we can sync the group se
>
> So we have
>
> gcfs --> tg --> gse
> ________________|
> |
> V
>
> cfs ---> tg (root)
>
> |
> V
>
> rq
>

child cfs_rq aka gcfs_rq
|
gse: group entity that represents child cfs_rq in parent cfs_rq
|
v
parent cfs_rq aka cfs_rq

>
> here. What is 'child rq' for 'group se' here? I guess 'parent cfs_rq' is
> cfs_rq.
>
> > and parent cfs_rq with current PELT window. In fact, we must keep them sync
> > in order to stay also synced with others se and group se that are already
> > attached to the cfs_rq.
> >
> > Signed-off-by: Vincent Guittot <[email protected]>
> > ---
> > kernel/sched/fair.c | 26 ++++++--------------------
> > 1 file changed, 6 insertions(+), 20 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 02f323b85b6d..ca6aa89c88f2 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3441,52 +3441,38 @@ static inline void
> > update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
> > {
> > long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
> > + u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;
> >
> > /* Nothing to update */
> > if (!delta)
> > return;
> >
> > - /*
> > - * The relation between sum and avg is:
> > - *
> > - * LOAD_AVG_MAX - 1024 + sa->period_contrib
> > - *
> > - * however, the PELT windows are not aligned between grq and gse.
> > - */
> > -
> > /* Set new sched_entity's utilization */
> > se->avg.util_avg = gcfs_rq->avg.util_avg;
> > - se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX;
> > + se->avg.util_sum = se->avg.util_avg * divider;
>
> divider uses cfs_rq but we sync se->avg.util_avg with gcfs_rq here.

we sync the util_avg of gse with the new util_avg of gcfs_rq but gse
is attached to cfs_rq and as a result we have to use cfs_rq's
period_contrib

>
> But since avg.period_contrib of cfs_rq and gcfs_rq are the same this
> should work.
>
> > /* Update parent cfs_rq utilization */
> > add_positive(&cfs_rq->avg.util_avg, delta);
> > - cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * LOAD_AVG_MAX;
> > + cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
>
> Looks like that avg.last_update_time of se (group entity), it's gcfs_rq
> and cfs_rq is always the same in update_tg_cfs_[util\|runnable].
>
> So that means the PELT windows are aligned for cfs_rqs and group se's?

We want to align util_avg with util_sum and period_contrib otherwise
we might have some unalignment. It's quite similarly to what is done
in attach_entity_load_avg()

>
> And if we want to enforce this for cfs_rq and task, we have
> sync_entity_load_avg().

It's not a matter of syncing the last_update_time

>
> [...]

2020-04-23 19:44:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/pelt: sync util/runnable_sum with PELT window when propagating

On Wed, Apr 22, 2020 at 05:14:01PM +0200, Vincent Guittot wrote:
> update_tg_cfs_util/runnable() propagate the impact of the attach/detach of
> an entity down into the cfs_rq hierarchy which must keep the sync with
> the current pelt window.
>
> Even if we can't sync child rq and its group se, we can sync the group se
> and parent cfs_rq with current PELT window. In fact, we must keep them sync
> in order to stay also synced with others se and group se that are already
> attached to the cfs_rq.
>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
> kernel/sched/fair.c | 26 ++++++--------------------
> 1 file changed, 6 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02f323b85b6d..ca6aa89c88f2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3441,52 +3441,38 @@ static inline void
> update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
> {
> long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
> + u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;
>
> /* Nothing to update */
> if (!delta)
> return;
>
> - /*
> - * The relation between sum and avg is:
> - *
> - * LOAD_AVG_MAX - 1024 + sa->period_contrib
> - *
> - * however, the PELT windows are not aligned between grq and gse.
> - */

Instead of deleting this, could we perhaps extend it?

2020-04-24 07:39:37

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/pelt: sync util/runnable_sum with PELT window when propagating

On Thu, 23 Apr 2020 at 21:29, Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Apr 22, 2020 at 05:14:01PM +0200, Vincent Guittot wrote:
> > update_tg_cfs_util/runnable() propagate the impact of the attach/detach of
> > an entity down into the cfs_rq hierarchy which must keep the sync with
> > the current pelt window.
> >
> > Even if we can't sync child rq and its group se, we can sync the group se
> > and parent cfs_rq with current PELT window. In fact, we must keep them sync
> > in order to stay also synced with others se and group se that are already
> > attached to the cfs_rq.
> >
> > Signed-off-by: Vincent Guittot <[email protected]>
> > ---
> > kernel/sched/fair.c | 26 ++++++--------------------
> > 1 file changed, 6 insertions(+), 20 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 02f323b85b6d..ca6aa89c88f2 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3441,52 +3441,38 @@ static inline void
> > update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
> > {
> > long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
> > + u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;
> >
> > /* Nothing to update */
> > if (!delta)
> > return;
> >
> > - /*
> > - * The relation between sum and avg is:
> > - *
> > - * LOAD_AVG_MAX - 1024 + sa->period_contrib
> > - *
> > - * however, the PELT windows are not aligned between grq and gse.
> > - */
>
> Instead of deleting this, could we perhaps extend it?

In fact, this is not the only place in fair.c that uses this rule to
align _avg and _sum but others don't have any special comment.

I can add a more detailed description of this relation for
___update_load_avg() in pelt.c and make a ref to this in all places in
fair.c that use this rule which are :
- update_tg_cfs_util
- update_tg_cfs_runnable
- update_cfs_rq_load_avg
- attach_entity_load_avg
- reweight_entity

2020-04-24 08:44:22

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/pelt: sync util/runnable_sum with PELT window when propagating

On 24/04/2020 09:37, Vincent Guittot wrote:
> On Thu, 23 Apr 2020 at 21:29, Peter Zijlstra <[email protected]> wrote:
>>
>> On Wed, Apr 22, 2020 at 05:14:01PM +0200, Vincent Guittot wrote:
>>> update_tg_cfs_util/runnable() propagate the impact of the attach/detach of
>>> an entity down into the cfs_rq hierarchy which must keep the sync with
>>> the current pelt window.
>>>
>>> Even if we can't sync child rq and its group se, we can sync the group se
>>> and parent cfs_rq with current PELT window. In fact, we must keep them sync
>>> in order to stay also synced with others se and group se that are already
>>> attached to the cfs_rq.
>>>
>>> Signed-off-by: Vincent Guittot <[email protected]>
>>> ---
>>> kernel/sched/fair.c | 26 ++++++--------------------
>>> 1 file changed, 6 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 02f323b85b6d..ca6aa89c88f2 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -3441,52 +3441,38 @@ static inline void
>>> update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
>>> {
>>> long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
>>> + u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;
>>>
>>> /* Nothing to update */
>>> if (!delta)
>>> return;
>>>
>>> - /*
>>> - * The relation between sum and avg is:
>>> - *
>>> - * LOAD_AVG_MAX - 1024 + sa->period_contrib
>>> - *
>>> - * however, the PELT windows are not aligned between grq and gse.
>>> - */
>>
>> Instead of deleting this, could we perhaps extend it?
>
> In fact, this is not the only place in fair.c that uses this rule to
> align _avg and _sum but others don't have any special comment.
>
> I can add a more detailed description of this relation for
> ___update_load_avg() in pelt.c and make a ref to this in all places in
> fair.c that use this rule which are :
> - update_tg_cfs_util
> - update_tg_cfs_runnable
> - update_cfs_rq_load_avg
> - attach_entity_load_avg
> - reweight_entity

But IMHO the

"* however, the PELT windows are not aligned between grq and gse."

should only apply to update_tg_cfs_util() and update_tg_cfs_runnable().
And attach_entity_load_avg() (for cfs_rq and se).

They seem to be special since we derive divider from a cfs_rq PELT value
and use it for a se PELT value.

I assume this fact is specifically worth highlighting with a comment. I
mean the fact we can do this because the decay windows are actually aligned.

2020-04-24 08:57:14

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/pelt: sync util/runnable_sum with PELT window when propagating

On Fri, 24 Apr 2020 at 10:41, Dietmar Eggemann <[email protected]> wrote:
>
> On 24/04/2020 09:37, Vincent Guittot wrote:
> > On Thu, 23 Apr 2020 at 21:29, Peter Zijlstra <[email protected]> wrote:
> >>
> >> On Wed, Apr 22, 2020 at 05:14:01PM +0200, Vincent Guittot wrote:
> >>> update_tg_cfs_util/runnable() propagate the impact of the attach/detach of
> >>> an entity down into the cfs_rq hierarchy which must keep the sync with
> >>> the current pelt window.
> >>>
> >>> Even if we can't sync child rq and its group se, we can sync the group se
> >>> and parent cfs_rq with current PELT window. In fact, we must keep them sync
> >>> in order to stay also synced with others se and group se that are already
> >>> attached to the cfs_rq.
> >>>
> >>> Signed-off-by: Vincent Guittot <[email protected]>
> >>> ---
> >>> kernel/sched/fair.c | 26 ++++++--------------------
> >>> 1 file changed, 6 insertions(+), 20 deletions(-)
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index 02f323b85b6d..ca6aa89c88f2 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -3441,52 +3441,38 @@ static inline void
> >>> update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
> >>> {
> >>> long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
> >>> + u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;
> >>>
> >>> /* Nothing to update */
> >>> if (!delta)
> >>> return;
> >>>
> >>> - /*
> >>> - * The relation between sum and avg is:
> >>> - *
> >>> - * LOAD_AVG_MAX - 1024 + sa->period_contrib
> >>> - *
> >>> - * however, the PELT windows are not aligned between grq and gse.
> >>> - */
> >>
> >> Instead of deleting this, could we perhaps extend it?
> >
> > In fact, this is not the only place in fair.c that uses this rule to
> > align _avg and _sum but others don't have any special comment.
> >
> > I can add a more detailed description of this relation for
> > ___update_load_avg() in pelt.c and make a ref to this in all places in
> > fair.c that use this rule which are :
> > - update_tg_cfs_util
> > - update_tg_cfs_runnable
> > - update_cfs_rq_load_avg
> > - attach_entity_load_avg
> > - reweight_entity
>
> But IMHO the
>
> "* however, the PELT windows are not aligned between grq and gse."
>
> should only apply to update_tg_cfs_util() and update_tg_cfs_runnable().
> And attach_entity_load_avg() (for cfs_rq and se).
>
> They seem to be special since we derive divider from a cfs_rq PELT value
> and use it for a se PELT value.

hmmm... There is nothing special here.

When se is attached to cfs_rq, they both have the same divider because
they use the same clock.

>
> I assume this fact is specifically worth highlighting with a comment. I
> mean the fact we can do this because the decay windows are actually aligned.

2020-04-24 09:09:17

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/pelt: sync util/runnable_sum with PELT window when propagating



On 24/04/2020 10:54, Vincent Guittot wrote:
> On Fri, 24 Apr 2020 at 10:41, Dietmar Eggemann <[email protected]> wrote:
>>
>> On 24/04/2020 09:37, Vincent Guittot wrote:
>>> On Thu, 23 Apr 2020 at 21:29, Peter Zijlstra <[email protected]> wrote:
>>>>
>>>> On Wed, Apr 22, 2020 at 05:14:01PM +0200, Vincent Guittot wrote:

[...]

>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index 02f323b85b6d..ca6aa89c88f2 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -3441,52 +3441,38 @@ static inline void
>>>>> update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
>>>>> {
>>>>> long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
>>>>> + u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;
>>>>>
>>>>> /* Nothing to update */
>>>>> if (!delta)
>>>>> return;
>>>>>
>>>>> - /*
>>>>> - * The relation between sum and avg is:
>>>>> - *
>>>>> - * LOAD_AVG_MAX - 1024 + sa->period_contrib
>>>>> - *
>>>>> - * however, the PELT windows are not aligned between grq and gse.
>>>>> - */
>>>>
>>>> Instead of deleting this, could we perhaps extend it?
>>>
>>> In fact, this is not the only place in fair.c that uses this rule to
>>> align _avg and _sum but others don't have any special comment.
>>>
>>> I can add a more detailed description of this relation for
>>> ___update_load_avg() in pelt.c and make a ref to this in all places in
>>> fair.c that use this rule which are :
>>> - update_tg_cfs_util
>>> - update_tg_cfs_runnable
>>> - update_cfs_rq_load_avg
>>> - attach_entity_load_avg
>>> - reweight_entity
>>
>> But IMHO the
>>
>> "* however, the PELT windows are not aligned between grq and gse."
>>
>> should only apply to update_tg_cfs_util() and update_tg_cfs_runnable().
>> And attach_entity_load_avg() (for cfs_rq and se).
>>
>> They seem to be special since we derive divider from a cfs_rq PELT value
>> and use it for a se PELT value.
>
> hmmm... There is nothing special here.
>
> When se is attached to cfs_rq, they both have the same divider because
> they use the same clock.

That's true.

But exactly this might deserve this comment. Otherwise people might
wonder why you can do a

u32 divider = LOAD_AVG_MAX - 1024 + *cfs_rq*->avg.period_contrib;

and use it for instance in:

*se*->avg.util_sum = se->avg.util_avg * divider;

In update_cfs_rq_load_avg() and reweight_entity() we derive 'divider'
from the same 'sched_avg' we use it on later.

u32 divider = LOAD_AVG_MAX - 1024 + sa->period_contrib;

sub_positive(&sa->load_sum, r * divider);

[...]

2020-04-24 12:12:28

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/pelt: sync util/runnable_sum with PELT window when propagating

On 23/04/2020 18:17, Vincent Guittot wrote:
> On Thu, 23 Apr 2020 at 16:30, Dietmar Eggemann <[email protected]> wrote:
>>
>> On 22/04/2020 17:14, Vincent Guittot wrote:

[..]

>> gcfs --> tg --> gse
>> ________________|
>> |
>> V
>>
>> cfs ---> tg (root)
>>
>> |
>> V
>>
>> rq
>>
>
> child cfs_rq aka gcfs_rq
> |
> gse: group entity that represents child cfs_rq in parent cfs_rq
> |
> v
> parent cfs_rq aka cfs_rq

OK, I see. Maybe it's clearer to refer to child cfs_rq as gcfs_rq in
this context.

[...]

>>> /* Set new sched_entity's utilization */
>>> se->avg.util_avg = gcfs_rq->avg.util_avg;
>>> - se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX;
>>> + se->avg.util_sum = se->avg.util_avg * divider;
>>
>> divider uses cfs_rq but we sync se->avg.util_avg with gcfs_rq here.
>
> we sync the util_avg of gse with the new util_avg of gcfs_rq but gse
> is attached to cfs_rq and as a result we have to use cfs_rq's
> period_contrib

I agree.

But the decay windows (avg.last_update_time, avg.period_contrib) of
cfs_rq and gcfs_rq are not always aligned, I guess?

I see they are not after the online_fair_sched_group() ->
attach_entity_cfs_rq() but later the are in sync as well.

I ran a couple of different rt-app taskgroup tests and try to

BUG_ON(se->avg.period_contrib != gcfs_rq->avg.period_contrib);
BUG_ON(se->avg.last_update_time != gcfs_rq->avg.last_update_time)

in update_tg_cfs_util() but they didn't trigger so far.

Both, cfs_rq and gcfs_rq are in sync in update_tg_cfs_util() before and
during a task runs in gcfs_rq.

Are there cases where this wouldn't necessary happen in
update_tg_cfs_util(), maybe a more complicated testcase?

>> But since avg.period_contrib of cfs_rq and gcfs_rq are the same this
>> should work.
>>
>>> /* Update parent cfs_rq utilization */
>>> add_positive(&cfs_rq->avg.util_avg, delta);
>>> - cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * LOAD_AVG_MAX;
>>> + cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
>>
>> Looks like that avg.last_update_time of se (group entity), it's gcfs_rq
>> and cfs_rq is always the same in update_tg_cfs_[util\|runnable].
>>
>> So that means the PELT windows are aligned for cfs_rqs and group se's?
>
> We want to align util_avg with util_sum and period_contrib otherwise
> we might have some unalignment. It's quite similarly to what is done
> in attach_entity_load_avg()

I agree.

>> And if we want to enforce this for cfs_rq and task, we have
>> sync_entity_load_avg().
>
> It's not a matter of syncing the last_update_time

I agree, this is not what you want to achieve here.
But syncing 'last_update_time' and 'period_contrib' is what I understand
as aligning the decay window (like in attach_entity_load_avg()).

2020-04-24 12:51:02

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/pelt: sync util/runnable_sum with PELT window when propagating

On Fri, 24 Apr 2020 at 14:07, Dietmar Eggemann <[email protected]> wrote:
>
> On 23/04/2020 18:17, Vincent Guittot wrote:
> > On Thu, 23 Apr 2020 at 16:30, Dietmar Eggemann <[email protected]> wrote:
> >>
> >> On 22/04/2020 17:14, Vincent Guittot wrote:
>
> [..]
>
> >> gcfs --> tg --> gse
> >> ________________|
> >> |
> >> V
> >>
> >> cfs ---> tg (root)
> >>
> >> |
> >> V
> >>
> >> rq
> >>
> >
> > child cfs_rq aka gcfs_rq
> > |
> > gse: group entity that represents child cfs_rq in parent cfs_rq
> > |
> > v
> > parent cfs_rq aka cfs_rq
>
> OK, I see. Maybe it's clearer to refer to child cfs_rq as gcfs_rq in
> this context.
>
> [...]
>
> >>> /* Set new sched_entity's utilization */
> >>> se->avg.util_avg = gcfs_rq->avg.util_avg;
> >>> - se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX;
> >>> + se->avg.util_sum = se->avg.util_avg * divider;
> >>
> >> divider uses cfs_rq but we sync se->avg.util_avg with gcfs_rq here.
> >
> > we sync the util_avg of gse with the new util_avg of gcfs_rq but gse
> > is attached to cfs_rq and as a result we have to use cfs_rq's
> > period_contrib
>
> I agree.
>
> But the decay windows (avg.last_update_time, avg.period_contrib) of
> cfs_rq and gcfs_rq are not always aligned, I guess?
>
> I see they are not after the online_fair_sched_group() ->
> attach_entity_cfs_rq() but later the are in sync as well.

cfs_rq and gcfs_rq use 2 different clocks:
- cfs_rq_clock_pelt(cfs_rq)
- cfs_rq_clock_pelt(gcfs_rq)

And they can be different in some cases like cfs bandwidth

>
> I ran a couple of different rt-app taskgroup tests and try to
>
> BUG_ON(se->avg.period_contrib != gcfs_rq->avg.period_contrib);
> BUG_ON(se->avg.last_update_time != gcfs_rq->avg.last_update_time)
>
> in update_tg_cfs_util() but they didn't trigger so far.
>
> Both, cfs_rq and gcfs_rq are in sync in update_tg_cfs_util() before and
> during a task runs in gcfs_rq.
>
> Are there cases where this wouldn't necessary happen in
> update_tg_cfs_util(), maybe a more complicated testcase?
>
> >> But since avg.period_contrib of cfs_rq and gcfs_rq are the same this
> >> should work.
> >>
> >>> /* Update parent cfs_rq utilization */
> >>> add_positive(&cfs_rq->avg.util_avg, delta);
> >>> - cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * LOAD_AVG_MAX;
> >>> + cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
> >>
> >> Looks like that avg.last_update_time of se (group entity), it's gcfs_rq
> >> and cfs_rq is always the same in update_tg_cfs_[util\|runnable].
> >>
> >> So that means the PELT windows are aligned for cfs_rqs and group se's?
> >
> > We want to align util_avg with util_sum and period_contrib otherwise
> > we might have some unalignment. It's quite similarly to what is done
> > in attach_entity_load_avg()
>
> I agree.
>
> >> And if we want to enforce this for cfs_rq and task, we have
> >> sync_entity_load_avg().
> >
> > It's not a matter of syncing the last_update_time
>
> I agree, this is not what you want to achieve here.
> But syncing 'last_update_time' and 'period_contrib' is what I understand
> as aligning the decay window (like in attach_entity_load_avg()).