2014-01-21 16:13:04

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH] sched: fix sched_entity avg statistics update

With the current implementation, the load average statistics of a sched entity
change according to other activity on the CPU even if this activity is done
between the running window of the sched entity and have no influence on the
running duration of the task.

When a task wakes up on the same CPU, we currently update last_runnable_update
with the return of __synchronize_entity_decay without updating the
runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
the load_contrib of the se with the rq's blocked_load_contrib before removing
it from the latter (with __synchronize_entity_decay) but we must keep
last_runnable_update unchanged for updating runnable_avg_sum/period during the
next update_entity_load_avg.

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e64b079..5b0ef90 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2370,8 +2370,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
* would have made count negative); we must be careful to avoid
* double-accounting blocked time after synchronizing decays.
*/
- se->avg.last_runnable_update += __synchronize_entity_decay(se)
- << 20;
+ __synchronize_entity_decay(se);
}

/* migrated tasks did not contribute to our blocked load */
--
1.7.9.5


2014-01-21 18:46:16

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH] sched: fix sched_entity avg statistics update

Vincent Guittot <[email protected]> writes:

> With the current implementation, the load average statistics of a sched entity
> change according to other activity on the CPU even if this activity is done
> between the running window of the sched entity and have no influence on the
> running duration of the task.
>
> When a task wakes up on the same CPU, we currently update last_runnable_update
> with the return of __synchronize_entity_decay without updating the
> runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
> the load_contrib of the se with the rq's blocked_load_contrib before removing
> it from the latter (with __synchronize_entity_decay) but we must keep
> last_runnable_update unchanged for updating runnable_avg_sum/period during the
> next update_entity_load_avg.

... Gah, that's correct, we had this right the first time. Could you do
this as a full revert of 282cf499f03ec1754b6c8c945c9674b02631fb0f (ie
remove the now inaccurate comment, or maybe replace it with a correct one).
>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
> kernel/sched/fair.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e64b079..5b0ef90 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2370,8 +2370,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
> * would have made count negative); we must be careful to avoid
> * double-accounting blocked time after synchronizing decays.
> */
> - se->avg.last_runnable_update += __synchronize_entity_decay(se)
> - << 20;
> + __synchronize_entity_decay(se);
> }
>
> /* migrated tasks did not contribute to our blocked load */

2014-01-21 20:06:48

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched: fix sched_entity avg statistics update

On 21 January 2014 19:38, <[email protected]> wrote:
> Vincent Guittot <[email protected]> writes:
>
>> With the current implementation, the load average statistics of a sched entity
>> change according to other activity on the CPU even if this activity is done
>> between the running window of the sched entity and have no influence on the
>> running duration of the task.
>>
>> When a task wakes up on the same CPU, we currently update last_runnable_update
>> with the return of __synchronize_entity_decay without updating the
>> runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
>> the load_contrib of the se with the rq's blocked_load_contrib before removing
>> it from the latter (with __synchronize_entity_decay) but we must keep
>> last_runnable_update unchanged for updating runnable_avg_sum/period during the
>> next update_entity_load_avg.
>
> ... Gah, that's correct, we had this right the first time. Could you do
> this as a full revert of 282cf499f03ec1754b6c8c945c9674b02631fb0f (ie
> remove the now inaccurate comment, or maybe replace it with a correct one).

Ok i'm going to remove comment as well and replace it with a new description

Vincent

>>
>> Signed-off-by: Vincent Guittot <[email protected]>
>> ---
>> kernel/sched/fair.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index e64b079..5b0ef90 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2370,8 +2370,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
>> * would have made count negative); we must be careful to avoid
>> * double-accounting blocked time after synchronizing decays.
>> */
>> - se->avg.last_runnable_update += __synchronize_entity_decay(se)
>> - << 20;
>> + __synchronize_entity_decay(se);
>> }
>>
>> /* migrated tasks did not contribute to our blocked load */

2014-01-21 20:31:53

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH] sched: fix sched_entity avg statistics update

On Tue, Jan 21, 2014 at 12:00 PM, Vincent Guittot
<[email protected]> wrote:
>
> Le 21 janv. 2014 19:39, <[email protected]> a écrit :
>
>
>>
>> Vincent Guittot <[email protected]> writes:
>>
>> > With the current implementation, the load average statistics of a sched
>> > entity
>> > change according to other activity on the CPU even if this activity is
>> > done
>> > between the running window of the sched entity and have no influence on
>> > the
>> > running duration of the task.
>> >
>> > When a task wakes up on the same CPU, we currently update
>> > last_runnable_update
>> > with the return of __synchronize_entity_decay without updating the
>> > runnable_avg_sum and runnable_avg_period accordingly. In fact, we have
>> > to sync
>> > the load_contrib of the se with the rq's blocked_load_contrib before
>> > removing
>> > it from the latter (with __synchronize_entity_decay) but we must keep
>> > last_runnable_update unchanged for updating runnable_avg_sum/period
>> > during the
>> > next update_entity_load_avg.
>>
>> ... Gah, that's correct, we had this right the first time. Could you do
>> this as a full revert of 282cf499f03ec1754b6c8c945c9674b02631fb0f (ie
>> remove the now inaccurate comment, or maybe replace it with a correct
>> one).
>
> Ok i'm going to remove comment as well and replace it with a new description
>

I think I need to go through and do a comments patch like we did with
the wake-affine math; it's too easy to make a finicky mistake like
this when not touching this path for a while.

OK, so there are two numerical components we're juggling here:

1) The actual quotient for the current runnable average, stored as
(runnable_avg_sum / runnable_avg period). Last updated at
last_runnable_update.
2) The last time we computed the quotient in (1) and accumulated it in
within cfs_rq->{runnable, blocked}_load_avg, this is stored in
load_avg_contrib. We track the passage of off-rq time and migrations
against this value using decay_count.
[ All of the values above are stored on se / se->avg ]

When we are re-enqueuing something and we wish to remove its
contribution from blocked_load_avg, we must update load_avg_contrib in
(2) using the total time it spent off rq (using a jiffy rounded
approximation in decay_count). However, Alex's patch (which this
reverts) also adjusted the quotient by modifying its last update time
so as to make it look up-to-date, effectively skipping the most recent
idle span.

I think we could make the connection between (1) and (2) more explicit
if we moved the subsequent "if (wakeup) logic" within the else. We
can then have a comment that refers to (1) and (2) explicitly, perhaps
something like:

Task re-woke on same cpu (or else migrate_task_rq_fair() would have
made count negative). Perform an approximate decay on
load_avg_contrib to match blocked_load_avg, and compute a precise
runnable_avg_sum quotient update that will be accumulated into
runnable_load_avg below.


>
>> >
>> > Signed-off-by: Vincent Guittot <[email protected]>
>> > ---
>> > kernel/sched/fair.c | 3 +--
>> > 1 file changed, 1 insertion(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index e64b079..5b0ef90 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -2370,8 +2370,7 @@ static inline void enqueue_entity_load_avg(struct
>> > cfs_rq *cfs_rq,
>> > * would have made count negative); we must be careful to
>> > avoid
>> > * double-accounting blocked time after synchronizing
>> > decays.
>> > */
>> > - se->avg.last_runnable_update +=
>> > __synchronize_entity_decay(se)
>> > - << 20;
>> > + __synchronize_entity_decay(se);
>> > }
>> >
>> > /* migrated tasks did not contribute to our blocked load */

2014-01-21 20:45:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: fix sched_entity avg statistics update

On Tue, Jan 21, 2014 at 12:31:18PM -0800, Paul Turner wrote:
> I think I need to go through and do a comments patch like we did with
> the wake-affine math; it's too easy to make a finicky mistake like
> this when not touching this path for a while.

If you're going to do that, please consider fixing the XXX on line 4783:

* [XXX write more on how we solve this.. _after_ merging pjt's patches that
* rewrite all of this once again.]

2014-01-22 07:46:46

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH] Revert "sched: Fix sleep time double accounting in enqueue entity"

This reverts commit 282cf499f03ec1754b6c8c945c9674b02631fb0f.

With the current implementation, the load average statistics of a sched entity
change according to other activity on the CPU even if this activity is done
between the running window of the sched entity and have no influence on the
running duration of the task.

When a task wakes up on the same CPU, we currently update last_runnable_update
with the return of __synchronize_entity_decay without updating the
runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
the load_contrib of the se with the rq's blocked_load_contrib before removing
it from the latter (with __synchronize_entity_decay) but we must keep
last_runnable_update unchanged for updating runnable_avg_sum/period during the
next update_entity_load_avg.

Signed-off-by: Vincent Guittot <[email protected]>

---
kernel/sched/fair.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e64b079..6d61f20 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2365,13 +2365,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
}
wakeup = 0;
} else {
- /*
- * Task re-woke on same cpu (or else migrate_task_rq_fair()
- * would have made count negative); we must be careful to avoid
- * double-accounting blocked time after synchronizing decays.
- */
- se->avg.last_runnable_update += __synchronize_entity_decay(se)
- << 20;
+ __synchronize_entity_decay(se);
}

/* migrated tasks did not contribute to our blocked load */
--
1.7.9.5

2014-01-22 07:50:34

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] Revert "sched: Fix sleep time double accounting in enqueue entity"

Paul,

I let you send a patch that will add comment and move the "if (wakeup) logic" ?

Regards
Vincent

On 22 January 2014 08:45, Vincent Guittot <[email protected]> wrote:
> This reverts commit 282cf499f03ec1754b6c8c945c9674b02631fb0f.
>
> With the current implementation, the load average statistics of a sched entity
> change according to other activity on the CPU even if this activity is done
> between the running window of the sched entity and have no influence on the
> running duration of the task.
>
> When a task wakes up on the same CPU, we currently update last_runnable_update
> with the return of __synchronize_entity_decay without updating the
> runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
> the load_contrib of the se with the rq's blocked_load_contrib before removing
> it from the latter (with __synchronize_entity_decay) but we must keep
> last_runnable_update unchanged for updating runnable_avg_sum/period during the
> next update_entity_load_avg.
>
> Signed-off-by: Vincent Guittot <[email protected]>
>
> ---
> kernel/sched/fair.c | 8 +-------
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e64b079..6d61f20 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2365,13 +2365,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
> }
> wakeup = 0;
> } else {
> - /*
> - * Task re-woke on same cpu (or else migrate_task_rq_fair()
> - * would have made count negative); we must be careful to avoid
> - * double-accounting blocked time after synchronizing decays.
> - */
> - se->avg.last_runnable_update += __synchronize_entity_decay(se)
> - << 20;
> + __synchronize_entity_decay(se);
> }
>
> /* migrated tasks did not contribute to our blocked load */
> --
> 1.7.9.5
>

2014-01-22 10:10:23

by Chris Redpath

[permalink] [raw]
Subject: Re: [PATCH] sched: fix sched_entity avg statistics update

On 21/01/14 16:12, Vincent Guittot wrote:
> With the current implementation, the load average statistics of a sched entity
> change according to other activity on the CPU even if this activity is done
> between the running window of the sched entity and have no influence on the
> running duration of the task.
>
> When a task wakes up on the same CPU, we currently update last_runnable_update
> with the return of __synchronize_entity_decay without updating the
> runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
> the load_contrib of the se with the rq's blocked_load_contrib before removing
> it from the latter (with __synchronize_entity_decay) but we must keep
> last_runnable_update unchanged for updating runnable_avg_sum/period during the
> next update_entity_load_avg.
>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
> kernel/sched/fair.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e64b079..5b0ef90 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2370,8 +2370,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
> * would have made count negative); we must be careful to avoid
> * double-accounting blocked time after synchronizing decays.
> */
> - se->avg.last_runnable_update += __synchronize_entity_decay(se)
> - << 20;
> + __synchronize_entity_decay(se);
> }
>
> /* migrated tasks did not contribute to our blocked load */
>

I've noticed this problem too. It becomes more apparent if you are
closely inspecting load signals and comparing against ideal signals
generated from task runtime traces. IMO it should be fixed.

2014-01-22 17:53:45

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH] Revert "sched: Fix sleep time double accounting in enqueue entity"

Vincent Guittot <[email protected]> writes:

> This reverts commit 282cf499f03ec1754b6c8c945c9674b02631fb0f.
>
> With the current implementation, the load average statistics of a sched entity
> change according to other activity on the CPU even if this activity is done
> between the running window of the sched entity and have no influence on the
> running duration of the task.
>
> When a task wakes up on the same CPU, we currently update last_runnable_update
> with the return of __synchronize_entity_decay without updating the
> runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
> the load_contrib of the se with the rq's blocked_load_contrib before removing
> it from the latter (with __synchronize_entity_decay) but we must keep
> last_runnable_update unchanged for updating runnable_avg_sum/period during the
> next update_entity_load_avg.
>
> Signed-off-by: Vincent Guittot <[email protected]>
Unless paul wants to squash this into a possible change to the if
(wakeup) stuff:
Reviewed-by: Ben Segall <[email protected]>

>
> ---
> kernel/sched/fair.c | 8 +-------
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e64b079..6d61f20 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2365,13 +2365,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
> }
> wakeup = 0;
> } else {
> - /*
> - * Task re-woke on same cpu (or else migrate_task_rq_fair()
> - * would have made count negative); we must be careful to avoid
> - * double-accounting blocked time after synchronizing decays.
> - */
> - se->avg.last_runnable_update += __synchronize_entity_decay(se)
> - << 20;
> + __synchronize_entity_decay(se);
> }
>
> /* migrated tasks did not contribute to our blocked load */

2014-01-22 19:55:13

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH] Revert "sched: Fix sleep time double accounting in enqueue entity"

On Wed, Jan 22, 2014 at 9:53 AM, <[email protected]> wrote:
> Vincent Guittot <[email protected]> writes:
>
>> This reverts commit 282cf499f03ec1754b6c8c945c9674b02631fb0f.
>>
>> With the current implementation, the load average statistics of a sched entity
>> change according to other activity on the CPU even if this activity is done
>> between the running window of the sched entity and have no influence on the
>> running duration of the task.
>>
>> When a task wakes up on the same CPU, we currently update last_runnable_update
>> with the return of __synchronize_entity_decay without updating the
>> runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
>> the load_contrib of the se with the rq's blocked_load_contrib before removing
>> it from the latter (with __synchronize_entity_decay) but we must keep
>> last_runnable_update unchanged for updating runnable_avg_sum/period during the
>> next update_entity_load_avg.
>>
>> Signed-off-by: Vincent Guittot <[email protected]>
> Unless paul wants to squash this into a possible change to the if
> (wakeup) stuff:
> Reviewed-by: Ben Segall <[email protected]>
>

I can send that separately as Vincent suggests. This is good to go.

>>
>> ---
>> kernel/sched/fair.c | 8 +-------
>> 1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index e64b079..6d61f20 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2365,13 +2365,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
>> }
>> wakeup = 0;
>> } else {
>> - /*
>> - * Task re-woke on same cpu (or else migrate_task_rq_fair()
>> - * would have made count negative); we must be careful to avoid
>> - * double-accounting blocked time after synchronizing decays.
>> - */
>> - se->avg.last_runnable_update += __synchronize_entity_decay(se)
>> - << 20;
>> + __synchronize_entity_decay(se);
>> }
>>
>> /* migrated tasks did not contribute to our blocked load */

Subject: [tip:sched/urgent] Revert "sched: Fix sleep time double accounting in enqueue entity"

Commit-ID: 9390675af0835ae1d654d33bfcf16096028550ad
Gitweb: http://git.kernel.org/tip/9390675af0835ae1d654d33bfcf16096028550ad
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 22 Jan 2014 08:45:34 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 23 Jan 2014 14:48:34 +0100

Revert "sched: Fix sleep time double accounting in enqueue entity"

This reverts commit 282cf499f03ec1754b6c8c945c9674b02631fb0f.

With the current implementation, the load average statistics of a sched entity
change according to other activity on the CPU even if this activity is done
between the running window of the sched entity and have no influence on the
running duration of the task.

When a task wakes up on the same CPU, we currently update last_runnable_update
with the return of __synchronize_entity_decay without updating the
runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
the load_contrib of the se with the rq's blocked_load_contrib before removing
it from the latter (with __synchronize_entity_decay) but we must keep
last_runnable_update unchanged for updating runnable_avg_sum/period during the
next update_entity_load_avg.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Ben Segall <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b24b6cf..efe6457 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2356,13 +2356,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
}
wakeup = 0;
} else {
- /*
- * Task re-woke on same cpu (or else migrate_task_rq_fair()
- * would have made count negative); we must be careful to avoid
- * double-accounting blocked time after synchronizing decays.
- */
- se->avg.last_runnable_update += __synchronize_entity_decay(se)
- << 20;
+ __synchronize_entity_decay(se);
}

/* migrated tasks did not contribute to our blocked load */