LinuxLists.cc - [PATCH] sched: prevent sched entity from being decayed twice when both waking and migrating it

2015-07-16 08:12:09

Subject: [PATCH] sched: prevent sched entity from being decayed twice when both waking and migrating it

From: Byungchul Park <[email protected]>

hello paul,

can i ask you something?

when a sched entity is both waken and migrated, it looks being decayed twice.
did you do it on purpose?
or am i missing something? :(

thanks,
byungchul

--------------->8---------------
>From 793c963d0b29977a0f6f9330291a9ea469cc54f0 Mon Sep 17 00:00:00 2001
From: Byungchul Park <[email protected]>
Date: Thu, 16 Jul 2015 16:49:48 +0900
Subject: [PATCH] sched: prevent sched entity from being decayed twice when
both waking and migrating it

current code is decaying load average variables with a sleep time twice,
when both waking and migrating it. the first decaying happens in a call path
"migrate_task_rq_fair() -> __synchronize_entity_decay()". the second
decaying happens in a call path "enqueue_entity_load_avg() ->
update_entity_load_avg()". so make it happen once.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/sched/fair.c | 29 +++--------------------------
1 file changed, 3 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 09456fc..c86cca0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2873,32 +2873,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
struct sched_entity *se,
int wakeup)
{
- /*
- * We track migrations using entity decay_count <= 0, on a wake-up
- * migration we use a negative decay count to track the remote decays
- * accumulated while sleeping.
- *
- * Newly forked tasks are enqueued with se->avg.decay_count == 0, they
- * are seen by enqueue_entity_load_avg() as a migration with an already
- * constructed load_avg_contrib.
- */
- if (unlikely(se->avg.decay_count <= 0)) {
+ /* we track migrations using entity decay_count == 0 */
+ if (unlikely(!se->avg.decay_count)) {
se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
- if (se->avg.decay_count) {
- /*
- * In a wake-up migration we have to approximate the
- * time sleeping. This is because we can't synchronize
- * clock_task between the two cpus, and it is not
- * guaranteed to be read-safe. Instead, we can
- * approximate this using our carried decays, which are
- * explicitly atomically readable.
- */
- se->avg.last_runnable_update -= (-se->avg.decay_count)
- << 20;
- update_entity_load_avg(se, 0);
- /* Indicate that we're now synchronized and on-rq */
- se->avg.decay_count = 0;
- }
wakeup = 0;
} else {
__synchronize_entity_decay(se);
@@ -5114,7 +5091,7 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
* be negative here since on-rq tasks have decay-count == 0.
*/
if (se->avg.decay_count) {
- se->avg.decay_count = -__synchronize_entity_decay(se);
+ __synchronize_entity_decay(se);
atomic_long_add(se->avg.load_avg_contrib,
&cfs_rq->removed_load);
}
--
1.7.9.5

2015-07-16 17:00:15

by Benjamin Segall

[permalink] [raw]

Subject: Re: [PATCH] sched: prevent sched entity from being decayed twice when both waking and migrating it

[email protected] writes:

> From: Byungchul Park <[email protected]>
>
> hello paul,
>
> can i ask you something?
>
> when a sched entity is both waken and migrated, it looks being decayed twice.
> did you do it on purpose?
> or am i missing something? :(
>
> thanks,
> byungchul

__synchronize_entity_decay() updates only se->avg.load_avg_contrib so
that removing from blocked_load is done correctly.
update_entity_load_avg() accounts that (approximation of) time blocked
against runnable_avg/running_avg (and then recomputes load_avg_contrib
to match while load_avg_contrib isn't part of any cfs_rq's sum).

>
> --------------->8---------------
> From 793c963d0b29977a0f6f9330291a9ea469cc54f0 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <[email protected]>
> Date: Thu, 16 Jul 2015 16:49:48 +0900
> Subject: [PATCH] sched: prevent sched entity from being decayed twice when
> both waking and migrating it
>
> current code is decaying load average variables with a sleep time twice,
> when both waking and migrating it. the first decaying happens in a call path
> "migrate_task_rq_fair() -> __synchronize_entity_decay()". the second
> decaying happens in a call path "enqueue_entity_load_avg() ->
> update_entity_load_avg()". so make it happen once.
>
> Signed-off-by: Byungchul Park <[email protected]>
> ---
> kernel/sched/fair.c | 29 +++--------------------------
> 1 file changed, 3 insertions(+), 26 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 09456fc..c86cca0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2873,32 +2873,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
> struct sched_entity *se,
> int wakeup)
> {
> - /*
> - * We track migrations using entity decay_count <= 0, on a wake-up
> - * migration we use a negative decay count to track the remote decays
> - * accumulated while sleeping.
> - *
> - * Newly forked tasks are enqueued with se->avg.decay_count == 0, they
> - * are seen by enqueue_entity_load_avg() as a migration with an already
> - * constructed load_avg_contrib.
> - */
> - if (unlikely(se->avg.decay_count <= 0)) {
> + /* we track migrations using entity decay_count == 0 */
> + if (unlikely(!se->avg.decay_count)) {
> se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
> - if (se->avg.decay_count) {
> - /*
> - * In a wake-up migration we have to approximate the
> - * time sleeping. This is because we can't synchronize
> - * clock_task between the two cpus, and it is not
> - * guaranteed to be read-safe. Instead, we can
> - * approximate this using our carried decays, which are
> - * explicitly atomically readable.
> - */
> - se->avg.last_runnable_update -= (-se->avg.decay_count)
> - << 20;
> - update_entity_load_avg(se, 0);
> - /* Indicate that we're now synchronized and on-rq */
> - se->avg.decay_count = 0;
> - }
> wakeup = 0;
> } else {
> __synchronize_entity_decay(se);
> @@ -5114,7 +5091,7 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
> * be negative here since on-rq tasks have decay-count == 0.
> */
> if (se->avg.decay_count) {
> - se->avg.decay_count = -__synchronize_entity_decay(se);
> + __synchronize_entity_decay(se);
> atomic_long_add(se->avg.load_avg_contrib,
> &cfs_rq->removed_load);
> }

2015-07-17 06:19:57

by Byungchul Park

[permalink] [raw]

Subject: Re: [PATCH] sched: prevent sched entity from being decayed twice when both waking and migrating it

On Thu, Jul 16, 2015 at 10:00:00AM -0700, [email protected] wrote:

hello,

> [email protected] writes:
>
> > From: Byungchul Park <[email protected]>
> >
> > hello paul,
> >
> > can i ask you something?
> >
> > when a sched entity is both waken and migrated, it looks being decayed twice.
> > did you do it on purpose?
> > or am i missing something? :(
> >
> > thanks,
> > byungchul
>
> __synchronize_entity_decay() updates only se->avg.load_avg_contrib so
> that removing from blocked_load is done correctly.

as you said, it should done here. :)

> update_entity_load_avg() accounts that (approximation of) time blocked

i mean the entity was already accounted the blocked time in
__synchronize_entity_decay().

> against runnable_avg/running_avg (and then recomputes load_avg_contrib
> to match while load_avg_contrib isn't part of any cfs_rq's sum).

the thing to keep in mind is that, currently load tracking is done by
per-entity. that is, the entity already has its own whole load_avg_contrib
with considering the entity's blocked time, after __synchronize_entity_decay().
and cfs_rq can account the se's load by adding se->avg.load_avg_contrib to
cfs_rq->runnable_load_avg, like enqueue_entity_load_avg() code.

wrong?

thanks,
byungchul

>
> >
> > --------------->8---------------
> > From 793c963d0b29977a0f6f9330291a9ea469cc54f0 Mon Sep 17 00:00:00 2001
> > From: Byungchul Park <[email protected]>
> > Date: Thu, 16 Jul 2015 16:49:48 +0900
> > Subject: [PATCH] sched: prevent sched entity from being decayed twice when
> > both waking and migrating it
> >
> > current code is decaying load average variables with a sleep time twice,
> > when both waking and migrating it. the first decaying happens in a call path
> > "migrate_task_rq_fair() -> __synchronize_entity_decay()". the second
> > decaying happens in a call path "enqueue_entity_load_avg() ->
> > update_entity_load_avg()". so make it happen once.
> >
> > Signed-off-by: Byungchul Park <[email protected]>
> > ---
> > kernel/sched/fair.c | 29 +++--------------------------
> > 1 file changed, 3 insertions(+), 26 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 09456fc..c86cca0 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2873,32 +2873,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
> > struct sched_entity *se,
> > int wakeup)
> > {
> > - /*
> > - * We track migrations using entity decay_count <= 0, on a wake-up
> > - * migration we use a negative decay count to track the remote decays
> > - * accumulated while sleeping.
> > - *
> > - * Newly forked tasks are enqueued with se->avg.decay_count == 0, they
> > - * are seen by enqueue_entity_load_avg() as a migration with an already
> > - * constructed load_avg_contrib.
> > - */
> > - if (unlikely(se->avg.decay_count <= 0)) {
> > + /* we track migrations using entity decay_count == 0 */
> > + if (unlikely(!se->avg.decay_count)) {
> > se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
> > - if (se->avg.decay_count) {
> > - /*
> > - * In a wake-up migration we have to approximate the
> > - * time sleeping. This is because we can't synchronize
> > - * clock_task between the two cpus, and it is not
> > - * guaranteed to be read-safe. Instead, we can
> > - * approximate this using our carried decays, which are
> > - * explicitly atomically readable.
> > - */
> > - se->avg.last_runnable_update -= (-se->avg.decay_count)
> > - << 20;
> > - update_entity_load_avg(se, 0);
> > - /* Indicate that we're now synchronized and on-rq */
> > - se->avg.decay_count = 0;
> > - }
> > wakeup = 0;
> > } else {
> > __synchronize_entity_decay(se);
> > @@ -5114,7 +5091,7 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
> > * be negative here since on-rq tasks have decay-count == 0.
> > */
> > if (se->avg.decay_count) {
> > - se->avg.decay_count = -__synchronize_entity_decay(se);
> > + __synchronize_entity_decay(se);
> > atomic_long_add(se->avg.load_avg_contrib,
> > &cfs_rq->removed_load);
> > }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-07-17 17:02:29

by Benjamin Segall

[permalink] [raw]

Subject: Re: [PATCH] sched: prevent sched entity from being decayed twice when both waking and migrating it

Byungchul Park <[email protected]> writes:

> On Thu, Jul 16, 2015 at 10:00:00AM -0700, [email protected] wrote:
>
> hello,
>
>> [email protected] writes:
>>
>> > From: Byungchul Park <[email protected]>
>> >
>> > hello paul,
>> >
>> > can i ask you something?
>> >
>> > when a sched entity is both waken and migrated, it looks being decayed twice.
>> > did you do it on purpose?
>> > or am i missing something? :(
>> >
>> > thanks,
>> > byungchul
>>
>> __synchronize_entity_decay() updates only se->avg.load_avg_contrib so
>> that removing from blocked_load is done correctly.
>
> as you said, it should done here. :)
>
>> update_entity_load_avg() accounts that (approximation of) time blocked
>
> i mean the entity was already accounted the blocked time in
> __synchronize_entity_decay().
>
>> against runnable_avg/running_avg (and then recomputes load_avg_contrib
>> to match while load_avg_contrib isn't part of any cfs_rq's sum).
>
> the thing to keep in mind is that, currently load tracking is done by
> per-entity. that is, the entity already has its own whole load_avg_contrib
> with considering the entity's blocked time, after __synchronize_entity_decay().
> and cfs_rq can account the se's load by adding se->avg.load_avg_contrib to
> cfs_rq->runnable_load_avg, like enqueue_entity_load_avg() code.
>
> wrong?

load_avg_contrib is computed from runnable_avg, which is not updated by
__synchronize_entity_decay, only by update_entity_load_avg ->
__update_entity_runnable_avg. __synchronize_entity_decay is used in this path
because update_entity_load_avg needs the rq lock (along with some other
reasons), and migrate_task_rq_fair generally doesn't have the lock.

>
> thanks,
> byungchul
>
>>
>> >
>> > --------------->8---------------
>> > From 793c963d0b29977a0f6f9330291a9ea469cc54f0 Mon Sep 17 00:00:00 2001
>> > From: Byungchul Park <[email protected]>
>> > Date: Thu, 16 Jul 2015 16:49:48 +0900
>> > Subject: [PATCH] sched: prevent sched entity from being decayed twice when
>> > both waking and migrating it
>> >
>> > current code is decaying load average variables with a sleep time twice,
>> > when both waking and migrating it. the first decaying happens in a call path
>> > "migrate_task_rq_fair() -> __synchronize_entity_decay()". the second
>> > decaying happens in a call path "enqueue_entity_load_avg() ->
>> > update_entity_load_avg()". so make it happen once.
>> >
>> > Signed-off-by: Byungchul Park <[email protected]>
>> > ---
>> > kernel/sched/fair.c | 29 +++--------------------------
>> > 1 file changed, 3 insertions(+), 26 deletions(-)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 09456fc..c86cca0 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -2873,32 +2873,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
>> > struct sched_entity *se,
>> > int wakeup)
>> > {
>> > - /*
>> > - * We track migrations using entity decay_count <= 0, on a wake-up
>> > - * migration we use a negative decay count to track the remote decays
>> > - * accumulated while sleeping.
>> > - *
>> > - * Newly forked tasks are enqueued with se->avg.decay_count == 0, they
>> > - * are seen by enqueue_entity_load_avg() as a migration with an already
>> > - * constructed load_avg_contrib.
>> > - */
>> > - if (unlikely(se->avg.decay_count <= 0)) {
>> > + /* we track migrations using entity decay_count == 0 */
>> > + if (unlikely(!se->avg.decay_count)) {
>> > se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
>> > - if (se->avg.decay_count) {
>> > - /*
>> > - * In a wake-up migration we have to approximate the
>> > - * time sleeping. This is because we can't synchronize
>> > - * clock_task between the two cpus, and it is not
>> > - * guaranteed to be read-safe. Instead, we can
>> > - * approximate this using our carried decays, which are
>> > - * explicitly atomically readable.
>> > - */
>> > - se->avg.last_runnable_update -= (-se->avg.decay_count)
>> > - << 20;
>> > - update_entity_load_avg(se, 0);
>> > - /* Indicate that we're now synchronized and on-rq */
>> > - se->avg.decay_count = 0;
>> > - }
>> > wakeup = 0;
>> > } else {
>> > __synchronize_entity_decay(se);
>> > @@ -5114,7 +5091,7 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
>> > * be negative here since on-rq tasks have decay-count == 0.
>> > */
>> > if (se->avg.decay_count) {
>> > - se->avg.decay_count = -__synchronize_entity_decay(se);
>> > + __synchronize_entity_decay(se);
>> > atomic_long_add(se->avg.load_avg_contrib,
>> > &cfs_rq->removed_load);
>> > }
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/

2015-07-19 06:03:12

by Byungchul Park

[permalink] [raw]

Subject: Re: [PATCH] sched: prevent sched entity from being decayed twice when both waking and migrating it

On Fri, Jul 17, 2015 at 10:02:22AM -0700, [email protected] wrote:
> Byungchul Park <[email protected]> writes:
>
> > On Thu, Jul 16, 2015 at 10:00:00AM -0700, [email protected] wrote:
> >
> > hello,
> >
> >> [email protected] writes:
> >>
> >> > From: Byungchul Park <[email protected]>
> >> >
> >> > hello paul,
> >> >
> >> > can i ask you something?
> >> >
> >> > when a sched entity is both waken and migrated, it looks being decayed twice.
> >> > did you do it on purpose?
> >> > or am i missing something? :(
> >> >
> >> > thanks,
> >> > byungchul
> >>
> >> __synchronize_entity_decay() updates only se->avg.load_avg_contrib so
> >> that removing from blocked_load is done correctly.
> >
> > as you said, it should done here. :)
> >
> >> update_entity_load_avg() accounts that (approximation of) time blocked
> >
> > i mean the entity was already accounted the blocked time in
> > __synchronize_entity_decay().
> >
> >> against runnable_avg/running_avg (and then recomputes load_avg_contrib
> >> to match while load_avg_contrib isn't part of any cfs_rq's sum).
> >
> > the thing to keep in mind is that, currently load tracking is done by
> > per-entity. that is, the entity already has its own whole load_avg_contrib
> > with considering the entity's blocked time, after __synchronize_entity_decay().
> > and cfs_rq can account the se's load by adding se->avg.load_avg_contrib to
> > cfs_rq->runnable_load_avg, like enqueue_entity_load_avg() code.
> >
> > wrong?
>
> load_avg_contrib is computed from runnable_avg, which is not updated by
> __synchronize_entity_decay, only by update_entity_load_avg ->
> __update_entity_runnable_avg. __synchronize_entity_decay is used in this path
> because update_entity_load_avg needs the rq lock (along with some other
> reasons), and migrate_task_rq_fair generally doesn't have the lock.

hello ben,

i see...
i missed the fact that __synchronize_entity_decay() is only for blocked one.
i am sorry for bothering you. ;(

thank you very much,
byungchul

>
> >
> > thanks,
> > byungchul
> >
> >>
> >> >
> >> > --------------->8---------------
> >> > From 793c963d0b29977a0f6f9330291a9ea469cc54f0 Mon Sep 17 00:00:00 2001
> >> > From: Byungchul Park <[email protected]>
> >> > Date: Thu, 16 Jul 2015 16:49:48 +0900
> >> > Subject: [PATCH] sched: prevent sched entity from being decayed twice when
> >> > both waking and migrating it
> >> >
> >> > current code is decaying load average variables with a sleep time twice,
> >> > when both waking and migrating it. the first decaying happens in a call path
> >> > "migrate_task_rq_fair() -> __synchronize_entity_decay()". the second
> >> > decaying happens in a call path "enqueue_entity_load_avg() ->
> >> > update_entity_load_avg()". so make it happen once.
> >> >
> >> > Signed-off-by: Byungchul Park <[email protected]>
> >> > ---
> >> > kernel/sched/fair.c | 29 +++--------------------------
> >> > 1 file changed, 3 insertions(+), 26 deletions(-)
> >> >
> >> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> > index 09456fc..c86cca0 100644
> >> > --- a/kernel/sched/fair.c
> >> > +++ b/kernel/sched/fair.c
> >> > @@ -2873,32 +2873,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
> >> > struct sched_entity *se,
> >> > int wakeup)
> >> > {
> >> > - /*
> >> > - * We track migrations using entity decay_count <= 0, on a wake-up
> >> > - * migration we use a negative decay count to track the remote decays
> >> > - * accumulated while sleeping.
> >> > - *
> >> > - * Newly forked tasks are enqueued with se->avg.decay_count == 0, they
> >> > - * are seen by enqueue_entity_load_avg() as a migration with an already
> >> > - * constructed load_avg_contrib.
> >> > - */
> >> > - if (unlikely(se->avg.decay_count <= 0)) {
> >> > + /* we track migrations using entity decay_count == 0 */
> >> > + if (unlikely(!se->avg.decay_count)) {
> >> > se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
> >> > - if (se->avg.decay_count) {
> >> > - /*
> >> > - * In a wake-up migration we have to approximate the
> >> > - * time sleeping. This is because we can't synchronize
> >> > - * clock_task between the two cpus, and it is not
> >> > - * guaranteed to be read-safe. Instead, we can
> >> > - * approximate this using our carried decays, which are
> >> > - * explicitly atomically readable.
> >> > - */
> >> > - se->avg.last_runnable_update -= (-se->avg.decay_count)
> >> > - << 20;
> >> > - update_entity_load_avg(se, 0);
> >> > - /* Indicate that we're now synchronized and on-rq */
> >> > - se->avg.decay_count = 0;
> >> > - }
> >> > wakeup = 0;
> >> > } else {
> >> > __synchronize_entity_decay(se);
> >> > @@ -5114,7 +5091,7 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
> >> > * be negative here since on-rq tasks have decay-count == 0.
> >> > */
> >> > if (se->avg.decay_count) {
> >> > - se->avg.decay_count = -__synchronize_entity_decay(se);
> >> > + __synchronize_entity_decay(se);
> >> > atomic_long_add(se->avg.load_avg_contrib,
> >> > &cfs_rq->removed_load);
> >> > }
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/