From: bsegall@google.com
To: Byungchul Park <byungchul.park@lge.com>
Cc: pjt@google.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sched: prevent sched entity from being decayed twice when both waking and migrating it
References: <1437034317-15120-1-git-send-email-byungchul.park@lge.com>
	<xm26r3o8jawf.fsf@sword-of-the-dawn.mtv.corp.google.com>
	<20150717061949.GD3956@byungchulpark-X58A-UD3R>
Date: Fri, 17 Jul 2015 10:02:22 -0700
In-Reply-To: <20150717061949.GD3956@byungchulpark-X58A-UD3R> (Byungchul Park's
	message of "Fri, 17 Jul 2015 15:19:49 +0900")
Message-ID: <xm26k2tyk99d.fsf@sword-of-the-dawn.mtv.corp.google.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5199
Lines: 130

Byungchul Park <byungchul.park@lge.com> writes:

> On Thu, Jul 16, 2015 at 10:00:00AM -0700, bsegall@google.com wrote:
>
> hello,
>
>> byungchul.park@lge.com writes:
>> 
>> > From: Byungchul Park <byungchul.park@lge.com>
>> >
>> > hello paul,
>> >
>> > can i ask you something?
>> >
>> > when a sched entity is both waken and migrated, it looks being decayed twice.
>> > did you do it on purpose?
>> > or am i missing something? :(
>> >
>> > thanks,
>> > byungchul
>> 
>> __synchronize_entity_decay() updates only se->avg.load_avg_contrib so
>> that removing from blocked_load is done correctly.
>
> as you said, it should done here. :)
>
>> update_entity_load_avg() accounts that (approximation of) time blocked
>
> i mean the entity was already accounted the blocked time in
> __synchronize_entity_decay().
>
>> against runnable_avg/running_avg (and then recomputes load_avg_contrib
>> to match while load_avg_contrib isn't part of any cfs_rq's sum).
>
> the thing to keep in mind is that, currently load tracking is done by 
> per-entity. that is, the entity already has its own whole load_avg_contrib
> with considering the entity's blocked time, after __synchronize_entity_decay().
> and cfs_rq can account the se's load by adding se->avg.load_avg_contrib to 
> cfs_rq->runnable_load_avg, like enqueue_entity_load_avg() code.
>
> wrong?

load_avg_contrib is computed from runnable_avg, which is not updated by
__synchronize_entity_decay, only by update_entity_load_avg ->
__update_entity_runnable_avg. __synchronize_entity_decay is used in this path
because update_entity_load_avg needs the rq lock (along with some other
reasons), and migrate_task_rq_fair generally doesn't have the lock.

>
> thanks,
> byungchul
>
>> 
>> >
>> > --------------->8---------------
>> > From 793c963d0b29977a0f6f9330291a9ea469cc54f0 Mon Sep 17 00:00:00 2001
>> > From: Byungchul Park <byungchul.park@lge.com>
>> > Date: Thu, 16 Jul 2015 16:49:48 +0900
>> > Subject: [PATCH] sched: prevent sched entity from being decayed twice when
>> >  both waking and migrating it
>> >
>> > current code is decaying load average variables with a sleep time twice,
>> > when both waking and migrating it. the first decaying happens in a call path
>> > "migrate_task_rq_fair() -> __synchronize_entity_decay()". the second
>> > decaying happens in a call path "enqueue_entity_load_avg() ->
>> > update_entity_load_avg()". so make it happen once.
>> >
>> > Signed-off-by: Byungchul Park <byungchul.park@lge.com>
>> > ---
>> >  kernel/sched/fair.c |   29 +++--------------------------
>> >  1 file changed, 3 insertions(+), 26 deletions(-)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 09456fc..c86cca0 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -2873,32 +2873,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
>> >  						  struct sched_entity *se,
>> >  						  int wakeup)
>> >  {
>> > -	/*
>> > -	 * We track migrations using entity decay_count <= 0, on a wake-up
>> > -	 * migration we use a negative decay count to track the remote decays
>> > -	 * accumulated while sleeping.
>> > -	 *
>> > -	 * Newly forked tasks are enqueued with se->avg.decay_count == 0, they
>> > -	 * are seen by enqueue_entity_load_avg() as a migration with an already
>> > -	 * constructed load_avg_contrib.
>> > -	 */
>> > -	if (unlikely(se->avg.decay_count <= 0)) {
>> > +	/* we track migrations using entity decay_count == 0 */
>> > +	if (unlikely(!se->avg.decay_count)) {
>> >  		se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
>> > -		if (se->avg.decay_count) {
>> > -			/*
>> > -			 * In a wake-up migration we have to approximate the
>> > -			 * time sleeping.  This is because we can't synchronize
>> > -			 * clock_task between the two cpus, and it is not
>> > -			 * guaranteed to be read-safe.  Instead, we can
>> > -			 * approximate this using our carried decays, which are
>> > -			 * explicitly atomically readable.
>> > -			 */
>> > -			se->avg.last_runnable_update -= (-se->avg.decay_count)
>> > -							<< 20;
>> > -			update_entity_load_avg(se, 0);
>> > -			/* Indicate that we're now synchronized and on-rq */
>> > -			se->avg.decay_count = 0;
>> > -		}
>> >  		wakeup = 0;
>> >  	} else {
>> >  		__synchronize_entity_decay(se);
>> > @@ -5114,7 +5091,7 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
>> >  	 * be negative here since on-rq tasks have decay-count == 0.
>> >  	 */
>> >  	if (se->avg.decay_count) {
>> > -		se->avg.decay_count = -__synchronize_entity_decay(se);
>> > +		__synchronize_entity_decay(se);
>> >  		atomic_long_add(se->avg.load_avg_contrib,
>> >  						&cfs_rq->removed_load);
>> >  	}
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/