2016-10-04 20:11:10

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue

On Wed, 28 Sep, at 12:14:22PM, Peter Zijlstra wrote:
> On Fri, Sep 23, 2016 at 12:58:08PM +0100, Matt Fleming wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8fb4d1942c14..4a2d3ff772f8 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3142,7 +3142,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > int migrated, decayed;
> >
> > migrated = !sa->last_update_time;
> > - if (!migrated) {
> > + if (!migrated && se->sum_exec_runtime) {
> > __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> > se->on_rq * scale_load_down(se->load.weight),
> > cfs_rq->curr == se, NULL);
>
>
> Hrmm,.. so I see the problem, but I think we're working around it.
>
> So the problem is that time moves between wake_up_new_task() doing
> post_init_entity_util_avg(), which attaches us to the cfs_rq, and
> activate_task() which enqueues us.
>
> Part of the problem is that we do not in fact seem to do
> update_rq_clock() before post_init_entity_util_avg(), which makes the
> delta larger than it should be.
>
> The other problem is that activate_task()->enqueue_task() does do
> update_rq_clock() (again, after fixing), creating the delta.
>
> Which suggests we do something like the below (not compile tested or
> anything, also I ran out of tea again).

This patch causes some low cpu machines (4 or 8) to regress. It turns
out they regress with my patch too. I'm running the below patch
without the enqueue_task() hunk now to see if that makes a difference.

> While staring at this, I don't think we can still hit
> vruntime_normalized() with a new task, so I _think_ we can remove that
> !se->sum_exec_runtime clause there (and rejoice), no?

Looks that way to me, yeah.