MIME-Version: 1.0
In-Reply-To: <20160411203733.GI8697@intel.com>
References: <1460327765-18024-1-git-send-email-yuyang.du@intel.com>
 <1460327765-18024-5-git-send-email-yuyang.du@intel.com> <CAKfTPtBS-MZRyq-+DSmcEr7Yi4Fh2y50hOvEN=S9P2tiTA7k8A@mail.gmail.com>
 <20160411203733.GI8697@intel.com>
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Wed, 13 Apr 2016 13:27:21 +0200
Message-ID: <CAKfTPtDVRy=RCJaLBa2_KVx2tk3SsuUPDhDLcovf+kpT-artbQ@mail.gmail.com>
Subject: Re: [PATCH 4/4] sched/fair: Implement flat hierarchical structure for util_avg
To: Yuyang Du <yuyang.du@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Benjamin Segall <bsegall@google.com>, Paul Turner <pjt@google.com>,
        Morten Rasmussen <morten.rasmussen@arm.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Juri Lelli <juri.lelli@arm.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4140
Lines: 92

On 11 April 2016 at 22:37, Yuyang Du <yuyang.du@intel.com> wrote:
> Hi Vincent,
>
> On Mon, Apr 11, 2016 at 02:29:25PM +0200, Vincent Guittot wrote:
>>
>> > update any group entity's util, so the net overhead should not be
>> > formidable. In addition, rq's util updates can be very frequent,
>> > but the updates in a period will be skipped, this is mostly effective
>> > in update_blocked_averages().
>>
>> Not sure to catch your point here but the rq->util of an idle CPU will
>> never be updated before a (idle) load balance as we call
>> update_cfs_rq_load_avg which doesn't update the cfs_rq->avg.util_avg
>> anymore nor rq->avg.util_avg.
>
> I meant in update_blocked_averages(), the rq's util won't be updated
> every time a cfs is updated if they are in the same period (~1ms), probabaly
> this is the case.
>
> The idle CPU thing is no difference, so it is anyway the responsibility of
> any other active CPU to take the idle CPU's idle time into account for any
> purpose.
>
>>
>> > +       if (update_util)
>> > +               sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
>> > +
>> > +       if (!cfs_rq)
>> > +               return 1;
>> > +
>> > +       /*
>> > +        * Update rq's util_sum and util_avg
>> > +        */
>>
>> Do we really have to update the utilization of rq each time we update
>> a sched_entity ?  IMHO, we don't need to do this update so often even
>> more if the se is a task_group. But even if it's a task, we don't
>> especially need to update it right now but we can wait for the update
>> of the rq->cfs like previously or we explicilty update it when we have
>> to do a direct sub/add on the util_avg value
>> See also my comment on removed_util_avg below
>>
>
> No, we only update a rq's util if the update is for cfs, not the sched_entity,
> which is the if (!cfs_rq) condition's job

ah yes i have skiped the ! when reading

>
>> Why not using the sched_avg of the rq->cfs in order to track the
>> utilization of the root cfs_rq instead of adding a new sched_avg into
>> the rq ? Then you call update_cfs_rq_load_avg(rq->cfs) when you want
>> to update/sync the utilization of the rq->cfs and for one call you
>> will update both the load_avg and the util_avg of the root cfs instead
>> of duplicating the sequence in _update_load_avg
>
> This is the approach taken by Dietmar in his patch, a fairly easy approach.
> The problem is though, if so, we update the root cfs_rq only when it is
> the root cfs_rq to update. A simple contrived case will make it never
> updated except in update_blocked_averages(). My impression is that this
> might be too much precision lost.
>
> And thus we take this alternative approach, and thus I revisited
> __update_load_avg() to optimize it.
>
> [snip]
>
>> > -       if (atomic_long_read(&cfs_rq->removed_util_avg)) {
>> > -               long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
>> > -               sa->util_avg = max_t(long, sa->util_avg - r, 0);
>> > -               sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
>> > +       if (atomic_long_read(&rq->removed_util_avg)) {
>> > +               long r = atomic_long_xchg(&rq->removed_util_avg, 0);
>> > +               rq->avg.util_avg = max_t(long, rq->avg.util_avg - r, 0);
>> > +               rq->avg.util_sum = max_t(s32, rq->avg.util_sum - r * LOAD_AVG_MAX, 0);
>>
>> I see one potential issue here because the rq->util_avg may (surely)
>> have been already updated and decayed during the update of a
>> sched_entity but before we substract the removed_util_avg
>
> This is the same now, because cfs_rq will be regularly updated in
> update_blocked_averages(), which basically means cfs_rq will be newer
> than task for sure, although task tries to catch up when removed.

I don't agree on that part. At now, we check and substract
removed_util_avg before calling __update_load_avg for a cfs_rq, so it
will be removed before changing last_update_time.
With your patch,  we update rq->avg.util_avg and last_update_time
without checking removed_util_avg.

> Well, yes, in this patch with rq, the situation is more so. But,
> hopefully, this is a formidable concern.