Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966285AbcDML1p (ORCPT ); Wed, 13 Apr 2016 07:27:45 -0400 Received: from mail-lf0-f50.google.com ([209.85.215.50]:35582 "EHLO mail-lf0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964912AbcDML1m (ORCPT ); Wed, 13 Apr 2016 07:27:42 -0400 MIME-Version: 1.0 In-Reply-To: <20160411203733.GI8697@intel.com> References: <1460327765-18024-1-git-send-email-yuyang.du@intel.com> <1460327765-18024-5-git-send-email-yuyang.du@intel.com> <20160411203733.GI8697@intel.com> From: Vincent Guittot Date: Wed, 13 Apr 2016 13:27:21 +0200 Message-ID: Subject: Re: [PATCH 4/4] sched/fair: Implement flat hierarchical structure for util_avg To: Yuyang Du Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , Benjamin Segall , Paul Turner , Morten Rasmussen , Dietmar Eggemann , Juri Lelli Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4140 Lines: 92 On 11 April 2016 at 22:37, Yuyang Du wrote: > Hi Vincent, > > On Mon, Apr 11, 2016 at 02:29:25PM +0200, Vincent Guittot wrote: >> >> > update any group entity's util, so the net overhead should not be >> > formidable. In addition, rq's util updates can be very frequent, >> > but the updates in a period will be skipped, this is mostly effective >> > in update_blocked_averages(). >> >> Not sure to catch your point here but the rq->util of an idle CPU will >> never be updated before a (idle) load balance as we call >> update_cfs_rq_load_avg which doesn't update the cfs_rq->avg.util_avg >> anymore nor rq->avg.util_avg. > > I meant in update_blocked_averages(), the rq's util won't be updated > every time a cfs is updated if they are in the same period (~1ms), probabaly > this is the case. > > The idle CPU thing is no difference, so it is anyway the responsibility of > any other active CPU to take the idle CPU's idle time into account for any > purpose. > >> >> > + if (update_util) >> > + sa->util_avg = sa->util_sum / LOAD_AVG_MAX; >> > + >> > + if (!cfs_rq) >> > + return 1; >> > + >> > + /* >> > + * Update rq's util_sum and util_avg >> > + */ >> >> Do we really have to update the utilization of rq each time we update >> a sched_entity ? IMHO, we don't need to do this update so often even >> more if the se is a task_group. But even if it's a task, we don't >> especially need to update it right now but we can wait for the update >> of the rq->cfs like previously or we explicilty update it when we have >> to do a direct sub/add on the util_avg value >> See also my comment on removed_util_avg below >> > > No, we only update a rq's util if the update is for cfs, not the sched_entity, > which is the if (!cfs_rq) condition's job ah yes i have skiped the ! when reading > >> Why not using the sched_avg of the rq->cfs in order to track the >> utilization of the root cfs_rq instead of adding a new sched_avg into >> the rq ? Then you call update_cfs_rq_load_avg(rq->cfs) when you want >> to update/sync the utilization of the rq->cfs and for one call you >> will update both the load_avg and the util_avg of the root cfs instead >> of duplicating the sequence in _update_load_avg > > This is the approach taken by Dietmar in his patch, a fairly easy approach. > The problem is though, if so, we update the root cfs_rq only when it is > the root cfs_rq to update. A simple contrived case will make it never > updated except in update_blocked_averages(). My impression is that this > might be too much precision lost. > > And thus we take this alternative approach, and thus I revisited > __update_load_avg() to optimize it. > > [snip] > >> > - if (atomic_long_read(&cfs_rq->removed_util_avg)) { >> > - long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0); >> > - sa->util_avg = max_t(long, sa->util_avg - r, 0); >> > - sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0); >> > + if (atomic_long_read(&rq->removed_util_avg)) { >> > + long r = atomic_long_xchg(&rq->removed_util_avg, 0); >> > + rq->avg.util_avg = max_t(long, rq->avg.util_avg - r, 0); >> > + rq->avg.util_sum = max_t(s32, rq->avg.util_sum - r * LOAD_AVG_MAX, 0); >> >> I see one potential issue here because the rq->util_avg may (surely) >> have been already updated and decayed during the update of a >> sched_entity but before we substract the removed_util_avg > > This is the same now, because cfs_rq will be regularly updated in > update_blocked_averages(), which basically means cfs_rq will be newer > than task for sure, although task tries to catch up when removed. I don't agree on that part. At now, we check and substract removed_util_avg before calling __update_load_avg for a cfs_rq, so it will be removed before changing last_update_time. With your patch, we update rq->avg.util_avg and last_update_time without checking removed_util_avg. > Well, yes, in this patch with rq, the situation is more so. But, > hopefully, this is a formidable concern.