Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754380AbcDLEUE (ORCPT ); Tue, 12 Apr 2016 00:20:04 -0400 Received: from mga04.intel.com ([192.55.52.120]:40070 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751087AbcDLEUB (ORCPT ); Tue, 12 Apr 2016 00:20:01 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,472,1455004800"; d="scan'208";a="943207082" Date: Tue, 12 Apr 2016 04:37:33 +0800 From: Yuyang Du To: Vincent Guittot Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , Benjamin Segall , Paul Turner , Morten Rasmussen , Dietmar Eggemann , Juri Lelli Subject: Re: [PATCH 4/4] sched/fair: Implement flat hierarchical structure for util_avg Message-ID: <20160411203733.GI8697@intel.com> References: <1460327765-18024-1-git-send-email-yuyang.du@intel.com> <1460327765-18024-5-git-send-email-yuyang.du@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3653 Lines: 81 Hi Vincent, On Mon, Apr 11, 2016 at 02:29:25PM +0200, Vincent Guittot wrote: > > > update any group entity's util, so the net overhead should not be > > formidable. In addition, rq's util updates can be very frequent, > > but the updates in a period will be skipped, this is mostly effective > > in update_blocked_averages(). > > Not sure to catch your point here but the rq->util of an idle CPU will > never be updated before a (idle) load balance as we call > update_cfs_rq_load_avg which doesn't update the cfs_rq->avg.util_avg > anymore nor rq->avg.util_avg. I meant in update_blocked_averages(), the rq's util won't be updated every time a cfs is updated if they are in the same period (~1ms), probabaly this is the case. The idle CPU thing is no difference, so it is anyway the responsibility of any other active CPU to take the idle CPU's idle time into account for any purpose. > > > + if (update_util) > > + sa->util_avg = sa->util_sum / LOAD_AVG_MAX; > > + > > + if (!cfs_rq) > > + return 1; > > + > > + /* > > + * Update rq's util_sum and util_avg > > + */ > > Do we really have to update the utilization of rq each time we update > a sched_entity ? IMHO, we don't need to do this update so often even > more if the se is a task_group. But even if it's a task, we don't > especially need to update it right now but we can wait for the update > of the rq->cfs like previously or we explicilty update it when we have > to do a direct sub/add on the util_avg value > See also my comment on removed_util_avg below > No, we only update a rq's util if the update is for cfs, not the sched_entity, which is the if (!cfs_rq) condition's job > Why not using the sched_avg of the rq->cfs in order to track the > utilization of the root cfs_rq instead of adding a new sched_avg into > the rq ? Then you call update_cfs_rq_load_avg(rq->cfs) when you want > to update/sync the utilization of the rq->cfs and for one call you > will update both the load_avg and the util_avg of the root cfs instead > of duplicating the sequence in _update_load_avg This is the approach taken by Dietmar in his patch, a fairly easy approach. The problem is though, if so, we update the root cfs_rq only when it is the root cfs_rq to update. A simple contrived case will make it never updated except in update_blocked_averages(). My impression is that this might be too much precision lost. And thus we take this alternative approach, and thus I revisited __update_load_avg() to optimize it. [snip] > > - if (atomic_long_read(&cfs_rq->removed_util_avg)) { > > - long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0); > > - sa->util_avg = max_t(long, sa->util_avg - r, 0); > > - sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0); > > + if (atomic_long_read(&rq->removed_util_avg)) { > > + long r = atomic_long_xchg(&rq->removed_util_avg, 0); > > + rq->avg.util_avg = max_t(long, rq->avg.util_avg - r, 0); > > + rq->avg.util_sum = max_t(s32, rq->avg.util_sum - r * LOAD_AVG_MAX, 0); > > I see one potential issue here because the rq->util_avg may (surely) > have been already updated and decayed during the update of a > sched_entity but before we substract the removed_util_avg This is the same now, because cfs_rq will be regularly updated in update_blocked_averages(), which basically means cfs_rq will be newer than task for sure, although task tries to catch up when removed. Well, yes, in this patch with rq, the situation is more so. But, hopefully, this is a formidable concern.