Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751573AbdFGMHB (ORCPT ); Wed, 7 Jun 2017 08:07:01 -0400 Received: from mail-pf0-f177.google.com ([209.85.192.177]:35676 "EHLO mail-pf0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751413AbdFGMHA (ORCPT ); Wed, 7 Jun 2017 08:07:00 -0400 Date: Wed, 7 Jun 2017 17:36:55 +0530 From: Viresh Kumar To: Peter Zijlstra Cc: Ingo Molnar , Rafael Wysocki , linaro-kernel@lists.linaro.org, linux-kernel@vger.kernel.org, Vincent Guittot , linux-pm@vger.kernel.org, Juri Lelli , Dietmar.Eggemann@arm.com, Morten.Rasmussen@arm.com, patrick.bellasi@arm.com Subject: Re: [RFC] sched: fair: Don't update CPU frequency too frequently Message-ID: <20170607120655.GB11126@vireshk-i7> References: <20170601122224.c324h4t7y3i4wr6e@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170601122224.c324h4t7y3i4wr6e@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3328 Lines: 71 + Patrick, On 01-06-17, 14:22, Peter Zijlstra wrote: > On Thu, Jun 01, 2017 at 05:04:27PM +0530, Viresh Kumar wrote: > > This patch relocates the call to utilization hook from > > update_cfs_rq_load_avg() to task_tick_fair(). > > That's not right. Consider hardware where 'setting' the DVFS is a > 'cheap' MSR write, doing that once every 10ms (HZ=100) is absurd. Yeah, that may be too much for such a platforms. Actually we (/me & Vincent) were worried about the current location of the utilization update hooks and believed that they are getting called way too often. But yeah, this patch optimized it way too much. One of the goals of this patch was to avoid doing small OPP updates from update_load_avg() which can potentially block significant utilization changes (and hence big OPP changes) while a task is attached or detached, etc. > We spoke about this problem in Pisa, the proposed solution was having > each driver provide a cost metric and the generic code doing a max > filter over the window constructed from that cost metric. So we want to compensate for the lost opportunities (due to rate_limit_us window) by changing the OPP based on what has happened in the previous rate_limit_us window. I am not sure how will that help. Case 1: A periodic RT task runs for a small time in the rate_limit_us window and the timing is such that we (almost) never go to the max OPP because of rate_limit_us window. Wouldn't a better solution towards such a case is what Patrick [1] proposed earlier (i.e. ignore rate_limit_us for RT/DL tasks), as we will run at high OPP when we really needed it the most. Case 2: A high utilization periodic CFS task runs for short duration and keeps on migrating to other CPUs. We miss the opportunity to update the OPP based on this tasks utilization because of rate_limit_us window and by the time we update the OPP again, this task is already migrated and so the utilization is low again. If the task has already migrated, why should we increase the OPP on assumption that this task will come back on this CPU? There are enough chances that the selected (higher) OPP will not be utilized by the current load on the CPU. Also if this CFS tasks runs once every 2 (or more) ticks on the same CPU, then we are back to the same problem again. 1 2 3 4 |---------|---------|---------|---------| T T 1,2,3,4 are representing the events on which we try to update the OPP and are placed rate_limit_us distance apart. And the task T happens to run between 1-2 and 3-4. We will not change the frequency until the event 2 in this case as rate_limit_us window isn't over yet. We go to higher OPP on 2 (which is really wasted for the current loads) because T happened in the last window. On 3 we come back to the OPP proportional to the current load. And the next time T runs again, we are still stuck on the low OPP. So instead of fixing it, we made it worse by wasting power unnecessarily. Is there any case I am missing that you are concerned about ? -- viresh [1] https://marc.info/?l=linux-kernel&m=148846976032099&w=2