Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751478AbdFGPoB (ORCPT ); Wed, 7 Jun 2017 11:44:01 -0400 Received: from foss.arm.com ([217.140.101.70]:34426 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750831AbdFGPoA (ORCPT ); Wed, 7 Jun 2017 11:44:00 -0400 Date: Wed, 7 Jun 2017 16:43:52 +0100 From: Morten Rasmussen To: Viresh Kumar Cc: Peter Zijlstra , Ingo Molnar , Rafael Wysocki , linaro-kernel@lists.linaro.org, linux-kernel@vger.kernel.org, Vincent Guittot , linux-pm@vger.kernel.org, Juri Lelli , Dietmar.Eggemann@arm.com, patrick.bellasi@arm.com Subject: Re: [RFC] sched: fair: Don't update CPU frequency too frequently Message-ID: <20170607154351.GA2551@e105550-lin.cambridge.arm.com> References: <20170601122224.c324h4t7y3i4wr6e@hirez.programming.kicks-ass.net> <20170607120655.GB11126@vireshk-i7> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170607120655.GB11126@vireshk-i7> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1723 Lines: 34 On Wed, Jun 07, 2017 at 05:36:55PM +0530, Viresh Kumar wrote: > + Patrick, > > On 01-06-17, 14:22, Peter Zijlstra wrote: > > On Thu, Jun 01, 2017 at 05:04:27PM +0530, Viresh Kumar wrote: > > > This patch relocates the call to utilization hook from > > > update_cfs_rq_load_avg() to task_tick_fair(). > > > > That's not right. Consider hardware where 'setting' the DVFS is a > > 'cheap' MSR write, doing that once every 10ms (HZ=100) is absurd. > > Yeah, that may be too much for such a platforms. Actually we (/me & Vincent) > were worried about the current location of the utilization update hooks and > believed that they are getting called way too often. But yeah, this patch > optimized it way too much. > > One of the goals of this patch was to avoid doing small OPP updates from > update_load_avg() which can potentially block significant utilization changes > (and hence big OPP changes) while a task is attached or detached, etc. To me that sounds like you want to apply a more clever filter to the utilization updates than a simple rate limiter as Peter suggests below. IMHO, it would be better to feed schedutil with all the available information and improve the filtering policy there instead of trying to hack the policy tweaking the input data. > > We spoke about this problem in Pisa, the proposed solution was having > > each driver provide a cost metric and the generic code doing a max > > filter over the window constructed from that cost metric. Maybe it is possible to somehow let the rate at which we allow OPP changes depend on the size of the 'error' delta between the current OPP and what we need. So radical changes causes OPP changes immediately, and small corrections have to wait longer?