Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759115AbdCVJ1K (ORCPT ); Wed, 22 Mar 2017 05:27:10 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:39164 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759083AbdCVJ1B (ORCPT ); Wed, 22 Mar 2017 05:27:01 -0400 Date: Wed, 22 Mar 2017 10:26:46 +0100 From: Peter Zijlstra To: "Rafael J. Wysocki" Cc: Linux PM , LKML , Srinivas Pandruvada , Viresh Kumar , Juri Lelli , Vincent Guittot , Patrick Bellasi , Joel Fernandes , Morten Rasmussen , Ingo Molnar , Thomas Gleixner Subject: Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Message-ID: <20170322092646.GW5680@worktop> References: <4366682.tsferJN35u@aspire.rjw.lan> <2185243.flNrap3qq1@aspire.rjw.lan> <3300960.HE4b3sK4dn@aspire.rjw.lan> <2997922.DidfPadJuT@aspire.rjw.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2997922.DidfPadJuT@aspire.rjw.lan> User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3027 Lines: 56 On Wed, Mar 22, 2017 at 12:08:50AM +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > The way the schedutil governor uses the PELT metric causes it to > underestimate the CPU utilization in some cases. > > That can be easily demonstrated by running kernel compilation on > a Sandy Bridge Intel processor, running turbostat in parallel with > it and looking at the values written to the MSR_IA32_PERF_CTL > register. Namely, the expected result would be that when all CPUs > were 100% busy, all of them would be requested to run in the maximum > P-state, but observation shows that this clearly isn't the case. > The CPUs run in the maximum P-state for a while and then are > requested to run slower and go back to the maximum P-state after > a while again. That causes the actual frequency of the processor to > visibly oscillate below the sustainable maximum in a jittery fashion > which clearly is not desirable. > > That has been attributed to CPU utilization metric updates on task > migration that cause the total utilization value for the CPU to be > reduced by the utilization of the migrated task. If that happens, > the schedutil governor may see a CPU utilization reduction and will > attempt to reduce the CPU frequency accordingly right away. That > may be premature, though, for example if the system is generally > busy and there are other runnable tasks waiting to be run on that > CPU already. > > This is unlikely to be an issue on systems where cpufreq policies are > shared between multiple CPUs, because in those cases the policy > utilization is computed as the maximum of the CPU utilization values > over the whole policy and if that turns out to be low, reducing the > frequency for the policy most likely is a good idea anyway. On > systems with one CPU per policy, however, it may affect performance > adversely and even lead to increased energy consumption in some cases. > > On those systems it may be addressed by taking another utilization > metric into consideration, like whether or not the CPU whose > frequency is about to be reduced has been idle recently, because if > that's not the case, the CPU is likely to be busy in the near future > and its frequency should not be reduced. > > To that end, use the counter of idle calls in the timekeeping code. > Namely, make the schedutil governor look at that counter for the > current CPU every time before its frequency is about to be reduced. > If the counter has not changed since the previous iteration of the > governor computations for that CPU, the CPU has been busy for all > that time and its frequency should not be decreased, so if the new > frequency would be lower than the one set previously, the governor > will skip the frequency update. > > Signed-off-by: Rafael J. Wysocki Right; this makes sense to me. Of course it would be good to have some more measurements on this, but in principle: Acked-by: Peter Zijlstra (Intel)