Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753858AbdCWBEg (ORCPT ); Wed, 22 Mar 2017 21:04:36 -0400 Received: from mail-vk0-f53.google.com ([209.85.213.53]:33133 "EHLO mail-vk0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751256AbdCWBE1 (ORCPT ); Wed, 22 Mar 2017 21:04:27 -0400 MIME-Version: 1.0 In-Reply-To: <2997922.DidfPadJuT@aspire.rjw.lan> References: <4366682.tsferJN35u@aspire.rjw.lan> <2185243.flNrap3qq1@aspire.rjw.lan> <3300960.HE4b3sK4dn@aspire.rjw.lan> <2997922.DidfPadJuT@aspire.rjw.lan> From: Joel Fernandes Date: Wed, 22 Mar 2017 18:04:25 -0700 Message-ID: Subject: Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely To: "Rafael J. Wysocki" Cc: Linux PM , Peter Zijlstra , LKML , Srinivas Pandruvada , Viresh Kumar , Juri Lelli , Vincent Guittot , Patrick Bellasi , Morten Rasmussen , Ingo Molnar , Thomas Gleixner Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2934 Lines: 58 On Tue, Mar 21, 2017 at 4:08 PM, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > The way the schedutil governor uses the PELT metric causes it to > underestimate the CPU utilization in some cases. > > That can be easily demonstrated by running kernel compilation on > a Sandy Bridge Intel processor, running turbostat in parallel with > it and looking at the values written to the MSR_IA32_PERF_CTL > register. Namely, the expected result would be that when all CPUs > were 100% busy, all of them would be requested to run in the maximum > P-state, but observation shows that this clearly isn't the case. > The CPUs run in the maximum P-state for a while and then are > requested to run slower and go back to the maximum P-state after > a while again. That causes the actual frequency of the processor to > visibly oscillate below the sustainable maximum in a jittery fashion > which clearly is not desirable. > > That has been attributed to CPU utilization metric updates on task > migration that cause the total utilization value for the CPU to be > reduced by the utilization of the migrated task. If that happens, > the schedutil governor may see a CPU utilization reduction and will > attempt to reduce the CPU frequency accordingly right away. That > may be premature, though, for example if the system is generally > busy and there are other runnable tasks waiting to be run on that > CPU already. > > This is unlikely to be an issue on systems where cpufreq policies are > shared between multiple CPUs, because in those cases the policy > utilization is computed as the maximum of the CPU utilization values > over the whole policy and if that turns out to be low, reducing the > frequency for the policy most likely is a good idea anyway. On > systems with one CPU per policy, however, it may affect performance > adversely and even lead to increased energy consumption in some cases. > > On those systems it may be addressed by taking another utilization > metric into consideration, like whether or not the CPU whose > frequency is about to be reduced has been idle recently, because if > that's not the case, the CPU is likely to be busy in the near future > and its frequency should not be reduced. > > To that end, use the counter of idle calls in the timekeeping code. > Namely, make the schedutil governor look at that counter for the > current CPU every time before its frequency is about to be reduced. > If the counter has not changed since the previous iteration of the > governor computations for that CPU, the CPU has been busy for all > that time and its frequency should not be decreased, so if the new > frequency would be lower than the one set previously, the governor > will skip the frequency update. > > Signed-off-by: Rafael J. Wysocki Makes sense, Reviewed-by: Joel Fernandes Thanks, Joel