Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752894AbdCTD5w (ORCPT ); Sun, 19 Mar 2017 23:57:52 -0400 Received: from mail-pg0-f48.google.com ([74.125.83.48]:34611 "EHLO mail-pg0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752467AbdCTD5t (ORCPT ); Sun, 19 Mar 2017 23:57:49 -0400 Date: Mon, 20 Mar 2017 09:27:45 +0530 From: Viresh Kumar To: "Rafael J. Wysocki" Cc: Linux PM , LKML , Peter Zijlstra , Srinivas Pandruvada , Juri Lelli , Vincent Guittot , Patrick Bellasi , Joel Fernandes , Morten Rasmussen , Ingo Molnar Subject: Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs Message-ID: <20170320035745.GC25659@vireshk-i7> References: <4366682.tsferJN35u@aspire.rjw.lan> <2185243.flNrap3qq1@aspire.rjw.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2185243.flNrap3qq1@aspire.rjw.lan> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5080 Lines: 141 On 19-03-17, 14:34, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > The PELT metric used by the schedutil governor underestimates the > CPU utilization in some cases. The reason for that may be time spent > in interrupt handlers and similar which is not accounted for by PELT. > > That can be easily demonstrated by running kernel compilation on > a Sandy Bridge Intel processor, running turbostat in parallel with > it and looking at the values written to the MSR_IA32_PERF_CTL > register. Namely, the expected result would be that when all CPUs > were 100% busy, all of them would be requested to run in the maximum > P-state, but observation shows that this clearly isn't the case. > The CPUs run in the maximum P-state for a while and then are > requested to run slower and go back to the maximum P-state after > a while again. That causes the actual frequency of the processor to > visibly oscillate below the sustainable maximum in a jittery fashion > which clearly is not desirable. > > To work around this issue use the observation that, from the > schedutil governor's perspective, CPUs that are never idle should > always run at the maximum frequency and make that happen. > > To that end, add a counter of idle calls to struct sugov_cpu and > modify cpuidle_idle_call() to increment that counter every time it > is about to put the given CPU into an idle state. Next, make the > schedutil governor look at that counter for the current CPU every > time before it is about to start heavy computations. If the counter > has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms), > the CPU has not been idle for at least that long and the governor > will choose the maximum frequency for it without looking at the PELT > metric at all. Looks like we are fixing a PELT problem with a schedutil Hack :) > Signed-off-by: Rafael J. Wysocki > --- > include/linux/sched/cpufreq.h | 6 ++++++ > kernel/sched/cpufreq_schedutil.c | 38 ++++++++++++++++++++++++++++++++++++-- > kernel/sched/idle.c | 3 +++ > 3 files changed, 45 insertions(+), 2 deletions(-) > > Index: linux-pm/kernel/sched/cpufreq_schedutil.c > =================================================================== > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c > +++ linux-pm/kernel/sched/cpufreq_schedutil.c > @@ -20,6 +20,7 @@ > #include "sched.h" > > #define SUGOV_KTHREAD_PRIORITY 50 > +#define SUGOV_BUSY_THRESHOLD (50 * NSEC_PER_MSEC) > > struct sugov_tunables { > struct gov_attr_set attr_set; > @@ -55,6 +56,9 @@ struct sugov_cpu { > > unsigned long iowait_boost; > unsigned long iowait_boost_max; > + unsigned long idle_calls; > + unsigned long saved_idle_calls; > + u64 busy_start; > u64 last_update; > > /* The fields below are only needed when sharing a policy. */ > @@ -192,6 +196,34 @@ static void sugov_iowait_boost(struct su > sg_cpu->iowait_boost >>= 1; > } > > +void cpufreq_schedutil_idle_call(void) > +{ > + struct sugov_cpu *sg_cpu = this_cpu_ptr(&sugov_cpu); > + > + sg_cpu->idle_calls++; > +} > + > +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) > +{ > + if (sg_cpu->idle_calls != sg_cpu->saved_idle_calls) { > + sg_cpu->busy_start = 0; > + return false; > + } > + > + if (!sg_cpu->busy_start) { > + sg_cpu->busy_start = sg_cpu->last_update; > + return false; > + } > + > + return sg_cpu->last_update - sg_cpu->busy_start > SUGOV_BUSY_THRESHOLD; > +} > + > +static void sugov_save_idle_calls(struct sugov_cpu *sg_cpu) > +{ > + if (!sg_cpu->busy_start) > + sg_cpu->saved_idle_calls = sg_cpu->idle_calls; Why aren't we doing this in sugov_cpu_is_busy() itself ? And isn't it possible for idle_calls to get incremented by this time? > +} > + > static void sugov_update_single(struct update_util_data *hook, u64 time, > unsigned int flags) > { > @@ -207,7 +239,7 @@ static void sugov_update_single(struct u > if (!sugov_should_update_freq(sg_policy, time)) > return; > > - if (flags & SCHED_CPUFREQ_RT_DL) { > + if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu)) { > next_f = policy->cpuinfo.max_freq; > } else { > sugov_get_util(&util, &max); > @@ -215,6 +247,7 @@ static void sugov_update_single(struct u > next_f = get_next_freq(sg_policy, util, max); > } > sugov_update_commit(sg_policy, time, next_f); > + sugov_save_idle_calls(sg_cpu); > } > > static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu) > @@ -278,12 +311,13 @@ static void sugov_update_shared(struct u > sg_cpu->last_update = time; > > if (sugov_should_update_freq(sg_policy, time)) { > - if (flags & SCHED_CPUFREQ_RT_DL) > + if ((flags & SCHED_CPUFREQ_RT_DL) || sugov_cpu_is_busy(sg_cpu)) What about others CPUs in this policy? > next_f = sg_policy->policy->cpuinfo.max_freq; > else > next_f = sugov_next_freq_shared(sg_cpu); > > sugov_update_commit(sg_policy, time, next_f); > + sugov_save_idle_calls(sg_cpu); > } > > raw_spin_unlock(&sg_policy->update_lock); -- viresh