Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751238AbdGPIES (ORCPT ); Sun, 16 Jul 2017 04:04:18 -0400 Received: from mail-pf0-f169.google.com ([209.85.192.169]:35910 "EHLO mail-pf0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750934AbdGPIEQ (ORCPT ); Sun, 16 Jul 2017 04:04:16 -0400 From: Joel Fernandes To: linux-kernel@vger.kernel.org Cc: Juri Lelli , Patrick Bellasi , Andres Oportus , Dietmar Eggemann , Joel Fernandes , Srinivas Pandruvada , Len Brown , "Rafael J . Wysocki" , Viresh Kumar , Ingo Molnar , Peter Zijlstra Subject: [PATCH RFC v5] cpufreq: schedutil: Make iowait boost more energy efficient Date: Sun, 16 Jul 2017 01:04:07 -0700 Message-Id: <20170716080407.28492-1-joelaf@google.com> X-Mailer: git-send-email 2.13.2.932.g7449e964c-goog Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4689 Lines: 121 Currently the iowait_boost feature in schedutil makes the frequency go to max on iowait wakeups. This feature was added to handle a case that Peter described where the throughput of operations involving continuous I/O requests [1] is reduced due to running at a lower frequency, however the lower throughput itself causes utilization to be low and hence causing frequency to be low hence its "stuck". Instead of going to max, its also possible to achieve the same effect by ramping up to max if there are repeated in_iowait wakeups happening. This patch is an attempt to do that. We start from a lower frequency (policy->mind) and double the boost for every consecutive iowait update until we reach the maximum iowait boost frequency (iowait_boost_max). I ran a synthetic test (continuous O_DIRECT writes in a loop) on an x86 machine with intel_pstate in passive mode using schedutil. In this test the iowait_boost value ramped from 800MHz to 4GHz in 60ms. The patch achieves the desired improved throughput as the existing behavior. Also while at it, make iowait_boost and iowait_boost_max as unsigned int since its unit is kHz and this is consistent with struct cpufreq_policy. [1] https://patchwork.kernel.org/patch/9735885/ Cc: Srinivas Pandruvada Cc: Len Brown Cc: Rafael J. Wysocki Cc: Viresh Kumar Cc: Ingo Molnar Cc: Peter Zijlstra Suggested-by: Peter Zijlstra Signed-off-by: Joel Fernandes --- This version is based on some ideas from Viresh and Juri in v4. Viresh, one difference between the idea we just discussed is, I am scaling up/down the boost only after consuming it. This has the effect of slightly delaying the "deboost" but achieves the same boost ramp time. Its more cleaner in the code IMO to avoid the scaling up and then down on the initial boost. Note that I also dropped iowait_boost_min and now I'm just starting the initial boost from policy->min since as I mentioned in the commit above, the ramp of the iowait_boost value is very quick and for the usecase its intended for, it works fine. Hope this is acceptable. Thanks. kernel/sched/cpufreq_schedutil.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 622eed1b7658..4225bbada88d 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -53,8 +53,9 @@ struct sugov_cpu { struct update_util_data update_util; struct sugov_policy *sg_policy; - unsigned long iowait_boost; - unsigned long iowait_boost_max; + bool iowait_boost_pending; + unsigned int iowait_boost; + unsigned int iowait_boost_max; u64 last_update; /* The fields below are only needed when sharing a policy. */ @@ -172,30 +173,43 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags) { if (flags & SCHED_CPUFREQ_IOWAIT) { - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max; + sg_cpu->iowait_boost_pending = true; + sg_cpu->iowait_boost = max(sg_cpu->iowait_boost, + sg_cpu->sg_policy->policy->min); } else if (sg_cpu->iowait_boost) { s64 delta_ns = time - sg_cpu->last_update; /* Clear iowait_boost if the CPU apprears to have been idle. */ - if (delta_ns > TICK_NSEC) + if (delta_ns > TICK_NSEC) { sg_cpu->iowait_boost = 0; + sg_cpu->iowait_boost_pending = false; + } } } static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util, unsigned long *max) { - unsigned long boost_util = sg_cpu->iowait_boost; - unsigned long boost_max = sg_cpu->iowait_boost_max; + unsigned long boost_util, boost_max; - if (!boost_util) + if (!sg_cpu->iowait_boost) return; + boost_util = sg_cpu->iowait_boost; + boost_max = sg_cpu->iowait_boost_max; + if (*util * boost_max < *max * boost_util) { *util = boost_util; *max = boost_max; } - sg_cpu->iowait_boost >>= 1; + + if (sg_cpu->iowait_boost_pending) { + sg_cpu->iowait_boost_pending = false; + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1, + sg_cpu->iowait_boost_max); + } else { + sg_cpu->iowait_boost >>= 1; + } } #ifdef CONFIG_NO_HZ_COMMON @@ -267,6 +281,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) delta_ns = time - j_sg_cpu->last_update; if (delta_ns > TICK_NSEC) { j_sg_cpu->iowait_boost = 0; + j_sg_cpu->iowait_boost_pending = false; continue; } if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL) -- 2.13.2.932.g7449e964c-goog