Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936343AbcJ0RlU (ORCPT ); Thu, 27 Oct 2016 13:41:20 -0400 Received: from foss.arm.com ([217.140.101.70]:43218 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933424AbcJ0RlQ (ORCPT ); Thu, 27 Oct 2016 13:41:16 -0400 From: Patrick Bellasi To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Steve Muckle , Leo Yan , Viresh Kumar , "Rafael J . Wysocki" , Todd Kjos , Srinath Sridharan , Andres Oportus , Juri Lelli , Morten Rasmussen , Dietmar Eggemann , Chris Redpath , Robin Randhawa , Patrick Bellasi Subject: [RFC v2 0/8] SchedTune: central, scheduler-driven, power-perfomance control Date: Thu, 27 Oct 2016 18:41:00 +0100 Message-Id: <20161027174108.31139-1-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.10.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9110 Lines: 205 This RFC is an update to the initial SchedTune proposal [1] for a central scheduler-driven power-performance control. The posting is being made ahead of the LPC to facilitate discussions there. The initial proposal was refined, eventually merged into the AOSP, and it currently finds good use in production mobile devices [*]. This series is a scaled down version of the complete solution that aims to restart discussions. The focus is on a suitable user-space <-> kernel space interface for tuning the scheduler’s behavior at run-time. Specifically, the intention is to highlight how the proposed interface can be used by the scheduler to bias the selection of the CPU's operating frequency depending on information injected from userspace. Patch Set Organization ====================== The concept of a simple power-performance tunable that is wholly scheduler centric is implemented by patches [01-04]. This is where we introduce a ‘global task boosting’ knob which is integrated with schedutil to allow the scheduler to bias OPP selection. These first 5 patches allow to dynamically tune schedutil up to the point where it behaves like the existing ‘performance’ governor. Patches [05-07] extend the basic mechanism to use different boost values for different tasks. This allows informed runtimes (e.g. Android and ChromeOS) to feed the scheduler with information related to their knowledge about the specific demand of different tasks and/or use-cases. Thanks to SchedTune’s defined interface, the scheduler is now able to collect simple yet powerful information about tasks: how much the user cares about their performance. Although it can be argued that something similar is already provided by the existing concept of task priority, we believe that the proposed interface is much more generic and can be further extended to support both OPP selection and task placement, thus leading in the future to a more comprehensive energy-aware scheduler driven solution. These patches enable schedutil to service interactive workloads like touch screen interaction. Only out of tree cpufreq governors like the Interactive governor were thus far able to service such use cases. The last patch in the series introduces the concept of ‘negative boosting’. Negative boosting is beneficial for mobile devices in scenarios where it is desired to intentionally reduce the performance of a task by running it at a lower frequency than the one selected by schedutil. For certain tasks, like compute intensive background operations or memory bounded tasks, negative boosting can have measurable energy-saving benefits. In these cases, a negative SchedTune value allows to bias schedutil towards the selection of a lower OPP. Importantly, this can be achieved using the same SchedTune interface. This patch allows to dynamically tune schedutil up to the point where it effectively replaces the “powersave” governor. The patches are based on tip/sched/core: a225023 - sched/core: Explain sleep/wakeup in a better way For testing purposes an integration branch, providing the required dependencies as well as a set of debugging tracepoints, is available here: git://www.linux-arm.com/linux-pb eas/stune/rfcv2 Test results ============ Extensive testing of the proposed solution has already been done as SchedTune is shipping on a production mobile device, with benefits observed for key use-cases (e.g. improved responsiveness and performance of key workloads). The following synthetic focused tests are used to show functional benefits and report overheads. All these tests have been performed on an HiKey board, an octa-core (ARM CortexA53 @1.2GHz) SMP platform, running a Debian image on a mainline kernel and using schedutil configured with a 1ms rate limit value. Performance boosting validation ------------------------------- The functional validation of the boost mechanism has been performed considering a ramp task generated using the rt-app provided by the LISA testing suite [2]. The ramp is configured as a 16ms periodic task which increases its utilization by 5% every second, starting from 5% up to 60%. The task is pinned to run on a single CPU and executed with different boost values: 0%, 15%, 30%, 60% and -100%. The following table reports: - the value used to boost the task in each experiment - the rt-app’s reported performance index: PerfIndex Avg (the higher the better) which expresses the average time left from completion of a task activation (i.e. a fixed amount of work) until its next activation - the CPU average frequency (FreqAvg) - the actual boost measured for the PerfIndex and FreqAvg Boost PerfIndex Actual FreqAvg Actual value Avg Std Boost [MHz] Boost 0 0.53 0.12 0% 606 0% 15 0.61 0.07 17% 658 9% 30 0.68 0.07 26% 739 22% 60 0.71 0.05 40% 852 41% -100 -98.84 120.00 -2K% 363 -36% For positive boost values, SchedTune can improve the performance of a task (i.e. its time to completion) by a quantity which is proportional to the boost value. This is reported by the increasingly higher values of the PerfIndex Avg as well as the average frequencies used to execute the task. For negative boost values the performance is progressively reduced, in the reported case of -100% boost we verified that the system runs most of its time at one of the lowest OPPs (thus providing a behavior similar to the powersave governor) while still running at higher OPPs when other (not negative boosted) tasks needs to run. That’s why the reported average frequency (363MHz) is slightly higher than the minimum OPP (208MHz). A graphical representation of the task’s behaviors at different boost values and the corresponding CPUs frequencies is available here: https://gist.github.com/derkling/8be0a8ac365c935b3df585cb24afec6c Impact on scheduler performance ------------------------------- Performance impact has been evaluated using the hackbench test provided by perf with this command line: perf bench sched messaging --thread --group 25 --loop 1000 Reported completion times (CTime) in seconds are averages over 10 runs: | | SchedTune (per-task) boost value | | Schedutil | 0% | 10% | 90% | ------------------+-----------+------------+------------+------------+ CTime [s] | 12.93 | 13.08 | 13.32 | 13.27 | vs Schedutil [%] | | 1.1% | 3.0% | 2.7% | SchedTune currently introduces overheads when used on saturated systems such as the one generated by running the hackbench test. This is possibly due to the currently used locking schema which can be further optimized. On the other hand, the SchedTune extension is mainly useful for lightly loaded systems (mobile devices, laptops, etc.) where the additional overhead has been verified to be compensated by the performance benefits due to (for example) a faster task completion. Some of these benefits are reported in the following section. ChangeLog ========= Changes since v1: - Rebase on tip/sched/core: A225023 sched/core: Explain sleep/wakeup in a better way - Integrated with schedutil (in replacement of SchedFreq) - Improved tasks accounting for correct boostgroups activations - Added support for negative boosting - Extensively tested on production-grade devices Credits ======= [*] This work has been supported by an extensive collaborative effort between ARM, Linaro and Google, targeting production devices. References ========== [1] https://lkml.org/lkml/2015/8/19/419 [2] https://github.com/ARM-software/lisa Patrick Bellasi (8): sched/tune: add detailed documentation sched/tune: add sysctl interface to define a boost value sched/fair: add function to convert boost value into "margin" sched/fair: add boosted CPU usage sched/tune: add initial support for CGroups based boosting sched/tune: compute and keep track of per CPU boost value sched/{fair,tune}: track RUNNABLE tasks impact on per CPU boost value sched/{fair,tune}: add support for negative boosting Documentation/scheduler/sched-tune.txt | 426 +++++++++++++++++++++++++ include/linux/cgroup_subsys.h | 4 + include/linux/sched/sysctl.h | 16 + init/Kconfig | 73 +++++ kernel/exit.c | 5 + kernel/sched/Makefile | 1 + kernel/sched/cpufreq_schedutil.c | 4 +- kernel/sched/fair.c | 119 +++++++ kernel/sched/sched.h | 2 + kernel/sched/tune.c | 561 +++++++++++++++++++++++++++++++++ kernel/sched/tune.h | 40 +++ kernel/sysctl.c | 16 + 12 files changed, 1265 insertions(+), 2 deletions(-) create mode 100644 Documentation/scheduler/sched-tune.txt create mode 100644 kernel/sched/tune.c create mode 100644 kernel/sched/tune.h -- 2.10.1