Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753367AbdHXSJX (ORCPT ); Thu, 24 Aug 2017 14:09:23 -0400 Received: from foss.arm.com ([217.140.101.70]:45404 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753162AbdHXSJV (ORCPT ); Thu, 24 Aug 2017 14:09:21 -0400 From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Tejun Heo , "Rafael J . Wysocki" , Paul Turner , Vincent Guittot , John Stultz , Morten Rasmussen , Dietmar Eggemann , Juri Lelli , Tim Murray , Todd Kjos , Andres Oportus , Joel Fernandes , Viresh Kumar Subject: [RFCv4 0/6] Add utilization clamping to the CPU controller Date: Thu, 24 Aug 2017 19:08:51 +0100 Message-Id: <20170824180857.32103-1-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.14.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6543 Lines: 142 Was: - RFCv3: Add capacity capping support to the CPU controller - RFCv2: SchedTune: central, scheduler-driven, power-performance control This is a respin of the series implementing the support for per-task boosting and capping of CPU frequency. This new version addresses most of the comments collected since last posting on LKML [1] and from the discussions at the OSPM Summit [2]. Hereafter a short description of the main changes since the previous posting [1]. .:: Concept: "capacity clamping" replaced by "utilization clamping" The previous implementation was expressed in terms of "capacity clamping", which generated some confusion mainly due to the fact that in mainline the capacity is currently defined as a "constant property" of a CPU. Here is a email [3] which resumes the confusion generated by the previous proposal. As Peter pointed out, the goal of this proposal is to "affect" the util_avg metric, i.e. the CPU utilization, and the way that signal is used for example by schedutil. Thus, both from a conceptual and implementation standpoint, it actually makes a lot more sense to talk about "utilization clamping". In this new proposal, the couple of new attributes added to the CPU controller allow to define the minimum and maximum utilization which should be considered for the set of tasks in a group. These utilization clamp values can be used, for example, to either "boost" or "cap" the actual frequency selected by schedutil when one of these tasks is RUNNABLE on a CPU. A proper aggregation mechanism is also provided to handle the cases where tasks with different utilization clamp values are co-scheduled on the same CPU. .:: Implementation: rb-trees replaced by reference counting The previous implementation used a couple of rb-trees to aggregate the different clamp values of tasks co-scheduled on the same CPU. Although being a simple solution, from the coding standpoint, Peter pointed out that it was definitively adding not negligible overheads in the fast path (i.e. tasks enqueue/dequeue) especially on highly loaded systems. This new implementation is based on a much more lightweight mechanism using reference counting. The new solution requires just to {in,de}crement an integer counter each time a task is {en,de}eueued. The most expensive operation is now a sequential scan of a small and per-CPU array of integers, which is also defined to easily fit into a single cache line. Scheduler performance overheads have been measured using the performance governor to run 20 iterations of: perf bench sched messaging --pipe --thread --group 2 --loop 5000 on a Juno R2 board (4xA53, 2xA72). With this new implementation we cannot see any sensible impact when comparing with the same benchmark running on tip/sched/core (as in 9c8783201). For the records, the previous implementation showed ~1.5% overhead using the same test. .:: Other comments: use-cases description People had concerns about use-cases, in a previous posting [4] I've resumed the main use cases we are targeting with this proposal. Further discussion went on at OSPM, outside of the official tracks, and I've got the feeling that people (at least Peter and Raphael) seem to recognize the interest in having a support to both boosting or clamping of the CPU frequencies, based on currently active tasks. Main discussed use cases was (refer to [4] further details): - boosting: better interactive response for small tasks which are affecting the user experience. Consider for example the case of a small control thread for an external accelerator (e.g. GPU, DSP, other devices). In this case the scheduler does not have a complete view of what are the task bandwidth requirements and if, it's a small task, schedutil will keep selecting a lower frequency thus affecting the overall time required to complete its activations. - clamping: increase energy efficiency for background tasks not directly affecting the user experience. Since running at a lower frequency is in general more energy efficient, when the completion time is not a main goal then clamping the maximum frequency to use for certain (maybe big) tasks can have positive effects, both on power dissipation and energy consumption. Moreover, this last support allows also to make RT tasks more energy friendly on mobile systems, whenever running them at the maximum frequency is not strictly required. .:: Other comments: usage of CGroups as a main interface The current implementation is based on CGroups but does not strictly depend on that API. We do not propose a different main interface just because, so far, all the use-cases we have on hand can take advantage from a CGroups API (notably the Android run-time). In case there should be the need for a different API, the current implementation can be easily extended to hook its internals to a different API. However, we believe it's not worth adding the maintenance burden for an additional API until there is not a real demand. .:: Patches organization The first three patches of this series introduce util_{min,max} tracking in the core scheduler, as an extension of the CPU controller. The fourth patch is dedicated to the synchronization between the cgroup interface (slow-path) and the core scheduler (fast-path). The last two patches integrate the utilization clamping support with schedutil for FAIR tasks and RT/DL tasks too. A detailed validation and analysis of the proposed features is available in this notebook: https://gist.github.com/7f9170e613dea25fe248e14157e6cb23 Cheers Patrick .:: References [1] https://lkml.org/lkml/2017/2/28/355 [2] slides: http://retis.sssup.it/ospm-summit/Downloads/OSPM_PELT_DecayClampingVsUtilEst.pdf video: http://youtu.be/6MC1jbYbQTo [3] https://lkml.org/lkml/2017/4/11/670 [4] https://lkml.org/lkml/2017/3/20/688 Patrick Bellasi (6): sched/core: add utilization clamping to CPU controller sched/core: map cpu's task groups to clamp groups sched/core: reference count active tasks's clamp groups sched/core: sync task_group's with CPU's clamp groups cpufreq: schedutil: add util clamp for FAIR tasks cpufreq: schedutil: add util clamp for RT/DL tasks include/linux/sched.h | 12 + init/Kconfig | 36 ++ kernel/sched/core.c | 706 +++++++++++++++++++++++++++++++++++++++ kernel/sched/cpufreq_schedutil.c | 49 ++- kernel/sched/sched.h | 199 +++++++++++ 5 files changed, 998 insertions(+), 4 deletions(-) -- 2.14.1