Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S942210AbcJ0Rlc (ORCPT ); Thu, 27 Oct 2016 13:41:32 -0400 Received: from foss.arm.com ([217.140.101.70]:43262 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935747AbcJ0Rl0 (ORCPT ); Thu, 27 Oct 2016 13:41:26 -0400 From: Patrick Bellasi To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Steve Muckle , Leo Yan , Viresh Kumar , "Rafael J . Wysocki" , Todd Kjos , Srinath Sridharan , Andres Oportus , Juri Lelli , Morten Rasmussen , Dietmar Eggemann , Chris Redpath , Robin Randhawa , Patrick Bellasi , Ingo Molnar Subject: [RFC v2 2/8] sched/tune: add sysctl interface to define a boost value Date: Thu, 27 Oct 2016 18:41:02 +0100 Message-Id: <20161027174108.31139-3-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.10.1 In-Reply-To: <20161027174108.31139-1-patrick.bellasi@arm.com> References: <20161027174108.31139-1-patrick.bellasi@arm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5938 Lines: 172 The current (CFS) scheduler implementation does not allow "to boost" tasks performance by running them at a higher OPP compared to the minimum required to meet their workload demands. To support tasks performance boosting the scheduler should provide a "knob" which allows to tune how much the system is going to be optimised for energy efficiency vs performance boosting. It's worth to notice that by energy-efficiency we mean running a CPU at the minimum OPP which satisfy its utilization while for performance boosting we mean running a task as fast as possible. This patch is the first of a series which provides a simple interface to define a tuning knob. One system-wide "boost" tunable is exposed via: /proc/sys/kernel/sched_cfs_boost which can be configured in the range [0..100], to define a percentage where: 0% boost requires to operate in "standard" mode by scheduling tasks at the minimum capacities required by the workload demand 100% boost requires to push at maximum the task performances, "regardless" of the incurred energy consumption A boost value in between these two boundaries is used to bias the power/performance trade-off, the higher the boost value the more the scheduler is biased toward performance boosting instead of energy efficiency. Cc: Ingo Molnar Cc: Peter Zijlstra Signed-off-by: Patrick Bellasi --- include/linux/sched/sysctl.h | 16 ++++++++++++++++ init/Kconfig | 31 +++++++++++++++++++++++++++++++ kernel/sched/Makefile | 1 + kernel/sched/tune.c | 23 +++++++++++++++++++++++ kernel/sysctl.c | 11 +++++++++++ 5 files changed, 82 insertions(+) create mode 100644 kernel/sched/tune.c diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 4411453..5bfbb14 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -55,6 +55,22 @@ extern int sysctl_sched_rt_runtime; extern unsigned int sysctl_sched_cfs_bandwidth_slice; #endif +#ifdef CONFIG_SCHED_TUNE +extern unsigned int sysctl_sched_cfs_boost; +int sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *length, + loff_t *ppos); +static inline unsigned int get_sysctl_sched_cfs_boost(void) +{ + return sysctl_sched_cfs_boost; +} +#else +static inline unsigned int get_sysctl_sched_cfs_boost(void) +{ + return 0; +} +#endif + #ifdef CONFIG_SCHED_AUTOGROUP extern unsigned int sysctl_sched_autogroup_enabled; #endif diff --git a/init/Kconfig b/init/Kconfig index 34407f1..461e052 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1248,6 +1248,37 @@ config SCHED_AUTOGROUP desktop applications. Task group autogeneration is currently based upon task session. +config SCHED_TUNE + bool "Boosting for CFS tasks (EXPERIMENTAL)" + depends on SMP + help + This option enables the system-wide support for task boosting. + When this support is enabled a new sysctl interface is exposed to + user-space via: + /proc/sys/kernel/sched_cfs_boost + which allows to set a system-wide boost value in range [0..100]. + + The currently boosting strategy is implemented in such a way that: + - a 0% boost value requires to operate in "standard" mode by + scheduling all tasks at the minimum capacities required by their + workload demand + - a 100% boost value requires to push at maximum the task + performances, "regardless" of the incurred energy consumption + + A boost value in between these two boundaries is used to bias the + power/performance trade-off, the higher the boost value the more the + scheduler is biased toward performance boosting instead of energy + efficiency. + + Since this support exposes a single system-wide knob, the specified + boost value is applied to all (CFS) tasks in the system. + + NOTE: SchedTune support is available only on SMP system since only + for those systems is currently defined and tracked the utilization + signal for RQs and SEs. + + If unsure, say N. + config SYSFS_DEPRECATED bool "Enable deprecated sysfs features to support old userspace tools" depends on SYSFS diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 5e59b83..26ab2a6 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o obj-$(CONFIG_SCHEDSTATS) += stats.o obj-$(CONFIG_SCHED_DEBUG) += debug.o +obj-$(CONFIG_SCHED_TUNE) += tune.o obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o obj-$(CONFIG_CPU_FREQ) += cpufreq.o obj-$(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) += cpufreq_schedutil.o diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c new file mode 100644 index 0000000..7336118 --- /dev/null +++ b/kernel/sched/tune.c @@ -0,0 +1,23 @@ +/* + * Scheduler Tunability (SchedTune) Extensions for CFS + * + * Copyright (C) 2016 ARM Ltd, Patrick Bellasi + */ + +#include "sched.h" + +unsigned int sysctl_sched_cfs_boost __read_mostly; + +int +sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos) +{ + int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + + if (ret || !write) + return ret; + + return 0; +} + diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 739fb17..43b6d14 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -442,6 +442,17 @@ static struct ctl_table kern_table[] = { .extra1 = &one, }, #endif +#ifdef CONFIG_SCHED_TUNE + { + .procname = "sched_cfs_boost", + .data = &sysctl_sched_cfs_boost, + .maxlen = sizeof(sysctl_sched_cfs_boost), + .mode = 0644, + .proc_handler = &sysctl_sched_cfs_boost_handler, + .extra1 = &zero, + .extra2 = &one_hundred, + }, +#endif #ifdef CONFIG_PROVE_LOCKING { .procname = "prove_locking", -- 2.10.1