2020-05-07 18:12:17

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 00/14] Modularize schedutil

Android is trying very hard to use a single kernel image (commonly
called Generic Kernel Image, or GKI), closely aligned with mainline, to
run on all Android devices regardless of the vendor.

The GKI project intends to not only improve the status quo for Android
users directly (less fragmentation simplifies updatability), but also
to benefit upstream by forcing all vendors to agree on one common
kernel, that we push hard to be aligned with mainline.

One challenge to implement GKI is to avoid bloating the kernel by
compiling too many things in, especially given that different devices
need different things. As such, anything that can be turned into a
module helps GKI, by offering an alternative to having that component
built-in. This is true for pretty much anything that can be made
modular, including drivers as well as other kernel components, such as
CPUFreq governors.

Indeed, in practice, Android devices often ship with only one CPUFreq
governor enabled, and don't require any other that would simply waste
memory for no benefits. All CPUFreq governors can already be built as
modules with one notable exception: schedutil. Though popular in
Android, some devices do not use schedutil, which is why it would be
preferable to not have it unconditionally built in GKI. This series is
an attempt to solve this problem, by making schedutil tristate.

While modularization is usually not something we want to see near the
scheduler, it appeared to me as I wrote those patches that the
particular case of schedutil was actually not too bad to implement.
We already have to support switching governors at run-time, simply
because userspace is free to do that, so the infrastructure for turning
sugov on and off dynamically is already there. Loading the code a little
later doesn't seem to make that a lot worse.

Patches 01-05 refactor some code to break the few dependencies on
schedutil being builtin (notably EAS). Patches 06-12 export various
symbols that schedutil needs when compiled as a module. And finally,
patches 13-14 finish off the work by making the Kconfig tristate.

---
The series is based on Peter's sched/fifo [1] branch (because sugov
uses sched_setscheduler_nocheck()).

[1] https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/fifo

Quentin Perret (14):
sched: Provide sched_set_deadline()
sched: cpufreq: Use sched_set_deadline() from sugov
sched: cpufreq: Introduce 'want_eas' governor flag
sched: cpufreq: Move sched_cpufreq_governor_change()
sched: cpufreq: Move schedutil_cpu_util()
arch_topology: Export cpu_scale per-cpu array
kthread: Export kthread_bind_mask()
sched/core: Export runqueues per-cpu array
sched/cpufreq: Export cpufreq_this_cpu_can_update()
sched/fair: Export cpu_util_freq()
tick/sched: Export tick_nohz_get_idle_calls_cpu
x86: Export arch_scale_freq_key
sched: cpufreq: Use IS_ENABLED() for schedutil
sched: cpufreq: Modularize schedutil

arch/x86/kernel/smpboot.c | 1 +
drivers/base/arch_topology.c | 1 +
drivers/cpufreq/Kconfig | 2 +-
include/linux/cpufreq.h | 6 +-
include/linux/sched.h | 2 +
include/linux/sched/sysctl.h | 2 +-
kernel/kthread.c | 1 +
kernel/sched/core.c | 18 ++++
kernel/sched/cpufreq.c | 34 ++++++
kernel/sched/cpufreq_schedutil.c | 176 +++----------------------------
kernel/sched/fair.c | 119 ++++++++++++++++++++-
kernel/sched/sched.h | 36 ++-----
kernel/sched/topology.c | 16 +--
kernel/sysctl.c | 2 +-
kernel/time/tick-sched.c | 1 +
15 files changed, 212 insertions(+), 205 deletions(-)

--
2.26.2.526.g744177e7f7-goog


2020-05-07 18:12:40

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 08/14] sched/core: Export runqueues per-cpu array

It will be needed by schedutil once modularized, export it.

Signed-off-by: Quentin Perret <[email protected]>
---
This is only needed for cpu_rq() -> cpu_bw_dl() in schedutil, so there is
probably an alternative if exporting this isn't desirable.
---
kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dbaf3f63df22..537eb45b4274 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -36,6 +36,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_se_tp);
EXPORT_TRACEPOINT_SYMBOL_GPL(sched_overutilized_tp);

DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
+EXPORT_SYMBOL_GPL(runqueues);

#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_JUMP_LABEL)
/*
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:12:45

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 09/14] sched/cpufreq: Export cpufreq_this_cpu_can_update()

It will be needed by schedutil once modularized, export it.

Signed-off-by: Quentin Perret <[email protected]>
---
kernel/sched/cpufreq.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index 82f2dda61a55..f4abe4e4e927 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -75,6 +75,7 @@ bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
(policy->dvfs_possible_from_any_cpu &&
rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
}
+EXPORT_SYMBOL_GPL(cpufreq_this_cpu_can_update);

#ifdef CONFIG_ENERGY_MODEL
extern bool sched_energy_update;
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:12:49

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 10/14] sched/fair: Export cpu_util_freq()

It will be needed by schedutil once modularized, export it.

Signed-off-by: Quentin Perret <[email protected]>
---
kernel/sched/fair.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dadf0a060abe..b7b43aeff969 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6488,6 +6488,7 @@ unsigned long cpu_util_freq(int cpu, unsigned long max)
{
return aggregate_cpu_util(cpu, cpu_util(cpu), max, FREQUENCY_UTIL, NULL);
}
+EXPORT_SYMBOL_GPL(cpu_util_freq);

/*
* Predicts what cpu_util(@cpu) would return if @p was migrated (and enqueued)
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:12:55

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 13/14] sched: cpufreq: Use IS_ENABLED() for schedutil

The IS_ENABLED() macro evaluates to true when an option is set to =y or
=m. As such, it is a good fit for tristate options.

In preparation for modularizing schedutil, change all the related ifdefs
to use IS_ENABLED().

Signed-off-by: Quentin Perret <[email protected]>
---
include/linux/cpufreq.h | 2 +-
include/linux/sched/sysctl.h | 2 +-
kernel/sched/sched.h | 4 ++--
kernel/sched/topology.c | 4 ++--
kernel/sysctl.c | 2 +-
5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 267cc3b624da..c1176b8a0f61 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -983,7 +983,7 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)
}
#endif

-#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
+#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
struct cpufreq_governor *old_gov);
#else
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index d4f6215ee03f..704d971f204f 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -94,7 +94,7 @@ extern int sysctl_schedstats(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);

-#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
+#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
extern unsigned int sysctl_sched_energy_aware;
extern int sched_energy_aware_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 60592cde80e8..087508723e58 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -217,7 +217,7 @@ static inline void update_avg(u64 *avg, u64 sample)

static inline bool dl_entity_is_special(struct sched_dl_entity *dl_se)
{
-#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
+#if IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
return unlikely(dl_se->flags & SCHED_FLAG_SUGOV);
#else
return false;
@@ -2459,7 +2459,7 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned
}
#endif

-#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
+#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)

#define perf_domain_span(pd) (to_cpumask(((pd)->em_pd->cpus)))

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b905f2e8d9b2..5f49d25730bd 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -201,7 +201,7 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
return 1;
}

-#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
+#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
DEFINE_STATIC_KEY_FALSE(sched_energy_present);
unsigned int sysctl_sched_energy_aware = 1;
DEFINE_MUTEX(sched_energy_mutex);
@@ -2287,7 +2287,7 @@ void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
;
}

-#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
+#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
/* Build perf. domains: */
for (i = 0; i < ndoms_new; i++) {
for (j = 0; j < n && !sched_energy_update; j++) {
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8a176d8727a3..e115bf26cdb0 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -475,7 +475,7 @@ static struct ctl_table kern_table[] = {
.extra1 = SYSCTL_ONE,
},
#endif
-#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
+#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
{
.procname = "sched_energy_aware",
.data = &sysctl_sched_energy_aware,
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:12:56

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 12/14] x86: Export arch_scale_freq_key

It will be needed by schedutil once modularized, export it.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/x86/kernel/smpboot.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8c89e4d9ad28..a8ccd69ba5ff 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1804,6 +1804,7 @@ void native_play_dead(void)
*/

DEFINE_STATIC_KEY_FALSE(arch_scale_freq_key);
+EXPORT_SYMBOL_GPL(arch_scale_freq_key);

static DEFINE_PER_CPU(u64, arch_prev_aperf);
static DEFINE_PER_CPU(u64, arch_prev_mperf);
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:13:07

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 14/14] sched: cpufreq: Modularize schedutil

Now that all requirements to modularize schedutil are met, make the
Kconfig option tristate and add the missing MODULE_*() declarations in
cpufreq_schedutil.c.

Signed-off-by: Quentin Perret <[email protected]>
---
drivers/cpufreq/Kconfig | 2 +-
kernel/sched/cpufreq_schedutil.c | 15 ++++++++++-----
2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index c3e6bd59e920..49942f422a57 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -185,7 +185,7 @@ config CPU_FREQ_GOV_CONSERVATIVE
If in doubt, say N.

config CPU_FREQ_GOV_SCHEDUTIL
- bool "'schedutil' cpufreq policy governor"
+ tristate "'schedutil' cpufreq policy governor"
depends on CPU_FREQ && SMP
select CPU_FREQ_GOV_ATTR_SET
select IRQ_WORK
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index aeb04cc5b740..ea2778422efd 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -786,15 +786,20 @@ struct cpufreq_governor schedutil_gov = {
#endif /* CONFIG_ENERGY_MODEL */
};

+static int __init sugov_register(void)
+{
+ return cpufreq_register_governor(&schedutil_gov);
+}
+
#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL
struct cpufreq_governor *cpufreq_default_governor(void)
{
return &schedutil_gov;
}
+core_initcall(sugov_register);
+#else
+module_init(sugov_register);
#endif

-static int __init sugov_register(void)
-{
- return cpufreq_register_governor(&schedutil_gov);
-}
-core_initcall(sugov_register);
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Schedutil CPUFreq Governor");
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:13:44

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 05/14] sched: cpufreq: Move schedutil_cpu_util()

The CPU util aggregation code for schedutil and EAS currently resides in
schedutil for historical reasons. However, this complicates modularizing
schedutil as that would make the core kernel call into a (potential)
module. While this could probably be done, it seems significantly
simpler to move that function in fair.c directly, as it has no strong
dependencies on schedutil anyway. And as an added benefit, that also
enables that function to be marked as static, hence giving the compiler
a chance to remove a few function calls from the wake-up path.

While at it, rename the function to reflect the fact that it is not
actually schedutil-specific, and remove the now unnecessary
cpu_util_cfs().

Signed-off-by: Quentin Perret <[email protected]>
---
kernel/sched/cpufreq_schedutil.c | 109 +---------------------------
kernel/sched/fair.c | 118 ++++++++++++++++++++++++++++++-
kernel/sched/sched.h | 32 ++-------
3 files changed, 120 insertions(+), 139 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 33e67c48f668..aeb04cc5b740 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -183,122 +183,15 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
return cpufreq_driver_resolve_freq(policy, freq);
}

-/*
- * This function computes an effective utilization for the given CPU, to be
- * used for frequency selection given the linear relation: f = u * f_max.
- *
- * The scheduler tracks the following metrics:
- *
- * cpu_util_{cfs,rt,dl,irq}()
- * cpu_bw_dl()
- *
- * Where the cfs,rt and dl util numbers are tracked with the same metric and
- * synchronized windows and are thus directly comparable.
- *
- * The cfs,rt,dl utilization are the running times measured with rq->clock_task
- * which excludes things like IRQ and steal-time. These latter are then accrued
- * in the irq utilization.
- *
- * The DL bandwidth number otoh is not a measured metric but a value computed
- * based on the task model parameters and gives the minimal utilization
- * required to meet deadlines.
- */
-unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs,
- unsigned long max, enum schedutil_type type,
- struct task_struct *p)
-{
- unsigned long dl_util, util, irq;
- struct rq *rq = cpu_rq(cpu);
-
- if (!IS_BUILTIN(CONFIG_UCLAMP_TASK) &&
- type == FREQUENCY_UTIL && rt_rq_is_runnable(&rq->rt)) {
- return max;
- }
-
- /*
- * Early check to see if IRQ/steal time saturates the CPU, can be
- * because of inaccuracies in how we track these -- see
- * update_irq_load_avg().
- */
- irq = cpu_util_irq(rq);
- if (unlikely(irq >= max))
- return max;
-
- /*
- * Because the time spend on RT/DL tasks is visible as 'lost' time to
- * CFS tasks and we use the same metric to track the effective
- * utilization (PELT windows are synchronized) we can directly add them
- * to obtain the CPU's actual utilization.
- *
- * CFS and RT utilization can be boosted or capped, depending on
- * utilization clamp constraints requested by currently RUNNABLE
- * tasks.
- * When there are no CFS RUNNABLE tasks, clamps are released and
- * frequency will be gracefully reduced with the utilization decay.
- */
- util = util_cfs + cpu_util_rt(rq);
- if (type == FREQUENCY_UTIL)
- util = uclamp_rq_util_with(rq, util, p);
-
- dl_util = cpu_util_dl(rq);
-
- /*
- * For frequency selection we do not make cpu_util_dl() a permanent part
- * of this sum because we want to use cpu_bw_dl() later on, but we need
- * to check if the CFS+RT+DL sum is saturated (ie. no idle time) such
- * that we select f_max when there is no idle time.
- *
- * NOTE: numerical errors or stop class might cause us to not quite hit
- * saturation when we should -- something for later.
- */
- if (util + dl_util >= max)
- return max;
-
- /*
- * OTOH, for energy computation we need the estimated running time, so
- * include util_dl and ignore dl_bw.
- */
- if (type == ENERGY_UTIL)
- util += dl_util;
-
- /*
- * There is still idle time; further improve the number by using the
- * irq metric. Because IRQ/steal time is hidden from the task clock we
- * need to scale the task numbers:
- *
- * max - irq
- * U' = irq + --------- * U
- * max
- */
- util = scale_irq_capacity(util, irq, max);
- util += irq;
-
- /*
- * Bandwidth required by DEADLINE must always be granted while, for
- * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
- * to gracefully reduce the frequency when no tasks show up for longer
- * periods of time.
- *
- * Ideally we would like to set bw_dl as min/guaranteed freq and util +
- * bw_dl as requested freq. However, cpufreq is not yet ready for such
- * an interface. So, we only do the latter for now.
- */
- if (type == FREQUENCY_UTIL)
- util += cpu_bw_dl(rq);
-
- return min(max, util);
-}
-
static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
{
struct rq *rq = cpu_rq(sg_cpu->cpu);
- unsigned long util = cpu_util_cfs(rq);
unsigned long max = arch_scale_cpu_capacity(sg_cpu->cpu);

sg_cpu->max = max;
sg_cpu->bw_dl = cpu_bw_dl(rq);

- return schedutil_cpu_util(sg_cpu->cpu, util, max, FREQUENCY_UTIL, NULL);
+ return cpu_util_freq(sg_cpu->cpu, max);
}

/**
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..dadf0a060abe 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6377,6 +6377,118 @@ static unsigned long cpu_util_without(int cpu, struct task_struct *p)
return min_t(unsigned long, util, capacity_orig_of(cpu));
}

+/*
+ * This function computes an effective utilization for the given CPU, to be
+ * used for frequency selection given the linear relation: f = u * f_max.
+ *
+ * The scheduler tracks the following metrics:
+ *
+ * cpu_util_{cfs,rt,dl,irq}()
+ * cpu_bw_dl()
+ *
+ * Where the cfs,rt and dl util numbers are tracked with the same metric and
+ * synchronized windows and are thus directly comparable.
+ *
+ * The cfs,rt,dl utilization are the running times measured with rq->clock_task
+ * which excludes things like IRQ and steal-time. These latter are then accrued
+ * in the irq utilization.
+ *
+ * The DL bandwidth number otoh is not a measured metric but a value computed
+ * based on the task model parameters and gives the minimal utilization
+ * required to meet deadlines.
+ */
+static unsigned long aggregate_cpu_util(int cpu, unsigned long util_cfs,
+ unsigned long max,
+ enum cpu_util_type type,
+ struct task_struct *p)
+{
+ unsigned long dl_util, util, irq;
+ struct rq *rq = cpu_rq(cpu);
+
+ if (!IS_BUILTIN(CONFIG_UCLAMP_TASK) &&
+ type == FREQUENCY_UTIL && rt_rq_is_runnable(&rq->rt)) {
+ return max;
+ }
+
+ /*
+ * Early check to see if IRQ/steal time saturates the CPU, can be
+ * because of inaccuracies in how we track these -- see
+ * update_irq_load_avg().
+ */
+ irq = cpu_util_irq(rq);
+ if (unlikely(irq >= max))
+ return max;
+
+ /*
+ * Because the time spend on RT/DL tasks is visible as 'lost' time to
+ * CFS tasks and we use the same metric to track the effective
+ * utilization (PELT windows are synchronized) we can directly add them
+ * to obtain the CPU's actual utilization.
+ *
+ * CFS and RT utilization can be boosted or capped, depending on
+ * utilization clamp constraints requested by currently RUNNABLE
+ * tasks.
+ * When there are no CFS RUNNABLE tasks, clamps are released and
+ * frequency will be gracefully reduced with the utilization decay.
+ */
+ util = util_cfs + cpu_util_rt(rq);
+ if (type == FREQUENCY_UTIL)
+ util = uclamp_rq_util_with(rq, util, p);
+
+ dl_util = cpu_util_dl(rq);
+
+ /*
+ * For frequency selection we do not make cpu_util_dl() a permanent part
+ * of this sum because we want to use cpu_bw_dl() later on, but we need
+ * to check if the CFS+RT+DL sum is saturated (ie. no idle time) such
+ * that we select f_max when there is no idle time.
+ *
+ * NOTE: numerical errors or stop class might cause us to not quite hit
+ * saturation when we should -- something for later.
+ */
+ if (util + dl_util >= max)
+ return max;
+
+ /*
+ * OTOH, for energy computation we need the estimated running time, so
+ * include util_dl and ignore dl_bw.
+ */
+ if (type == ENERGY_UTIL)
+ util += dl_util;
+
+ /*
+ * There is still idle time; further improve the number by using the
+ * irq metric. Because IRQ/steal time is hidden from the task clock we
+ * need to scale the task numbers:
+ *
+ * max - irq
+ * U' = irq + --------- * U
+ * max
+ */
+ util = scale_irq_capacity(util, irq, max);
+ util += irq;
+
+ /*
+ * Bandwidth required by DEADLINE must always be granted while, for
+ * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
+ * to gracefully reduce the frequency when no tasks show up for longer
+ * periods of time.
+ *
+ * Ideally we would like to set bw_dl as min/guaranteed freq and util +
+ * bw_dl as requested freq. However, cpufreq is not yet ready for such
+ * an interface. So, we only do the latter for now.
+ */
+ if (type == FREQUENCY_UTIL)
+ util += cpu_bw_dl(rq);
+
+ return min(max, util);
+}
+
+unsigned long cpu_util_freq(int cpu, unsigned long max)
+{
+ return aggregate_cpu_util(cpu, cpu_util(cpu), max, FREQUENCY_UTIL, NULL);
+}
+
/*
* Predicts what cpu_util(@cpu) would return if @p was migrated (and enqueued)
* to @dst_cpu.
@@ -6449,7 +6561,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
* is already enough to scale the EM reported power
* consumption at the (eventually clamped) cpu_capacity.
*/
- sum_util += schedutil_cpu_util(cpu, util_cfs, cpu_cap,
+ sum_util += aggregate_cpu_util(cpu, util_cfs, cpu_cap,
ENERGY_UTIL, NULL);

/*
@@ -6459,7 +6571,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
* NOTE: in case RT tasks are running, by default the
* FREQUENCY_UTIL's utilization can be max OPP.
*/
- cpu_util = schedutil_cpu_util(cpu, util_cfs, cpu_cap,
+ cpu_util = aggregate_cpu_util(cpu, util_cfs, cpu_cap,
FREQUENCY_UTIL, tsk);
max_util = max(max_util, cpu_util);
}
@@ -6556,7 +6668,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
* IOW, placing the task there would make the CPU
* overutilized. Take uclamp into account to see how
* much capacity we can get out of the CPU; this is
- * aligned with schedutil_cpu_util().
+ * aligned with aggregate_cpu_util().
*/
util = uclamp_rq_util_with(cpu_rq(cpu), util, p);
if (!fits_capacity(util, cpu_cap))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index db3a57675ccf..60592cde80e8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2397,10 +2397,9 @@ static inline unsigned long capacity_orig_of(int cpu)
{
return cpu_rq(cpu)->cpu_capacity_orig;
}
-#endif

/**
- * enum schedutil_type - CPU utilization type
+ * enum cpu_util_type - CPU utilization type
* @FREQUENCY_UTIL: Utilization used to select frequency
* @ENERGY_UTIL: Utilization used during energy calculation
*
@@ -2409,16 +2408,12 @@ static inline unsigned long capacity_orig_of(int cpu)
* enum is used within schedutil_freq_util() to differentiate the types of
* utilization expected by the callers, and adjust the aggregation accordingly.
*/
-enum schedutil_type {
+enum cpu_util_type {
FREQUENCY_UTIL,
ENERGY_UTIL,
};

-#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
-
-unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs,
- unsigned long max, enum schedutil_type type,
- struct task_struct *p);
+unsigned long cpu_util_freq(int cpu, unsigned long max);

static inline unsigned long cpu_bw_dl(struct rq *rq)
{
@@ -2430,30 +2425,11 @@ static inline unsigned long cpu_util_dl(struct rq *rq)
return READ_ONCE(rq->avg_dl.util_avg);
}

-static inline unsigned long cpu_util_cfs(struct rq *rq)
-{
- unsigned long util = READ_ONCE(rq->cfs.avg.util_avg);
-
- if (sched_feat(UTIL_EST)) {
- util = max_t(unsigned long, util,
- READ_ONCE(rq->cfs.avg.util_est.enqueued));
- }
-
- return util;
-}
-
static inline unsigned long cpu_util_rt(struct rq *rq)
{
return READ_ONCE(rq->avg_rt.util_avg);
}
-#else /* CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
-static inline unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs,
- unsigned long max, enum schedutil_type type,
- struct task_struct *p)
-{
- return 0;
-}
-#endif /* CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
+#endif /* CONFIG_SMP */

#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
static inline unsigned long cpu_util_irq(struct rq *rq)
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:14:01

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 04/14] sched: cpufreq: Move sched_cpufreq_governor_change()

CPUFreq calls into sched_cpufreq_governor_change() when switching
governors, which triggers a sched domain rebuild when entering or
exiting schedutil.

Move the function to sched/cpufreq.c to prepare the ground for the
modularization of schedutil.

Signed-off-by: Quentin Perret <[email protected]>
---
kernel/sched/cpufreq.c | 33 ++++++++++++++++++++++++++++++++
kernel/sched/cpufreq_schedutil.c | 33 --------------------------------
2 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index 7c2fe50fd76d..82f2dda61a55 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -75,3 +75,36 @@ bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
(policy->dvfs_possible_from_any_cpu &&
rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
}
+
+#ifdef CONFIG_ENERGY_MODEL
+extern bool sched_energy_update;
+extern struct mutex sched_energy_mutex;
+
+static void rebuild_sd_workfn(struct work_struct *work)
+{
+ mutex_lock(&sched_energy_mutex);
+ sched_energy_update = true;
+ rebuild_sched_domains();
+ sched_energy_update = false;
+ mutex_unlock(&sched_energy_mutex);
+}
+static DECLARE_WORK(rebuild_sd_work, rebuild_sd_workfn);
+
+/*
+ * EAS shouldn't be attempted without sugov, so rebuild the sched_domains
+ * on governor changes to make sure the scheduler knows about it.
+ */
+void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
+ struct cpufreq_governor *old_gov)
+{
+ if ((old_gov && old_gov->want_eas) || policy->governor->want_eas) {
+ /*
+ * When called from the cpufreq_register_driver() path, the
+ * cpu_hotplug_lock is already held, so use a work item to
+ * avoid nested locking in rebuild_sched_domains().
+ */
+ schedule_work(&rebuild_sd_work);
+ }
+
+}
+#endif
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index c5e5045f7c81..33e67c48f668 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -905,36 +905,3 @@ static int __init sugov_register(void)
return cpufreq_register_governor(&schedutil_gov);
}
core_initcall(sugov_register);
-
-#ifdef CONFIG_ENERGY_MODEL
-extern bool sched_energy_update;
-extern struct mutex sched_energy_mutex;
-
-static void rebuild_sd_workfn(struct work_struct *work)
-{
- mutex_lock(&sched_energy_mutex);
- sched_energy_update = true;
- rebuild_sched_domains();
- sched_energy_update = false;
- mutex_unlock(&sched_energy_mutex);
-}
-static DECLARE_WORK(rebuild_sd_work, rebuild_sd_workfn);
-
-/*
- * EAS shouldn't be attempted without sugov, so rebuild the sched_domains
- * on governor changes to make sure the scheduler knows about it.
- */
-void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
- struct cpufreq_governor *old_gov)
-{
- if ((old_gov && old_gov->want_eas) || policy->governor->want_eas) {
- /*
- * When called from the cpufreq_register_driver() path, the
- * cpu_hotplug_lock is already held, so use a work item to
- * avoid nested locking in rebuild_sched_domains().
- */
- schedule_work(&rebuild_sd_work);
- }
-
-}
-#endif
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:14:34

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 02/14] sched: cpufreq: Use sched_set_deadline() from sugov

sched_set_deadline() is an exported symbol, use it instead of
sched_setattr_nocheck() to elevate the sugov kthreads to DL.

Signed-off-by: Quentin Perret <[email protected]>
---
kernel/sched/cpufreq_schedutil.c | 18 +++---------------
1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 7fbaee24c824..ebd5d30f0861 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -654,20 +654,6 @@ static void sugov_policy_free(struct sugov_policy *sg_policy)
static int sugov_kthread_create(struct sugov_policy *sg_policy)
{
struct task_struct *thread;
- struct sched_attr attr = {
- .size = sizeof(struct sched_attr),
- .sched_policy = SCHED_DEADLINE,
- .sched_flags = SCHED_FLAG_SUGOV,
- .sched_nice = 0,
- .sched_priority = 0,
- /*
- * Fake (unused) bandwidth; workaround to "fix"
- * priority inheritance.
- */
- .sched_runtime = 1000000,
- .sched_deadline = 10000000,
- .sched_period = 10000000,
- };
struct cpufreq_policy *policy = sg_policy->policy;
int ret;

@@ -685,7 +671,9 @@ static int sugov_kthread_create(struct sugov_policy *sg_policy)
return PTR_ERR(thread);
}

- ret = sched_setattr_nocheck(thread, &attr);
+ /* Fake (unused) bandwidth; workaround to "fix" priority inheritance. */
+ ret = sched_set_deadline(thread, 1000000, 10000000, 10000000,
+ SCHED_FLAG_SUGOV);
if (ret) {
kthread_stop(thread);
pr_warn("%s: failed to set SCHED_DEADLINE\n", __func__);
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:14:51

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 06/14] arch_topology: Export cpu_scale per-cpu array

It will be needed by schedutil once modularized, export it.

Signed-off-by: Quentin Perret <[email protected]>
---
drivers/base/arch_topology.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 4d0a0038b476..b1f0c272da67 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -48,6 +48,7 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
}

DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
+EXPORT_SYMBOL_GPL(cpu_scale);

void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
{
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:15:40

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 11/14] tick/sched: Export tick_nohz_get_idle_calls_cpu

It will be needed by schedutil once modularized, export it.

Signed-off-by: Quentin Perret <[email protected]>
---
kernel/time/tick-sched.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3e2dc9b8858c..3b1050cabb58 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1122,6 +1122,7 @@ unsigned long tick_nohz_get_idle_calls_cpu(int cpu)

return ts->idle_calls;
}
+EXPORT_SYMBOL_GPL(tick_nohz_get_idle_calls_cpu);

/**
* tick_nohz_get_idle_calls - return the current idle calls counter value
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:15:55

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 07/14] kthread: Export kthread_bind_mask()

It will be needed by schedutil once modularized, export it.

Signed-off-by: Quentin Perret <[email protected]>
---
kernel/kthread.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index bfbfa481be3a..c0d84e1fbec2 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -430,6 +430,7 @@ void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
{
__kthread_bind_mask(p, mask, TASK_UNINTERRUPTIBLE);
}
+EXPORT_SYMBOL_GPL(kthread_bind_mask);

/**
* kthread_bind - bind a just-created kthread to a cpu.
--
2.26.2.526.g744177e7f7-goog

2020-05-07 18:16:12

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 03/14] sched: cpufreq: Introduce 'want_eas' governor flag

The EAS topology code requires the usage of schedutil on all CPUs of an
rd to actually enable EAS balancing. However, the check implementing
this references the schedutil_gov struct directly, which makes having
schedutil as a module impractical.

To prepare the ground for this modularization, introduce a new
'want_eas' flag in the cpufreq_governor struct, set it for schedutil
only, and make sure to check it from the EAS topology code.

Signed-off-by: Quentin Perret <[email protected]>
---
include/linux/cpufreq.h | 4 ++++
kernel/sched/cpufreq_schedutil.c | 5 ++++-
kernel/sched/topology.c | 12 ++++++------
3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index f7240251a949..267cc3b624da 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -560,6 +560,10 @@ struct cpufreq_governor {
bool dynamic_switching;
struct list_head governor_list;
struct module *owner;
+
+#ifdef CONFIG_ENERGY_MODEL
+ bool want_eas;
+#endif /* CONFIG_ENERGY_MODEL */
};

/* Pass a target to the cpufreq driver */
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index ebd5d30f0861..c5e5045f7c81 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -888,6 +888,9 @@ struct cpufreq_governor schedutil_gov = {
.start = sugov_start,
.stop = sugov_stop,
.limits = sugov_limits,
+#ifdef CONFIG_ENERGY_MODEL
+ .want_eas = true,
+#endif /* CONFIG_ENERGY_MODEL */
};

#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL
@@ -924,7 +927,7 @@ static DECLARE_WORK(rebuild_sd_work, rebuild_sd_workfn);
void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
struct cpufreq_governor *old_gov)
{
- if (old_gov == &schedutil_gov || policy->governor == &schedutil_gov) {
+ if ((old_gov && old_gov->want_eas) || policy->governor->want_eas) {
/*
* When called from the cpufreq_register_driver() path, the
* cpu_hotplug_lock is already held, so use a work item to
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8344757bba6e..b905f2e8d9b2 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -319,7 +319,8 @@ static void sched_energy_set(bool has_eas)
* 2. the SD_ASYM_CPUCAPACITY flag is set in the sched_domain hierarchy.
* 3. no SMT is detected.
* 4. the EM complexity is low enough to keep scheduling overheads low;
- * 5. schedutil is driving the frequency of all CPUs of the rd;
+ * 5. an EAS-compatible CPUfreq governor (schedutil) is driving the frequency
+ * of all CPUs of the rd;
*
* The complexity of the Energy Model is defined as:
*
@@ -339,7 +340,6 @@ static void sched_energy_set(bool has_eas)
*/
#define EM_MAX_COMPLEXITY 2048

-extern struct cpufreq_governor schedutil_gov;
static bool build_perf_domains(const struct cpumask *cpu_map)
{
int i, nr_pd = 0, nr_cs = 0, nr_cpus = cpumask_weight(cpu_map);
@@ -347,7 +347,7 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
int cpu = cpumask_first(cpu_map);
struct root_domain *rd = cpu_rq(cpu)->rd;
struct cpufreq_policy *policy;
- struct cpufreq_governor *gov;
+ bool want_eas;

if (!sysctl_sched_energy_aware)
goto free;
@@ -377,11 +377,11 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
policy = cpufreq_cpu_get(i);
if (!policy)
goto free;
- gov = policy->governor;
+ want_eas = policy->governor && policy->governor->want_eas;
cpufreq_cpu_put(policy);
- if (gov != &schedutil_gov) {
+ if (!want_eas) {
if (rd->pd)
- pr_warn("rd %*pbl: Disabling EAS, schedutil is mandatory\n",
+ pr_warn("rd %*pbl: Disabling EAS because of incompatible CPUFreq governor\n",
cpumask_pr_args(cpu_map));
goto free;
}
--
2.26.2.526.g744177e7f7-goog

2020-05-07 21:38:12

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil


(+Ionela)

Hi Quentin,

Apologies for sidetracking this a bit.

On 07/05/20 19:09, Quentin Perret wrote:
> Android is trying very hard to use a single kernel image (commonly
> called Generic Kernel Image, or GKI), closely aligned with mainline, to
> run on all Android devices regardless of the vendor.
>
> The GKI project intends to not only improve the status quo for Android
> users directly (less fragmentation simplifies updatability), but also
> to benefit upstream by forcing all vendors to agree on one common
> kernel, that we push hard to be aligned with mainline.
>
> One challenge to implement GKI is to avoid bloating the kernel by
> compiling too many things in, especially given that different devices
> need different things. As such, anything that can be turned into a
> module helps GKI, by offering an alternative to having that component
> built-in. This is true for pretty much anything that can be made
> modular, including drivers as well as other kernel components, such as
> CPUFreq governors.
>
> Indeed, in practice, Android devices often ship with only one CPUFreq
> governor enabled, and don't require any other that would simply waste
> memory for no benefits. All CPUFreq governors can already be built as
> modules with one notable exception: schedutil. Though popular in
> Android, some devices do not use schedutil, which is why it would be
> preferable to not have it unconditionally built in GKI. This series is
> an attempt to solve this problem, by making schedutil tristate.
>

I'm curious; why would some Android device not want to roll with schedutil?

When it comes to dynamic policies (i.e. forget performance / powersave, and
put userspace in a corner), I'd be willing to take a stand and say you
should only really be using schedutil nowadays - alignment with the
scheduler, uclamp, yadda yadda.

AFAIA the only schedutil-related quirk we oughta fix for arm/arm64 is that
arch_scale_freq_invariant() thingie, and FWIW I'm hoping to get something
regarding this out sometime soonish. After that, I'd actually want to make
schedutil the default governor for arm/arm64.

I'm not opiniated on the modularization, but if you can, could you please
share some more details as to why schedutil cannot fulfill its role of holy
messiah of governors for GKI?

2020-05-08 05:35:26

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH 13/14] sched: cpufreq: Use IS_ENABLED() for schedutil

Hi Quentin

On Thu, May 07, 2020 at 07:10:11PM +0100, Quentin Perret wrote:
> The IS_ENABLED() macro evaluates to true when an option is set to =y or
> =m. As such, it is a good fit for tristate options.
>
> In preparation for modularizing schedutil, change all the related ifdefs
> to use IS_ENABLED().
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> include/linux/cpufreq.h | 2 +-
> include/linux/sched/sysctl.h | 2 +-
> kernel/sched/sched.h | 4 ++--
> kernel/sched/topology.c | 4 ++--
> kernel/sysctl.c | 2 +-
> 5 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 267cc3b624da..c1176b8a0f61 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -983,7 +983,7 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)
> }
> #endif
>
> -#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> +#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
> struct cpufreq_governor *old_gov);
> #else
> diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> index d4f6215ee03f..704d971f204f 100644
> --- a/include/linux/sched/sysctl.h
> +++ b/include/linux/sched/sysctl.h
> @@ -94,7 +94,7 @@ extern int sysctl_schedstats(struct ctl_table *table, int write,
> void __user *buffer, size_t *lenp,
> loff_t *ppos);
>
> -#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> +#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> extern unsigned int sysctl_sched_energy_aware;
> extern int sched_energy_aware_handler(struct ctl_table *table, int write,
> void __user *buffer, size_t *lenp,
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 60592cde80e8..087508723e58 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -217,7 +217,7 @@ static inline void update_avg(u64 *avg, u64 sample)
>
> static inline bool dl_entity_is_special(struct sched_dl_entity *dl_se)
> {
> -#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
> +#if IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> return unlikely(dl_se->flags & SCHED_FLAG_SUGOV);
> #else
> return false;
> @@ -2459,7 +2459,7 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned
> }
> #endif
>
> -#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> +#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
>
> #define perf_domain_span(pd) (to_cpumask(((pd)->em_pd->cpus)))
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index b905f2e8d9b2..5f49d25730bd 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -201,7 +201,7 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
> return 1;
> }
>
> -#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> +#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> unsigned int sysctl_sched_energy_aware = 1;
> DEFINE_MUTEX(sched_energy_mutex);
> @@ -2287,7 +2287,7 @@ void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
> ;
> }
>
> -#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> +#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> /* Build perf. domains: */
> for (i = 0; i < ndoms_new; i++) {
> for (j = 0; j < n && !sched_energy_update; j++) {

Now that scheduler does not have any references to schedutil_gov and cpufreq
has want_eas flag, do we need this CONFIG_CPU_FREQ_GOV_SCHEDUTIL checks here?

Thanks,
Pavan
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-05-08 05:37:55

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH 04/14] sched: cpufreq: Move sched_cpufreq_governor_change()

On Thu, May 07, 2020 at 07:10:02PM +0100, Quentin Perret wrote:
> CPUFreq calls into sched_cpufreq_governor_change() when switching
> governors, which triggers a sched domain rebuild when entering or
> exiting schedutil.
>
> Move the function to sched/cpufreq.c to prepare the ground for the
> modularization of schedutil.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> kernel/sched/cpufreq.c | 33 ++++++++++++++++++++++++++++++++
> kernel/sched/cpufreq_schedutil.c | 33 --------------------------------
> 2 files changed, 33 insertions(+), 33 deletions(-)
>
> diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
> index 7c2fe50fd76d..82f2dda61a55 100644
> --- a/kernel/sched/cpufreq.c
> +++ b/kernel/sched/cpufreq.c
> @@ -75,3 +75,36 @@ bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
> (policy->dvfs_possible_from_any_cpu &&
> rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
> }
> +
> +#ifdef CONFIG_ENERGY_MODEL
> +extern bool sched_energy_update;
> +extern struct mutex sched_energy_mutex;
> +
> +static void rebuild_sd_workfn(struct work_struct *work)
> +{
> + mutex_lock(&sched_energy_mutex);
> + sched_energy_update = true;
> + rebuild_sched_domains();
> + sched_energy_update = false;
> + mutex_unlock(&sched_energy_mutex);
> +}
> +static DECLARE_WORK(rebuild_sd_work, rebuild_sd_workfn);
> +
> +/*
> + * EAS shouldn't be attempted without sugov, so rebuild the sched_domains
> + * on governor changes to make sure the scheduler knows about it.
> + */

In the previous patch, you removed reference to schedutil and replaced it with
" an EAS-compatible CPUfreq governor (schedutil)". May be you could do the
same here.

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-05-08 05:38:05

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On 07-05-20, 19:09, Quentin Perret wrote:
> Android is trying very hard to use a single kernel image (commonly
> called Generic Kernel Image, or GKI), closely aligned with mainline, to
> run on all Android devices regardless of the vendor.
>
> The GKI project intends to not only improve the status quo for Android
> users directly (less fragmentation simplifies updatability), but also
> to benefit upstream by forcing all vendors to agree on one common
> kernel, that we push hard to be aligned with mainline.
>
> One challenge to implement GKI is to avoid bloating the kernel by
> compiling too many things in, especially given that different devices
> need different things. As such, anything that can be turned into a
> module helps GKI, by offering an alternative to having that component
> built-in. This is true for pretty much anything that can be made
> modular, including drivers as well as other kernel components, such as
> CPUFreq governors.
>
> Indeed, in practice, Android devices often ship with only one CPUFreq
> governor enabled, and don't require any other that would simply waste
> memory for no benefits. All CPUFreq governors can already be built as
> modules with one notable exception: schedutil. Though popular in
> Android, some devices do not use schedutil, which is why it would be
> preferable to not have it unconditionally built in GKI. This series is
> an attempt to solve this problem, by making schedutil tristate.
>
> While modularization is usually not something we want to see near the
> scheduler, it appeared to me as I wrote those patches that the
> particular case of schedutil was actually not too bad to implement.
> We already have to support switching governors at run-time, simply
> because userspace is free to do that, so the infrastructure for turning
> sugov on and off dynamically is already there. Loading the code a little
> later doesn't seem to make that a lot worse.
>
> Patches 01-05 refactor some code to break the few dependencies on
> schedutil being builtin (notably EAS). Patches 06-12 export various
> symbols that schedutil needs when compiled as a module. And finally,
> patches 13-14 finish off the work by making the Kconfig tristate.

IMHO, you have over-broken the patches, like first two could be merged
together and all exports could have been done in a single patch, etc.
i.e. all related or similar changes together...

Acked-by: Viresh Kumar <[email protected]>

--
viresh

2020-05-08 08:10:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 08/14] sched/core: Export runqueues per-cpu array

On Thu, May 07, 2020 at 07:10:06PM +0100, Quentin Perret wrote:
> It will be needed by schedutil once modularized, export it.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> This is only needed for cpu_rq() -> cpu_bw_dl() in schedutil, so there is
> probably an alternative if exporting this isn't desirable.
> ---
> kernel/sched/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index dbaf3f63df22..537eb45b4274 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -36,6 +36,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_se_tp);
> EXPORT_TRACEPOINT_SYMBOL_GPL(sched_overutilized_tp);
>
> DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
> +EXPORT_SYMBOL_GPL(runqueues);

NAK, never going to happen.

2020-05-08 08:14:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Thu, May 07, 2020 at 07:09:58PM +0100, Quentin Perret wrote:
> One challenge to implement GKI is to avoid bloating the kernel by
> compiling too many things in, especially given that different devices
> need different things. As such, anything that can be turned into a
> module helps GKI, by offering an alternative to having that component
> built-in. This is true for pretty much anything that can be made
> modular, including drivers as well as other kernel components, such as
> CPUFreq governors.

The idea is to move to 1 governor, schedutil. Also, I abhor all the
exports this thing does. Modules have no business touching most of that.

Look at every EXPORT you do and wonder ask yourself how you can abuse
it. Modules are not a good thing, they're horrible pieces of crap.

2020-05-08 10:08:34

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 08/14] sched/core: Export runqueues per-cpu array

On Friday 08 May 2020 at 10:07:59 (+0200), Peter Zijlstra wrote:
> On Thu, May 07, 2020 at 07:10:06PM +0100, Quentin Perret wrote:
> > It will be needed by schedutil once modularized, export it.
> >
> > Signed-off-by: Quentin Perret <[email protected]>
> > ---
> > This is only needed for cpu_rq() -> cpu_bw_dl() in schedutil, so there is
> > probably an alternative if exporting this isn't desirable.
> > ---
> > kernel/sched/core.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index dbaf3f63df22..537eb45b4274 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -36,6 +36,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_se_tp);
> > EXPORT_TRACEPOINT_SYMBOL_GPL(sched_overutilized_tp);
> >
> > DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
> > +EXPORT_SYMBOL_GPL(runqueues);
>
> NAK, never going to happen.

Well, I should have seen that one coming :-)

As mentioned in the commit message, we might be able to work around.
I'll cook something for v2.

Thanks,
Quentin

2020-05-08 10:41:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Fri, May 08, 2020 at 10:11:28AM +0200, Peter Zijlstra wrote:
> On Thu, May 07, 2020 at 07:09:58PM +0100, Quentin Perret wrote:
> > One challenge to implement GKI is to avoid bloating the kernel by
> > compiling too many things in, especially given that different devices
> > need different things. As such, anything that can be turned into a
> > module helps GKI, by offering an alternative to having that component
> > built-in. This is true for pretty much anything that can be made
> > modular, including drivers as well as other kernel components, such as
> > CPUFreq governors.
>
> The idea is to move to 1 governor, schedutil. Also, I abhor all the
> exports this thing does. Modules have no business touching most of that.
>
> Look at every EXPORT you do and wonder ask yourself how you can abuse
> it. Modules are not a good thing, they're horrible pieces of crap.

Quentin, what is missing from schedutil that warrants the need for a
totally different governor? Is there specific problems with the
existing ones or is this just an instance of "we wrote our own a long
time ago and never pushed it upstream" from various SoC companies?

thanks,

greg k-h

2020-05-08 11:20:31

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Friday 08 May 2020 at 12:37:21 (+0200), Greg KH wrote:
> On Fri, May 08, 2020 at 10:11:28AM +0200, Peter Zijlstra wrote:
> > On Thu, May 07, 2020 at 07:09:58PM +0100, Quentin Perret wrote:
> > > One challenge to implement GKI is to avoid bloating the kernel by
> > > compiling too many things in, especially given that different devices
> > > need different things. As such, anything that can be turned into a
> > > module helps GKI, by offering an alternative to having that component
> > > built-in. This is true for pretty much anything that can be made
> > > modular, including drivers as well as other kernel components, such as
> > > CPUFreq governors.
> >
> > The idea is to move to 1 governor, schedutil. Also, I abhor all the
> > exports this thing does. Modules have no business touching most of that.
> >
> > Look at every EXPORT you do and wonder ask yourself how you can abuse
> > it. Modules are not a good thing, they're horrible pieces of crap.
>
> Quentin, what is missing from schedutil that warrants the need for a
> totally different governor? Is there specific problems with the
> existing ones or is this just an instance of "we wrote our own a long
> time ago and never pushed it upstream" from various SoC companies?

In all honesty, this is probably a mix. Some vendors definitely have
their out of tree crap that they'll load anyways, and some others use
other mainline governors for semi-reasonable reasons (e.g, for lower
tier devices, they often don't want to go though the costly and tedious
process of tuning uclamp, so they'll go for a more 'aggressive' governor
like ondemand). Note that this is the minority, though. The majority in
Android use schedutil, and I'm quite happy with that.

However, the point I tried to make here is orthogonal to that. As of
today using another governor than schedutil is fully supported upstream,
and in fact it isn't even enabled by default for most archs. If vendors
feel like using something else makes their product better, then I don't
see why I need to argue with them about that. And frankly I don't see
that support being removed from upstream any time soon.

Don't get me wrong, I do think schedutil is the way to go. And we do
recommend it in Android. I'm simply saying that /mandating/ it in GKI
would only add more friction than really necessary. Making it modular,
OTOH, might ease things a bit.

I guess it all boils down to the code. If modularizing schedutil really
is too ugly, then I'll drop the series and we'll just build it in and
waste memory on devices that don't use it. It's not the end of the
world, but it's a shame if we don't have strong technical reasons to do
that -- all other governors _are_ modular.

IMO the changes in this series aren't *too* bad -- moving
schedutil_cpu_util to fair.c is probably a sensible change in its own
right for instance. The biggest 'issue' is probably the exports, but
these will need discussions on a case by case basis. And I'm happy to
try and rework the code to work around when possible (as is the case for
runqueues for instance).

I hope that helps!

Thanks,
Quentin

2020-05-08 11:29:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Fri, May 08, 2020 at 12:37:21PM +0200, Greg KH wrote:
> On Fri, May 08, 2020 at 10:11:28AM +0200, Peter Zijlstra wrote:
> > On Thu, May 07, 2020 at 07:09:58PM +0100, Quentin Perret wrote:
> > > One challenge to implement GKI is to avoid bloating the kernel by
> > > compiling too many things in, especially given that different devices
> > > need different things. As such, anything that can be turned into a
> > > module helps GKI, by offering an alternative to having that component
> > > built-in. This is true for pretty much anything that can be made
> > > modular, including drivers as well as other kernel components, such as
> > > CPUFreq governors.
> >
> > The idea is to move to 1 governor, schedutil. Also, I abhor all the
> > exports this thing does. Modules have no business touching most of that.
> >
> > Look at every EXPORT you do and wonder ask yourself how you can abuse
> > it. Modules are not a good thing, they're horrible pieces of crap.
>
> Quentin, what is missing from schedutil that warrants the need for a
> totally different governor? Is there specific problems with the
> existing ones or is this just an instance of "we wrote our own a long
> time ago and never pushed it upstream" from various SoC companies?

At the very least there's that interactive governor that's really
popular with Android. But IIRC there's a whole scala of home-brew
governors and tweaks out there.

And instead of enabling that crap, we should be discouraging it.
Consolidate and kill the governor interface.

2020-05-08 11:34:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Fri, May 08, 2020 at 12:16:12PM +0100, Quentin Perret wrote:
> However, the point I tried to make here is orthogonal to that. As of
> today using another governor than schedutil is fully supported upstream,
> and in fact it isn't even enabled by default for most archs. If vendors
> feel like using something else makes their product better, then I don't
> see why I need to argue with them about that. And frankly I don't see
> that support being removed from upstream any time soon.

Right, it'll take a while to get there. But that doesn't mean we
shouldn't encourage schedutil usage wherever and whenever possible. That
includes not making it easier to not use it.

In that respect making it modular goes against our ultimate goal (world
domination, <mad giggles here>).

2020-05-08 13:12:00

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Friday 08 May 2020 at 13:31:41 (+0200), Peter Zijlstra wrote:
> On Fri, May 08, 2020 at 12:16:12PM +0100, Quentin Perret wrote:
> > However, the point I tried to make here is orthogonal to that. As of
> > today using another governor than schedutil is fully supported upstream,
> > and in fact it isn't even enabled by default for most archs. If vendors
> > feel like using something else makes their product better, then I don't
> > see why I need to argue with them about that. And frankly I don't see
> > that support being removed from upstream any time soon.
>
> Right, it'll take a while to get there. But that doesn't mean we
> shouldn't encourage schedutil usage wherever and whenever possible. That
> includes not making it easier to not use it.
>
> In that respect making it modular goes against our ultimate goal (world
> domination, <mad giggles here>).

Right, I definitely understand the sentiment. OTOH, things like that
give vendors weapons against GKI ('you-force-us-to-build-in-things-we-dont't-want'
etc etc). That _is_ true to some extent, but it's important we make sure
to keep this to an absolute minimum, otherwise GKI just won't happen
(and I really think that'd be a shame, GKI _is_ a good thing for
upstream).

And sure, schedutil isn't that big, and we can make an exception. But
I'm sure you know what happens when you starting making exceptions ;)

So, all in all, I don't think the series actively makes schedutil worse
by adding out-of-line calls in the hot path or anything like that, and
having it as a module helps with GKI which I'm arguing is a good thing
in the grand scheme of things. That of course is only true if we can
agree on a reasonable set of exported symbols, so I'll give others some
time to complain and see if I can post a v2 addressing these issues!

Cheers,
Quentin

2020-05-08 13:17:11

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

Hey Valentin,

On Thursday 07 May 2020 at 22:34:17 (+0100), Valentin Schneider wrote:
> I'm curious; why would some Android device not want to roll with schedutil?
>
> When it comes to dynamic policies (i.e. forget performance / powersave, and
> put userspace in a corner), I'd be willing to take a stand and say you
> should only really be using schedutil nowadays - alignment with the
> scheduler, uclamp, yadda yadda.
>
> AFAIA the only schedutil-related quirk we oughta fix for arm/arm64 is that
> arch_scale_freq_invariant() thingie, and FWIW I'm hoping to get something
> regarding this out sometime soonish. After that, I'd actually want to make
> schedutil the default governor for arm/arm64.

As in setting CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y in the arm64
defconfig? If so, you have my Acked-by already :)

> I'm not opiniated on the modularization, but if you can, could you please
> share some more details as to why schedutil cannot fulfill its role of holy
> messiah of governors for GKI?

I guess I answered some of that in the other thread with Peter, but all
in all I'm definitely not trying to make an argument that schedutil
isn't good enough here. I'm trying to say that mandating it in *GKI* is
just likely to cause unnecessary friction, and trust me there is already
enough of that with other topics. Giving the option of having sugov as a
module doesn't prevent us from making it a default for a few arches, so
I think there is ground for an agreement!

Cheers,
Quentin

2020-05-08 13:20:06

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

Hi Viresh,

On Friday 08 May 2020 at 11:03:59 (+0530), Viresh Kumar wrote:
> IMHO, you have over-broken the patches, like first two could be merged
> together and all exports could have been done in a single patch, etc.
> i.e. all related or similar changes together...

Right, I don't mind squashing the first patches. For the exports, I'm
guessing they'll need a case by case discussion, so it's probably
reasonable to keep them separate, at least for now.

> Acked-by: Viresh Kumar <[email protected]>

Thanks!
Quentin

2020-05-08 13:20:32

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 04/14] sched: cpufreq: Move sched_cpufreq_governor_change()

On Friday 08 May 2020 at 11:05:23 (+0530), Pavan Kondeti wrote:
> In the previous patch, you removed reference to schedutil and replaced it with
> " an EAS-compatible CPUfreq governor (schedutil)". May be you could do the
> same here.

Good point, I add it to the todo list for v2 ;)

Thanks,
Quentin

2020-05-08 13:23:28

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 13/14] sched: cpufreq: Use IS_ENABLED() for schedutil

On Friday 08 May 2020 at 11:00:53 (+0530), Pavan Kondeti wrote:
> > -#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> > +#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> > /* Build perf. domains: */
> > for (i = 0; i < ndoms_new; i++) {
> > for (j = 0; j < n && !sched_energy_update; j++) {
>
> Now that scheduler does not have any references to schedutil_gov and cpufreq
> has want_eas flag, do we need this CONFIG_CPU_FREQ_GOV_SCHEDUTIL checks here?

Right, they're not absolutely required, but given that sugov is the only
one to have 'want_eas' set I guess there is no need to compile that in
without it, no?

Cheers,
Quentin

2020-05-08 13:44:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Fri, May 8, 2020 at 3:05 PM Quentin Perret <[email protected]> wrote:
>
> On Friday 08 May 2020 at 13:31:41 (+0200), Peter Zijlstra wrote:
> > On Fri, May 08, 2020 at 12:16:12PM +0100, Quentin Perret wrote:
> > > However, the point I tried to make here is orthogonal to that. As of
> > > today using another governor than schedutil is fully supported upstream,
> > > and in fact it isn't even enabled by default for most archs. If vendors
> > > feel like using something else makes their product better, then I don't
> > > see why I need to argue with them about that. And frankly I don't see
> > > that support being removed from upstream any time soon.
> >
> > Right, it'll take a while to get there. But that doesn't mean we
> > shouldn't encourage schedutil usage wherever and whenever possible. That
> > includes not making it easier to not use it.
> >
> > In that respect making it modular goes against our ultimate goal (world
> > domination, <mad giggles here>).
>
> Right, I definitely understand the sentiment. OTOH, things like that
> give vendors weapons against GKI ('you-force-us-to-build-in-things-we-dont't-want'
> etc etc). That _is_ true to some extent, but it's important we make sure
> to keep this to an absolute minimum, otherwise GKI just won't happen
> (and I really think that'd be a shame, GKI _is_ a good thing for
> upstream).
>
> And sure, schedutil isn't that big, and we can make an exception. But
> I'm sure you know what happens when you starting making exceptions ;)

This is a very weak argument, if it can be regarded as an argument at all.

You will have to make exceptions, the question is how many and on what
criteria and you really need to figure that out for the GKI plan.

> So, all in all, I don't think the series actively makes schedutil worse
> by adding out-of-line calls in the hot path or anything like that, and
> having it as a module helps with GKI which I'm arguing is a good thing
> in the grand scheme of things.

Frankly, I'm not sure if it really helps.

The idea of making schedutil modular seems to be based on the
observation that it is not part of the core kernel, which I don't
agree with. Arguably, things like util clamps need it to work as
expected.

> That of course is only true if we can
> agree on a reasonable set of exported symbols, so I'll give others some
> time to complain and see if I can post a v2 addressing these issues!

This isn't just about exported symbols, it is about what is regarded
as essential and what isn't.

Cheers!

2020-05-08 14:16:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Fri, May 08, 2020 at 02:05:07PM +0100, Quentin Perret wrote:
> So, all in all, I don't think the series actively makes schedutil worse
> by adding out-of-line calls in the hot path or anything like that, and

Do note that (afaik) ARM64 (and Power and probably others) has modules
in an address space that is not reachable from regular kernel text and
needs those intermediate trampolines.

While that's probably not a very big deal, it does increase L1$ pressure
and things.

2020-05-08 14:54:19

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil


On 08/05/20 14:15, Quentin Perret wrote:
> Hey Valentin,
>
> On Thursday 07 May 2020 at 22:34:17 (+0100), Valentin Schneider wrote:
>> I'm curious; why would some Android device not want to roll with schedutil?
>>
>> When it comes to dynamic policies (i.e. forget performance / powersave, and
>> put userspace in a corner), I'd be willing to take a stand and say you
>> should only really be using schedutil nowadays - alignment with the
>> scheduler, uclamp, yadda yadda.
>>
>> AFAIA the only schedutil-related quirk we oughta fix for arm/arm64 is that
>> arch_scale_freq_invariant() thingie, and FWIW I'm hoping to get something
>> regarding this out sometime soonish. After that, I'd actually want to make
>> schedutil the default governor for arm/arm64.
>
> As in setting CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y in the arm64
> defconfig? If so, you have my Acked-by already :)
>

I'm actually thinking of making it the unconditional default for arm/arm64
in cpufreq's Kconfig, following what has been recently done for
intel_pstate.

>> I'm not opiniated on the modularization, but if you can, could you please
>> share some more details as to why schedutil cannot fulfill its role of holy
>> messiah of governors for GKI?
>
> I guess I answered some of that in the other thread with Peter, but all
> in all I'm definitely not trying to make an argument that schedutil
> isn't good enough here. I'm trying to say that mandating it in *GKI* is
> just likely to cause unnecessary friction, and trust me there is already
> enough of that with other topics.

Right, I appreciate it must be an "interesting" tug of war. My own opinion
has also already been expanded in the rest of the thread; i.e. we should
strive to make schedutil good enough that folks don't feel like they still
need to use ondemand/whatever frankengov. That said, even without GKI, I
get that making some vendors change their already tested-and-tuned setup is
an obstacle course in and of itself.

> Giving the option of having sugov as a
> module doesn't prevent us from making it a default for a few arches, so
> I think there is ground for an agreement!
>
> Cheers,
> Quentin

2020-05-09 02:44:57

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH 13/14] sched: cpufreq: Use IS_ENABLED() for schedutil

On Fri, May 08, 2020 at 02:21:29PM +0100, Quentin Perret wrote:
> On Friday 08 May 2020 at 11:00:53 (+0530), Pavan Kondeti wrote:
> > > -#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> > > +#if defined(CONFIG_ENERGY_MODEL) && IS_ENABLED(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> > > /* Build perf. domains: */
> > > for (i = 0; i < ndoms_new; i++) {
> > > for (j = 0; j < n && !sched_energy_update; j++) {
> >
> > Now that scheduler does not have any references to schedutil_gov and cpufreq
> > has want_eas flag, do we need this CONFIG_CPU_FREQ_GOV_SCHEDUTIL checks here?
>
> Right, they're not absolutely required, but given that sugov is the only
> one to have 'want_eas' set I guess there is no need to compile that in
> without it, no?
>
Right.

Since you removed all compile time dependencies on schedutil, I thought the
#ifdef check around schedutil can be removed too.

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-05-11 05:24:17

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On 08-05-20, 13:26, Peter Zijlstra wrote:
> At the very least there's that interactive governor that's really
> popular with Android. But IIRC there's a whole scala of home-brew
> governors and tweaks out there.

I removed interactive governor from Android long time back :)

--
viresh

2020-05-11 09:05:49

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

Hi Rafael,

On Friday 08 May 2020 at 15:40:34 (+0200), Rafael J. Wysocki wrote:
> On Fri, May 8, 2020 at 3:05 PM Quentin Perret <[email protected]> wrote:
> >
> > On Friday 08 May 2020 at 13:31:41 (+0200), Peter Zijlstra wrote:
> > > On Fri, May 08, 2020 at 12:16:12PM +0100, Quentin Perret wrote:
> > > > However, the point I tried to make here is orthogonal to that. As of
> > > > today using another governor than schedutil is fully supported upstream,
> > > > and in fact it isn't even enabled by default for most archs. If vendors
> > > > feel like using something else makes their product better, then I don't
> > > > see why I need to argue with them about that. And frankly I don't see
> > > > that support being removed from upstream any time soon.
> > >
> > > Right, it'll take a while to get there. But that doesn't mean we
> > > shouldn't encourage schedutil usage wherever and whenever possible. That
> > > includes not making it easier to not use it.
> > >
> > > In that respect making it modular goes against our ultimate goal (world
> > > domination, <mad giggles here>).
> >
> > Right, I definitely understand the sentiment. OTOH, things like that
> > give vendors weapons against GKI ('you-force-us-to-build-in-things-we-dont't-want'
> > etc etc). That _is_ true to some extent, but it's important we make sure
> > to keep this to an absolute minimum, otherwise GKI just won't happen
> > (and I really think that'd be a shame, GKI _is_ a good thing for
> > upstream).
> >
> > And sure, schedutil isn't that big, and we can make an exception. But
> > I'm sure you know what happens when you starting making exceptions ;)
>
> This is a very weak argument, if it can be regarded as an argument at all.

Well, fair enough :)

> You will have to make exceptions, the question is how many and on what
> criteria and you really need to figure that out for the GKI plan.

The base idea is, anything that we know from experience is used by
everybody can be built in, anything else will need investigation. And as
you've understood, schedutil falls in that second category.

> > So, all in all, I don't think the series actively makes schedutil worse
> > by adding out-of-line calls in the hot path or anything like that, and
> > having it as a module helps with GKI which I'm arguing is a good thing
> > in the grand scheme of things.
>
> Frankly, I'm not sure if it really helps.

Oh, why not?

> The idea of making schedutil modular seems to be based on the
> observation that it is not part of the core kernel, which I don't
> agree with.

Right, so that I think is the core of the discussion.

> Arguably, things like util clamps need it to work as
> expected.

Sure, but loading sugov dynamically as a module doesn't change much does
it?

If you are referring to the Kconfig dependency of uclamp on schedutil,
then that is a good point and I will argue that it should be removed.
In fact I'll add a patch to v2 that does just that, with the following
rationale:
- it is obsolete: the reason that dependency was added originally was
because sugov was the only place where util clamps where taken into
accounts. But that is no longer true -- we check them in the capacity
aware wake-up path as well, which is entirely independent from the
currently running governor;
- because of the above, it is (now) largely useless: a compile time
dependency doesn't guarantee we are actually running with schedutil
at all;
- it is artificial: there are no actual compilation dependencies
between sugov and uclamp, everything will compile just fine without
that 'depends on';

Or maybe you were thinking of something else?

> > That of course is only true if we can
> > agree on a reasonable set of exported symbols, so I'll give others some
> > time to complain and see if I can post a v2 addressing these issues!
>
> This isn't just about exported symbols, it is about what is regarded
> as essential and what isn't.

Right, the exported symbols are, IMO, quite interesting because they
show how 'core' the governor is. But what exactly do you mean by
'essential' here? Essential in what sense?

Thanks,
Quentin

2020-05-11 09:17:44

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

Hi Peter,

On Friday 08 May 2020 at 16:09:38 (+0200), Peter Zijlstra wrote:
> Do note that (afaik) ARM64 (and Power and probably others) has modules
> in an address space that is not reachable from regular kernel text and
> needs those intermediate trampolines.

Right, but I meant this series in its current form doesn't actively make
things worse for existing users. That is, the act of modularizing
schedutil has no impact on those who compile it in (because I agree that
would be a solid reason for refusing that series), and they are still
perfectly free to do so.

The 'new' users who deliberately decide to compile it as a module will
have to pay that price, but that is their choice. And in the Android
case, as you've understood, this is a price we are ready to pay.

Thanks,
Quentin

2020-05-11 15:28:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Mon, May 11, 2020 at 11:00 AM Quentin Perret <[email protected]> wrote:
>
> Hi Rafael,
>
> On Friday 08 May 2020 at 15:40:34 (+0200), Rafael J. Wysocki wrote:
> > On Fri, May 8, 2020 at 3:05 PM Quentin Perret <[email protected]> wrote:
> > >
> > > On Friday 08 May 2020 at 13:31:41 (+0200), Peter Zijlstra wrote:
> > > > On Fri, May 08, 2020 at 12:16:12PM +0100, Quentin Perret wrote:
> > > > > However, the point I tried to make here is orthogonal to that. As of
> > > > > today using another governor than schedutil is fully supported upstream,
> > > > > and in fact it isn't even enabled by default for most archs. If vendors
> > > > > feel like using something else makes their product better, then I don't
> > > > > see why I need to argue with them about that. And frankly I don't see
> > > > > that support being removed from upstream any time soon.
> > > >
> > > > Right, it'll take a while to get there. But that doesn't mean we
> > > > shouldn't encourage schedutil usage wherever and whenever possible. That
> > > > includes not making it easier to not use it.
> > > >
> > > > In that respect making it modular goes against our ultimate goal (world
> > > > domination, <mad giggles here>).
> > >
> > > Right, I definitely understand the sentiment. OTOH, things like that
> > > give vendors weapons against GKI ('you-force-us-to-build-in-things-we-dont't-want'
> > > etc etc). That _is_ true to some extent, but it's important we make sure
> > > to keep this to an absolute minimum, otherwise GKI just won't happen
> > > (and I really think that'd be a shame, GKI _is_ a good thing for
> > > upstream).
> > >
> > > And sure, schedutil isn't that big, and we can make an exception. But
> > > I'm sure you know what happens when you starting making exceptions ;)
> >
> > This is a very weak argument, if it can be regarded as an argument at all.
>
> Well, fair enough :)
>
> > You will have to make exceptions, the question is how many and on what
> > criteria and you really need to figure that out for the GKI plan.
>
> The base idea is, anything that we know from experience is used by
> everybody can be built in, anything else will need investigation. And as
> you've understood, schedutil falls in that second category.

The fact that the vendor sets up a different governor by default
doesn't mean that there should be no way to switch over to schedutil
IMO.

> > > So, all in all, I don't think the series actively makes schedutil worse
> > > by adding out-of-line calls in the hot path or anything like that, and
> > > having it as a module helps with GKI which I'm arguing is a good thing
> > > in the grand scheme of things.
> >
> > Frankly, I'm not sure if it really helps.
>
> Oh, why not?
>
> > The idea of making schedutil modular seems to be based on the
> > observation that it is not part of the core kernel, which I don't
> > agree with.
>
> Right, so that I think is the core of the discussion.
>
> > Arguably, things like util clamps need it to work as
> > expected.
>
> Sure, but loading sugov dynamically as a module doesn't change much does
> it?
>
> If you are referring to the Kconfig dependency of uclamp on schedutil,
> then that is a good point and I will argue that it should be removed.
> In fact I'll add a patch to v2 that does just that, with the following
> rationale:

Which isn't correct AFAICS.

> - it is obsolete:

No, it isn't IMO.

> the reason that dependency was added originally was
> because sugov was the only place where util clamps where taken into
> accounts. But that is no longer true -- we check them in the capacity
> aware wake-up path as well, which is entirely independent from the
> currently running governor;

But this is done under the assumption that the governor will also take
the clamps into account, isn't it?

Otherwise you can see your "low util" tasks running at high
frequencies and "high util" ones running slowly. Surely, that's not
desirable?

IIUC, the task placement needs to be consistent with the governor's
decisions for things to work as expected.

> - because of the above, it is (now) largely useless: a compile time
> dependency doesn't guarantee we are actually running with schedutil
> at all;
> - it is artificial: there are no actual compilation dependencies
> between sugov and uclamp, everything will compile just fine without
> that 'depends on';

That actually is the case, but it doesn't mean that there is no
dependency in there.

> Or maybe you were thinking of something else?
>
> > > That of course is only true if we can
> > > agree on a reasonable set of exported symbols, so I'll give others some
> > > time to complain and see if I can post a v2 addressing these issues!
> >
> > This isn't just about exported symbols, it is about what is regarded
> > as essential and what isn't.
>
> Right, the exported symbols are, IMO, quite interesting because they
> show how 'core' the governor is. But what exactly do you mean by
> 'essential' here? Essential in what sense?

IMO the question is how much value there is in making it possible to
avoid loading a particular piece of kernel code into memory.

You've demonstrated that it can be done with schedutil, but does that
matter that it needs to be done?

I thought that the original idea was to make it closely integrated
with the scheduler, so it could access the scheduler's data structures
(that we specifically didn't want to expose to the *other* governors)
and so as to avoid forgetting about any dependencies when making
changes to either the scheduler or schedutil. Allowing it to be build
as a module would make make us have to worry about those things again,
so is it really worth it?

2020-05-12 09:23:48

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

Hi Rafael,

On Monday 11 May 2020 at 17:26:26 (+0200), Rafael J. Wysocki wrote:
> On Mon, May 11, 2020 at 11:00 AM Quentin Perret <[email protected]> wrote:
> > The base idea is, anything that we know from experience is used by
> > everybody can be built in, anything else will need investigation. And as
> > you've understood, schedutil falls in that second category.
>
> The fact that the vendor sets up a different governor by default
> doesn't mean that there should be no way to switch over to schedutil
> IMO.

Well, there will always be the option to load the schedutil module ;-)

<snip>
> > the reason that dependency was added originally was
> > because sugov was the only place where util clamps where taken into
> > accounts. But that is no longer true -- we check them in the capacity
> > aware wake-up path as well, which is entirely independent from the
> > currently running governor;
>
> But this is done under the assumption that the governor will also take
> the clamps into account, isn't it?

Even if that was correct, it's not clear a compile-time dependency makes
that assumption true, right?

For governors and the like, if the option is =n, then you can hard-rely
on it not being used. But if it is =y, you cannot assume anything
what-so-ever. EAS does a run-time check for that exact reason -- a
sole Kconfig dependency typically doesn't work for that.

> Otherwise you can see your "low util" tasks running at high
> frequencies and "high util" ones running slowly. Surely, that's not
> desirable?
>
> IIUC, the task placement needs to be consistent with the governor's
> decisions for things to work as expected.

Sure, but, say, the 'performance' governor could give you some of that
too. That is, you could use uclamp.min on some tasks to ensure they are
biased to bigger CPUs, and just stick the frequency to max. I wouldn't
be surprised to see setups like that on non-battery-powered devices for
instance. And yes, there are non-battery-powered devices that use big
little out there (TVs and such, often because the only SOCs matching
their requirements are mobile SOCs).

> > - because of the above, it is (now) largely useless: a compile time
> > dependency doesn't guarantee we are actually running with schedutil
> > at all;
> > - it is artificial: there are no actual compilation dependencies
> > between sugov and uclamp, everything will compile just fine without
> > that 'depends on';
>
> That actually is the case, but it doesn't mean that there is no
> dependency in there.

Sure, and the dependency did make sense when uclamp was first introduced.
At the time, the clamp values where used _only_ in schedutil. So, it
was fair to say "if schedutil is =n, there is no way the clamps will ever
be useful to anything else, so the uclamp code can be safely compiled
out". That is no longer true, and if you want to make uclamp work only
with schedutil (which I would advise against for the above reason), then
a Kconfig dependency doesn't seem to be the right tool for that anyway.

> > Or maybe you were thinking of something else?
> >
> > > > That of course is only true if we can
> > > > agree on a reasonable set of exported symbols, so I'll give others some
> > > > time to complain and see if I can post a v2 addressing these issues!
> > >
> > > This isn't just about exported symbols, it is about what is regarded
> > > as essential and what isn't.
> >
> > Right, the exported symbols are, IMO, quite interesting because they
> > show how 'core' the governor is. But what exactly do you mean by
> > 'essential' here? Essential in what sense?
>
> IMO the question is how much value there is in making it possible to
> avoid loading a particular piece of kernel code into memory.
>
> You've demonstrated that it can be done with schedutil, but does that
> matter that it needs to be done?
>
> I thought that the original idea was to make it closely integrated
> with the scheduler, so it could access the scheduler's data structures
> (that we specifically didn't want to expose to the *other* governors)
> and so as to avoid forgetting about any dependencies when making
> changes to either the scheduler or schedutil. Allowing it to be build
> as a module would make make us have to worry about those things again,
> so is it really worth it?

Right, so, if there is a strong technical reason to keep schedutil a
bool option (such as accessing data structures we really don't want to
export), then sure, I'll have no choice but to accept it. Now, assuming
that I fix the usage of 'runqueues', is there anything in particular
that you think is wrong in the series?

Note that if one day keeping schedutil modular becomes a blocker for a
new feature, then we'll have the option to make it bool again. But is
there something like that already?

Thanks,
Quentin

2020-05-12 10:27:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Tue, May 12, 2020 at 11:21 AM Quentin Perret <[email protected]> wrote:
>
> Hi Rafael,
>
> On Monday 11 May 2020 at 17:26:26 (+0200), Rafael J. Wysocki wrote:
> > On Mon, May 11, 2020 at 11:00 AM Quentin Perret <[email protected]> wrote:
> > > The base idea is, anything that we know from experience is used by
> > > everybody can be built in, anything else will need investigation. And as
> > > you've understood, schedutil falls in that second category.
> >
> > The fact that the vendor sets up a different governor by default
> > doesn't mean that there should be no way to switch over to schedutil
> > IMO.
>
> Well, there will always be the option to load the schedutil module ;-)

Yeah, fair enough.

> <snip>
> > > the reason that dependency was added originally was
> > > because sugov was the only place where util clamps where taken into
> > > accounts. But that is no longer true -- we check them in the capacity
> > > aware wake-up path as well, which is entirely independent from the
> > > currently running governor;
> >
> > But this is done under the assumption that the governor will also take
> > the clamps into account, isn't it?
>
> Even if that was correct, it's not clear a compile-time dependency makes
> that assumption true, right?

It indicates that the functional dependency is there if you will.

Otherwise it would be kind of hidden and likely to become confusing.

> For governors and the like, if the option is =n, then you can hard-rely
> on it not being used. But if it is =y, you cannot assume anything
> what-so-ever. EAS does a run-time check for that exact reason -- a
> sole Kconfig dependency typically doesn't work for that.

Obviously Kconfig dependencies cannot replace runtime checks, but OTOH
the latter are guaranteed to fail if the given piece of code is not
there in the kernel even.

> > Otherwise you can see your "low util" tasks running at high
> > frequencies and "high util" ones running slowly. Surely, that's not
> > desirable?
> >
> > IIUC, the task placement needs to be consistent with the governor's
> > decisions for things to work as expected.
>
> Sure, but, say, the 'performance' governor could give you some of that too.

Well, kind of, and the 'powersave' governor can do that too in theory.

> That is, you could use uclamp.min on some tasks to ensure they are
> biased to bigger CPUs, and just stick the frequency to max. I wouldn't
> be surprised to see setups like that on non-battery-powered devices for
> instance. And yes, there are non-battery-powered devices that use big
> little out there (TVs and such, often because the only SOCs matching
> their requirements are mobile SOCs).

I would not conflate the use cases for uclamps and big-little.

Anyway, there are some cases in which using uclamps without schedutil
might make theoretical sense, but I'm not sure how useful that would
be in practice.

> > > - because of the above, it is (now) largely useless: a compile time
> > > dependency doesn't guarantee we are actually running with schedutil
> > > at all;
> > > - it is artificial: there are no actual compilation dependencies
> > > between sugov and uclamp, everything will compile just fine without
> > > that 'depends on';
> >
> > That actually is the case, but it doesn't mean that there is no
> > dependency in there.
>
> Sure, and the dependency did make sense when uclamp was first introduced.
> At the time, the clamp values where used _only_ in schedutil. So, it
> was fair to say "if schedutil is =n, there is no way the clamps will ever
> be useful to anything else, so the uclamp code can be safely compiled
> out". That is no longer true, and if you want to make uclamp work only
> with schedutil (which I would advise against for the above reason), then
> a Kconfig dependency doesn't seem to be the right tool for that anyway.

Still, IMO it would be fair to say that if uclamps are used, schedutil
is very likely to be preferred.

Kconfig can be made select schedutil when enabling uclamps or similar
to express that preference.

> > > Or maybe you were thinking of something else?
> > >
> > > > > That of course is only true if we can
> > > > > agree on a reasonable set of exported symbols, so I'll give others some
> > > > > time to complain and see if I can post a v2 addressing these issues!
> > > >
> > > > This isn't just about exported symbols, it is about what is regarded
> > > > as essential and what isn't.
> > >
> > > Right, the exported symbols are, IMO, quite interesting because they
> > > show how 'core' the governor is. But what exactly do you mean by
> > > 'essential' here? Essential in what sense?
> >
> > IMO the question is how much value there is in making it possible to
> > avoid loading a particular piece of kernel code into memory.
> >
> > You've demonstrated that it can be done with schedutil, but does that
> > matter that it needs to be done?
> >
> > I thought that the original idea was to make it closely integrated
> > with the scheduler, so it could access the scheduler's data structures
> > (that we specifically didn't want to expose to the *other* governors)
> > and so as to avoid forgetting about any dependencies when making
> > changes to either the scheduler or schedutil. Allowing it to be build
> > as a module would make make us have to worry about those things again,
> > so is it really worth it?
>
> Right, so, if there is a strong technical reason to keep schedutil a
> bool option (such as accessing data structures we really don't want to
> export),

The bool option is the status quo and there needs to be a strong
technical reason to allow it to be modular (as that would cause the
complexity to increase).

> then sure, I'll have no choice but to accept it. Now, assuming
> that I fix the usage of 'runqueues', is there anything in particular
> that you think is wrong in the series?

Well, define "wrong". :-)

Some pieces of the series look like general improvements, but some of
them don't.

And the fact that something can be done alone should not be regarded
as a good enough reason for doing it in my view.

> Note that if one day keeping schedutil modular becomes a blocker for a
> new feature, then we'll have the option to make it bool again. But is
> there something like that already?

Putting it this way isn't entirely fair IMO.

What you are proposing is basically to add complexity and the reason
for doing that seems to be convenience (and that's not the users'
convenience for that matter) which is not really super-convincing.

Cheers!

2020-05-12 14:02:33

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Tuesday 12 May 2020 at 12:25:17 (+0200), Rafael J. Wysocki wrote:
> Still, IMO it would be fair to say that if uclamps are used, schedutil
> is very likely to be preferred.
>
> Kconfig can be made select schedutil when enabling uclamps or similar
> to express that preference.

Right, fair enough. Making schedutil default to y when uclamp is
compiled in should do the trick (and avoid using 'select'). Would that
work for you?

> What you are proposing is basically to add complexity and the reason
> for doing that seems to be convenience (and that's not the users'
> convenience for that matter) which is not really super-convincing.

Forcing our users to build in their products something they don't want
to use tends to be a very real problem for what we're trying to achieve,
so it's certainly not just convenience from our perspective. I can
understand that yours might be different, though.

Thanks,
Quentin

2020-05-12 14:11:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Tue, May 12, 2020 at 3:58 PM Quentin Perret <[email protected]> wrote:
>
> On Tuesday 12 May 2020 at 12:25:17 (+0200), Rafael J. Wysocki wrote:
> > Still, IMO it would be fair to say that if uclamps are used, schedutil
> > is very likely to be preferred.
> >
> > Kconfig can be made select schedutil when enabling uclamps or similar
> > to express that preference.
>
> Right, fair enough. Making schedutil default to y when uclamp is
> compiled in should do the trick (and avoid using 'select'). Would that
> work for you?

I think so.

> > What you are proposing is basically to add complexity and the reason
> > for doing that seems to be convenience (and that's not the users'
> > convenience for that matter) which is not really super-convincing.
>
> Forcing our users to build in their products something they don't want
> to use tends to be a very real problem for what we're trying to achieve,
> so it's certainly not just convenience from our perspective. I can
> understand that yours might be different, though.

I would like to understand the nature of the problem here.

If some piece of kernel code is modular, it still needs to be build.
The difference is when and how it gets loaded, so can you possibly
elaborate here?

Cheers!

2020-05-12 15:13:27

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Tuesday 12 May 2020 at 16:08:56 (+0200), Rafael J. Wysocki wrote:
> If some piece of kernel code is modular, it still needs to be build.
> The difference is when and how it gets loaded, so can you possibly
> elaborate here?

Sure thing, sorry if that wasn't clear.

The end goal with GKI is the following: Google will release a single
binary kernel image (signed, etc etc) that all devices using a given
Android version will be required to use. That image is however going to
be only for the core of the kernel (no drivers or anything of the sort).
Vendors and OEMs, on their end, will be responsible to build and ship
GKI-compatible modules for their respective devices. So, Android devices
will eventually ship with a Google-issued GKI, plus a bunch of
vendor-provided modules loaded during boot.

This is a significant shift from the current model where vendors
completely own the kernel, and are largely free to use the kernel config
they want. Today, those who don't use schedutil are free to turn the
config off, for example.

But GKI changes that. The 'core' GKI config is effectively imposed to
the entire ecosystem. As of now, because it is 'bool' we have no choice
but to compile schedutil in the core GKI as some (most) partners use it.
But as you can imagine, that is not the preferred option of those who
_don't_ use schedutil. Modularizing avoids any potential friction since
the vendors who want to use it will be able load the module, and the
others will simply not. That really is the reason for that series.

Then there is an important question: why should upstream care about all
that stuff? That's obviously debatable, but my biased opinion is that
GKI is a good thing(TM). It's our opportunity to put some order in the
android ecosystem and to reduce the delta with mainline. That'll
definitely take time, and there will be Android-specific churn in GKI in
the beginning, but we'd like to keep that as small as possible, and to
converge to 0 looking forwards.

I hope that helps!

Thanks,
Quentin

2020-05-12 15:32:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Tue, May 12, 2020 at 5:11 PM Quentin Perret <[email protected]> wrote:
>
> On Tuesday 12 May 2020 at 16:08:56 (+0200), Rafael J. Wysocki wrote:
> > If some piece of kernel code is modular, it still needs to be build.
> > The difference is when and how it gets loaded, so can you possibly
> > elaborate here?
>
> Sure thing, sorry if that wasn't clear.

No worries.

> The end goal with GKI is the following: Google will release a single
> binary kernel image (signed, etc etc) that all devices using a given
> Android version will be required to use. That image is however going to
> be only for the core of the kernel (no drivers or anything of the sort).
> Vendors and OEMs, on their end, will be responsible to build and ship
> GKI-compatible modules for their respective devices. So, Android devices
> will eventually ship with a Google-issued GKI, plus a bunch of
> vendor-provided modules loaded during boot.

If that is the case, then I absolutely think that schedutil should be
part of the GKI.

Moreover, that would have been my opinion even if it had been modular
in the first place.

> This is a significant shift from the current model where vendors
> completely own the kernel, and are largely free to use the kernel config
> they want. Today, those who don't use schedutil are free to turn the
> config off, for example.

So why is this regarded as a good thing?

> But GKI changes that. The 'core' GKI config is effectively imposed to
> the entire ecosystem. As of now, because it is 'bool' we have no choice
> but to compile schedutil in the core GKI as some (most) partners use it.
> But as you can imagine, that is not the preferred option of those who
> _don't_ use schedutil.

OTOH, it may as well be an incentive for them to switch over and
report problems with it that they see.

I absolutely would like to make schedutil the clearly preferred option
and IMO avoiding to use it, especially for non-technical reasons,
should be clearly less attractive.

> Modularizing avoids any potential friction since
> the vendors who want to use it will be able load the module, and the
> others will simply not. That really is the reason for that series.

If the long-term target is for everyone to use schedutil, then I don't
quite see why making it easy to not include it in one's system is
going to help.

> Then there is an important question: why should upstream care about all
> that stuff? That's obviously debatable, but my biased opinion is that
> GKI is a good thing(TM). It's our opportunity to put some order in the
> android ecosystem and to reduce the delta with mainline. That'll
> definitely take time, and there will be Android-specific churn in GKI in
> the beginning, but we'd like to keep that as small as possible, and to
> converge to 0 looking forwards.

That's a good goal, but I'm not sure if the least resistance path to
it is the right one. :-)

Cheers!

2020-05-12 15:53:45

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Tue, May 12, 2020 at 11:30 AM Rafael J. Wysocki <[email protected]> wrote:
[...]
> > The end goal with GKI is the following: Google will release a single
> > binary kernel image (signed, etc etc) that all devices using a given
> > Android version will be required to use. That image is however going to
> > be only for the core of the kernel (no drivers or anything of the sort).
> > Vendors and OEMs, on their end, will be responsible to build and ship
> > GKI-compatible modules for their respective devices. So, Android devices
> > will eventually ship with a Google-issued GKI, plus a bunch of
> > vendor-provided modules loaded during boot.
>
> If that is the case, then I absolutely think that schedutil should be
> part of the GKI.
>
> Moreover, that would have been my opinion even if it had been modular
> in the first place.
>
> > This is a significant shift from the current model where vendors
> > completely own the kernel, and are largely free to use the kernel config
> > they want. Today, those who don't use schedutil are free to turn the
> > config off, for example.
>
> So why is this regarded as a good thing?
>
> > But GKI changes that. The 'core' GKI config is effectively imposed to
> > the entire ecosystem. As of now, because it is 'bool' we have no choice
> > but to compile schedutil in the core GKI as some (most) partners use it.
> > But as you can imagine, that is not the preferred option of those who
> > _don't_ use schedutil.
>
> OTOH, it may as well be an incentive for them to switch over and
> report problems with it that they see.
>
> I absolutely would like to make schedutil the clearly preferred option
> and IMO avoiding to use it, especially for non-technical reasons,
> should be clearly less attractive.

Also, does this series make it easier for vendors / oems / whoever to
carry out-of-tree schedutil hacks saying that's "Ok" because that's
not part of the core GKI? That would definitely be a bad thing to
encourage as well. schedutil should pretty much be considered a part
of the core GKI if the goal is to encourage everyone to move to it,
IMO.

- Joel

2020-05-12 16:28:31

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Tuesday 12 May 2020 at 17:30:36 (+0200), Rafael J. Wysocki wrote:
> On Tue, May 12, 2020 at 5:11 PM Quentin Perret <[email protected]> wrote:
> > The end goal with GKI is the following: Google will release a single
> > binary kernel image (signed, etc etc) that all devices using a given
> > Android version will be required to use. That image is however going to
> > be only for the core of the kernel (no drivers or anything of the sort).
> > Vendors and OEMs, on their end, will be responsible to build and ship
> > GKI-compatible modules for their respective devices. So, Android devices
> > will eventually ship with a Google-issued GKI, plus a bunch of
> > vendor-provided modules loaded during boot.
>
> If that is the case, then I absolutely think that schedutil should be
> part of the GKI.
>
> Moreover, that would have been my opinion even if it had been modular
> in the first place.

I definitely understand the feeling. Heck I contributed to schedutil, so
I'd love to see the entire world run it :-)

But my personal preference doesn't seem to matter in this world, sadly.
The truth is, we cannot afford to be arbitrary in our decisions in GKI.
Switching governors and such is a fully supported feature upstream, and
it has been for a long time. Taking that away from partners is not the
goal, nor the intention, of GKI. They will be able to choose whatever
governor they want, because there are no *objective* reasons to not let
them do that.

> > This is a significant shift from the current model where vendors
> > completely own the kernel, and are largely free to use the kernel config
> > they want. Today, those who don't use schedutil are free to turn the
> > config off, for example.
>
> So why is this regarded as a good thing?

You mean using something else than schedutil? It is not seen as a good
thing at all, at least not by me. But we have the same problem as
upstream. We cannot remove the other governors or the governor API for a
simple reason: they have users :/

Thanks,
Quentin

2020-05-12 17:34:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Tue, May 12, 2020 at 6:26 PM Quentin Perret <[email protected]> wrote:
>
> On Tuesday 12 May 2020 at 17:30:36 (+0200), Rafael J. Wysocki wrote:
> > On Tue, May 12, 2020 at 5:11 PM Quentin Perret <[email protected]> wrote:
> > > The end goal with GKI is the following: Google will release a single
> > > binary kernel image (signed, etc etc) that all devices using a given
> > > Android version will be required to use. That image is however going to
> > > be only for the core of the kernel (no drivers or anything of the sort).
> > > Vendors and OEMs, on their end, will be responsible to build and ship
> > > GKI-compatible modules for their respective devices. So, Android devices
> > > will eventually ship with a Google-issued GKI, plus a bunch of
> > > vendor-provided modules loaded during boot.
> >
> > If that is the case, then I absolutely think that schedutil should be
> > part of the GKI.
> >
> > Moreover, that would have been my opinion even if it had been modular
> > in the first place.
>
> I definitely understand the feeling. Heck I contributed to schedutil, so
> I'd love to see the entire world run it :-)
>
> But my personal preference doesn't seem to matter in this world, sadly.
> The truth is, we cannot afford to be arbitrary in our decisions in GKI.
> Switching governors and such is a fully supported feature upstream, and
> it has been for a long time. Taking that away from partners is not the
> goal, nor the intention, of GKI.

It still will be possible with schedutil built-in, however.

> They will be able to choose whatever
> governor they want, because there are no *objective* reasons to not let
> them do that.

Which, again, is still possible with non-modular schedutil AFAICS.

I don't see any technical reason for making schedutil modular in the
context of GKI other than to make the GKI image smaller, but I don't
expect that to be significant enough.

Is there anything else I am missing?

> > > This is a significant shift from the current model where vendors
> > > completely own the kernel, and are largely free to use the kernel config
> > > they want. Today, those who don't use schedutil are free to turn the
> > > config off, for example.
> >
> > So why is this regarded as a good thing?
>
> You mean using something else than schedutil?

I mean why allowing people to compile schedutil out is regarded as a good thing.

> It is not seen as a good
> thing at all, at least not by me. But we have the same problem as
> upstream. We cannot remove the other governors or the governor API for a
> simple reason: they have users :/

I'm not saying about removing any of that. I'm just trying to
understand why you need schedutil to be modular so as to make those
things possible.

Cheers!

2020-05-13 09:44:10

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

Hi Rafael,

On Tuesday 12 May 2020 at 19:30:52 (+0200), Rafael J. Wysocki wrote:
> I don't see any technical reason for making schedutil modular in the
> context of GKI other than to make the GKI image smaller, but I don't
> expect that to be significant enough.

The fact that we can make the image smaller, and we give vendors one
less reason to not-want GKI _is_ desirable IMO.

$ size vmlinux.*
text data bss dec hex filename
19225963 9601976 491084 29319023 1bf5f6f vmlinux.after
19230599 9603236 491084 29324919 1bf7677 vmlinux.before

^ that's with the series applied. 'before' means sugov is =y, and
'after' is sugov =m. So modularizing saves just over 4K on text, and a
bit of data too. Is it significant? Maybe not. But it's quite likely
that those who don't use schedutil will find any unnecessary byte to be
one too many.

I just checked the size of modules in the default arm64 defconfig, and
the median is ~4K of text. The average is a little bigger, but mostly
because of a small number of really large modules (nouveau being the
prime the example). So all in all, the sugov module is not particularly
small by comparison with other things that have been modularized. A lot
of small things can lead to significant savings at the end.

Thanks,
Quentin

2020-05-13 10:09:45

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Wednesday 13 May 2020 at 12:02:13 (+0200), Greg KH wrote:
> It's not significant at all, just always build it in, no one will notice
> it, it's just a page or two. Serial port drivers are way bigger :)

Alright, I give up :)

When partners will complain (and I think they will) I'll point them here
and say we tried ;)

Cheers,
Quentin

2020-05-13 10:27:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Wed, May 13, 2020 at 11:06:11AM +0100, Quentin Perret wrote:
> On Wednesday 13 May 2020 at 12:02:13 (+0200), Greg KH wrote:
> > It's not significant at all, just always build it in, no one will notice
> > it, it's just a page or two. Serial port drivers are way bigger :)
>
> Alright, I give up :)
>
> When partners will complain (and I think they will) I'll point them here
> and say we tried ;)

No problem, that's what lore.kernel.org archives are for :)

2020-05-13 11:20:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

On Wed, May 13, 2020 at 10:41:17AM +0100, Quentin Perret wrote:
> Hi Rafael,
>
> On Tuesday 12 May 2020 at 19:30:52 (+0200), Rafael J. Wysocki wrote:
> > I don't see any technical reason for making schedutil modular in the
> > context of GKI other than to make the GKI image smaller, but I don't
> > expect that to be significant enough.
>
> The fact that we can make the image smaller, and we give vendors one
> less reason to not-want GKI _is_ desirable IMO.
>
> $ size vmlinux.*
> text data bss dec hex filename
> 19225963 9601976 491084 29319023 1bf5f6f vmlinux.after
> 19230599 9603236 491084 29324919 1bf7677 vmlinux.before
>
> ^ that's with the series applied. 'before' means sugov is =y, and
> 'after' is sugov =m. So modularizing saves just over 4K on text, and a
> bit of data too. Is it significant? Maybe not. But it's quite likely
> that those who don't use schedutil will find any unnecessary byte to be
> one too many.

It's not significant at all, just always build it in, no one will notice
it, it's just a page or two. Serial port drivers are way bigger :)

thanks,

greg k-h

2020-05-13 20:27:09

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 00/14] Modularize schedutil

Hey Joel,

On Tuesday 12 May 2020 at 11:49:32 (-0400), Joel Fernandes wrote:
> Also, does this series make it easier for vendors / oems / whoever to
> carry out-of-tree schedutil hacks saying that's "Ok" because that's
> not part of the core GKI? That would definitely be a bad thing to
> encourage as well. schedutil should pretty much be considered a part
> of the core GKI if the goal is to encourage everyone to move to it,
> IMO.

Sure, but I don't think the series makes it easier to carry out-of-tree
stuff. Vendors will have the choice to load the governor they want. Some
will use schedutil, some will use other upstream governors, and some will
use their out of tree crap. And that is orthogonal to schedutil being a
module or not.

The only thing that will happen is that they will complain about GKI,
and find examples of things like schedutil that is being forced on
them. Realistically, having schedutil built-in is unlikely to change
their mind about the governor they want to use, it is just likely to
give them reasons not to do the right thing and be GKI compliant.

Thanks,
Quentin