Hi guys,
I've combined in this series some recent work on Frequency Invariance
(FI) from Valentin and myself to create what we believe to be some nice
improvements which also enable the adoption of schedutil as the default
cpufreq governor for arm and arm64.
Therefore, the series is structured as follows:
- (1) patches 1-3 move the frequency scaling updates from the cpufreq
drivers to the cpufreq core,
- (2) patch 4 fixes the FI scale factor setting for the vexpress-spc
driver, also showing that this driver is atypical,
- (3) patches 5-7 enable proper reporting on whether the system
supports frequency invariance (either cpufreq or counter-driven),
- (4) patch 8 makes schedutil the default governor for arm and arm64
systems.
Additionally, I will submit separately patches that condition Energy
Aware Scheduling (EAS) enablement on FI support. For now I've kept them
out of this series to keep it focused mostly on cpufreq.
The need for (1) is two-fold:
- First of all, the call to arch_set_freq_scale(), the update function
for the Frequency Invariance Enging (FIE) scale factor, is often
forgotten by cpufreq drivers as this link to the scheduler does not
seem as a logical step to consider in its implementation.
(for example
commit ada54f35b227 ("cpufreq: qcom-hw: invoke frequency-invariance
setter function"))
Therefore, given that a majority of drivers already provide the
information needed by this function call back to the cpufreq core,
moving it there provides by default the functionality that the
scheduler's FIE needs.
- Secondly, this enables cpufreq to report support for FI (3).
The patch at (2) provides a fix for the vexpress-spc driver, when the
big.LITTLE switcher is enabled, also showing that this case is atypical
and cannot benefit from the move of the FI scale setting to the cpufreq
core. For this reason, a new flag is introduced to allow drivers to
flag custom support for FI. While other drivers could benefit from the
presence of this flag as described in patch 2/8, it would be good to
reflect on whether this old functionality deserves us to introduce an
additional flag in cpufreq. Until this is discussed, I've chosen to
handle and fix this driver in this series as well.
The functionality at (3) fixes a silent issue for arm and arm64 systems:
if we look at the definition of arch_scale_freq_invariant(), schedutil
considers these systems frequency invariant because they #define
arch_scale_freq_capacity(), even for cases where the cpufreq driver in
use doesn't drive the frequency invariance engine (FIE), i.e. calls
arch_set_freq_scale().
Therefore the patches at (3) are necessary to be able to switch to
schedutil as the default governor (4) and to later condition EAS
enablement, with the confidence that arch_scale_freq_invariant() now
properly reports FI support.
Given that this functionality touches multiple cpufreq drivers, I've
tested on a range of platforms to ensure that functionality is
maintained.
This series applies on linux-next-20200629.
Testing:
Preliminary tests on all patches: build tests for multiple architectures,
sparse static analysis, LTP tests, LISA synthetics (test suite detailed
at [1]).
Other tests on affected platforms:
- Juno (R0) - scpi-cpufreq
root@buildroot:~# dmesg | grep -i invariance
[ 5.376523] cpufreq: cpufreq_register_driver: Driver scpi-cpufreq can provide frequency invariance
root@buildroot:~# cd /sys/devices/system/cpu/cpufreq/policy0
root@buildroot:~# cat scaling_governor
userspace
root@buildroot:~# cat scaling_available_frequencies
450000 575000 700000 775000 850000
root@buildroot:~# echo 450000 > scaling_setspeed
[ 953.861317] Scale for cpus 0,3-5: 542
root@buildroot:~# echo 850000 > scaling_setspeed
[ 971.780739] Scale for cpus 0,3-5: 1024
root@buildroot:~# cd ../policy1
root@buildroot:~# cat scaling_available_frequencies
450000 625000 800000 950000 1100000
root@buildroot:~# echo 450000 > scaling_setspeed
[ 1084.867760] Scale for cpus 1-2: 418
root@buildroot:~# echo 1100000 > scaling_setspeed
[ 1094.447724] Scale for cpus 1-2: 1024
- Juno (R2) - scmi (cpufreq driver)
root@buildroot:~# dmesg | grep scmi
[ 5.707341] arm-scmi firmware:scmi: SCMI Protocol v1.0 'arm:arm' Firmware version 0x2060000
[ 5.987594] cpufreq: cpufreq_register_driver: Driver scmi can provide frequency invariance
root@buildroot:~# dmesg | grep -i invariance
[ 5.987594] cpufreq: cpufreq_register_driver: Driver scmi can provide frequency invariance
root@buildroot:~# cd /sys/devices/system/cpu/cpufreq/policy0
root@buildroot:~# cat scaling_available_frequencies
450000 800000 950000
root@buildroot:~# echo 450000 > scaling_setspeed
[ 65.691303] Scale for cpus 0,3-5: 485
root@buildroot:~# echo 850000 > scaling_setspeed
[ 697.538250] Scale for cpus 0,3-5: 1024
root@buildroot:~# cd ../policy1
root@buildroot:~# cat scaling_available_frequencies
600000 1000000 1200000
root@buildroot:~# echo 600000 > scaling_setspeed
[ 711.874918] Scale for cpus 1-2: 512
root@buildroot:~# echo 1200000 > scaling_setspeed
[ 715.955159] Scale for cpus 1-2: 1024
- DB845c - qcom-cpufreq-hw
root@buildroot:~# dmesg | grep -i invariance
[ 5.136076] cpufreq: cpufreq_register_driver: Driver qcom-cpufreq-hw can provide frequency invariance
root@buildroot:~# cd /sys/devices/system/cpu/cpufreq/policy0
root@buildroot:~# cat scaling_available_frequencies
300000 403200 480000 576000 652800 748800 825600 902400 979200 1056000 1132800 1228800 1324800 1420800 1516800 1612800 1689600 1766400
root@buildroot:~# echo 300000 > scaling_setspeed
[ 94.027825] Scale for cpus 0-3: 173
root@buildroot:~# echo 1766400 > scaling_setspeed
[ 119.680565] Scale for cpus 0-3: 1024
root@buildroot:~# cd ../policy4/
root@buildroot:~# cat scaling_available_frequencies
825600 902400 979200 1056000 1209600 1286400 1363200 1459200 1536000 1612800 1689600 1766400 1843200 1920000 1996800 2092800 2169600 2246400 2323200 2400000 2476800 2553600 2649600
root@buildroot:~# echo 825600 > scaling_setspeed
[ 158.759343] Scale for cpus 4-7: 319
root@buildroot:~# echo 2649600 > scaling_setspeed
[ 160.766545] Scale for cpus 4-7: 1024
- Hikey960 - cpufreq-dt
root@buildroot:~# dmesg | grep -i invariance
[ 4.706909] cpufreq: cpufreq_register_driver: Driver cpufreq-dt can provide frequency invariance
root@buildroot:~# cd /sys/devices/system/cpu/cpufreq/policy0
root@buildroot:~# cat scaling_available_frequencies
533000 999000 1402000 1709000 1844000
root@buildroot:~# echo 533000 > scaling_setspeed
[ 56.732673] Scale for cpus 0-3: 295
root@buildroot:~# echo 1844000 > scaling_setspeed
[ 64.657238] Scale for cpus 0-3: 1024
root@buildroot:~# cd ../policy4/
root@buildroot:~# cat scaling_available_frequencies
903000 1421000 1805000 2112000 2362000
root@buildroot:~# echo 903000 > scaling_setspeed
[ 79.847937] Scale for cpus 4-7: 391
root@buildroot:~# echo 2362000 > scaling_setspeed
[ 90.545476] Scale for cpus 4-7: 1024
- TC2 - vexpress-spc (!bL switcher enabled)
root@buildroot:~# dmesg | grep -i invariance
[ 9.250942] cpufreq: cpufreq_register_driver: Driver vexpress-spc can provide frequency invariance
root@buildroot:~# cd /sys/devices/system/cpu/cpufreq/policy0
root@buildroot:~# cat scaling_governor
userspace
root@buildroot:~# cat scaling_available_frequencies
350000 400000 500000 600000 700000 800000 900000 1000000
root@buildroot:~# echo 350000 > scaling_setspeed
[ 809.254517] Scale for cpus 0,3-4: 358
root@buildroot:~# echo 1000000 > scaling_setspeed
[ 818.744782] Scale for cpus 0,3-4: 1024
root@buildroot:~# cd ../policy1/
root@buildroot:~# cat scaling_available_frequencies
500000 600000 700000 800000 900000 1000000 1100000 1200000
root@buildroot:~# echo 500000 > scaling_setspeed
[ 1173.785907] Scale for cpus 1-2: 426
root@buildroot:~# echo 1200000 > scaling_setspeed
[ 1180.177035] Scale for cpus 1-2: 1024
- TC2 - vexpress-spc (bL switcher enabled)
before patches:
root@buster-armhf:~# echo 175000 > policy0/scaling_setspeed
[ 376.515629] vexpress_spc_cpufreq: Setting freq scale 0: 175000*1024/1200000=149 (when it should be 350000*1024/1000000=358)
--> still using the little CPU from the pair (0 - LITTLE, 1 - big)
root@buster-armhf:~# echo 175000 > policy3/scaling_setspeed
[ 400.765352] vexpress_spc_cpufreq: Setting freq scale 3: 175000*1024/1200000=149 (when it should be 350000*1024/1000000=358)
--> still using the little CPU in the pair (3 - LITTLE, 2 - big)
root@buster-armhf:~# echo 300000 > policy0/scaling_setspeed
[ 456.155104] vexpress_spc_cpufreq: Setting freq scale 0: 300000*1024/1200000=256 (when it should be 600000*1024/1000000=614)
--> Still using the little CPU in the pair (0 - LITTLE, 1 - big)
--> Now policy 0 requested 300000 while policy1 requested 175000.
The rate in the clock domain is now virtual-300000 = 600000.
root@buster-armhf:~# echo 700000 > policy0/scaling_setspeed
[ 506.617496] vexpress_spc_cpufreq: Setting freq scale 0: 700000*1024/1200000 - correct.
--> Switch to using the big CPU in the pair. The virtual frequency
is equal to the actual frequency and the scale factor is correct.
--> But, the request of virtual-300000 from this group on the
little clock domain goes away, so the rate should reduce to
virtual-175000 = 350000.
[ 506.578275] vexpress_spc_cpufreq: ve_spc_cpufreq_set_rate recalc: clk_set_rate DONE: 0, new_cluster 0 old cluster: 1 rate 700000 new_rate 350000.
--> arch_set_freq_scale() should be called here to reduce
the scale of pair (0, 1) but it is not.
after patches:
root@buildroot:~# echo 175000 > policy0/scaling_setspeed
[ 520.078181] Scale changed for CPU 0 to 358
[ 520.130559] Scale changed for CPU 3 to 853
--> Default was 1000000/1200000 which is why CPU 3 is at 853
root@buildroot:~# echo 175000 > policy3/scaling_setspeed
[ 577.250912] Scale changed for CPU 3 to 358
root@buildroot:~# echo 300000 > policy0/scaling_setspeed
[ 670.716195] Scale changed for CPU 0 to 614
[ 670.747971] Scale changed for CPU 3 to 614
root@buildroot:~# echo 700000 > policy0/scaling_setspeed
[ 710.746860] Scale changed for CPU 0 to 597
[ 710.836288] Scale changed for CPU 3 to 358
[1] https://developer.arm.com/tools-and-software/open-source-software/linux-kernel/energy-aware-scheduling/eas-mainline-development
Ionela Voinescu (4):
cpufreq: allow drivers to flag custom support for freq invariance
cpufreq,drivers: remove setting of frequency scale factor
cpufreq,vexpress-spc: fix Frequency Invariance (FI) for bL switching
cpufreq: report whether cpufreq supports Frequency Invariance (FI)
Valentin Schneider (4):
cpufreq: move invariance setter calls in cpufreq core
arch_topology,cpufreq,sched/core: constify arch_* cpumasks
arch_topology,arm64: define arch_scale_freq_invariant()
cpufreq: make schedutil the default for arm and arm64
arch/arm64/kernel/topology.c | 9 ++++-
drivers/base/arch_topology.c | 10 ++++-
drivers/cpufreq/Kconfig | 2 +-
drivers/cpufreq/cpufreq-dt.c | 10 +----
drivers/cpufreq/cpufreq.c | 51 +++++++++++++++++++++++---
drivers/cpufreq/qcom-cpufreq-hw.c | 9 +----
drivers/cpufreq/scmi-cpufreq.c | 18 +++------
drivers/cpufreq/scpi-cpufreq.c | 3 --
drivers/cpufreq/vexpress-spc-cpufreq.c | 26 ++++++++++++-
include/linux/arch_topology.h | 8 +++-
include/linux/cpufreq.h | 18 ++++++++-
kernel/sched/core.c | 2 +-
12 files changed, 117 insertions(+), 49 deletions(-)
base-commit: c28e58ee9dadc99f79cf16ca805221feddd432ad
--
2.17.1
The scheduler's Frequency Invariance Engine (FIE) is providing a
frequency scale correction factor that helps achieve more accurate
load-tracking by conveying information about the currently selected
frequency relative to the maximum supported frequency of a CPU.
In some cases this is achieved by passing information from cpufreq
drivers about the frequency selection done by cpufreq.
Given that most drivers follow a similar process of selecting and
setting of frequency, there is a strong case for moving the setting
of the frequency scale factor from the cpufreq drivers frequency
switch callbacks (target_index() and fast_switch()), to the cpufreq
core functions that call them.
In preparation for this, acknowledge that there are still drivers
who's frequency setting process is custom and therefore these drivers
will want to provide and flag custom support for the setting of the
scheduler's frequency invariance (FI) scale factor as well. Prepare
for this by introducing a new flag: CPUFREQ_CUSTOM_SET_FREQ_SCALE.
Examples of users of this flag are:
- drivers that do not implement the callbacks that lend themselves
to triggering the setting of the FI scale factor,
- drivers that implement the appropriate callbacks but which have
an atypical implementation.
Currently, given that all drivers call arch_set_freq_scale() directly,
flag all users with CPUFREQ_CUSTOM_SET_FREQ_SCALE. These driver changes
are also useful to maintain bisection between the FI switch from the
drivers to the core.
Signed-off-by: Ionela Voinescu <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq-dt.c | 3 ++-
drivers/cpufreq/qcom-cpufreq-hw.c | 3 ++-
drivers/cpufreq/scmi-cpufreq.c | 3 ++-
drivers/cpufreq/scpi-cpufreq.c | 3 ++-
drivers/cpufreq/vexpress-spc-cpufreq.c | 3 ++-
include/linux/cpufreq.h | 10 +++++++++-
6 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index 944d7b45afe9..8e0571a49d1e 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -331,7 +331,8 @@ static int cpufreq_exit(struct cpufreq_policy *policy)
static struct cpufreq_driver dt_cpufreq_driver = {
.flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK |
- CPUFREQ_IS_COOLING_DEV,
+ CPUFREQ_IS_COOLING_DEV |
+ CPUFREQ_CUSTOM_SET_FREQ_SCALE,
.verify = cpufreq_generic_frequency_table_verify,
.target_index = set_target,
.get = cpufreq_generic_get,
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 573630c23aca..e13780beb373 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -337,7 +337,8 @@ static struct freq_attr *qcom_cpufreq_hw_attr[] = {
static struct cpufreq_driver cpufreq_qcom_hw_driver = {
.flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK |
CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
- CPUFREQ_IS_COOLING_DEV,
+ CPUFREQ_IS_COOLING_DEV |
+ CPUFREQ_CUSTOM_SET_FREQ_SCALE,
.verify = cpufreq_generic_frequency_table_verify,
.target_index = qcom_cpufreq_hw_target_index,
.get = qcom_cpufreq_hw_get,
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index fb42e3390377..16ab4ecc75e4 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -223,7 +223,8 @@ static struct cpufreq_driver scmi_cpufreq_driver = {
.name = "scmi",
.flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
CPUFREQ_NEED_INITIAL_FREQ_CHECK |
- CPUFREQ_IS_COOLING_DEV,
+ CPUFREQ_IS_COOLING_DEV |
+ CPUFREQ_CUSTOM_SET_FREQ_SCALE,
.verify = cpufreq_generic_frequency_table_verify,
.attr = cpufreq_generic_attr,
.target_index = scmi_cpufreq_set_target,
diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c
index b0f5388b8854..6b5f56dc3ca3 100644
--- a/drivers/cpufreq/scpi-cpufreq.c
+++ b/drivers/cpufreq/scpi-cpufreq.c
@@ -197,7 +197,8 @@ static struct cpufreq_driver scpi_cpufreq_driver = {
.name = "scpi-cpufreq",
.flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
CPUFREQ_NEED_INITIAL_FREQ_CHECK |
- CPUFREQ_IS_COOLING_DEV,
+ CPUFREQ_IS_COOLING_DEV |
+ CPUFREQ_CUSTOM_SET_FREQ_SCALE,
.verify = cpufreq_generic_frequency_table_verify,
.attr = cpufreq_generic_attr,
.get = scpi_cpufreq_get_rate,
diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c
index 4e8b1dee7c9a..e0a1a3367ec5 100644
--- a/drivers/cpufreq/vexpress-spc-cpufreq.c
+++ b/drivers/cpufreq/vexpress-spc-cpufreq.c
@@ -496,7 +496,8 @@ static struct cpufreq_driver ve_spc_cpufreq_driver = {
.name = "vexpress-spc",
.flags = CPUFREQ_STICKY |
CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
- CPUFREQ_NEED_INITIAL_FREQ_CHECK,
+ CPUFREQ_NEED_INITIAL_FREQ_CHECK |
+ CPUFREQ_CUSTOM_SET_FREQ_SCALE,
.verify = cpufreq_generic_frequency_table_verify,
.target_index = ve_spc_cpufreq_set_target,
.get = ve_spc_cpufreq_get_rate,
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 3494f6763597..42668588f9f8 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name)
struct cpufreq_driver {
char name[CPUFREQ_NAME_LEN];
- u8 flags;
+ u16 flags;
void *driver_data;
/* needed by all drivers */
@@ -417,6 +417,14 @@ struct cpufreq_driver {
*/
#define CPUFREQ_IS_COOLING_DEV BIT(7)
+/*
+ * Set by drivers which implement the necessary calls to the scheduler's
+ * frequency invariance engine. The use of this flag will result in the
+ * default arch_set_freq_scale calls being skipped in favour of custom
+ * driver calls.
+ */
+#define CPUFREQ_CUSTOM_SET_FREQ_SCALE BIT(8)
+
int cpufreq_register_driver(struct cpufreq_driver *driver_data);
int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
--
2.17.1
From: Valentin Schneider <[email protected]>
To properly scale its per-entity load-tracking signals, the task scheduler
needs to be given a frequency scale factor, i.e. some image of the current
frequency the CPU is running at. Currently, this scale can be computed
either by using counters (APERF/MPERF on x86, AMU on arm64), or by
piggy-backing on the frequency selection done by cpufreq.
For the latter, drivers have to explicitly set the scale factor
themselves, despite it being purely boiler-plate code: the required
information depends entirely on the kind of frequency switch callback
implemented by the driver, i.e. either of: target_index(), target(),
fast_switch() and setpolicy().
The fitness of those callbacks with regard to driving the Frequency
Invariance Engine (FIE) is studied below:
target_index()
==============
Documentation states that the chosen frequency "must be determined by
freq_table[index].frequency". It isn't clear if it *has* to be that
frequency, or if it can use that frequency value to do some computation
that ultimately leads to a different frequency selection. All drivers
go for the former, while the vexpress-spc-cpufreq has an atypical
implementation.
Thefore, the hook works on the asusmption the core can use
freq_table[index].frequency.
target()
=======
This has been flagged as deprecated since:
commit 9c0ebcf78fde ("cpufreq: Implement light weight ->target_index() routine")
It also doesn't have that many users:
cpufreq-nforce2.c:371:2: .target = nforce2_target,
cppc_cpufreq.c:416:2: .target = cppc_cpufreq_set_target,
pcc-cpufreq.c:573:2: .target = pcc_cpufreq_target,
Should we care about drivers using this hook, we may be able to exploit
cpufreq_freq_transition_{being, end}(). Otherwise, if FIE support is
desired in their current state, arch_set_freq_scale() could still be
called directly by the driver, while CPUFREQ_CUSTOM_SET_FREQ_SCALE
could be used to mark support for it.
fast_switch()
=============
This callback *has* to return the frequency that was selected.
setpolicy()
===========
This callback does not have any designated way of informing what was the
end choice. But there are only two drivers using setpolicy(), and none
of them have current FIE support:
drivers/cpufreq/longrun.c:281: .setpolicy = longrun_set_policy,
drivers/cpufreq/intel_pstate.c:2215: .setpolicy = intel_pstate_set_policy,
The intel_pstate is known to use counter-driven frequency invariance.
If FIE support is desired in their current state, arch_set_freq_scale()
could still be called directly by the driver, while
CPUFREQ_CUSTOM_SET_FREQ_SCALE could be used to mark support for it.
Conclusion
==========
Given that the significant majority of current FIE enabled drivers use
callbacks that lend themselves to triggering the setting of the FIE scale
factor in a generic way, move the invariance setter calls to cpufreq core,
while filtering drivers that flag custom support using
CPUFREQ_CUSTOM_SET_FREQ_SCALE.
Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Ionela Voinescu <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 0128de3603df..83b58483a39b 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2046,9 +2046,16 @@ EXPORT_SYMBOL(cpufreq_unregister_notifier);
unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
unsigned int target_freq)
{
+ unsigned int freq;
+
target_freq = clamp_val(target_freq, policy->min, policy->max);
+ freq = cpufreq_driver->fast_switch(policy, target_freq);
+
+ if (freq && !(cpufreq_driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE))
+ arch_set_freq_scale(policy->related_cpus, freq,
+ policy->cpuinfo.max_freq);
- return cpufreq_driver->fast_switch(policy, target_freq);
+ return freq;
}
EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
@@ -2140,7 +2147,7 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
unsigned int relation)
{
unsigned int old_target_freq = target_freq;
- int index;
+ int index, retval;
if (cpufreq_disabled())
return -ENODEV;
@@ -2171,7 +2178,14 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
index = cpufreq_frequency_table_target(policy, target_freq, relation);
- return __target_index(policy, index);
+ retval = __target_index(policy, index);
+
+ if (!retval && !(cpufreq_driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE))
+ arch_set_freq_scale(policy->related_cpus,
+ policy->freq_table[index].frequency,
+ policy->cpuinfo.max_freq);
+
+ return retval;
}
EXPORT_SYMBOL_GPL(__cpufreq_driver_target);
--
2.17.1
In the majority of cases, the index argument to cpufreq's target_index()
is meant to identify the frequency that is requested from the hardware,
according to the frequency table: policy->freq_table[index].frequency.
After successfully requesting it from the hardware, this value, together
with the maximum hardware frequency (policy->cpuinfo.max_freq) are used
as arguments to arch_set_freq_scale(), in order to set the task scheduler
frequency scale factor. This is a normalized indication of a CPU's
current performance.
But for the vexpress-spc-cpufreq driver, when big.LITTLE switching [1]
is enabled, there are three issues with using the above information for
setting the FI scale factor:
- cur_freq: policy->freq_table[index].frequency is not the frequency
requested from the hardware. ve_spc_cpufreq_set_rate() will convert
from this virtual frequency to an actual frequency, which is then
requested from the hardware. For the A7 cluster, the virtual frequency
is half the actual frequency. The use of the virtual policy->freq_table
frequency results in an incorrect FI scale factor.
- max_freq: policy->cpuinfo.max_freq does not correctly identify the
maximum frequency of the physical cluster. This value identifies the
maximum frequency achievable by the big-LITTLE pair, that is the
maximum frequency of the big CPU. But when the LITTLE CPU in the group
is used, the hardware maximum frquency passed to arch_set_freq_scale()
is incorrect.
- missing a scale factor update: when switching clusters, the driver
recalculates the frequency of the old clock domain based on the
requests of the remaining CPUs in the domain and asks for a clock
change. But this does not result in an update in the scale factor.
Therefore, introduce a local function bLs_set_sched_freq_scale() that
helps call arch_set_freq_scale() with correct information for the
is_bL_switching_enabled() case, while maintaining the old, more
efficient, call site of arch_set_freq_scale() for when cluster
switching is disabled.
Also, because of these requirements in computing the scale factor, this
driver is the only one that maintains custom support for FI, which is
marked by the presence of the CPUFREQ_CUSTOM_SET_FREQ_SCALE flag.
[1] https://lwn.net/Articles/481055/
Signed-off-by: Ionela Voinescu <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Liviu Dudau <[email protected]>
---
drivers/cpufreq/vexpress-spc-cpufreq.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c
index e0a1a3367ec5..f2caf67d4050 100644
--- a/drivers/cpufreq/vexpress-spc-cpufreq.c
+++ b/drivers/cpufreq/vexpress-spc-cpufreq.c
@@ -55,6 +55,8 @@ static atomic_t cluster_usage[MAX_CLUSTERS + 1];
static unsigned int clk_big_min; /* (Big) clock frequencies */
static unsigned int clk_little_max; /* Maximum clock frequency (Little) */
+static inline u32 get_table_max(struct cpufreq_frequency_table *table);
+
static DEFINE_PER_CPU(unsigned int, physical_cluster);
static DEFINE_PER_CPU(unsigned int, cpu_last_req_freq);
@@ -87,6 +89,18 @@ static unsigned int find_cluster_maxfreq(int cluster)
return max_freq;
}
+static void bLs_set_sched_freq_scale(int cluster, unsigned long cur_freq)
+{
+ unsigned long max_freq = get_table_max(freq_table[cluster]);
+ int j;
+
+ for_each_online_cpu(j) {
+ if (cluster == per_cpu(physical_cluster, j))
+ arch_set_freq_scale(get_cpu_mask(j), cur_freq,
+ max_freq);
+ }
+}
+
static unsigned int clk_get_cpu_rate(unsigned int cpu)
{
u32 cur_cluster = per_cpu(physical_cluster, cpu);
@@ -154,6 +168,9 @@ ve_spc_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate)
mutex_unlock(&cluster_lock[new_cluster]);
+ if (bLs)
+ bLs_set_sched_freq_scale(new_cluster, new_rate);
+
/* Recalc freq for old cluster when switching clusters */
if (old_cluster != new_cluster) {
/* Switch cluster */
@@ -170,7 +187,11 @@ ve_spc_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate)
pr_err("%s: clk_set_rate failed: %d, old cluster: %d\n",
__func__, ret, old_cluster);
}
+
mutex_unlock(&cluster_lock[old_cluster]);
+
+ if (new_rate)
+ bLs_set_sched_freq_scale(old_cluster, new_rate);
}
return 0;
@@ -200,7 +221,7 @@ static int ve_spc_cpufreq_set_target(struct cpufreq_policy *policy,
ret = ve_spc_cpufreq_set_rate(cpu, actual_cluster, new_cluster,
freqs_new);
- if (!ret) {
+ if (!is_bL_switching_enabled() && !ret) {
arch_set_freq_scale(policy->related_cpus, freqs_new,
policy->cpuinfo.max_freq);
}
--
2.17.1
Now that the update of the FI scale factor is done in cpufreq core for
selected functions - target_index() and fast_switch(), and drivers can
mark themselves as providing custom support for FI as well, by calling
arch_set_freq_scale() themselves, we can provide feedback to the task
scheduler (and not only) on whether cpufreq supports FI.
For this purpose, provide error and debug messages, together with an
external function to expose whether the cpufreq drivers support FI, by
using a static key.
Signed-off-by: Ionela Voinescu <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq.c | 26 ++++++++++++++++++++++++++
include/linux/cpufreq.h | 5 +++++
2 files changed, 31 insertions(+)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 83b58483a39b..60b5272c5d80 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -59,6 +59,9 @@ static struct cpufreq_driver *cpufreq_driver;
static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
static DEFINE_RWLOCK(cpufreq_driver_lock);
+/* Mark support for the scheduler's frequency invariance engine */
+static DEFINE_STATIC_KEY_FALSE(cpufreq_set_freq_scale);
+
/* Flag to suspend/resume CPUFreq governors */
static bool cpufreq_suspended;
@@ -67,6 +70,26 @@ static inline bool has_target(void)
return cpufreq_driver->target_index || cpufreq_driver->target;
}
+static inline
+void enable_cpufreq_freq_invariance(struct cpufreq_driver *driver)
+{
+ if ((driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE) ||
+ ((driver->target_index || driver->fast_switch)
+ && !(driver->target || driver->setpolicy))) {
+
+ static_branch_enable_cpuslocked(&cpufreq_set_freq_scale);
+ pr_debug("%s: Driver %s can provide frequency invariance.",
+ __func__, driver->name);
+ } else
+ pr_err("%s: Driver %s cannot provide frequency invariance.",
+ __func__, driver->name);
+}
+
+bool cpufreq_sets_freq_scale(void)
+{
+ return static_branch_likely(&cpufreq_set_freq_scale);
+}
+
/* internal prototypes */
static unsigned int __cpufreq_get(struct cpufreq_policy *policy);
static int cpufreq_init_governor(struct cpufreq_policy *policy);
@@ -2713,6 +2736,8 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
cpufreq_driver = driver_data;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+ enable_cpufreq_freq_invariance(cpufreq_driver);
+
if (driver_data->setpolicy)
driver_data->flags |= CPUFREQ_CONST_LOOPS;
@@ -2782,6 +2807,7 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver)
cpus_read_lock();
subsys_interface_unregister(&cpufreq_interface);
remove_boost_sysfs_file();
+ static_branch_disable_cpuslocked(&cpufreq_set_freq_scale);
cpuhp_remove_state_nocalls_cpuslocked(hp_online);
write_lock_irqsave(&cpufreq_driver_lock, flags);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 42668588f9f8..8b6369d657bd 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -217,6 +217,7 @@ void refresh_frequency_limits(struct cpufreq_policy *policy);
void cpufreq_update_policy(unsigned int cpu);
void cpufreq_update_limits(unsigned int cpu);
bool have_governor_per_policy(void);
+bool cpufreq_sets_freq_scale(void);
struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
void cpufreq_enable_fast_switch(struct cpufreq_policy *policy);
void cpufreq_disable_fast_switch(struct cpufreq_policy *policy);
@@ -237,6 +238,10 @@ static inline unsigned int cpufreq_get_hw_max_freq(unsigned int cpu)
{
return 0;
}
+static inline bool cpufreq_sets_freq_scale(void)
+{
+ return false;
+}
static inline void disable_cpufreq(void) { }
#endif
--
2.17.1
From: Valentin Schneider <[email protected]>
The passed cpumask arguments to:
- arch_set_freq_scale(),
- arch_set_thermal_pressure(), and
- arch_freq_counters_available()
are only iterated over, so reflect this in the prototype. This also
allows to pass system cpumasks like cpu_online_mask without getting
a warning.
Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Ionela Voinescu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/arm64/kernel/topology.c | 2 +-
drivers/base/arch_topology.c | 4 ++--
drivers/cpufreq/cpufreq.c | 5 +++--
include/linux/arch_topology.h | 4 ++--
include/linux/cpufreq.h | 3 ++-
kernel/sched/core.c | 2 +-
6 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 0801a0f3c156..9a9f2b8dedf5 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -253,7 +253,7 @@ static int __init init_amu_fie(void)
}
late_initcall_sync(init_amu_fie);
-bool arch_freq_counters_available(struct cpumask *cpus)
+bool arch_freq_counters_available(const struct cpumask *cpus)
{
return amu_freq_invariant() &&
cpumask_subset(cpus, amu_fie_cpus);
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 4d0a0038b476..8447e1f30340 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -21,13 +21,13 @@
#include <linux/sched.h>
#include <linux/smp.h>
-__weak bool arch_freq_counters_available(struct cpumask *cpus)
+__weak bool arch_freq_counters_available(const struct cpumask *cpus)
{
return false;
}
DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE;
-void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
+void arch_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq,
unsigned long max_freq)
{
unsigned long scale;
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 60b5272c5d80..161b8089b0f6 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -177,8 +177,9 @@ u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy)
}
EXPORT_SYMBOL_GPL(get_cpu_idle_time);
-__weak void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
- unsigned long max_freq)
+__weak void arch_set_freq_scale(const struct cpumask *cpus,
+ unsigned long cur_freq,
+ unsigned long max_freq)
{
}
EXPORT_SYMBOL_GPL(arch_set_freq_scale);
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 0566cb3314ef..4be0315700cb 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -30,7 +30,7 @@ static inline unsigned long topology_get_freq_scale(int cpu)
return per_cpu(freq_scale, cpu);
}
-bool arch_freq_counters_available(struct cpumask *cpus);
+bool arch_freq_counters_available(const struct cpumask *cpus);
DECLARE_PER_CPU(unsigned long, thermal_pressure);
@@ -39,7 +39,7 @@ static inline unsigned long topology_get_thermal_pressure(int cpu)
return per_cpu(thermal_pressure, cpu);
}
-void arch_set_thermal_pressure(struct cpumask *cpus,
+void arch_set_thermal_pressure(const struct cpumask *cpus,
unsigned long th_pressure);
struct cpu_topology {
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 8b6369d657bd..23398133c24b 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -1003,7 +1003,8 @@ static inline void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
extern void arch_freq_prepare_all(void);
extern unsigned int arch_freq_get_on_cpu(int cpu);
-extern void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
+extern void arch_set_freq_scale(const struct cpumask *cpus,
+ unsigned long cur_freq,
unsigned long max_freq);
/* the following are really really optional */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f518af52d0fb..b44a42b1236c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3645,7 +3645,7 @@ unsigned long long task_sched_runtime(struct task_struct *p)
DEFINE_PER_CPU(unsigned long, thermal_pressure);
-void arch_set_thermal_pressure(struct cpumask *cpus,
+void arch_set_thermal_pressure(const struct cpumask *cpus,
unsigned long th_pressure)
{
int cpu;
--
2.17.1
From: Valentin Schneider <[email protected]>
schedutil is already a hard-requirement for EAS, which has lead to making
it default on arm (when CONFIG_BIG_LITTLE), see:
commit 8fdcca8e254a ("cpufreq: Select schedutil when using big.LITTLE")
One thing worth pointing out is that schedutil isn't only relevant for
asymmetric CPU capacity systems; for instance, schedutil is the only
governor that honours util-clamp performance requests. Another good example
of this is x86 switching to using it by default in:
commit a00ec3874e7d ("cpufreq: intel_pstate: Select schedutil as the default governor")
Arguably it should be made the default for all architectures, but it seems
better to wait for them to also gain frequency invariance powers. Make it
the default for arm && arm64 for now.
Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Ionela Voinescu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Russell King <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
---
drivers/cpufreq/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index e91750132552..2c7171e0b001 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -37,7 +37,7 @@ config CPU_FREQ_STAT
choice
prompt "Default CPUFreq governor"
default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || ARM_SA1110_CPUFREQ
- default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if BIG_LITTLE
+ default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if ARM64 || ARM
default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if X86_INTEL_PSTATE && SMP
default CPU_FREQ_DEFAULT_GOV_PERFORMANCE
help
--
2.17.1
From: Valentin Schneider <[email protected]>
arch_scale_freq_invariant() is used by schedutil to determine whether
the scheduler's load-tracking signals are frequency invariant. Its
definition is overridable, though by default it is hardcoded to 'true'
if arch_scale_freq_capacity() is defined ('false' otherwise).
This behaviour is not overridden on arm, arm64 and other users of the
generic arch topology driver, which is somewhat precarious:
arch_scale_freq_capacity() will always be defined, yet not all cpufreq
drivers are guaranteed to drive the frequency invariance scale factor
setting. In other words, the load-tracking signals may very well *not*
be frequency invariant.
Now that cpufreq can be queried on whether the current driver is driving
the Frequency Invariance (FI) scale setting, the current situation can
be improved. This combines the query of whether cpufreq supports the
setting of the frequency scale factor, with whether all online CPUs are
counter-based FI enabled.
While cpufreq FI enablement applies at system level, for all CPUs,
counter-based FI support could also be used for only a subset of CPUs to
set the invariance scale factor. Therefore, if cpufreq-based FI support
is present, we consider the system to be invariant. If missing, we
require all online CPUs to be counter-based FI enabled in order for the
full system to be considered invariant.
If the system ends up not being invariant, a new condition is needed in
the counter initialization code that disables all scale factor setting
based on counters.
Precedence of counters over cpufreq use is not important here. The
invariant status is only given to the system if all CPUs have at least
one method of setting the frequency scale factor.
Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Ionela Voinescu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Sudeep Holla <[email protected]>
---
arch/arm64/kernel/topology.c | 7 +++++++
drivers/base/arch_topology.c | 6 ++++++
include/linux/arch_topology.h | 4 ++++
3 files changed, 17 insertions(+)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 9a9f2b8dedf5..4064d39bb66d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -246,6 +246,13 @@ static int __init init_amu_fie(void)
static_branch_enable(&amu_fie_key);
}
+ /*
+ * If the system is not fully invariant after AMU init, disable
+ * partial use of counters for frequency invariance.
+ */
+ if (!topology_scale_freq_invariant())
+ static_branch_disable(&amu_fie_key);
+
free_valid_mask:
free_cpumask_var(valid_cpus);
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 8447e1f30340..8686771b866d 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -21,6 +21,12 @@
#include <linux/sched.h>
#include <linux/smp.h>
+bool topology_scale_freq_invariant(void)
+{
+ return cpufreq_sets_freq_scale() ||
+ arch_freq_counters_available(cpu_online_mask);
+}
+
__weak bool arch_freq_counters_available(const struct cpumask *cpus)
{
return false;
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 4be0315700cb..6272fbc05cde 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -53,6 +53,10 @@ struct cpu_topology {
};
#ifdef CONFIG_GENERIC_ARCH_TOPOLOGY
+
+extern bool topology_scale_freq_invariant(void);
+#define arch_scale_freq_invariant() topology_scale_freq_invariant()
+
extern struct cpu_topology cpu_topology[NR_CPUS];
#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id)
--
2.17.1
As a result of setting the frequency scale factor in cpufreq core, after
callbacks that lend themselves to trigger it, remove this functionality
from the driver side.
For these drivers, the CPUFREQ_CUSTOM_SET_FREQ_SCALE flag is also removed,
to enable the use of the generic code on the cpufreq core side.
Signed-off-by: Ionela Voinescu <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq-dt.c | 13 ++-----------
drivers/cpufreq/qcom-cpufreq-hw.c | 12 ++----------
drivers/cpufreq/scmi-cpufreq.c | 21 ++++++---------------
drivers/cpufreq/scpi-cpufreq.c | 6 +-----
4 files changed, 11 insertions(+), 41 deletions(-)
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index 8e0571a49d1e..9fd4ce774f12 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -40,16 +40,8 @@ static int set_target(struct cpufreq_policy *policy, unsigned int index)
{
struct private_data *priv = policy->driver_data;
unsigned long freq = policy->freq_table[index].frequency;
- int ret;
-
- ret = dev_pm_opp_set_rate(priv->cpu_dev, freq * 1000);
- if (!ret) {
- arch_set_freq_scale(policy->related_cpus, freq,
- policy->cpuinfo.max_freq);
- }
-
- return ret;
+ return dev_pm_opp_set_rate(priv->cpu_dev, freq * 1000);
}
/*
@@ -331,8 +323,7 @@ static int cpufreq_exit(struct cpufreq_policy *policy)
static struct cpufreq_driver dt_cpufreq_driver = {
.flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK |
- CPUFREQ_IS_COOLING_DEV |
- CPUFREQ_CUSTOM_SET_FREQ_SCALE,
+ CPUFREQ_IS_COOLING_DEV,
.verify = cpufreq_generic_frequency_table_verify,
.target_index = set_target,
.get = cpufreq_generic_get,
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index e13780beb373..e5d1ee7746a4 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -85,8 +85,6 @@ static int qcom_cpufreq_hw_target_index(struct cpufreq_policy *policy,
if (icc_scaling_enabled)
qcom_cpufreq_set_bw(policy, freq);
- arch_set_freq_scale(policy->related_cpus, freq,
- policy->cpuinfo.max_freq);
return 0;
}
@@ -113,7 +111,6 @@ static unsigned int qcom_cpufreq_hw_fast_switch(struct cpufreq_policy *policy,
{
void __iomem *perf_state_reg = policy->driver_data;
int index;
- unsigned long freq;
index = policy->cached_resolved_idx;
if (index < 0)
@@ -121,11 +118,7 @@ static unsigned int qcom_cpufreq_hw_fast_switch(struct cpufreq_policy *policy,
writel_relaxed(index, perf_state_reg);
- freq = policy->freq_table[index].frequency;
- arch_set_freq_scale(policy->related_cpus, freq,
- policy->cpuinfo.max_freq);
-
- return freq;
+ return policy->freq_table[index].frequency;
}
static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
@@ -337,8 +330,7 @@ static struct freq_attr *qcom_cpufreq_hw_attr[] = {
static struct cpufreq_driver cpufreq_qcom_hw_driver = {
.flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK |
CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
- CPUFREQ_IS_COOLING_DEV |
- CPUFREQ_CUSTOM_SET_FREQ_SCALE,
+ CPUFREQ_IS_COOLING_DEV,
.verify = cpufreq_generic_frequency_table_verify,
.target_index = qcom_cpufreq_hw_target_index,
.get = qcom_cpufreq_hw_get,
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index 16ab4ecc75e4..a91a45c90274 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -48,16 +48,11 @@ static unsigned int scmi_cpufreq_get_rate(unsigned int cpu)
static int
scmi_cpufreq_set_target(struct cpufreq_policy *policy, unsigned int index)
{
- int ret;
struct scmi_data *priv = policy->driver_data;
struct scmi_perf_ops *perf_ops = handle->perf_ops;
u64 freq = policy->freq_table[index].frequency;
- ret = perf_ops->freq_set(handle, priv->domain_id, freq * 1000, false);
- if (!ret)
- arch_set_freq_scale(policy->related_cpus, freq,
- policy->cpuinfo.max_freq);
- return ret;
+ return perf_ops->freq_set(handle, priv->domain_id, freq * 1000, false);
}
static unsigned int scmi_cpufreq_fast_switch(struct cpufreq_policy *policy,
@@ -66,14 +61,11 @@ static unsigned int scmi_cpufreq_fast_switch(struct cpufreq_policy *policy,
struct scmi_data *priv = policy->driver_data;
struct scmi_perf_ops *perf_ops = handle->perf_ops;
- if (!perf_ops->freq_set(handle, priv->domain_id,
- target_freq * 1000, true)) {
- arch_set_freq_scale(policy->related_cpus, target_freq,
- policy->cpuinfo.max_freq);
- return target_freq;
- }
+ if (perf_ops->freq_set(handle, priv->domain_id,
+ target_freq * 1000, true))
+ return 0;
- return 0;
+ return target_freq;
}
static int
@@ -223,8 +215,7 @@ static struct cpufreq_driver scmi_cpufreq_driver = {
.name = "scmi",
.flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
CPUFREQ_NEED_INITIAL_FREQ_CHECK |
- CPUFREQ_IS_COOLING_DEV |
- CPUFREQ_CUSTOM_SET_FREQ_SCALE,
+ CPUFREQ_IS_COOLING_DEV,
.verify = cpufreq_generic_frequency_table_verify,
.attr = cpufreq_generic_attr,
.target_index = scmi_cpufreq_set_target,
diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c
index 6b5f56dc3ca3..5a399fb847b9 100644
--- a/drivers/cpufreq/scpi-cpufreq.c
+++ b/drivers/cpufreq/scpi-cpufreq.c
@@ -60,9 +60,6 @@ scpi_cpufreq_set_target(struct cpufreq_policy *policy, unsigned int index)
if (clk_get_rate(priv->clk) != rate)
return -EIO;
- arch_set_freq_scale(policy->related_cpus, freq,
- policy->cpuinfo.max_freq);
-
return 0;
}
@@ -197,8 +194,7 @@ static struct cpufreq_driver scpi_cpufreq_driver = {
.name = "scpi-cpufreq",
.flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
CPUFREQ_NEED_INITIAL_FREQ_CHECK |
- CPUFREQ_IS_COOLING_DEV |
- CPUFREQ_CUSTOM_SET_FREQ_SCALE,
+ CPUFREQ_IS_COOLING_DEV,
.verify = cpufreq_generic_frequency_table_verify,
.attr = cpufreq_generic_attr,
.get = scpi_cpufreq_get_rate,
--
2.17.1
On 01-07-20, 10:07, Ionela Voinescu wrote:
> From: Valentin Schneider <[email protected]>
>
> To properly scale its per-entity load-tracking signals, the task scheduler
> needs to be given a frequency scale factor, i.e. some image of the current
> frequency the CPU is running at. Currently, this scale can be computed
> either by using counters (APERF/MPERF on x86, AMU on arm64), or by
> piggy-backing on the frequency selection done by cpufreq.
>
> For the latter, drivers have to explicitly set the scale factor
> themselves, despite it being purely boiler-plate code: the required
> information depends entirely on the kind of frequency switch callback
> implemented by the driver, i.e. either of: target_index(), target(),
> fast_switch() and setpolicy().
>
> The fitness of those callbacks with regard to driving the Frequency
> Invariance Engine (FIE) is studied below:
>
> target_index()
> ==============
> Documentation states that the chosen frequency "must be determined by
> freq_table[index].frequency". It isn't clear if it *has* to be that
> frequency, or if it can use that frequency value to do some computation
> that ultimately leads to a different frequency selection. All drivers
> go for the former, while the vexpress-spc-cpufreq has an atypical
> implementation.
>
> Thefore, the hook works on the asusmption the core can use
> freq_table[index].frequency.
>
> target()
> =======
> This has been flagged as deprecated since:
>
> commit 9c0ebcf78fde ("cpufreq: Implement light weight ->target_index() routine")
>
> It also doesn't have that many users:
>
> cpufreq-nforce2.c:371:2: .target = nforce2_target,
> cppc_cpufreq.c:416:2: .target = cppc_cpufreq_set_target,
> pcc-cpufreq.c:573:2: .target = pcc_cpufreq_target,
>
> Should we care about drivers using this hook, we may be able to exploit
> cpufreq_freq_transition_{being, end}(). Otherwise, if FIE support is
> desired in their current state, arch_set_freq_scale() could still be
> called directly by the driver, while CPUFREQ_CUSTOM_SET_FREQ_SCALE
> could be used to mark support for it.
>
> fast_switch()
> =============
> This callback *has* to return the frequency that was selected.
>
> setpolicy()
> ===========
> This callback does not have any designated way of informing what was the
> end choice. But there are only two drivers using setpolicy(), and none
> of them have current FIE support:
>
> drivers/cpufreq/longrun.c:281: .setpolicy = longrun_set_policy,
> drivers/cpufreq/intel_pstate.c:2215: .setpolicy = intel_pstate_set_policy,
>
> The intel_pstate is known to use counter-driven frequency invariance.
Same for acpi-cpufreq driver as well ?
And I think we should do the freq-invariance thing for all the above categories
nevertheless.
> If FIE support is desired in their current state, arch_set_freq_scale()
> could still be called directly by the driver, while
> CPUFREQ_CUSTOM_SET_FREQ_SCALE could be used to mark support for it.
>
> Conclusion
> ==========
>
> Given that the significant majority of current FIE enabled drivers use
> callbacks that lend themselves to triggering the setting of the FIE scale
> factor in a generic way, move the invariance setter calls to cpufreq core,
> while filtering drivers that flag custom support using
> CPUFREQ_CUSTOM_SET_FREQ_SCALE.
>
> Signed-off-by: Valentin Schneider <[email protected]>
> Signed-off-by: Ionela Voinescu <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> ---
> drivers/cpufreq/cpufreq.c | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 0128de3603df..83b58483a39b 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2046,9 +2046,16 @@ EXPORT_SYMBOL(cpufreq_unregister_notifier);
> unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
> unsigned int target_freq)
> {
> + unsigned int freq;
> +
> target_freq = clamp_val(target_freq, policy->min, policy->max);
> + freq = cpufreq_driver->fast_switch(policy, target_freq);
> +
> + if (freq && !(cpufreq_driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE))
> + arch_set_freq_scale(policy->related_cpus, freq,
> + policy->cpuinfo.max_freq);
This needs to be a separate function.
>
> - return cpufreq_driver->fast_switch(policy, target_freq);
> + return freq;
> }
> EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
>
> @@ -2140,7 +2147,7 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
> unsigned int relation)
> {
> unsigned int old_target_freq = target_freq;
> - int index;
> + int index, retval;
>
> if (cpufreq_disabled())
> return -ENODEV;
> @@ -2171,7 +2178,14 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
>
> index = cpufreq_frequency_table_target(policy, target_freq, relation);
>
> - return __target_index(policy, index);
> + retval = __target_index(policy, index);
> +
> + if (!retval && !(cpufreq_driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE))
> + arch_set_freq_scale(policy->related_cpus,
> + policy->freq_table[index].frequency,
policy->cur gets updated for both target and target_index type drivers. You can
use that safely. It gets updated after the postchange notification.
> + policy->cpuinfo.max_freq);
> +
> + return retval;
> }
> EXPORT_SYMBOL_GPL(__cpufreq_driver_target);
--
viresh
On 01-07-20, 10:07, Ionela Voinescu wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 83b58483a39b..60b5272c5d80 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -59,6 +59,9 @@ static struct cpufreq_driver *cpufreq_driver;
> static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
> static DEFINE_RWLOCK(cpufreq_driver_lock);
>
> +/* Mark support for the scheduler's frequency invariance engine */
> +static DEFINE_STATIC_KEY_FALSE(cpufreq_set_freq_scale);
> +
> /* Flag to suspend/resume CPUFreq governors */
> static bool cpufreq_suspended;
>
> @@ -67,6 +70,26 @@ static inline bool has_target(void)
> return cpufreq_driver->target_index || cpufreq_driver->target;
> }
>
> +static inline
> +void enable_cpufreq_freq_invariance(struct cpufreq_driver *driver)
> +{
> + if ((driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE) ||
> + ((driver->target_index || driver->fast_switch)
> + && !(driver->target || driver->setpolicy))) {
This will get simplified with the way I suggested it I believe.
> +
> + static_branch_enable_cpuslocked(&cpufreq_set_freq_scale);
> + pr_debug("%s: Driver %s can provide frequency invariance.",
> + __func__, driver->name);
> + } else
> + pr_err("%s: Driver %s cannot provide frequency invariance.",
> + __func__, driver->name);
> +}
> +
> +bool cpufreq_sets_freq_scale(void)
> +{
> + return static_branch_likely(&cpufreq_set_freq_scale);
> +}
> +
> /* internal prototypes */
> static unsigned int __cpufreq_get(struct cpufreq_policy *policy);
> static int cpufreq_init_governor(struct cpufreq_policy *policy);
> @@ -2713,6 +2736,8 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
> cpufreq_driver = driver_data;
> write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>
> + enable_cpufreq_freq_invariance(cpufreq_driver);
> +
> if (driver_data->setpolicy)
> driver_data->flags |= CPUFREQ_CONST_LOOPS;
>
> @@ -2782,6 +2807,7 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver)
> cpus_read_lock();
> subsys_interface_unregister(&cpufreq_interface);
> remove_boost_sysfs_file();
> + static_branch_disable_cpuslocked(&cpufreq_set_freq_scale);
> cpuhp_remove_state_nocalls_cpuslocked(hp_online);
>
> write_lock_irqsave(&cpufreq_driver_lock, flags);
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 42668588f9f8..8b6369d657bd 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -217,6 +217,7 @@ void refresh_frequency_limits(struct cpufreq_policy *policy);
> void cpufreq_update_policy(unsigned int cpu);
> void cpufreq_update_limits(unsigned int cpu);
> bool have_governor_per_policy(void);
> +bool cpufreq_sets_freq_scale(void);
> struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
> void cpufreq_enable_fast_switch(struct cpufreq_policy *policy);
> void cpufreq_disable_fast_switch(struct cpufreq_policy *policy);
> @@ -237,6 +238,10 @@ static inline unsigned int cpufreq_get_hw_max_freq(unsigned int cpu)
> {
> return 0;
> }
> +static inline bool cpufreq_sets_freq_scale(void)
> +{
> + return false;
> +}
> static inline void disable_cpufreq(void) { }
> #endif
>
> --
> 2.17.1
--
viresh
On 01-07-20, 10:07, Ionela Voinescu wrote:
> In the majority of cases, the index argument to cpufreq's target_index()
> is meant to identify the frequency that is requested from the hardware,
> according to the frequency table: policy->freq_table[index].frequency.
>
> After successfully requesting it from the hardware, this value, together
> with the maximum hardware frequency (policy->cpuinfo.max_freq) are used
> as arguments to arch_set_freq_scale(), in order to set the task scheduler
> frequency scale factor. This is a normalized indication of a CPU's
> current performance.
>
> But for the vexpress-spc-cpufreq driver, when big.LITTLE switching [1]
> is enabled, there are three issues with using the above information for
> setting the FI scale factor:
>
> - cur_freq: policy->freq_table[index].frequency is not the frequency
> requested from the hardware. ve_spc_cpufreq_set_rate() will convert
> from this virtual frequency to an actual frequency, which is then
> requested from the hardware. For the A7 cluster, the virtual frequency
> is half the actual frequency. The use of the virtual policy->freq_table
> frequency results in an incorrect FI scale factor.
>
> - max_freq: policy->cpuinfo.max_freq does not correctly identify the
> maximum frequency of the physical cluster. This value identifies the
> maximum frequency achievable by the big-LITTLE pair, that is the
> maximum frequency of the big CPU. But when the LITTLE CPU in the group
> is used, the hardware maximum frquency passed to arch_set_freq_scale()
> is incorrect.
>
> - missing a scale factor update: when switching clusters, the driver
> recalculates the frequency of the old clock domain based on the
> requests of the remaining CPUs in the domain and asks for a clock
> change. But this does not result in an update in the scale factor.
>
> Therefore, introduce a local function bLs_set_sched_freq_scale() that
> helps call arch_set_freq_scale() with correct information for the
> is_bL_switching_enabled() case, while maintaining the old, more
> efficient, call site of arch_set_freq_scale() for when cluster
> switching is disabled.
>
> Also, because of these requirements in computing the scale factor, this
> driver is the only one that maintains custom support for FI, which is
> marked by the presence of the CPUFREQ_CUSTOM_SET_FREQ_SCALE flag.
>
> [1] https://lwn.net/Articles/481055/
>
> Signed-off-by: Ionela Voinescu <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Cc: Sudeep Holla <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Liviu Dudau <[email protected]>
> ---
> drivers/cpufreq/vexpress-spc-cpufreq.c | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
Is there anyone who cares for this driver and EAS ? I will just skip doing the
FIE thing here and mark it skipped.
--
viresh
On 01-07-20, 10:07, Ionela Voinescu wrote:
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 3494f6763597..42668588f9f8 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name)
>
> struct cpufreq_driver {
> char name[CPUFREQ_NAME_LEN];
> - u8 flags;
> + u16 flags;
Lets make it u32.
> void *driver_data;
>
> /* needed by all drivers */
> @@ -417,6 +417,14 @@ struct cpufreq_driver {
> */
> #define CPUFREQ_IS_COOLING_DEV BIT(7)
>
> +/*
> + * Set by drivers which implement the necessary calls to the scheduler's
> + * frequency invariance engine. The use of this flag will result in the
> + * default arch_set_freq_scale calls being skipped in favour of custom
> + * driver calls.
> + */
> +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE BIT(8)
I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
functionality. We need to give drivers a choice if they do not want
the core to do it on their behalf, because they are doing it on their
own or they don't want to do it.
--
viresh
Hi,
Thank you for taking a look over these so quickly.
On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> On 01-07-20, 10:07, Ionela Voinescu wrote:
> > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > index 3494f6763597..42668588f9f8 100644
> > --- a/include/linux/cpufreq.h
> > +++ b/include/linux/cpufreq.h
> > @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name)
> >
> > struct cpufreq_driver {
> > char name[CPUFREQ_NAME_LEN];
> > - u8 flags;
> > + u16 flags;
>
> Lets make it u32.
>
> > void *driver_data;
> >
> > /* needed by all drivers */
> > @@ -417,6 +417,14 @@ struct cpufreq_driver {
> > */
> > #define CPUFREQ_IS_COOLING_DEV BIT(7)
> >
> > +/*
> > + * Set by drivers which implement the necessary calls to the scheduler's
> > + * frequency invariance engine. The use of this flag will result in the
> > + * default arch_set_freq_scale calls being skipped in favour of custom
> > + * driver calls.
> > + */
> > +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE BIT(8)
>
> I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
> functionality. We need to give drivers a choice if they do not want
> the core to do it on their behalf, because they are doing it on their
> own or they don't want to do it.
>
In this case we would not be able to tell if cpufreq (driver or core)
can provide the frequency scale factor, so we would not be able to tell
if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE
would be set if either:
- the driver calls arch_set_freq_scale() on its own
- the driver does not want arch_set_freq_scale() to be called.
So at the core level we would not be able to distinguish between the
two, and return whether cpufreq-based invariance is supported.
I don't really see a reason why a driver would not want to set the
frequency scale factor, if it has the proper mechanisms to do so
(therefore excluding the exceptions mentioned in 2/8). I think the
cpufreq core or drivers should produce the information (set the scale
factor) and it should be up to the users to decide whether to use it or
not. But being invariant should always be the default.
Therefore, there are a few reasons I went for
CPUFREQ_CUSTOM_SET_FREQ_SCALE instead:
- It tells us if the driver has custom mechanisms to set the scale
factor to filter the setting in cpufreq core and to inform the
core on whether the system is frequency invariant.
- It does have a user in the vexpress-spc driver.
- Currently there aren't drivers that could but choose not to set
the frequency scale factor, and it my opinion this should not be
the case.
Thanks,
Ionela.
> --
> viresh
On Wednesday 01 Jul 2020 at 16:16:19 (+0530), Viresh Kumar wrote:
> On 01-07-20, 10:07, Ionela Voinescu wrote:
> > In the majority of cases, the index argument to cpufreq's target_index()
> > is meant to identify the frequency that is requested from the hardware,
> > according to the frequency table: policy->freq_table[index].frequency.
> >
> > After successfully requesting it from the hardware, this value, together
> > with the maximum hardware frequency (policy->cpuinfo.max_freq) are used
> > as arguments to arch_set_freq_scale(), in order to set the task scheduler
> > frequency scale factor. This is a normalized indication of a CPU's
> > current performance.
> >
> > But for the vexpress-spc-cpufreq driver, when big.LITTLE switching [1]
> > is enabled, there are three issues with using the above information for
> > setting the FI scale factor:
> >
> > - cur_freq: policy->freq_table[index].frequency is not the frequency
> > requested from the hardware. ve_spc_cpufreq_set_rate() will convert
> > from this virtual frequency to an actual frequency, which is then
> > requested from the hardware. For the A7 cluster, the virtual frequency
> > is half the actual frequency. The use of the virtual policy->freq_table
> > frequency results in an incorrect FI scale factor.
> >
> > - max_freq: policy->cpuinfo.max_freq does not correctly identify the
> > maximum frequency of the physical cluster. This value identifies the
> > maximum frequency achievable by the big-LITTLE pair, that is the
> > maximum frequency of the big CPU. But when the LITTLE CPU in the group
> > is used, the hardware maximum frquency passed to arch_set_freq_scale()
> > is incorrect.
> >
> > - missing a scale factor update: when switching clusters, the driver
> > recalculates the frequency of the old clock domain based on the
> > requests of the remaining CPUs in the domain and asks for a clock
> > change. But this does not result in an update in the scale factor.
> >
> > Therefore, introduce a local function bLs_set_sched_freq_scale() that
> > helps call arch_set_freq_scale() with correct information for the
> > is_bL_switching_enabled() case, while maintaining the old, more
> > efficient, call site of arch_set_freq_scale() for when cluster
> > switching is disabled.
> >
> > Also, because of these requirements in computing the scale factor, this
> > driver is the only one that maintains custom support for FI, which is
> > marked by the presence of the CPUFREQ_CUSTOM_SET_FREQ_SCALE flag.
> >
> > [1] https://lwn.net/Articles/481055/
> >
> > Signed-off-by: Ionela Voinescu <[email protected]>
> > Cc: Viresh Kumar <[email protected]>
> > Cc: Sudeep Holla <[email protected]>
> > Cc: Rafael J. Wysocki <[email protected]>
> > Cc: Liviu Dudau <[email protected]>
> > ---
> > drivers/cpufreq/vexpress-spc-cpufreq.c | 23 ++++++++++++++++++++++-
> > 1 file changed, 22 insertions(+), 1 deletion(-)
>
> Is there anyone who cares for this driver and EAS ? I will just skip doing the
> FIE thing here and mark it skipped.
>
That is a good question. The vexpress driver is still used for TC2, but
I don't know of any users of this bL switcher functionality that's part
of the driver. I think there were a few people wondering recently if
it's still used [1].
If we disconsider the bL switcher functionality, then the vexpress
driver itself gets in line with the other drivers supported by this
series. Therefore there would not be a flag set needed here. Also, to
maintain current functionality, we would not need to introduce a flag
at all.
But, the frequency invariance fix is also useful for schedutil to
better select a frequency based on invariant utilization. So if we
independently decide having a flag like the one introduced at 1/8 is
useful, I would recommend to consider this patch, as it does fix some
current functionality in the kernel (whether we can determine if it's
used much or not).
Therefore, IMO, if it's not used any more it should be removed, but if
it's kept it should be fixed.
[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
Thanks,
Ionela.
> --
> viresh
Hey,
On Wednesday 01 Jul 2020 at 16:16:19 (+0530), Viresh Kumar wrote:
> On 01-07-20, 10:07, Ionela Voinescu wrote:
> > From: Valentin Schneider <[email protected]>
> >
> > To properly scale its per-entity load-tracking signals, the task scheduler
> > needs to be given a frequency scale factor, i.e. some image of the current
> > frequency the CPU is running at. Currently, this scale can be computed
> > either by using counters (APERF/MPERF on x86, AMU on arm64), or by
> > piggy-backing on the frequency selection done by cpufreq.
> >
> > For the latter, drivers have to explicitly set the scale factor
> > themselves, despite it being purely boiler-plate code: the required
> > information depends entirely on the kind of frequency switch callback
> > implemented by the driver, i.e. either of: target_index(), target(),
> > fast_switch() and setpolicy().
> >
> > The fitness of those callbacks with regard to driving the Frequency
> > Invariance Engine (FIE) is studied below:
> >
> > target_index()
> > ==============
> > Documentation states that the chosen frequency "must be determined by
> > freq_table[index].frequency". It isn't clear if it *has* to be that
> > frequency, or if it can use that frequency value to do some computation
> > that ultimately leads to a different frequency selection. All drivers
> > go for the former, while the vexpress-spc-cpufreq has an atypical
> > implementation.
> >
> > Thefore, the hook works on the asusmption the core can use
> > freq_table[index].frequency.
> >
> > target()
> > =======
> > This has been flagged as deprecated since:
> >
> > commit 9c0ebcf78fde ("cpufreq: Implement light weight ->target_index() routine")
> >
> > It also doesn't have that many users:
> >
> > cpufreq-nforce2.c:371:2: .target = nforce2_target,
> > cppc_cpufreq.c:416:2: .target = cppc_cpufreq_set_target,
> > pcc-cpufreq.c:573:2: .target = pcc_cpufreq_target,
> >
> > Should we care about drivers using this hook, we may be able to exploit
> > cpufreq_freq_transition_{being, end}(). Otherwise, if FIE support is
> > desired in their current state, arch_set_freq_scale() could still be
> > called directly by the driver, while CPUFREQ_CUSTOM_SET_FREQ_SCALE
> > could be used to mark support for it.
> >
> > fast_switch()
> > =============
> > This callback *has* to return the frequency that was selected.
> >
> > setpolicy()
> > ===========
> > This callback does not have any designated way of informing what was the
> > end choice. But there are only two drivers using setpolicy(), and none
> > of them have current FIE support:
> >
> > drivers/cpufreq/longrun.c:281: .setpolicy = longrun_set_policy,
> > drivers/cpufreq/intel_pstate.c:2215: .setpolicy = intel_pstate_set_policy,
> >
> > The intel_pstate is known to use counter-driven frequency invariance.
>
> Same for acpi-cpufreq driver as well ?
>
The acpi-cpufreq driver defines target_index() and fast_switch() so it
should go through the setting in cpufreq core. But x86 does not actually
define arch_set_freq_scale() so when called it won't do anything (won't
set any frequency scale factor), but rely on counters to set it through
the arch_scale_freq_tick(). But this cpufreq functionality could
potentially be used.
> And I think we should do the freq-invariance thing for all the above categories
> nevertheless.
>
I'm not sure what you mean by this. You mean we should also (try to) set
the frequency scale factor for drivers defining setpolicy() and target()?
> > If FIE support is desired in their current state, arch_set_freq_scale()
> > could still be called directly by the driver, while
> > CPUFREQ_CUSTOM_SET_FREQ_SCALE could be used to mark support for it.
> >
> > Conclusion
> > ==========
> >
> > Given that the significant majority of current FIE enabled drivers use
> > callbacks that lend themselves to triggering the setting of the FIE scale
> > factor in a generic way, move the invariance setter calls to cpufreq core,
> > while filtering drivers that flag custom support using
> > CPUFREQ_CUSTOM_SET_FREQ_SCALE.
> >
> > Signed-off-by: Valentin Schneider <[email protected]>
> > Signed-off-by: Ionela Voinescu <[email protected]>
> > Cc: Rafael J. Wysocki <[email protected]>
> > Cc: Viresh Kumar <[email protected]>
> > ---
> > drivers/cpufreq/cpufreq.c | 20 +++++++++++++++++---
> > 1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 0128de3603df..83b58483a39b 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -2046,9 +2046,16 @@ EXPORT_SYMBOL(cpufreq_unregister_notifier);
> > unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
> > unsigned int target_freq)
> > {
> > + unsigned int freq;
> > +
> > target_freq = clamp_val(target_freq, policy->min, policy->max);
> > + freq = cpufreq_driver->fast_switch(policy, target_freq);
> > +
>
> > + if (freq && !(cpufreq_driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE))
> > + arch_set_freq_scale(policy->related_cpus, freq,
> > + policy->cpuinfo.max_freq);
>
> This needs to be a separate function.
>
Yes, that would be nicer.
> >
> > - return cpufreq_driver->fast_switch(policy, target_freq);
> > + return freq;
> > }
> > EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
> >
> > @@ -2140,7 +2147,7 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
> > unsigned int relation)
> > {
> > unsigned int old_target_freq = target_freq;
> > - int index;
> > + int index, retval;
> >
> > if (cpufreq_disabled())
> > return -ENODEV;
> > @@ -2171,7 +2178,14 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
> >
> > index = cpufreq_frequency_table_target(policy, target_freq, relation);
> >
> > - return __target_index(policy, index);
> > + retval = __target_index(policy, index);
> > +
> > + if (!retval && !(cpufreq_driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE))
> > + arch_set_freq_scale(policy->related_cpus,
> > + policy->freq_table[index].frequency,
>
> policy->cur gets updated for both target and target_index type drivers. You can
> use that safely. It gets updated after the postchange notification.
>
This would allow us to cover the drivers that define target() as well (not
only target_index() and fast_switch()). Looking over the code we only take
that path (calling cpufreq_freq_transition_end()), for
!CPUFREQ_ASYNC_NOTIFICATION. But again, that's only used for
powernow-k8 which is deprecated.
I'll attempt a nice way to use this.
Thank you very much for the review,
Ionela.
> > + policy->cpuinfo.max_freq);
> > +
> > + return retval;
> > }
> > EXPORT_SYMBOL_GPL(__cpufreq_driver_target);
>
> --
> viresh
On Wed, Jul 1, 2020 at 5:28 PM Ionela Voinescu <[email protected]> wrote:
>
> Hey,
>
> On Wednesday 01 Jul 2020 at 16:16:19 (+0530), Viresh Kumar wrote:
> > On 01-07-20, 10:07, Ionela Voinescu wrote:
> > > From: Valentin Schneider <[email protected]>
> > >
> > > To properly scale its per-entity load-tracking signals, the task scheduler
> > > needs to be given a frequency scale factor, i.e. some image of the current
> > > frequency the CPU is running at. Currently, this scale can be computed
> > > either by using counters (APERF/MPERF on x86, AMU on arm64), or by
> > > piggy-backing on the frequency selection done by cpufreq.
> > >
> > > For the latter, drivers have to explicitly set the scale factor
> > > themselves, despite it being purely boiler-plate code: the required
> > > information depends entirely on the kind of frequency switch callback
> > > implemented by the driver, i.e. either of: target_index(), target(),
> > > fast_switch() and setpolicy().
> > >
> > > The fitness of those callbacks with regard to driving the Frequency
> > > Invariance Engine (FIE) is studied below:
> > >
> > > target_index()
> > > ==============
> > > Documentation states that the chosen frequency "must be determined by
> > > freq_table[index].frequency". It isn't clear if it *has* to be that
> > > frequency, or if it can use that frequency value to do some computation
> > > that ultimately leads to a different frequency selection. All drivers
> > > go for the former, while the vexpress-spc-cpufreq has an atypical
> > > implementation.
> > >
> > > Thefore, the hook works on the asusmption the core can use
> > > freq_table[index].frequency.
> > >
> > > target()
> > > =======
> > > This has been flagged as deprecated since:
> > >
> > > commit 9c0ebcf78fde ("cpufreq: Implement light weight ->target_index() routine")
> > >
> > > It also doesn't have that many users:
> > >
> > > cpufreq-nforce2.c:371:2: .target = nforce2_target,
> > > cppc_cpufreq.c:416:2: .target = cppc_cpufreq_set_target,
> > > pcc-cpufreq.c:573:2: .target = pcc_cpufreq_target,
> > >
> > > Should we care about drivers using this hook, we may be able to exploit
> > > cpufreq_freq_transition_{being, end}(). Otherwise, if FIE support is
> > > desired in their current state, arch_set_freq_scale() could still be
> > > called directly by the driver, while CPUFREQ_CUSTOM_SET_FREQ_SCALE
> > > could be used to mark support for it.
> > >
> > > fast_switch()
> > > =============
> > > This callback *has* to return the frequency that was selected.
> > >
> > > setpolicy()
> > > ===========
> > > This callback does not have any designated way of informing what was the
> > > end choice. But there are only two drivers using setpolicy(), and none
> > > of them have current FIE support:
> > >
> > > drivers/cpufreq/longrun.c:281: .setpolicy = longrun_set_policy,
> > > drivers/cpufreq/intel_pstate.c:2215: .setpolicy = intel_pstate_set_policy,
> > >
> > > The intel_pstate is known to use counter-driven frequency invariance.
> >
> > Same for acpi-cpufreq driver as well ?
> >
>
> The acpi-cpufreq driver defines target_index() and fast_switch() so it
> should go through the setting in cpufreq core. But x86 does not actually
> define arch_set_freq_scale() so when called it won't do anything (won't
> set any frequency scale factor), but rely on counters to set it through
> the arch_scale_freq_tick().
Right.
So on x86 (Intel flavor of it at least), cpufreq has nothing to do
with this regardless of what driver is in use.
> But this cpufreq functionality could potentially be used.
How so?
>
> > And I think we should do the freq-invariance thing for all the above categories
> > nevertheless.
> >
>
> I'm not sure what you mean by this. You mean we should also (try to) set
> the frequency scale factor for drivers defining setpolicy() and target()?
No, we shouldn't.
The sched tick potentially does that already and nothing more needs to
be done unless we know it for the fact that the scale factor is not
set by the tick.
> > > If FIE support is desired in their current state, arch_set_freq_scale()
> > > could still be called directly by the driver, while
> > > CPUFREQ_CUSTOM_SET_FREQ_SCALE could be used to mark support for it.
> > >
> > > Conclusion
> > > ==========
> > >
> > > Given that the significant majority of current FIE enabled drivers use
> > > callbacks that lend themselves to triggering the setting of the FIE scale
> > > factor in a generic way, move the invariance setter calls to cpufreq core,
> > > while filtering drivers that flag custom support using
> > > CPUFREQ_CUSTOM_SET_FREQ_SCALE.
> > >
> > > Signed-off-by: Valentin Schneider <[email protected]>
> > > Signed-off-by: Ionela Voinescu <[email protected]>
> > > Cc: Rafael J. Wysocki <[email protected]>
> > > Cc: Viresh Kumar <[email protected]>
> > > ---
> > > drivers/cpufreq/cpufreq.c | 20 +++++++++++++++++---
> > > 1 file changed, 17 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index 0128de3603df..83b58483a39b 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -2046,9 +2046,16 @@ EXPORT_SYMBOL(cpufreq_unregister_notifier);
> > > unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
> > > unsigned int target_freq)
> > > {
> > > + unsigned int freq;
> > > +
> > > target_freq = clamp_val(target_freq, policy->min, policy->max);
> > > + freq = cpufreq_driver->fast_switch(policy, target_freq);
> > > +
> >
> > > + if (freq && !(cpufreq_driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE))
> > > + arch_set_freq_scale(policy->related_cpus, freq,
> > > + policy->cpuinfo.max_freq);
policy->cpuinfo.max_freq need not be the one to use in all cases when
boost is supported.
policy->cpuinfo.max_freq may be the max boost freq and you may want to
scale with respect to the max sustainable one anyway.
> > This needs to be a separate function.
> >
>
> Yes, that would be nicer.
>
> > >
> > > - return cpufreq_driver->fast_switch(policy, target_freq);
> > > + return freq;
> > > }
> > > EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
> > >
> > > @@ -2140,7 +2147,7 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
> > > unsigned int relation)
> > > {
> > > unsigned int old_target_freq = target_freq;
> > > - int index;
> > > + int index, retval;
> > >
> > > if (cpufreq_disabled())
> > > return -ENODEV;
> > > @@ -2171,7 +2178,14 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
> > >
> > > index = cpufreq_frequency_table_target(policy, target_freq, relation);
> > >
> > > - return __target_index(policy, index);
> > > + retval = __target_index(policy, index);
> > > +
> > > + if (!retval && !(cpufreq_driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE))
> > > + arch_set_freq_scale(policy->related_cpus,
> > > + policy->freq_table[index].frequency,
> >
> > policy->cur gets updated for both target and target_index type drivers. You can
> > use that safely. It gets updated after the postchange notification.
> >
>
> This would allow us to cover the drivers that define target() as well (not
> only target_index() and fast_switch()). Looking over the code we only take
> that path (calling cpufreq_freq_transition_end()), for
> !CPUFREQ_ASYNC_NOTIFICATION. But again, that's only used for
> powernow-k8 which is deprecated.
>
> I'll attempt a nice way to use this.
On arches like x86, policy->cur may not be the current frequency of
the CPU, though. On relatively recent systems it actually isn't that
frequency most of the time.
Thanks!
On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <[email protected]> wrote:
>
> Hi,
>
> Thank you for taking a look over these so quickly.
>
> On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> > On 01-07-20, 10:07, Ionela Voinescu wrote:
> > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > > index 3494f6763597..42668588f9f8 100644
> > > --- a/include/linux/cpufreq.h
> > > +++ b/include/linux/cpufreq.h
> > > @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name)
> > >
> > > struct cpufreq_driver {
> > > char name[CPUFREQ_NAME_LEN];
> > > - u8 flags;
> > > + u16 flags;
> >
> > Lets make it u32.
> >
> > > void *driver_data;
> > >
> > > /* needed by all drivers */
> > > @@ -417,6 +417,14 @@ struct cpufreq_driver {
> > > */
> > > #define CPUFREQ_IS_COOLING_DEV BIT(7)
> > >
> > > +/*
> > > + * Set by drivers which implement the necessary calls to the scheduler's
> > > + * frequency invariance engine. The use of this flag will result in the
> > > + * default arch_set_freq_scale calls being skipped in favour of custom
> > > + * driver calls.
> > > + */
> > > +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE BIT(8)
> >
> > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
> > functionality. We need to give drivers a choice if they do not want
> > the core to do it on their behalf, because they are doing it on their
> > own or they don't want to do it.
Well, this would go backwards to me, as we seem to be designing an
opt-out flag for something that's not even implemented already.
I would go for an opt-in instead. That would be much cleaner and less
prone to regressions IMO.
>
> In this case we would not be able to tell if cpufreq (driver or core)
> can provide the frequency scale factor, so we would not be able to tell
> if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE
> would be set if either:
> - the driver calls arch_set_freq_scale() on its own
> - the driver does not want arch_set_freq_scale() to be called.
>
> So at the core level we would not be able to distinguish between the
> two, and return whether cpufreq-based invariance is supported.
>
> I don't really see a reason why a driver would not want to set the
> frequency scale factor, if it has the proper mechanisms to do so
> (therefore excluding the exceptions mentioned in 2/8). I think the
> cpufreq core or drivers should produce the information (set the scale
> factor) and it should be up to the users to decide whether to use it or
> not. But being invariant should always be the default.
So instead of what is being introduced by this patch, there should be
an opt-in mechanism for drivers to tell the core to do the freq-scale
factor setting on behalf of the driver.
Then, the driver would be responsible to only opt-in for that if it
knows it for a fact that the sched tick doesn't set the freq-scale
factor.
> Therefore, there are a few reasons I went for
> CPUFREQ_CUSTOM_SET_FREQ_SCALE instead:
> - It tells us if the driver has custom mechanisms to set the scale
> factor to filter the setting in cpufreq core and to inform the
> core on whether the system is frequency invariant.
> - It does have a user in the vexpress-spc driver.
> - Currently there aren't drivers that could but choose not to set
> the frequency scale factor, and it my opinion this should not be
> the case.
Well, that depends on what you mean by "could".
For example, it doesn't really make sense to set the freq-scale factor
in either the ACPI cpufreq driver or intel_pstate, because the
frequency (or P-state to be precise) requested by them may not be the
one the CPU ends up running at and even so it may change at any time
for various reasons (eg. in the turbo range). However, the ACPI
cpufreq driver as well as intel_pstate in the passive mode both set
policy->cur, so that might be used for setting the freq-scale factor
in principle, but that freq-scale factor may not be very useful in
practice.
Thanks!
Hi Rafael,
Thank you for the review!
On Wednesday 01 Jul 2020 at 18:05:33 (+0200), Rafael J. Wysocki wrote:
> On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <[email protected]> wrote:
> >
> > Hi,
> >
> > Thank you for taking a look over these so quickly.
> >
> > On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> > > On 01-07-20, 10:07, Ionela Voinescu wrote:
> > > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > > > index 3494f6763597..42668588f9f8 100644
> > > > --- a/include/linux/cpufreq.h
> > > > +++ b/include/linux/cpufreq.h
> > > > @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name)
> > > >
> > > > struct cpufreq_driver {
> > > > char name[CPUFREQ_NAME_LEN];
> > > > - u8 flags;
> > > > + u16 flags;
> > >
> > > Lets make it u32.
> > >
> > > > void *driver_data;
> > > >
> > > > /* needed by all drivers */
> > > > @@ -417,6 +417,14 @@ struct cpufreq_driver {
> > > > */
> > > > #define CPUFREQ_IS_COOLING_DEV BIT(7)
> > > >
> > > > +/*
> > > > + * Set by drivers which implement the necessary calls to the scheduler's
> > > > + * frequency invariance engine. The use of this flag will result in the
> > > > + * default arch_set_freq_scale calls being skipped in favour of custom
> > > > + * driver calls.
> > > > + */
> > > > +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE BIT(8)
> > >
> > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
> > > functionality. We need to give drivers a choice if they do not want
> > > the core to do it on their behalf, because they are doing it on their
> > > own or they don't want to do it.
>
> Well, this would go backwards to me, as we seem to be designing an
> opt-out flag for something that's not even implemented already.
>
> I would go for an opt-in instead. That would be much cleaner and less
> prone to regressions IMO.
>
> >
> > In this case we would not be able to tell if cpufreq (driver or core)
> > can provide the frequency scale factor, so we would not be able to tell
> > if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE
> > would be set if either:
> > - the driver calls arch_set_freq_scale() on its own
> > - the driver does not want arch_set_freq_scale() to be called.
> >
> > So at the core level we would not be able to distinguish between the
> > two, and return whether cpufreq-based invariance is supported.
> >
> > I don't really see a reason why a driver would not want to set the
> > frequency scale factor, if it has the proper mechanisms to do so
> > (therefore excluding the exceptions mentioned in 2/8). I think the
> > cpufreq core or drivers should produce the information (set the scale
> > factor) and it should be up to the users to decide whether to use it or
> > not. But being invariant should always be the default.
>
> So instead of what is being introduced by this patch, there should be
> an opt-in mechanism for drivers to tell the core to do the freq-scale
> factor setting on behalf of the driver.
>
This could work better as it covers the following scenarios:
- All the drivers in patch 3/8 would just use the flag to inform the
the core that it can call arch_set_freq_scale() on their behalf.
- It being omitted truly conveys the message that cpufreq information
should not be used for frequency invariance, no matter the
implementation of arch_set_freq_scale() (more details below)
The only case that it does not cover is is the scenario in patch 4/8:
one in which the driver is atypical and it needs its own calls to
arch_set_freq_scale(), while it still wants to be able to report support
for frequency invariance through cpufreq_sets_freq_scale() and later
arch_scale_freq_invariant(). But the jury is still out on whether that
part of the vexpress-spc driver should be given that much consideration.
My choice of flag was considering this case and potentially other future
ones like it, but this alternative also sounds good to me.
> Then, the driver would be responsible to only opt-in for that if it
> knows it for a fact that the sched tick doesn't set the freq-scale
> factor.
>
I think that would create a tight coupling between the driver and the
architecture, when arch_set_freq_scale() is already meant to have the
same purpose, but it also provides some flexibility. Let me expand on
this below.
> > Therefore, there are a few reasons I went for
> > CPUFREQ_CUSTOM_SET_FREQ_SCALE instead:
> > - It tells us if the driver has custom mechanisms to set the scale
> > factor to filter the setting in cpufreq core and to inform the
> > core on whether the system is frequency invariant.
> > - It does have a user in the vexpress-spc driver.
> > - Currently there aren't drivers that could but choose not to set
> > the frequency scale factor, and it my opinion this should not be
> > the case.
>
> Well, that depends on what you mean by "could".
>
> For example, it doesn't really make sense to set the freq-scale factor
> in either the ACPI cpufreq driver or intel_pstate, because the
> frequency (or P-state to be precise) requested by them may not be the
> one the CPU ends up running at and even so it may change at any time
> for various reasons (eg. in the turbo range). However, the ACPI
> cpufreq driver as well as intel_pstate in the passive mode both set
> policy->cur, so that might be used for setting the freq-scale factor
> in principle, but that freq-scale factor may not be very useful in
> practice.
>
Yes, this completely makes sense, and if there are more accurate methods
of obtaining information about the current performance level, by using
counters for example, they should definitely be used.
But in my opinion it should not be up to the driver to choose between
the methods. The driver and core would only have some information on the
current performance level (more or less accurate) and
arch_set_freq_scale() is called to *potentially* use it to set the scale
factor. So the use of policy->cur would be entirely dependent on the
implementation of arch_set_freq_scale().
There could be a few scenarios here:
- arch_set_freq_scale() is left to its weak default that does nothing
(which would be the case for when the ACPI cpufreq driver or
intel_psate are used)
- arch_set_freq_scale() is implemented in such a way that takes into
account the presence of a counter-based method of setting the scale
factor and makes that take precedence (currently done for the users
of the arch_topology driver). This also provides support for platforms
that have partial support for counters, where the use of cpufreq
information is still useful for the CPUs that don't support counters.
For those cases, some information, although not entirely accurate,
is still better than no information at all.
So I believe cpufreq should just provide the information, if it can,
and let the user decide whether to use it, or what source of information
takes precedence. Therefore, arch_set_freq_scale() would decide to
whether to filter it out.
In any case, your suggestion regarding the choice of flag would make
bypassing the use of cpufreq information in setting the scale factor
explicit, no matter the definition of arch_set_freq_scale(). But it
would also require writers of cpufreq driver code to remember to
consider the setting of that flag.
I'll consider this more while gauging interest in 4/8.
Many thanks,
Ionela.
> Thanks!
On 01-07-20, 18:05, Rafael J. Wysocki wrote:
> On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <[email protected]> wrote:
> > On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
> > > functionality. We need to give drivers a choice if they do not want
> > > the core to do it on their behalf, because they are doing it on their
> > > own or they don't want to do it.
>
> Well, this would go backwards to me, as we seem to be designing an
> opt-out flag for something that's not even implemented already.
>
> I would go for an opt-in instead. That would be much cleaner and less
> prone to regressions IMO.
That's fine, I just wanted an option for drivers to opt-out of this
thing. I felt okay with the opt-out flag as this should be enabled for
most of the drivers and so enabling by default looked okay as well.
> > In this case we would not be able to tell if cpufreq (driver or core)
> > can provide the frequency scale factor, so we would not be able to tell
> > if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE
That is easy to fix. Let the drivers call
enable_cpufreq_freq_invariance() and set the flag.
> > would be set if either:
> > - the driver calls arch_set_freq_scale() on its own
> > - the driver does not want arch_set_freq_scale() to be called.
> >
> > So at the core level we would not be able to distinguish between the
> > two, and return whether cpufreq-based invariance is supported.
> >
> > I don't really see a reason why a driver would not want to set the
> > frequency scale factor
A simple case where the driver doesn't have any idea what the real
freq of the CPU is and it doesn't have counters to guess it as well.
There can be other reasons which we aren't able to imagine at this
point of time.
--
viresh
On 01-07-20, 17:51, Rafael J. Wysocki wrote:
> On Wed, Jul 1, 2020 at 5:28 PM Ionela Voinescu <[email protected]> wrote:
> > On Wednesday 01 Jul 2020 at 16:16:19 (+0530), Viresh Kumar wrote:
> > > On 01-07-20, 10:07, Ionela Voinescu wrote:
> > > > setpolicy()
> > > > ===========
> > > > This callback does not have any designated way of informing what was the
> > > > end choice. But there are only two drivers using setpolicy(), and none
> > > > of them have current FIE support:
> > > >
> > > > drivers/cpufreq/longrun.c:281: .setpolicy = longrun_set_policy,
> > > > drivers/cpufreq/intel_pstate.c:2215: .setpolicy = intel_pstate_set_policy,
> > > >
> > > > The intel_pstate is known to use counter-driven frequency invariance.
> > >
> > > Same for acpi-cpufreq driver as well ?
> > >
> >
> > The acpi-cpufreq driver defines target_index() and fast_switch() so it
> > should go through the setting in cpufreq core. But x86 does not actually
> > define arch_set_freq_scale() so when called it won't do anything (won't
> > set any frequency scale factor), but rely on counters to set it through
> > the arch_scale_freq_tick().
>
> Right.
>
> So on x86 (Intel flavor of it at least), cpufreq has nothing to do
> with this regardless of what driver is in use.
--
viresh
On 01-07-20, 15:07, Ionela Voinescu wrote:
> On Wednesday 01 Jul 2020 at 16:16:19 (+0530), Viresh Kumar wrote:
> > Is there anyone who cares for this driver and EAS ? I will just skip doing the
> > FIE thing here and mark it skipped.
>
> That is a good question. The vexpress driver is still used for TC2, but
> I don't know of any users of this bL switcher functionality that's part
> of the driver. I think there were a few people wondering recently if
> it's still used [1].
Even if it is used by some, there is no need, I believe, to enable
freq-invariance for it, which wasn't enabled until now.
And considering that we are going to enable the flag only for the
interested parties now, as from the discussion on 1/8, this shouldn't
be required.
--
viresh
Hi,
On Thursday 02 Jul 2020 at 08:35:51 (+0530), Viresh Kumar wrote:
> On 01-07-20, 15:07, Ionela Voinescu wrote:
> > On Wednesday 01 Jul 2020 at 16:16:19 (+0530), Viresh Kumar wrote:
> > > Is there anyone who cares for this driver and EAS ? I will just skip doing the
> > > FIE thing here and mark it skipped.
> >
> > That is a good question. The vexpress driver is still used for TC2, but
> > I don't know of any users of this bL switcher functionality that's part
> > of the driver. I think there were a few people wondering recently if
> > it's still used [1].
>
> Even if it is used by some, there is no need, I believe, to enable
> freq-invariance for it, which wasn't enabled until now.
>
It was enabled until now, but it was partially broken. If you look over
the driver you'll see arch_set_freq_scale() being called for both
is_bL_switching_enabled() and for when it's not [1].
For !is_bL_switching_enabled() this is fine. But for
is_bL_switching_enabled(), it is broken as described in 4/8.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/cpufreq/vexpress-spc-cpufreq.c?h=v5.8-rc3#n203
> And considering that we are going to enable the flag only for the
> interested parties now, as from the discussion on 1/8, this shouldn't
> be required.
>
If we just don't want frequency invariance for
is_bL_switching_enabled(), I can just guard the setting of the flag
suggested by Rafael at 1/8 by !CONFIG_BL_SWITCHER.
I'll proceed to do that and remove the fix at 4/8.
Many thanks!
Ionela.
> --
> viresh
Hi,
On Thursday 02 Jul 2020 at 08:28:18 (+0530), Viresh Kumar wrote:
> On 01-07-20, 18:05, Rafael J. Wysocki wrote:
> > On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <[email protected]> wrote:
> > > On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> > > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
> > > > functionality. We need to give drivers a choice if they do not want
> > > > the core to do it on their behalf, because they are doing it on their
> > > > own or they don't want to do it.
> >
> > Well, this would go backwards to me, as we seem to be designing an
> > opt-out flag for something that's not even implemented already.
> >
> > I would go for an opt-in instead. That would be much cleaner and less
> > prone to regressions IMO.
>
> That's fine, I just wanted an option for drivers to opt-out of this
> thing. I felt okay with the opt-out flag as this should be enabled for
> most of the drivers and so enabling by default looked okay as well.
>
> > > In this case we would not be able to tell if cpufreq (driver or core)
> > > can provide the frequency scale factor, so we would not be able to tell
> > > if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE
>
> That is easy to fix. Let the drivers call
> enable_cpufreq_freq_invariance() and set the flag.
>
Right! I suppose part of "the dream" :) was for drivers to be ignorant of
frequency invariance, and for the core to figure out if it has proper
information to potentially* pass to the scheduler.
*potentially = depending on the arch_set_freq_scale() definition.
> > > would be set if either:
> > > - the driver calls arch_set_freq_scale() on its own
> > > - the driver does not want arch_set_freq_scale() to be called.
> > >
> > > So at the core level we would not be able to distinguish between the
> > > two, and return whether cpufreq-based invariance is supported.
> > >
> > > I don't really see a reason why a driver would not want to set the
> > > frequency scale factor
>
> A simple case where the driver doesn't have any idea what the real
> freq
For me, this would have been filtered by either the type of callback
they use (target_index(), fast_switch() and even target() would offer
some close to accurate indication of the current frequency, while
setpolicy() it obviously targets a range of frequencies) or by the
definition of arch_set_freq_scale().
> ..of the CPU is and it doesn't have counters to guess it as well.
>
> There can be other reasons which we aren't able to imagine at this
> point of time.
>
But I understand both the points you and Rafael raised so it's obvious
that a 'opt in' flag would be the better option.
Thank you both,
Ionela.
> --
> viresh
On 02-07-20, 12:41, Ionela Voinescu wrote:
> It was enabled until now, but it was partially broken. If you look over
> the driver you'll see arch_set_freq_scale() being called for both
> is_bL_switching_enabled() and for when it's not [1].
I missed that completely, it was indeed added here:
commit 518accf20629 ("cpufreq: arm_big_little: invoke frequency-invariance setter function")
and so this patch or a version of it is required here.
> If we just don't want frequency invariance for
> is_bL_switching_enabled(), I can just guard the setting of the flag
> suggested by Rafael at 1/8 by !CONFIG_BL_SWITCHER.
>
> I'll proceed to do that and remove the fix at 4/8.
I think it would be better to do that and avoid any complicate code
unnecessarily here.
--
viresh
Hi Rafael,
On Wednesday 01 Jul 2020 at 17:51:26 (+0200), Rafael J. Wysocki wrote:
> On Wed, Jul 1, 2020 at 5:28 PM Ionela Voinescu <[email protected]> wrote:
> >
> > Hey,
> >
> > On Wednesday 01 Jul 2020 at 16:16:19 (+0530), Viresh Kumar wrote:
> > > On 01-07-20, 10:07, Ionela Voinescu wrote:
> > > > From: Valentin Schneider <[email protected]>
> > > >
> > > > To properly scale its per-entity load-tracking signals, the task scheduler
> > > > needs to be given a frequency scale factor, i.e. some image of the current
> > > > frequency the CPU is running at. Currently, this scale can be computed
> > > > either by using counters (APERF/MPERF on x86, AMU on arm64), or by
> > > > piggy-backing on the frequency selection done by cpufreq.
> > > >
> > > > For the latter, drivers have to explicitly set the scale factor
> > > > themselves, despite it being purely boiler-plate code: the required
> > > > information depends entirely on the kind of frequency switch callback
> > > > implemented by the driver, i.e. either of: target_index(), target(),
> > > > fast_switch() and setpolicy().
> > > >
> > > > The fitness of those callbacks with regard to driving the Frequency
> > > > Invariance Engine (FIE) is studied below:
> > > >
> > > > target_index()
> > > > ==============
> > > > Documentation states that the chosen frequency "must be determined by
> > > > freq_table[index].frequency". It isn't clear if it *has* to be that
> > > > frequency, or if it can use that frequency value to do some computation
> > > > that ultimately leads to a different frequency selection. All drivers
> > > > go for the former, while the vexpress-spc-cpufreq has an atypical
> > > > implementation.
> > > >
> > > > Thefore, the hook works on the asusmption the core can use
> > > > freq_table[index].frequency.
> > > >
> > > > target()
> > > > =======
> > > > This has been flagged as deprecated since:
> > > >
> > > > commit 9c0ebcf78fde ("cpufreq: Implement light weight ->target_index() routine")
> > > >
> > > > It also doesn't have that many users:
> > > >
> > > > cpufreq-nforce2.c:371:2: .target = nforce2_target,
> > > > cppc_cpufreq.c:416:2: .target = cppc_cpufreq_set_target,
> > > > pcc-cpufreq.c:573:2: .target = pcc_cpufreq_target,
> > > >
> > > > Should we care about drivers using this hook, we may be able to exploit
> > > > cpufreq_freq_transition_{being, end}(). Otherwise, if FIE support is
> > > > desired in their current state, arch_set_freq_scale() could still be
> > > > called directly by the driver, while CPUFREQ_CUSTOM_SET_FREQ_SCALE
> > > > could be used to mark support for it.
> > > >
> > > > fast_switch()
> > > > =============
> > > > This callback *has* to return the frequency that was selected.
> > > >
> > > > setpolicy()
> > > > ===========
> > > > This callback does not have any designated way of informing what was the
> > > > end choice. But there are only two drivers using setpolicy(), and none
> > > > of them have current FIE support:
> > > >
> > > > drivers/cpufreq/longrun.c:281: .setpolicy = longrun_set_policy,
> > > > drivers/cpufreq/intel_pstate.c:2215: .setpolicy = intel_pstate_set_policy,
> > > >
> > > > The intel_pstate is known to use counter-driven frequency invariance.
> > >
> > > Same for acpi-cpufreq driver as well ?
> > >
> >
> > The acpi-cpufreq driver defines target_index() and fast_switch() so it
> > should go through the setting in cpufreq core. But x86 does not actually
> > define arch_set_freq_scale() so when called it won't do anything (won't
> > set any frequency scale factor), but rely on counters to set it through
> > the arch_scale_freq_tick().
>
> Right.
>
> So on x86 (Intel flavor of it at least), cpufreq has nothing to do
> with this regardless of what driver is in use.
>
> > But this cpufreq functionality could potentially be used.
>
> How so?
>
I was thinking of a scenario in which counters were not available and
cpufreq could give a rough indication of the current performance, if
arch_set_freq_scale() would be defined to pass that information.
It's improbable, but the implementation would allow it.
> >
> > > And I think we should do the freq-invariance thing for all the above categories
> > > nevertheless.
> > >
> >
> > I'm not sure what you mean by this. You mean we should also (try to) set
> > the frequency scale factor for drivers defining setpolicy() and target()?
>
> No, we shouldn't.
>
> The sched tick potentially does that already and nothing more needs to
> be done unless we know it for the fact that the scale factor is not
> set by the tick.
>
> > > > If FIE support is desired in their current state, arch_set_freq_scale()
> > > > could still be called directly by the driver, while
> > > > CPUFREQ_CUSTOM_SET_FREQ_SCALE could be used to mark support for it.
> > > >
> > > > Conclusion
> > > > ==========
> > > >
> > > > Given that the significant majority of current FIE enabled drivers use
> > > > callbacks that lend themselves to triggering the setting of the FIE scale
> > > > factor in a generic way, move the invariance setter calls to cpufreq core,
> > > > while filtering drivers that flag custom support using
> > > > CPUFREQ_CUSTOM_SET_FREQ_SCALE.
> > > >
> > > > Signed-off-by: Valentin Schneider <[email protected]>
> > > > Signed-off-by: Ionela Voinescu <[email protected]>
> > > > Cc: Rafael J. Wysocki <[email protected]>
> > > > Cc: Viresh Kumar <[email protected]>
> > > > ---
> > > > drivers/cpufreq/cpufreq.c | 20 +++++++++++++++++---
> > > > 1 file changed, 17 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > > index 0128de3603df..83b58483a39b 100644
> > > > --- a/drivers/cpufreq/cpufreq.c
> > > > +++ b/drivers/cpufreq/cpufreq.c
> > > > @@ -2046,9 +2046,16 @@ EXPORT_SYMBOL(cpufreq_unregister_notifier);
> > > > unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
> > > > unsigned int target_freq)
> > > > {
> > > > + unsigned int freq;
> > > > +
> > > > target_freq = clamp_val(target_freq, policy->min, policy->max);
> > > > + freq = cpufreq_driver->fast_switch(policy, target_freq);
> > > > +
> > >
> > > > + if (freq && !(cpufreq_driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE))
> > > > + arch_set_freq_scale(policy->related_cpus, freq,
> > > > + policy->cpuinfo.max_freq);
>
> policy->cpuinfo.max_freq need not be the one to use in all cases when
> boost is supported.
>
> policy->cpuinfo.max_freq may be the max boost freq and you may want to
> scale with respect to the max sustainable one anyway.
>
> > > This needs to be a separate function.
> > >
> >
> > Yes, that would be nicer.
> >
> > > >
> > > > - return cpufreq_driver->fast_switch(policy, target_freq);
> > > > + return freq;
> > > > }
> > > > EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
> > > >
> > > > @@ -2140,7 +2147,7 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
> > > > unsigned int relation)
> > > > {
> > > > unsigned int old_target_freq = target_freq;
> > > > - int index;
> > > > + int index, retval;
> > > >
> > > > if (cpufreq_disabled())
> > > > return -ENODEV;
> > > > @@ -2171,7 +2178,14 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
> > > >
> > > > index = cpufreq_frequency_table_target(policy, target_freq, relation);
> > > >
> > > > - return __target_index(policy, index);
> > > > + retval = __target_index(policy, index);
> > > > +
> > > > + if (!retval && !(cpufreq_driver->flags & CPUFREQ_CUSTOM_SET_FREQ_SCALE))
> > > > + arch_set_freq_scale(policy->related_cpus,
> > > > + policy->freq_table[index].frequency,
> > >
> > > policy->cur gets updated for both target and target_index type drivers. You can
> > > use that safely. It gets updated after the postchange notification.
> > >
> >
> > This would allow us to cover the drivers that define target() as well (not
> > only target_index() and fast_switch()). Looking over the code we only take
> > that path (calling cpufreq_freq_transition_end()), for
> > !CPUFREQ_ASYNC_NOTIFICATION. But again, that's only used for
> > powernow-k8 which is deprecated.
> >
> > I'll attempt a nice way to use this.
>
> On arches like x86, policy->cur may not be the current frequency of
> the CPU, though. On relatively recent systems it actually isn't that
> frequency most of the time.
>
Yes, as discussed on the other patches my reasoning was that
arch_set_freq_scale() would filter less accurate information from
cpufreq and give priority to counter use.
But I understand your reasoning on this, and that both you and Viresh
would prefer a more strict 'opt in' policy for which drivers are
appropriate for use with frequency invariance.
So I'll make the suggested changes.
Kind regards,
Ionela.
> Thanks!
On 02/07/2020 13:44, Ionela Voinescu wrote:
> Hi,
>
> On Thursday 02 Jul 2020 at 08:28:18 (+0530), Viresh Kumar wrote:
>> On 01-07-20, 18:05, Rafael J. Wysocki wrote:
>>> On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <[email protected]> wrote:
>>>> On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
[...]
>> There can be other reasons which we aren't able to imagine at this
>> point of time.
>>
>
> But I understand both the points you and Rafael raised so it's obvious
> that a 'opt in' flag would be the better option.
Why can't we just move the arch_set_freq_scale() call from cpufreq
driver to cpufreq core w/o introducing a FIE related driver flag?
Current scenario for Frequency Invariance Engine (FIE) on arm/arm64.
+------------------------------+ +------------------------------+
| | | |
| cpufreq core: | | arch: (arm, arm64) |
| | | |
| weak arch_set_freq_scale() {}| | |
| | | |
+------------------------------+ | |
| |
+------------------------------+ | |
| | | |
| cpufreq driver: | | |
| +-----------> arch_set_freq_scale() |
| | | { |
+------------------------------+ | if (use counters) |
| return; |
+------------------------------+ | ... |
| | | } |
| task scheduler: | | |
| +-----------> arch_scale_freq_tick()* |
| | | { |
| | | if (!use counters) |
| | | return; |
| | | ... |
| | | } |
+------------------------------+ +------------------------------+
* defined as topology_scale_freq_tick() in arm64
Only Arm/Arm64 defines arch_set_freq_scale() to get the 'legacy' CPUfreq
based FIE. This would still be the case when we move
arch_set_freq_scale() from individual cpufreq drivers to cpufreq core.
Arm64 is the only arch which has to runtime-choose between two different
FIEs. This is currently done by bailing out early in one of the FIE
functions based on 'use counters'.
X86 (and others) will continue to not define arch_set_freq_scale().
The issue with CONFIG_BL_SWITCHER (vexpress-spc-cpufreq.c) could be
solved arm/arm64 internally (arch_topology.c) by putting
arch_set_freq_scale() under a !CONFIG_BL_SWITCHER guard.
I doubt that there are any arm bL systems out there running it. At least
I'm not aware of any complaints due to missing FIE support in bl
switcher setups so far.
Hi guys,
On Monday 06 Jul 2020 at 14:14:47 (+0200), Dietmar Eggemann wrote:
> On 02/07/2020 13:44, Ionela Voinescu wrote:
> > Hi,
> >
> > On Thursday 02 Jul 2020 at 08:28:18 (+0530), Viresh Kumar wrote:
> >> On 01-07-20, 18:05, Rafael J. Wysocki wrote:
> >>> On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <[email protected]> wrote:
> >>>> On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
>
> [...]
>
> >> There can be other reasons which we aren't able to imagine at this
> >> point of time.
> >>
> >
> > But I understand both the points you and Rafael raised so it's obvious
> > that a 'opt in' flag would be the better option.
>
> Why can't we just move the arch_set_freq_scale() call from cpufreq
> driver to cpufreq core w/o introducing a FIE related driver flag?
>
> Current scenario for Frequency Invariance Engine (FIE) on arm/arm64.
>
> +------------------------------+ +------------------------------+
> | | | |
> | cpufreq core: | | arch: (arm, arm64) |
>
> | | | |
> | weak arch_set_freq_scale() {}| | |
> | | | |
> +------------------------------+ | |
> | |
> +------------------------------+ | |
> | | | |
> | cpufreq driver: | | |
> | +-----------> arch_set_freq_scale() |
> | | | { |
> +------------------------------+ | if (use counters) |
> | return; |
> +------------------------------+ | ... |
> | | | } |
> | task scheduler: | | |
> | +-----------> arch_scale_freq_tick()* |
> | | | { |
>
> | | | if (!use counters) |
> | | | return; |
> | | | ... |
> | | | } |
> +------------------------------+ +------------------------------+
>
> * defined as topology_scale_freq_tick() in arm64
>
> Only Arm/Arm64 defines arch_set_freq_scale() to get the 'legacy' CPUfreq
> based FIE. This would still be the case when we move
> arch_set_freq_scale() from individual cpufreq drivers to cpufreq core.
>
> Arm64 is the only arch which has to runtime-choose between two different
> FIEs. This is currently done by bailing out early in one of the FIE
> functions based on 'use counters'.
>
> X86 (and others) will continue to not define arch_set_freq_scale().
>
> The issue with CONFIG_BL_SWITCHER (vexpress-spc-cpufreq.c) could be
> solved arm/arm64 internally (arch_topology.c) by putting
> arch_set_freq_scale() under a !CONFIG_BL_SWITCHER guard.
> I doubt that there are any arm bL systems out there running it. At least
> I'm not aware of any complaints due to missing FIE support in bl
> switcher setups so far.
Thank you Dietmar, for your review.
I was trying to suggest the same in my other replies. Given that BL_SWITCHER
can be removed as an argument for introducing a flag, I would also find it
cleaner to just skip on introducing a flag altogether, at least until we
have a driver/scenario in the kernel that will functionally benefit from it.
This would also give us the chance to reconsider the best meaning of the
flag we later introduce.
The introduction of the 'opt in' flag would be the next best thing as
suggested in the other replies, but currently it would not result in
anything functionally different.
Rafael, Viresh, would you mind confirming whether you still consider
having an 'opt in' flag is preferable here?
Many thanks,
Ionela.
On 09-07-20, 09:53, Ionela Voinescu wrote:
> On Monday 06 Jul 2020 at 14:14:47 (+0200), Dietmar Eggemann wrote:
> > Why can't we just move the arch_set_freq_scale() call from cpufreq
> > driver to cpufreq core w/o introducing a FIE related driver flag?
> >
> > Current scenario for Frequency Invariance Engine (FIE) on arm/arm64.
> >
> > +------------------------------+ +------------------------------+
> > | | | |
> > | cpufreq core: | | arch: (arm, arm64) |
> >
> > | | | |
> > | weak arch_set_freq_scale() {}| | |
> > | | | |
> > +------------------------------+ | |
> > | |
> > +------------------------------+ | |
> > | | | |
> > | cpufreq driver: | | |
> > | +-----------> arch_set_freq_scale() |
> > | | | { |
> > +------------------------------+ | if (use counters) |
> > | return; |
> > +------------------------------+ | ... |
> > | | | } |
> > | task scheduler: | | |
> > | +-----------> arch_scale_freq_tick()* |
> > | | | { |
> >
> > | | | if (!use counters) |
> > | | | return; |
> > | | | ... |
> > | | | } |
> > +------------------------------+ +------------------------------+
> >
> > * defined as topology_scale_freq_tick() in arm64
> >
> > Only Arm/Arm64 defines arch_set_freq_scale() to get the 'legacy' CPUfreq
> > based FIE. This would still be the case when we move
> > arch_set_freq_scale() from individual cpufreq drivers to cpufreq core.
> >
> > Arm64 is the only arch which has to runtime-choose between two different
> > FIEs. This is currently done by bailing out early in one of the FIE
> > functions based on 'use counters'.
> >
> > X86 (and others) will continue to not define arch_set_freq_scale().
> >
> > The issue with CONFIG_BL_SWITCHER (vexpress-spc-cpufreq.c) could be
> > solved arm/arm64 internally (arch_topology.c) by putting
> > arch_set_freq_scale() under a !CONFIG_BL_SWITCHER guard.
> > I doubt that there are any arm bL systems out there running it. At least
> > I'm not aware of any complaints due to missing FIE support in bl
> > switcher setups so far.
I agree to that.
> Thank you Dietmar, for your review.
>
> I was trying to suggest the same in my other replies.
I am sorry, I must have overlooked that part in your replies,
otherwise I may agreed to it :)
> Rafael, Viresh, would you mind confirming whether you still consider
> having an 'opt in' flag is preferable here?
Well, we wanted an opt-in flag instead of an opt-out one. And no flag
is certainly better.
--
viresh