2023-10-18 16:26:31

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3 0/6] consolidate and cleanup CPU capacity

This is the 1st part of consolidating how the max compute capacity is
used in the scheduler and how we calculate the frequency for a level of
utilization.

Fix some unconsistancy when computing frequency for an utilization. There
can be a mismatch between energy model and schedutil.

Next step will be to make a difference between the original
max compute capacity of a CPU and what is currently available when
there is a capping applying forever (i.e. seconds or more).


Changes since v2:
- Remove the 1st patch which has been queued in tip
- Rework how to initialize the reference frequency for cppc_cpufreq and
change topology_init_cpu_capacity_cppc() to also set capacity_ref_freq
- Add a RFC to convert AMU to use arch_scale_freq_ref and move the config
of the AMU ratio to be done when intializing cpu capacity and
capacity_ref_freq
- Added some tags

Changes since v1:
- Fix typos
- Added changes in cpufreq to use arch_scale_freq_ref() when calling
arch_set_freq_scale (patch 3).
- arch_scale_freq_ref() is always defined and returns 0 (as proposed
by Ionela) when not defined by the arch. This simplifies the code with
the addition of patch 3.
- Simplify Energy Model which always uses arch_scale_freq_ref(). The
latter returns 0 when not defined by arch instead of last item of the
perf domain. This is not a problem because the function is only defined
for compilation purpose in this case and we don't care about the
returned value. (patch 5)
- Added changes in cppc cpufreq to set capacity_ref_freq (patch 6)
- Added reviewed tag for patch 1 which got a minor change but not for
others as I did some changes which could make previous reviewed tag
no more relevant.

Vincent Guittot (6):
topology: add a new arch_scale_freq_reference
cpufreq: use the fixed and coherent frequency for scaling capacity
cpufreq/schedutil: use a fixed reference frequency
energy_model: use a fixed reference frequency
cpufreq/cppc: set the frequency used for computing the capacity
arm64/amu: use capacity_ref_freq to set AMU ratio

arch/arm/include/asm/topology.h | 1 +
arch/arm64/include/asm/topology.h | 1 +
arch/arm64/kernel/topology.c | 18 ++--
arch/riscv/include/asm/topology.h | 1 +
drivers/acpi/cppc_acpi.c | 93 ++++++++++++++++++++
drivers/base/arch_topology.c | 56 ++++++++----
drivers/cpufreq/cppc_cpufreq.c | 141 +++++-------------------------
drivers/cpufreq/cpufreq.c | 4 +-
include/acpi/cppc_acpi.h | 2 +
include/linux/arch_topology.h | 8 ++
include/linux/cpufreq.h | 9 ++
include/linux/energy_model.h | 14 ++-
kernel/sched/cpufreq_schedutil.c | 26 +++++-
13 files changed, 225 insertions(+), 149 deletions(-)

--
2.34.1


2023-10-18 16:26:31

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3 5/6] cpufreq/cppc: set the frequency used for computing the capacity

Save the frequency associated to the performance that has been used when
initializing the capacity of CPUs.
Also, cppc cpufreq driver can register an artificial energy model. In such
case, it needs the frequency for this compute capacity.
We moved and renamed cppc_perf_to_khz and cppc_perf_to_khz to use them
outside cppc_cpufreq in topology_init_cpu_capacity_cppc().

Signed-off-by: Vincent Guittot <[email protected]>
---
drivers/acpi/cppc_acpi.c | 93 ++++++++++++++++++++++
drivers/base/arch_topology.c | 15 +++-
drivers/cpufreq/cppc_cpufreq.c | 141 ++++++---------------------------
include/acpi/cppc_acpi.h | 2 +
4 files changed, 133 insertions(+), 118 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 7ff269a78c20..72aae5e87788 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -39,6 +39,8 @@
#include <linux/rwsem.h>
#include <linux/wait.h>
#include <linux/topology.h>
+#include <linux/dmi.h>
+#include <asm/unaligned.h>

#include <acpi/cppc_acpi.h>

@@ -1760,3 +1762,94 @@ unsigned int cppc_get_transition_latency(int cpu_num)
return latency_ns;
}
EXPORT_SYMBOL_GPL(cppc_get_transition_latency);
+
+/* Minimum struct length needed for the DMI processor entry we want */
+#define DMI_ENTRY_PROCESSOR_MIN_LENGTH 48
+
+/* Offset in the DMI processor structure for the max frequency */
+#define DMI_PROCESSOR_MAX_SPEED 0x14
+
+/* Callback function used to retrieve the max frequency from DMI */
+static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
+{
+ const u8 *dmi_data = (const u8 *)dm;
+ u16 *mhz = (u16 *)private;
+
+ if (dm->type == DMI_ENTRY_PROCESSOR &&
+ dm->length >= DMI_ENTRY_PROCESSOR_MIN_LENGTH) {
+ u16 val = (u16)get_unaligned((const u16 *)
+ (dmi_data + DMI_PROCESSOR_MAX_SPEED));
+ *mhz = val > *mhz ? val : *mhz;
+ }
+}
+
+/* Look up the max frequency in DMI */
+static u64 cppc_get_dmi_max_khz(void)
+{
+ u16 mhz = 0;
+
+ dmi_walk(cppc_find_dmi_mhz, &mhz);
+
+ /*
+ * Real stupid fallback value, just in case there is no
+ * actual value set.
+ */
+ mhz = mhz ? mhz : 1;
+
+ return (1000 * mhz);
+}
+
+/*
+ * If CPPC lowest_freq and nominal_freq registers are exposed then we can
+ * use them to convert perf to freq and vice versa. The conversion is
+ * extrapolated as an affine function passing by the 2 points:
+ * - (Low perf, Low freq)
+ * - (Nominal perf, Nominal perf)
+ */
+unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf)
+{
+ s64 retval, offset = 0;
+ static u64 max_khz;
+ u64 mul, div;
+
+ if (caps->lowest_freq && caps->nominal_freq) {
+ mul = caps->nominal_freq - caps->lowest_freq;
+ div = caps->nominal_perf - caps->lowest_perf;
+ offset = caps->nominal_freq - div64_u64(caps->nominal_perf * mul, div);
+ } else {
+ if (!max_khz)
+ max_khz = cppc_get_dmi_max_khz();
+ mul = max_khz;
+ div = caps->highest_perf;
+ }
+
+ retval = offset + div64_u64(perf * mul, div);
+ if (retval >= 0)
+ return retval;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cppc_perf_to_khz);
+
+unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq)
+{
+ s64 retval, offset = 0;
+ static u64 max_khz;
+ u64 mul, div;
+
+ if (caps->lowest_freq && caps->nominal_freq) {
+ mul = caps->nominal_perf - caps->lowest_perf;
+ div = caps->nominal_freq - caps->lowest_freq;
+ offset = caps->nominal_perf - div64_u64(caps->nominal_freq * mul, div);
+ } else {
+ if (!max_khz)
+ max_khz = cppc_get_dmi_max_khz();
+ mul = caps->highest_perf;
+ div = max_khz;
+ }
+
+ retval = offset + div64_u64(freq * mul, div);
+ if (retval >= 0)
+ return retval;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cppc_khz_to_perf);
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 9a073c2d2086..2372ce791bb4 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -350,6 +350,7 @@ bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
void topology_init_cpu_capacity_cppc(void)
{
struct cppc_perf_caps perf_caps;
+ u64 capacity, capacity_scale;
int cpu;

if (likely(!acpi_cpc_valid()))
@@ -365,6 +366,10 @@ void topology_init_cpu_capacity_cppc(void)
(perf_caps.highest_perf >= perf_caps.nominal_perf) &&
(perf_caps.highest_perf >= perf_caps.lowest_perf)) {
raw_capacity[cpu] = perf_caps.highest_perf;
+ capacity_scale = max_t(u64, capacity_scale, raw_capacity[cpu]);
+
+ per_cpu(capacity_ref_freq, cpu) = cppc_perf_to_khz(&perf_caps, raw_capacity[cpu]);
+
pr_debug("cpu_capacity: CPU%d cpu_capacity=%u (raw).\n",
cpu, raw_capacity[cpu]);
continue;
@@ -375,7 +380,15 @@ void topology_init_cpu_capacity_cppc(void)
goto exit;
}

- topology_normalize_cpu_scale();
+ for_each_possible_cpu(cpu) {
+ capacity = raw_capacity[cpu];
+ capacity = div64_u64(capacity << SCHED_CAPACITY_SHIFT,
+ capacity_scale);
+ topology_set_cpu_scale(cpu, capacity);
+ pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n",
+ cpu, topology_get_cpu_scale(cpu));
+ }
+
schedule_work(&update_topology_flags_work);
pr_debug("cpu_capacity: cpu_capacity initialization done\n");

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index fe08ca419b3d..822376b0cb78 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -27,12 +27,6 @@

#include <acpi/cppc_acpi.h>

-/* Minimum struct length needed for the DMI processor entry we want */
-#define DMI_ENTRY_PROCESSOR_MIN_LENGTH 48
-
-/* Offset in the DMI processor structure for the max frequency */
-#define DMI_PROCESSOR_MAX_SPEED 0x14
-
/*
* This list contains information parsed from per CPU ACPI _CPC and _PSD
* structures: e.g. the highest and lowest supported performance, capabilities,
@@ -291,97 +285,9 @@ static inline void cppc_freq_invariance_exit(void)
}
#endif /* CONFIG_ACPI_CPPC_CPUFREQ_FIE */

-/* Callback function used to retrieve the max frequency from DMI */
-static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
-{
- const u8 *dmi_data = (const u8 *)dm;
- u16 *mhz = (u16 *)private;
-
- if (dm->type == DMI_ENTRY_PROCESSOR &&
- dm->length >= DMI_ENTRY_PROCESSOR_MIN_LENGTH) {
- u16 val = (u16)get_unaligned((const u16 *)
- (dmi_data + DMI_PROCESSOR_MAX_SPEED));
- *mhz = val > *mhz ? val : *mhz;
- }
-}
-
-/* Look up the max frequency in DMI */
-static u64 cppc_get_dmi_max_khz(void)
-{
- u16 mhz = 0;
-
- dmi_walk(cppc_find_dmi_mhz, &mhz);
-
- /*
- * Real stupid fallback value, just in case there is no
- * actual value set.
- */
- mhz = mhz ? mhz : 1;
-
- return (1000 * mhz);
-}
-
-/*
- * If CPPC lowest_freq and nominal_freq registers are exposed then we can
- * use them to convert perf to freq and vice versa. The conversion is
- * extrapolated as an affine function passing by the 2 points:
- * - (Low perf, Low freq)
- * - (Nominal perf, Nominal perf)
- */
-static unsigned int cppc_cpufreq_perf_to_khz(struct cppc_cpudata *cpu_data,
- unsigned int perf)
-{
- struct cppc_perf_caps *caps = &cpu_data->perf_caps;
- s64 retval, offset = 0;
- static u64 max_khz;
- u64 mul, div;
-
- if (caps->lowest_freq && caps->nominal_freq) {
- mul = caps->nominal_freq - caps->lowest_freq;
- div = caps->nominal_perf - caps->lowest_perf;
- offset = caps->nominal_freq - div64_u64(caps->nominal_perf * mul, div);
- } else {
- if (!max_khz)
- max_khz = cppc_get_dmi_max_khz();
- mul = max_khz;
- div = caps->highest_perf;
- }
-
- retval = offset + div64_u64(perf * mul, div);
- if (retval >= 0)
- return retval;
- return 0;
-}
-
-static unsigned int cppc_cpufreq_khz_to_perf(struct cppc_cpudata *cpu_data,
- unsigned int freq)
-{
- struct cppc_perf_caps *caps = &cpu_data->perf_caps;
- s64 retval, offset = 0;
- static u64 max_khz;
- u64 mul, div;
-
- if (caps->lowest_freq && caps->nominal_freq) {
- mul = caps->nominal_perf - caps->lowest_perf;
- div = caps->nominal_freq - caps->lowest_freq;
- offset = caps->nominal_perf - div64_u64(caps->nominal_freq * mul, div);
- } else {
- if (!max_khz)
- max_khz = cppc_get_dmi_max_khz();
- mul = caps->highest_perf;
- div = max_khz;
- }
-
- retval = offset + div64_u64(freq * mul, div);
- if (retval >= 0)
- return retval;
- return 0;
-}
-
static int cppc_cpufreq_set_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation)
-
{
struct cppc_cpudata *cpu_data = policy->driver_data;
unsigned int cpu = policy->cpu;
@@ -389,7 +295,7 @@ static int cppc_cpufreq_set_target(struct cpufreq_policy *policy,
u32 desired_perf;
int ret = 0;

- desired_perf = cppc_cpufreq_khz_to_perf(cpu_data, target_freq);
+ desired_perf = cppc_khz_to_perf(&cpu_data->perf_caps, target_freq);
/* Return if it is exactly the same perf */
if (desired_perf == cpu_data->perf_ctrls.desired_perf)
return ret;
@@ -417,7 +323,7 @@ static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
u32 desired_perf;
int ret;

- desired_perf = cppc_cpufreq_khz_to_perf(cpu_data, target_freq);
+ desired_perf = cppc_khz_to_perf(&cpu_data->perf_caps, target_freq);
cpu_data->perf_ctrls.desired_perf = desired_perf;
ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);

@@ -530,7 +436,7 @@ static int cppc_get_cpu_power(struct device *cpu_dev,
min_step = min_cap / CPPC_EM_CAP_STEP;
max_step = max_cap / CPPC_EM_CAP_STEP;

- perf_prev = cppc_cpufreq_khz_to_perf(cpu_data, *KHz);
+ perf_prev = cppc_khz_to_perf(perf_caps, *KHz);
step = perf_prev / perf_step;

if (step > max_step)
@@ -550,8 +456,8 @@ static int cppc_get_cpu_power(struct device *cpu_dev,
perf = step * perf_step;
}

- *KHz = cppc_cpufreq_perf_to_khz(cpu_data, perf);
- perf_check = cppc_cpufreq_khz_to_perf(cpu_data, *KHz);
+ *KHz = cppc_perf_to_khz(perf_caps, perf);
+ perf_check = cppc_khz_to_perf(perf_caps, *KHz);
step_check = perf_check / perf_step;

/*
@@ -561,8 +467,8 @@ static int cppc_get_cpu_power(struct device *cpu_dev,
*/
while ((*KHz == prev_freq) || (step_check != step)) {
perf++;
- *KHz = cppc_cpufreq_perf_to_khz(cpu_data, perf);
- perf_check = cppc_cpufreq_khz_to_perf(cpu_data, *KHz);
+ *KHz = cppc_perf_to_khz(perf_caps, perf);
+ perf_check = cppc_khz_to_perf(perf_caps, *KHz);
step_check = perf_check / perf_step;
}

@@ -591,7 +497,7 @@ static int cppc_get_cpu_cost(struct device *cpu_dev, unsigned long KHz,
perf_caps = &cpu_data->perf_caps;
max_cap = arch_scale_cpu_capacity(cpu_dev->id);

- perf_prev = cppc_cpufreq_khz_to_perf(cpu_data, KHz);
+ perf_prev = cppc_khz_to_perf(perf_caps, KHz);
perf_step = CPPC_EM_CAP_STEP * perf_caps->highest_perf / max_cap;
step = perf_prev / perf_step;

@@ -643,6 +549,7 @@ static void cppc_cpufreq_register_em(struct cpufreq_policy *policy)
EM_ADV_DATA_CB(cppc_get_cpu_power, cppc_get_cpu_cost);

cpu_data = policy->driver_data;
+
em_dev_register_perf_domain(get_cpu_device(policy->cpu),
get_perf_level_count(policy), &em_cb,
cpu_data->shared_cpu_map, 0);
@@ -724,20 +631,20 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
* Set min to lowest nonlinear perf to avoid any efficiency penalty (see
* Section 8.4.7.1.1.5 of ACPI 6.1 spec)
*/
- policy->min = cppc_cpufreq_perf_to_khz(cpu_data,
- caps->lowest_nonlinear_perf);
- policy->max = cppc_cpufreq_perf_to_khz(cpu_data,
- caps->nominal_perf);
+ policy->min = cppc_perf_to_khz(caps,
+ caps->lowest_nonlinear_perf);
+ policy->max = cppc_perf_to_khz(caps,
+ caps->nominal_perf);

/*
* Set cpuinfo.min_freq to Lowest to make the full range of performance
* available if userspace wants to use any perf between lowest & lowest
* nonlinear perf
*/
- policy->cpuinfo.min_freq = cppc_cpufreq_perf_to_khz(cpu_data,
- caps->lowest_perf);
- policy->cpuinfo.max_freq = cppc_cpufreq_perf_to_khz(cpu_data,
- caps->nominal_perf);
+ policy->cpuinfo.min_freq = cppc_perf_to_khz(caps,
+ caps->lowest_perf);
+ policy->cpuinfo.max_freq = cppc_perf_to_khz(caps,
+ caps->nominal_perf);

policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu);
policy->shared_type = cpu_data->shared_type;
@@ -773,7 +680,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
boost_supported = true;

/* Set policy->cur to max now. The governors will adjust later. */
- policy->cur = cppc_cpufreq_perf_to_khz(cpu_data, caps->highest_perf);
+ policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
cpu_data->perf_ctrls.desired_perf = caps->highest_perf;

ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
@@ -863,7 +770,7 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
delivered_perf = cppc_perf_from_fbctrs(cpu_data, &fb_ctrs_t0,
&fb_ctrs_t1);

- return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf);
+ return cppc_perf_to_khz(&cpu_data->perf_caps, delivered_perf);
}

static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
@@ -878,11 +785,11 @@ static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
}

if (state)
- policy->max = cppc_cpufreq_perf_to_khz(cpu_data,
- caps->highest_perf);
+ policy->max = cppc_perf_to_khz(caps,
+ caps->highest_perf);
else
- policy->max = cppc_cpufreq_perf_to_khz(cpu_data,
- caps->nominal_perf);
+ policy->max = cppc_perf_to_khz(caps,
+ caps->nominal_perf);
policy->cpuinfo.max_freq = policy->max;

ret = freq_qos_update_request(policy->max_freq_req, policy->max);
@@ -937,7 +844,7 @@ static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpu)
if (ret < 0)
return -EIO;

- return cppc_cpufreq_perf_to_khz(cpu_data, desired_perf);
+ return cppc_perf_to_khz(&cpu_data->perf_caps, desired_perf);
}

static void cppc_check_hisi_workaround(void)
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 6126c977ece0..3a0995f8bce8 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -144,6 +144,8 @@ extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
extern int cppc_set_enable(int cpu, bool enable);
extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
extern bool cppc_perf_ctrs_in_pcc(void);
+extern unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf);
+extern unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq);
extern bool acpi_cpc_valid(void);
extern bool cppc_allow_fast_switch(void);
extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data);
--
2.34.1

2023-10-18 16:26:32

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3 2/6] cpufreq: use the fixed and coherent frequency for scaling capacity

cpuinfo.max_freq can change at runtime because of boost as an example. This
implies that the value could be different from the frequency that has been
used to compute the capacity of a CPU.

The new arch_scale_freq_ref() returns a fixed and coherent frequency
that can be used to compute the capacity for a given frequency.

Signed-off-by: Vincent Guittot <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Reviewed-by: Lukasz Luba <[email protected]>
Tested-by: Lukasz Luba <[email protected]>

---
drivers/cpufreq/cpufreq.c | 4 ++--
include/linux/cpufreq.h | 9 +++++++++
2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 60ed89000e82..8c4f9c2f9c44 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -454,7 +454,7 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy,

arch_set_freq_scale(policy->related_cpus,
policy->cur,
- policy->cpuinfo.max_freq);
+ arch_scale_freq_ref(policy->cpu));

spin_lock(&policy->transition_lock);
policy->transition_ongoing = false;
@@ -2174,7 +2174,7 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,

policy->cur = freq;
arch_set_freq_scale(policy->related_cpus, freq,
- policy->cpuinfo.max_freq);
+ arch_scale_freq_ref(policy->cpu));
cpufreq_stats_record_transition(policy, freq);

if (trace_cpu_frequency_enabled()) {
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 71d186d6933a..bbc483b4b6e5 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -1211,6 +1211,15 @@ void arch_set_freq_scale(const struct cpumask *cpus,
{
}
#endif
+
+#ifndef arch_scale_freq_ref
+static __always_inline
+unsigned int arch_scale_freq_ref(int cpu)
+{
+ return 0;
+}
+#endif
+
/* the following are really really optional */
extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs;
extern struct freq_attr cpufreq_freq_attr_scaling_boost_freqs;
--
2.34.1

2023-10-18 16:26:33

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3 4/6] energy_model: use a fixed reference frequency

The last item of a performance domain is not always the performance point
that has been used to compute CPU's capacity. This can lead to different
target frequency compared with other part of the system like schedutil and
would result in wrong energy estimation.

A new arch_scale_freq_ref() is available to return a fixed and coherent
frequency reference that can be used when computing the CPU's frequency
for an level of utilization. Use this function to get this reference
frequency.

Energy model is never used without defining arch_scale_freq_ref() but
can be compiled. Define a default arch_scale_freq_ref() returning 0
in such case.

Signed-off-by: Vincent Guittot <[email protected]>
Reviewed-by: Lukasz Luba <[email protected]>
Tested-by: Lukasz Luba <[email protected]>

---
include/linux/energy_model.h | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index b9caa01dfac4..1b0c8490d4bd 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -204,6 +204,14 @@ struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd,
return ps;
}

+#ifndef arch_scale_freq_ref
+static __always_inline
+unsigned int arch_scale_freq_ref(int cpu)
+{
+ return 0;
+}
+#endif
+
/**
* em_cpu_energy() - Estimates the energy consumed by the CPUs of a
* performance domain
@@ -224,7 +232,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
unsigned long max_util, unsigned long sum_util,
unsigned long allowed_cpu_cap)
{
- unsigned long freq, scale_cpu;
+ unsigned long freq, ref_freq, scale_cpu;
struct em_perf_state *ps;
int cpu;

@@ -241,11 +249,11 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
*/
cpu = cpumask_first(to_cpumask(pd->cpus));
scale_cpu = arch_scale_cpu_capacity(cpu);
- ps = &pd->table[pd->nr_perf_states - 1];
+ ref_freq = arch_scale_freq_ref(cpu);

max_util = map_util_perf(max_util);
max_util = min(max_util, allowed_cpu_cap);
- freq = map_util_freq(max_util, ps->frequency, scale_cpu);
+ freq = map_util_freq(max_util, ref_freq, scale_cpu);

/*
* Find the lowest performance state of the Energy Model above the
--
2.34.1

2023-10-18 16:26:35

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3 3/6] cpufreq/schedutil: use a fixed reference frequency

cpuinfo.max_freq can change at runtime because of boost as an example. This
implies that the value could be different than the one that has been
used when computing the capacity of a CPU.

The new arch_scale_freq_ref() returns a fixed and coherent reference
frequency that can be used when computing a frequency based on utilization.

Use this arch_scale_freq_ref() when available and fallback to
policy otherwise.

Signed-off-by: Vincent Guittot <[email protected]>
Reviewed-by: Lukasz Luba <[email protected]>
Tested-by: Lukasz Luba <[email protected]>

---
kernel/sched/cpufreq_schedutil.c | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 458d359f5991..6e4030482ae8 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -114,6 +114,28 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy)
}
}

+/**
+ * cpufreq_get_capacity_ref_freq - get the reference frequency of a given CPU that
+ * has been used to correlate frequency and compute capacity.
+ * @policy: the cpufreq policy of the CPU in question.
+ * @use_current: Fallback to current freq instead of policy->cpuinfo.max_freq.
+ *
+ * Return: the reference CPU frequency to compute a capacity.
+ */
+static __always_inline
+unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
+{
+ unsigned int freq = arch_scale_freq_ref(policy->cpu);
+
+ if (freq)
+ return freq;
+
+ if (arch_scale_freq_invariant())
+ return policy->cpuinfo.max_freq;
+
+ return policy->cur;
+}
+
/**
* get_next_freq - Compute a new frequency for a given cpufreq policy.
* @sg_policy: schedutil policy object to compute the new frequency for.
@@ -140,10 +162,10 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
unsigned long util, unsigned long max)
{
struct cpufreq_policy *policy = sg_policy->policy;
- unsigned int freq = arch_scale_freq_invariant() ?
- policy->cpuinfo.max_freq : policy->cur;
+ unsigned int freq;

util = map_util_perf(util);
+ freq = get_capacity_ref_freq(policy);
freq = map_util_freq(util, freq, max);

if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
--
2.34.1

2023-10-18 16:26:37

by Vincent Guittot

[permalink] [raw]
Subject: [RFC v3 6/6] arm64/amu: use capacity_ref_freq to set AMU ratio

Use the new capacity_ref_freq to set the ratio that is used by AMU for
computing the arch_scale_freq_capacity().
This helps to keep everything aligned using the same reference for
computing CPUs capacity.

The default value of the ratio ensures that arch_scale_freq_capacity()
returns max capacity until it is set to its correct value with the
cpu capacity and capacity_ref_freq.

Signed-off-by: Vincent Guittot <[email protected]>
---
arch/arm64/kernel/topology.c | 18 ++++++++++--------
drivers/base/arch_topology.c | 14 ++++++++++++--
include/linux/arch_topology.h | 1 +
3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 817d788cd866..0f8f6e90c46d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -82,7 +82,12 @@ int __init parse_acpi_topology(void)
#undef pr_fmt
#define pr_fmt(fmt) "AMU: " fmt

-static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
+/*
+ * Ensure that amu_scale_freq_tick() will return SCHED_CAPACITY_SHIFT until
+ * the CPU capacity and its assosciated frequency have been correctly
+ * initialized.
+ */
+static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = (2 * SCHED_CAPACITY_SHIFT);
static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
static cpumask_var_t amu_fie_cpus;
@@ -112,9 +117,9 @@ static inline bool freq_counters_valid(int cpu)
return true;
}

-static int freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate)
+int freq_inv_set_max_ratio(int cpu, u64 max_rate)
{
- u64 ratio;
+ u64 ratio, ref_rate = arch_timer_get_rate();

if (unlikely(!max_rate || !ref_rate)) {
pr_debug("CPU%d: invalid maximum or reference frequency.\n",
@@ -142,7 +147,7 @@ static int freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate)
return -EINVAL;
}

- per_cpu(arch_max_freq_scale, cpu) = (unsigned long)ratio;
+ WRITE_ONCE(per_cpu(arch_max_freq_scale, cpu), (unsigned long)ratio);

return 0;
}
@@ -195,10 +200,7 @@ static void amu_fie_setup(const struct cpumask *cpus)
return;

for_each_cpu(cpu, cpus) {
- if (!freq_counters_valid(cpu) ||
- freq_inv_set_max_ratio(cpu,
- cpufreq_get_hw_max_freq(cpu) * 1000ULL,
- arch_timer_get_rate()))
+ if (!freq_counters_valid(cpu))
return;
}

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 2372ce791bb4..3a604b77b12d 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -344,6 +344,11 @@ bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
return !ret;
}

+int __weak freq_inv_set_max_ratio(int cpu, u64 max_rate)
+{
+ return 0;
+}
+
#ifdef CONFIG_ACPI_CPPC_LIB
#include <acpi/cppc_acpi.h>

@@ -369,7 +374,6 @@ void topology_init_cpu_capacity_cppc(void)
capacity_scale = max_t(u64, capacity_scale, raw_capacity[cpu]);

per_cpu(capacity_ref_freq, cpu) = cppc_perf_to_khz(&perf_caps, raw_capacity[cpu]);
-
pr_debug("cpu_capacity: CPU%d cpu_capacity=%u (raw).\n",
cpu, raw_capacity[cpu]);
continue;
@@ -381,6 +385,9 @@ void topology_init_cpu_capacity_cppc(void)
}

for_each_possible_cpu(cpu) {
+ freq_inv_set_max_ratio(cpu,
+ per_cpu(capacity_ref_freq, cpu));
+
capacity = raw_capacity[cpu];
capacity = div64_u64(capacity << SCHED_CAPACITY_SHIFT,
capacity_scale);
@@ -422,8 +429,11 @@ init_cpu_capacity_callback(struct notifier_block *nb,

cpumask_andnot(cpus_to_visit, cpus_to_visit, policy->related_cpus);

- for_each_cpu(cpu, policy->related_cpus)
+ for_each_cpu(cpu, policy->related_cpus) {
per_cpu(capacity_ref_freq, cpu) = policy->cpuinfo.max_freq;
+ freq_inv_set_max_ratio(cpu,
+ per_cpu(capacity_ref_freq, cpu));
+ }

if (cpumask_empty(cpus_to_visit)) {
topology_normalize_cpu_scale();
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 38ca6c76af56..b6e95d763279 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -99,6 +99,7 @@ void update_siblings_masks(unsigned int cpu);
void remove_cpu_topology(unsigned int cpuid);
void reset_cpu_topology(void);
int parse_acpi_topology(void);
+int freq_inv_set_max_ratio(int cpu, u64 max_rate);
#endif

#endif /* _LINUX_ARCH_TOPOLOGY_H_ */
--
2.34.1

2023-10-18 17:27:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] cpufreq/cppc: set the frequency used for computing the capacity

On Wed, Oct 18, 2023 at 6:25 PM Vincent Guittot
<[email protected]> wrote:
>
> Save the frequency associated to the performance that has been used when
> initializing the capacity of CPUs.
> Also, cppc cpufreq driver can register an artificial energy model. In such
> case, it needs the frequency for this compute capacity.
> We moved and renamed cppc_perf_to_khz and cppc_perf_to_khz to use them
> outside cppc_cpufreq in topology_init_cpu_capacity_cppc().
>
> Signed-off-by: Vincent Guittot <[email protected]>

For the changes in drivers/acpi/cppc_acpi.c :

Acked-by: Rafael J. Wysocki <[email protected]>

> ---
> drivers/acpi/cppc_acpi.c | 93 ++++++++++++++++++++++
> drivers/base/arch_topology.c | 15 +++-
> drivers/cpufreq/cppc_cpufreq.c | 141 ++++++---------------------------
> include/acpi/cppc_acpi.h | 2 +
> 4 files changed, 133 insertions(+), 118 deletions(-)
>
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index 7ff269a78c20..72aae5e87788 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -39,6 +39,8 @@
> #include <linux/rwsem.h>
> #include <linux/wait.h>
> #include <linux/topology.h>
> +#include <linux/dmi.h>
> +#include <asm/unaligned.h>
>
> #include <acpi/cppc_acpi.h>
>
> @@ -1760,3 +1762,94 @@ unsigned int cppc_get_transition_latency(int cpu_num)
> return latency_ns;
> }
> EXPORT_SYMBOL_GPL(cppc_get_transition_latency);
> +
> +/* Minimum struct length needed for the DMI processor entry we want */
> +#define DMI_ENTRY_PROCESSOR_MIN_LENGTH 48
> +
> +/* Offset in the DMI processor structure for the max frequency */
> +#define DMI_PROCESSOR_MAX_SPEED 0x14
> +
> +/* Callback function used to retrieve the max frequency from DMI */
> +static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
> +{
> + const u8 *dmi_data = (const u8 *)dm;
> + u16 *mhz = (u16 *)private;
> +
> + if (dm->type == DMI_ENTRY_PROCESSOR &&
> + dm->length >= DMI_ENTRY_PROCESSOR_MIN_LENGTH) {
> + u16 val = (u16)get_unaligned((const u16 *)
> + (dmi_data + DMI_PROCESSOR_MAX_SPEED));
> + *mhz = val > *mhz ? val : *mhz;
> + }
> +}
> +
> +/* Look up the max frequency in DMI */
> +static u64 cppc_get_dmi_max_khz(void)
> +{
> + u16 mhz = 0;
> +
> + dmi_walk(cppc_find_dmi_mhz, &mhz);
> +
> + /*
> + * Real stupid fallback value, just in case there is no
> + * actual value set.
> + */
> + mhz = mhz ? mhz : 1;
> +
> + return (1000 * mhz);
> +}
> +
> +/*
> + * If CPPC lowest_freq and nominal_freq registers are exposed then we can
> + * use them to convert perf to freq and vice versa. The conversion is
> + * extrapolated as an affine function passing by the 2 points:
> + * - (Low perf, Low freq)
> + * - (Nominal perf, Nominal perf)
> + */
> +unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf)
> +{
> + s64 retval, offset = 0;
> + static u64 max_khz;
> + u64 mul, div;
> +
> + if (caps->lowest_freq && caps->nominal_freq) {
> + mul = caps->nominal_freq - caps->lowest_freq;
> + div = caps->nominal_perf - caps->lowest_perf;
> + offset = caps->nominal_freq - div64_u64(caps->nominal_perf * mul, div);
> + } else {
> + if (!max_khz)
> + max_khz = cppc_get_dmi_max_khz();
> + mul = max_khz;
> + div = caps->highest_perf;
> + }
> +
> + retval = offset + div64_u64(perf * mul, div);
> + if (retval >= 0)
> + return retval;
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(cppc_perf_to_khz);
> +
> +unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq)
> +{
> + s64 retval, offset = 0;
> + static u64 max_khz;
> + u64 mul, div;
> +
> + if (caps->lowest_freq && caps->nominal_freq) {
> + mul = caps->nominal_perf - caps->lowest_perf;
> + div = caps->nominal_freq - caps->lowest_freq;
> + offset = caps->nominal_perf - div64_u64(caps->nominal_freq * mul, div);
> + } else {
> + if (!max_khz)
> + max_khz = cppc_get_dmi_max_khz();
> + mul = caps->highest_perf;
> + div = max_khz;
> + }
> +
> + retval = offset + div64_u64(freq * mul, div);
> + if (retval >= 0)
> + return retval;
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(cppc_khz_to_perf);
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 9a073c2d2086..2372ce791bb4 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -350,6 +350,7 @@ bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
> void topology_init_cpu_capacity_cppc(void)
> {
> struct cppc_perf_caps perf_caps;
> + u64 capacity, capacity_scale;
> int cpu;
>
> if (likely(!acpi_cpc_valid()))
> @@ -365,6 +366,10 @@ void topology_init_cpu_capacity_cppc(void)
> (perf_caps.highest_perf >= perf_caps.nominal_perf) &&
> (perf_caps.highest_perf >= perf_caps.lowest_perf)) {
> raw_capacity[cpu] = perf_caps.highest_perf;
> + capacity_scale = max_t(u64, capacity_scale, raw_capacity[cpu]);
> +
> + per_cpu(capacity_ref_freq, cpu) = cppc_perf_to_khz(&perf_caps, raw_capacity[cpu]);
> +
> pr_debug("cpu_capacity: CPU%d cpu_capacity=%u (raw).\n",
> cpu, raw_capacity[cpu]);
> continue;
> @@ -375,7 +380,15 @@ void topology_init_cpu_capacity_cppc(void)
> goto exit;
> }
>
> - topology_normalize_cpu_scale();
> + for_each_possible_cpu(cpu) {
> + capacity = raw_capacity[cpu];
> + capacity = div64_u64(capacity << SCHED_CAPACITY_SHIFT,
> + capacity_scale);
> + topology_set_cpu_scale(cpu, capacity);
> + pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n",
> + cpu, topology_get_cpu_scale(cpu));
> + }
> +
> schedule_work(&update_topology_flags_work);
> pr_debug("cpu_capacity: cpu_capacity initialization done\n");
>
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index fe08ca419b3d..822376b0cb78 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -27,12 +27,6 @@
>
> #include <acpi/cppc_acpi.h>
>
> -/* Minimum struct length needed for the DMI processor entry we want */
> -#define DMI_ENTRY_PROCESSOR_MIN_LENGTH 48
> -
> -/* Offset in the DMI processor structure for the max frequency */
> -#define DMI_PROCESSOR_MAX_SPEED 0x14
> -
> /*
> * This list contains information parsed from per CPU ACPI _CPC and _PSD
> * structures: e.g. the highest and lowest supported performance, capabilities,
> @@ -291,97 +285,9 @@ static inline void cppc_freq_invariance_exit(void)
> }
> #endif /* CONFIG_ACPI_CPPC_CPUFREQ_FIE */
>
> -/* Callback function used to retrieve the max frequency from DMI */
> -static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
> -{
> - const u8 *dmi_data = (const u8 *)dm;
> - u16 *mhz = (u16 *)private;
> -
> - if (dm->type == DMI_ENTRY_PROCESSOR &&
> - dm->length >= DMI_ENTRY_PROCESSOR_MIN_LENGTH) {
> - u16 val = (u16)get_unaligned((const u16 *)
> - (dmi_data + DMI_PROCESSOR_MAX_SPEED));
> - *mhz = val > *mhz ? val : *mhz;
> - }
> -}
> -
> -/* Look up the max frequency in DMI */
> -static u64 cppc_get_dmi_max_khz(void)
> -{
> - u16 mhz = 0;
> -
> - dmi_walk(cppc_find_dmi_mhz, &mhz);
> -
> - /*
> - * Real stupid fallback value, just in case there is no
> - * actual value set.
> - */
> - mhz = mhz ? mhz : 1;
> -
> - return (1000 * mhz);
> -}
> -
> -/*
> - * If CPPC lowest_freq and nominal_freq registers are exposed then we can
> - * use them to convert perf to freq and vice versa. The conversion is
> - * extrapolated as an affine function passing by the 2 points:
> - * - (Low perf, Low freq)
> - * - (Nominal perf, Nominal perf)
> - */
> -static unsigned int cppc_cpufreq_perf_to_khz(struct cppc_cpudata *cpu_data,
> - unsigned int perf)
> -{
> - struct cppc_perf_caps *caps = &cpu_data->perf_caps;
> - s64 retval, offset = 0;
> - static u64 max_khz;
> - u64 mul, div;
> -
> - if (caps->lowest_freq && caps->nominal_freq) {
> - mul = caps->nominal_freq - caps->lowest_freq;
> - div = caps->nominal_perf - caps->lowest_perf;
> - offset = caps->nominal_freq - div64_u64(caps->nominal_perf * mul, div);
> - } else {
> - if (!max_khz)
> - max_khz = cppc_get_dmi_max_khz();
> - mul = max_khz;
> - div = caps->highest_perf;
> - }
> -
> - retval = offset + div64_u64(perf * mul, div);
> - if (retval >= 0)
> - return retval;
> - return 0;
> -}
> -
> -static unsigned int cppc_cpufreq_khz_to_perf(struct cppc_cpudata *cpu_data,
> - unsigned int freq)
> -{
> - struct cppc_perf_caps *caps = &cpu_data->perf_caps;
> - s64 retval, offset = 0;
> - static u64 max_khz;
> - u64 mul, div;
> -
> - if (caps->lowest_freq && caps->nominal_freq) {
> - mul = caps->nominal_perf - caps->lowest_perf;
> - div = caps->nominal_freq - caps->lowest_freq;
> - offset = caps->nominal_perf - div64_u64(caps->nominal_freq * mul, div);
> - } else {
> - if (!max_khz)
> - max_khz = cppc_get_dmi_max_khz();
> - mul = caps->highest_perf;
> - div = max_khz;
> - }
> -
> - retval = offset + div64_u64(freq * mul, div);
> - if (retval >= 0)
> - return retval;
> - return 0;
> -}
> -
> static int cppc_cpufreq_set_target(struct cpufreq_policy *policy,
> unsigned int target_freq,
> unsigned int relation)
> -
> {
> struct cppc_cpudata *cpu_data = policy->driver_data;
> unsigned int cpu = policy->cpu;
> @@ -389,7 +295,7 @@ static int cppc_cpufreq_set_target(struct cpufreq_policy *policy,
> u32 desired_perf;
> int ret = 0;
>
> - desired_perf = cppc_cpufreq_khz_to_perf(cpu_data, target_freq);
> + desired_perf = cppc_khz_to_perf(&cpu_data->perf_caps, target_freq);
> /* Return if it is exactly the same perf */
> if (desired_perf == cpu_data->perf_ctrls.desired_perf)
> return ret;
> @@ -417,7 +323,7 @@ static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
> u32 desired_perf;
> int ret;
>
> - desired_perf = cppc_cpufreq_khz_to_perf(cpu_data, target_freq);
> + desired_perf = cppc_khz_to_perf(&cpu_data->perf_caps, target_freq);
> cpu_data->perf_ctrls.desired_perf = desired_perf;
> ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>
> @@ -530,7 +436,7 @@ static int cppc_get_cpu_power(struct device *cpu_dev,
> min_step = min_cap / CPPC_EM_CAP_STEP;
> max_step = max_cap / CPPC_EM_CAP_STEP;
>
> - perf_prev = cppc_cpufreq_khz_to_perf(cpu_data, *KHz);
> + perf_prev = cppc_khz_to_perf(perf_caps, *KHz);
> step = perf_prev / perf_step;
>
> if (step > max_step)
> @@ -550,8 +456,8 @@ static int cppc_get_cpu_power(struct device *cpu_dev,
> perf = step * perf_step;
> }
>
> - *KHz = cppc_cpufreq_perf_to_khz(cpu_data, perf);
> - perf_check = cppc_cpufreq_khz_to_perf(cpu_data, *KHz);
> + *KHz = cppc_perf_to_khz(perf_caps, perf);
> + perf_check = cppc_khz_to_perf(perf_caps, *KHz);
> step_check = perf_check / perf_step;
>
> /*
> @@ -561,8 +467,8 @@ static int cppc_get_cpu_power(struct device *cpu_dev,
> */
> while ((*KHz == prev_freq) || (step_check != step)) {
> perf++;
> - *KHz = cppc_cpufreq_perf_to_khz(cpu_data, perf);
> - perf_check = cppc_cpufreq_khz_to_perf(cpu_data, *KHz);
> + *KHz = cppc_perf_to_khz(perf_caps, perf);
> + perf_check = cppc_khz_to_perf(perf_caps, *KHz);
> step_check = perf_check / perf_step;
> }
>
> @@ -591,7 +497,7 @@ static int cppc_get_cpu_cost(struct device *cpu_dev, unsigned long KHz,
> perf_caps = &cpu_data->perf_caps;
> max_cap = arch_scale_cpu_capacity(cpu_dev->id);
>
> - perf_prev = cppc_cpufreq_khz_to_perf(cpu_data, KHz);
> + perf_prev = cppc_khz_to_perf(perf_caps, KHz);
> perf_step = CPPC_EM_CAP_STEP * perf_caps->highest_perf / max_cap;
> step = perf_prev / perf_step;
>
> @@ -643,6 +549,7 @@ static void cppc_cpufreq_register_em(struct cpufreq_policy *policy)
> EM_ADV_DATA_CB(cppc_get_cpu_power, cppc_get_cpu_cost);
>
> cpu_data = policy->driver_data;
> +
> em_dev_register_perf_domain(get_cpu_device(policy->cpu),
> get_perf_level_count(policy), &em_cb,
> cpu_data->shared_cpu_map, 0);
> @@ -724,20 +631,20 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> * Set min to lowest nonlinear perf to avoid any efficiency penalty (see
> * Section 8.4.7.1.1.5 of ACPI 6.1 spec)
> */
> - policy->min = cppc_cpufreq_perf_to_khz(cpu_data,
> - caps->lowest_nonlinear_perf);
> - policy->max = cppc_cpufreq_perf_to_khz(cpu_data,
> - caps->nominal_perf);
> + policy->min = cppc_perf_to_khz(caps,
> + caps->lowest_nonlinear_perf);
> + policy->max = cppc_perf_to_khz(caps,
> + caps->nominal_perf);
>
> /*
> * Set cpuinfo.min_freq to Lowest to make the full range of performance
> * available if userspace wants to use any perf between lowest & lowest
> * nonlinear perf
> */
> - policy->cpuinfo.min_freq = cppc_cpufreq_perf_to_khz(cpu_data,
> - caps->lowest_perf);
> - policy->cpuinfo.max_freq = cppc_cpufreq_perf_to_khz(cpu_data,
> - caps->nominal_perf);
> + policy->cpuinfo.min_freq = cppc_perf_to_khz(caps,
> + caps->lowest_perf);
> + policy->cpuinfo.max_freq = cppc_perf_to_khz(caps,
> + caps->nominal_perf);
>
> policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu);
> policy->shared_type = cpu_data->shared_type;
> @@ -773,7 +680,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> boost_supported = true;
>
> /* Set policy->cur to max now. The governors will adjust later. */
> - policy->cur = cppc_cpufreq_perf_to_khz(cpu_data, caps->highest_perf);
> + policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
> cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
>
> ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
> @@ -863,7 +770,7 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
> delivered_perf = cppc_perf_from_fbctrs(cpu_data, &fb_ctrs_t0,
> &fb_ctrs_t1);
>
> - return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf);
> + return cppc_perf_to_khz(&cpu_data->perf_caps, delivered_perf);
> }
>
> static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
> @@ -878,11 +785,11 @@ static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
> }
>
> if (state)
> - policy->max = cppc_cpufreq_perf_to_khz(cpu_data,
> - caps->highest_perf);
> + policy->max = cppc_perf_to_khz(caps,
> + caps->highest_perf);
> else
> - policy->max = cppc_cpufreq_perf_to_khz(cpu_data,
> - caps->nominal_perf);
> + policy->max = cppc_perf_to_khz(caps,
> + caps->nominal_perf);
> policy->cpuinfo.max_freq = policy->max;
>
> ret = freq_qos_update_request(policy->max_freq_req, policy->max);
> @@ -937,7 +844,7 @@ static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpu)
> if (ret < 0)
> return -EIO;
>
> - return cppc_cpufreq_perf_to_khz(cpu_data, desired_perf);
> + return cppc_perf_to_khz(&cpu_data->perf_caps, desired_perf);
> }
>
> static void cppc_check_hisi_workaround(void)
> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> index 6126c977ece0..3a0995f8bce8 100644
> --- a/include/acpi/cppc_acpi.h
> +++ b/include/acpi/cppc_acpi.h
> @@ -144,6 +144,8 @@ extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
> extern int cppc_set_enable(int cpu, bool enable);
> extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
> extern bool cppc_perf_ctrs_in_pcc(void);
> +extern unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf);
> +extern unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq);
> extern bool acpi_cpc_valid(void);
> extern bool cppc_allow_fast_switch(void);
> extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data);
> --
> 2.34.1
>

2023-10-18 17:29:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] cpufreq/schedutil: use a fixed reference frequency

On Wed, Oct 18, 2023 at 6:25 PM Vincent Guittot
<[email protected]> wrote:
>
> cpuinfo.max_freq can change at runtime because of boost as an example. This
> implies that the value could be different than the one that has been
> used when computing the capacity of a CPU.
>
> The new arch_scale_freq_ref() returns a fixed and coherent reference
> frequency that can be used when computing a frequency based on utilization.
>
> Use this arch_scale_freq_ref() when available and fallback to
> policy otherwise.
>
> Signed-off-by: Vincent Guittot <[email protected]>
> Reviewed-by: Lukasz Luba <[email protected]>
> Tested-by: Lukasz Luba <[email protected]>

Acked-by: Rafael J. Wysocki <[email protected]>

>
> ---
> kernel/sched/cpufreq_schedutil.c | 26 ++++++++++++++++++++++++--
> 1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 458d359f5991..6e4030482ae8 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -114,6 +114,28 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy)
> }
> }
>
> +/**
> + * cpufreq_get_capacity_ref_freq - get the reference frequency of a given CPU that
> + * has been used to correlate frequency and compute capacity.
> + * @policy: the cpufreq policy of the CPU in question.
> + * @use_current: Fallback to current freq instead of policy->cpuinfo.max_freq.
> + *
> + * Return: the reference CPU frequency to compute a capacity.
> + */
> +static __always_inline
> +unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> +{
> + unsigned int freq = arch_scale_freq_ref(policy->cpu);
> +
> + if (freq)
> + return freq;
> +
> + if (arch_scale_freq_invariant())
> + return policy->cpuinfo.max_freq;
> +
> + return policy->cur;
> +}
> +
> /**
> * get_next_freq - Compute a new frequency for a given cpufreq policy.
> * @sg_policy: schedutil policy object to compute the new frequency for.
> @@ -140,10 +162,10 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> unsigned long util, unsigned long max)
> {
> struct cpufreq_policy *policy = sg_policy->policy;
> - unsigned int freq = arch_scale_freq_invariant() ?
> - policy->cpuinfo.max_freq : policy->cur;
> + unsigned int freq;
>
> util = map_util_perf(util);
> + freq = get_capacity_ref_freq(policy);
> freq = map_util_freq(util, freq, max);
>
> if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> --
> 2.34.1
>

2023-10-18 17:31:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] cpufreq: use the fixed and coherent frequency for scaling capacity

On Wed, Oct 18, 2023 at 6:25 PM Vincent Guittot
<[email protected]> wrote:
>
> cpuinfo.max_freq can change at runtime because of boost as an example. This
> implies that the value could be different from the frequency that has been
> used to compute the capacity of a CPU.
>
> The new arch_scale_freq_ref() returns a fixed and coherent frequency
> that can be used to compute the capacity for a given frequency.
>
> Signed-off-by: Vincent Guittot <[email protected]>
> Acked-by: Viresh Kumar <[email protected]>
> Reviewed-by: Lukasz Luba <[email protected]>
> Tested-by: Lukasz Luba <[email protected]>

Acked-by: Rafael J. Wysocki <[email protected]>

> ---
> drivers/cpufreq/cpufreq.c | 4 ++--
> include/linux/cpufreq.h | 9 +++++++++
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 60ed89000e82..8c4f9c2f9c44 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -454,7 +454,7 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
>
> arch_set_freq_scale(policy->related_cpus,
> policy->cur,
> - policy->cpuinfo.max_freq);
> + arch_scale_freq_ref(policy->cpu));
>
> spin_lock(&policy->transition_lock);
> policy->transition_ongoing = false;
> @@ -2174,7 +2174,7 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
>
> policy->cur = freq;
> arch_set_freq_scale(policy->related_cpus, freq,
> - policy->cpuinfo.max_freq);
> + arch_scale_freq_ref(policy->cpu));
> cpufreq_stats_record_transition(policy, freq);
>
> if (trace_cpu_frequency_enabled()) {
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 71d186d6933a..bbc483b4b6e5 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -1211,6 +1211,15 @@ void arch_set_freq_scale(const struct cpumask *cpus,
> {
> }
> #endif
> +
> +#ifndef arch_scale_freq_ref
> +static __always_inline
> +unsigned int arch_scale_freq_ref(int cpu)
> +{
> + return 0;
> +}
> +#endif
> +
> /* the following are really really optional */
> extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs;
> extern struct freq_attr cpufreq_freq_attr_scaling_boost_freqs;
> --
> 2.34.1
>

2023-10-20 16:07:39

by Pierre Gondois

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] cpufreq/cppc: set the frequency used for computing the capacity

Hello Vincent,

On 10/18/23 19:26, Rafael J. Wysocki wrote:
> On Wed, Oct 18, 2023 at 6:25 PM Vincent Guittot
> <[email protected]> wrote:
>>
>> Save the frequency associated to the performance that has been used when
>> initializing the capacity of CPUs.
>> Also, cppc cpufreq driver can register an artificial energy model. In such
>> case, it needs the frequency for this compute capacity.
>> We moved and renamed cppc_perf_to_khz and cppc_perf_to_khz to use them
>> outside cppc_cpufreq in topology_init_cpu_capacity_cppc().
>>
>> Signed-off-by: Vincent Guittot <[email protected]>
>
> For the changes in drivers/acpi/cppc_acpi.c :
>
> Acked-by: Rafael J. Wysocki <[email protected]>
>
>> ---
>> drivers/acpi/cppc_acpi.c | 93 ++++++++++++++++++++++
>> drivers/base/arch_topology.c | 15 +++-
>> drivers/cpufreq/cppc_cpufreq.c | 141 ++++++---------------------------
>> include/acpi/cppc_acpi.h | 2 +
>> 4 files changed, 133 insertions(+), 118 deletions(-)
>>

[snip]

>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>> index 9a073c2d2086..2372ce791bb4 100644
>> --- a/drivers/base/arch_topology.c
>> +++ b/drivers/base/arch_topology.c
>> @@ -350,6 +350,7 @@ bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
>> void topology_init_cpu_capacity_cppc(void)
>> {
>> struct cppc_perf_caps perf_caps;
>> + u64 capacity, capacity_scale;

I think capacity_scale should be initialized to 0 here,
since it is used to find the max value of raw_capacity[cpu].

>> int cpu;
>>
>> if (likely(!acpi_cpc_valid()))
>> @@ -365,6 +366,10 @@ void topology_init_cpu_capacity_cppc(void)
>> (perf_caps.highest_perf >= perf_caps.nominal_perf) &&
>> (perf_caps.highest_perf >= perf_caps.lowest_perf)) {
>> raw_capacity[cpu] = perf_caps.highest_perf;
>> + capacity_scale = max_t(u64, capacity_scale, raw_capacity[cpu]);
>> +
>> + per_cpu(capacity_ref_freq, cpu) = cppc_perf_to_khz(&perf_caps, raw_capacity[cpu]);

I think capacity_ref_freq in in Hz, so the freq should be multiplied by 1000 .

With these two modifications, the patches worked well on a cppc-based platform.

Sorry I forgot to detail what it was. It's a modified Juno with CPPC enabled. AMUs are not
enabled, so the CPPC performance counters are not handled correctly and FIE cannot be enabled,
but it is possible to change frequencies.

The _CPC objects are setup as:
little CPUs:
- lowest_freq = 450 (MHz)
- nominal_freq = 800 (MHz)
- highest_perf = 383 * 1000
- nominal_perf = 322 * 1000
- lowest_perf = 181 * 1000
- lowest_nonlinear_perf = 181 * 1000

big CPUs:
- lowest_freq = 600 (MHz)
- nominal_freq = 1200 (MHz)
- highest_perf = 1024 * 1000
- nominal_perf = 833 * 1000
- lowest_perf = 512 * 1000
- lowest_nonlinear_perf = 512 * 1000

As a remainder, available frequencies are:
- little CPUs: 450, 800, 950 MHz
- big CPUs: 600, 1000, 1200 Mhz
So the platform is setup to have the last frequency as a boost frequency (for testing).

----

Just to make a note of 2 potential side-issues for later (independent from these patches):

- When testing with boosted/non-bossted frequencies, it didn't seem that cpu_overutilized()
was taking the maximum frequency into consideration. This might mean that when lowering the
maximum frequency of a policy, the maximum capacity of the CPUs of this policy is used
to detect over-utilization.
I would have thought that the over-utilization threshold would be lowered at the same time.

- Similarly for EAS, the energy computation doesn't take into account the maximum frequency
of the policy. This should mean that EAS is taking into consideration frequencies that
are not actually available.


Regards,
Pierre

2023-10-23 21:01:26

by Ionela Voinescu

[permalink] [raw]
Subject: Re: [RFC v3 6/6] arm64/amu: use capacity_ref_freq to set AMU ratio

Hi,

On Wednesday 18 Oct 2023 at 18:25:40 (+0200), Vincent Guittot wrote:
> Use the new capacity_ref_freq to set the ratio that is used by AMU for
> computing the arch_scale_freq_capacity().
> This helps to keep everything aligned using the same reference for
> computing CPUs capacity.
>
> The default value of the ratio ensures that arch_scale_freq_capacity()
> returns max capacity until it is set to its correct value with the
> cpu capacity and capacity_ref_freq.
>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
> arch/arm64/kernel/topology.c | 18 ++++++++++--------
> drivers/base/arch_topology.c | 14 ++++++++++++--
> include/linux/arch_topology.h | 1 +
> 3 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 817d788cd866..0f8f6e90c46d 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -82,7 +82,12 @@ int __init parse_acpi_topology(void)
> #undef pr_fmt
> #define pr_fmt(fmt) "AMU: " fmt
>
> -static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
> +/*
> + * Ensure that amu_scale_freq_tick() will return SCHED_CAPACITY_SHIFT until
> + * the CPU capacity and its assosciated frequency have been correctly
> + * initialized.

s/SCHED_CAPACITY_SHIFT/SCHED_CAPACITY_SCALE

> + */
> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = (2 * SCHED_CAPACITY_SHIFT);

It should be 1UL << (2 * SCHED_CAPACITY_SHIFT).

> static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> static cpumask_var_t amu_fie_cpus;
> @@ -112,9 +117,9 @@ static inline bool freq_counters_valid(int cpu)
> return true;
> }
>
> -static int freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate)
> +int freq_inv_set_max_ratio(int cpu, u64 max_rate)
> {
> - u64 ratio;
> + u64 ratio, ref_rate = arch_timer_get_rate();
>
> if (unlikely(!max_rate || !ref_rate)) {
> pr_debug("CPU%d: invalid maximum or reference frequency.\n",
> @@ -142,7 +147,7 @@ static int freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate)
> return -EINVAL;
> }

The return value is no longer used so it might be worth declaring it to
return void and WARN_ONCE for unlikely(!max_rate || !ref_rate).

>
> - per_cpu(arch_max_freq_scale, cpu) = (unsigned long)ratio;
> + WRITE_ONCE(per_cpu(arch_max_freq_scale, cpu), (unsigned long)ratio);
>
> return 0;
> }
> @@ -195,10 +200,7 @@ static void amu_fie_setup(const struct cpumask *cpus)
> return;
>
> for_each_cpu(cpu, cpus) {
> - if (!freq_counters_valid(cpu) ||
> - freq_inv_set_max_ratio(cpu,
> - cpufreq_get_hw_max_freq(cpu) * 1000ULL,
> - arch_timer_get_rate()))
> + if (!freq_counters_valid(cpu))
> return;
> }
>
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 2372ce791bb4..3a604b77b12d 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -344,6 +344,11 @@ bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
> return !ret;
> }
>
> +int __weak freq_inv_set_max_ratio(int cpu, u64 max_rate)
> +{
> + return 0;
> +}
> +
> #ifdef CONFIG_ACPI_CPPC_LIB
> #include <acpi/cppc_acpi.h>
>
> @@ -369,7 +374,6 @@ void topology_init_cpu_capacity_cppc(void)
> capacity_scale = max_t(u64, capacity_scale, raw_capacity[cpu]);
>
> per_cpu(capacity_ref_freq, cpu) = cppc_perf_to_khz(&perf_caps, raw_capacity[cpu]);
> -
> pr_debug("cpu_capacity: CPU%d cpu_capacity=%u (raw).\n",
> cpu, raw_capacity[cpu]);
> continue;
> @@ -381,6 +385,9 @@ void topology_init_cpu_capacity_cppc(void)
> }
>
> for_each_possible_cpu(cpu) {
> + freq_inv_set_max_ratio(cpu,
> + per_cpu(capacity_ref_freq, cpu));
> +
> capacity = raw_capacity[cpu];
> capacity = div64_u64(capacity << SCHED_CAPACITY_SHIFT,
> capacity_scale);
> @@ -422,8 +429,11 @@ init_cpu_capacity_callback(struct notifier_block *nb,
>
> cpumask_andnot(cpus_to_visit, cpus_to_visit, policy->related_cpus);
>
> - for_each_cpu(cpu, policy->related_cpus)
> + for_each_cpu(cpu, policy->related_cpus) {
> per_cpu(capacity_ref_freq, cpu) = policy->cpuinfo.max_freq;
> + freq_inv_set_max_ratio(cpu,
> + per_cpu(capacity_ref_freq, cpu));

capacity_ref_freq is in KHz while arch_timer_get_rate() is in Hz. One or
the other needs to be scaled.

The same in topology_init_cpu_capacity_cppc().

Thanks,
Ionela.

> + }
>
> if (cpumask_empty(cpus_to_visit)) {
> topology_normalize_cpu_scale();
> diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
> index 38ca6c76af56..b6e95d763279 100644
> --- a/include/linux/arch_topology.h
> +++ b/include/linux/arch_topology.h
> @@ -99,6 +99,7 @@ void update_siblings_masks(unsigned int cpu);
> void remove_cpu_topology(unsigned int cpuid);
> void reset_cpu_topology(void);
> int parse_acpi_topology(void);
> +int freq_inv_set_max_ratio(int cpu, u64 max_rate);
> #endif
>
> #endif /* _LINUX_ARCH_TOPOLOGY_H_ */
> --
> 2.34.1
>

2023-10-24 09:57:19

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] cpufreq/cppc: set the frequency used for computing the capacity

On Fri, 20 Oct 2023 at 18:05, Pierre Gondois <[email protected]> wrote:
>
> Hello Vincent,
>
> On 10/18/23 19:26, Rafael J. Wysocki wrote:
> > On Wed, Oct 18, 2023 at 6:25 PM Vincent Guittot
> > <[email protected]> wrote:
> >>
> >> Save the frequency associated to the performance that has been used when
> >> initializing the capacity of CPUs.
> >> Also, cppc cpufreq driver can register an artificial energy model. In such
> >> case, it needs the frequency for this compute capacity.
> >> We moved and renamed cppc_perf_to_khz and cppc_perf_to_khz to use them
> >> outside cppc_cpufreq in topology_init_cpu_capacity_cppc().
> >>
> >> Signed-off-by: Vincent Guittot <[email protected]>
> >
> > For the changes in drivers/acpi/cppc_acpi.c :
> >
> > Acked-by: Rafael J. Wysocki <[email protected]>
> >
> >> ---
> >> drivers/acpi/cppc_acpi.c | 93 ++++++++++++++++++++++
> >> drivers/base/arch_topology.c | 15 +++-
> >> drivers/cpufreq/cppc_cpufreq.c | 141 ++++++---------------------------
> >> include/acpi/cppc_acpi.h | 2 +
> >> 4 files changed, 133 insertions(+), 118 deletions(-)
> >>
>
> [snip]
>
> >> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> >> index 9a073c2d2086..2372ce791bb4 100644
> >> --- a/drivers/base/arch_topology.c
> >> +++ b/drivers/base/arch_topology.c
> >> @@ -350,6 +350,7 @@ bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
> >> void topology_init_cpu_capacity_cppc(void)
> >> {
> >> struct cppc_perf_caps perf_caps;
> >> + u64 capacity, capacity_scale;
>
> I think capacity_scale should be initialized to 0 here,
> since it is used to find the max value of raw_capacity[cpu].

yes

>
> >> int cpu;
> >>
> >> if (likely(!acpi_cpc_valid()))
> >> @@ -365,6 +366,10 @@ void topology_init_cpu_capacity_cppc(void)
> >> (perf_caps.highest_perf >= perf_caps.nominal_perf) &&
> >> (perf_caps.highest_perf >= perf_caps.lowest_perf)) {
> >> raw_capacity[cpu] = perf_caps.highest_perf;
> >> + capacity_scale = max_t(u64, capacity_scale, raw_capacity[cpu]);
> >> +
> >> + per_cpu(capacity_ref_freq, cpu) = cppc_perf_to_khz(&perf_caps, raw_capacity[cpu]);
>
> I think capacity_ref_freq in in Hz, so the freq should be multiplied by 1000 .

yes, I forgot the *1000. I'm going to add * HZ_PER_KHZ

>
> With these two modifications, the patches worked well on a cppc-based platform.
>
> Sorry I forgot to detail what it was. It's a modified Juno with CPPC enabled. AMUs are not
> enabled, so the CPPC performance counters are not handled correctly and FIE cannot be enabled,
> but it is possible to change frequencies.
>
> The _CPC objects are setup as:
> little CPUs:
> - lowest_freq = 450 (MHz)
> - nominal_freq = 800 (MHz)
> - highest_perf = 383 * 1000
> - nominal_perf = 322 * 1000
> - lowest_perf = 181 * 1000
> - lowest_nonlinear_perf = 181 * 1000
>
> big CPUs:
> - lowest_freq = 600 (MHz)
> - nominal_freq = 1200 (MHz)
> - highest_perf = 1024 * 1000
> - nominal_perf = 833 * 1000
> - lowest_perf = 512 * 1000
> - lowest_nonlinear_perf = 512 * 1000
>
> As a remainder, available frequencies are:
> - little CPUs: 450, 800, 950 MHz
> - big CPUs: 600, 1000, 1200 Mhz
> So the platform is setup to have the last frequency as a boost frequency (for testing).
>
> ----
>
> Just to make a note of 2 potential side-issues for later (independent from these patches):
>
> - When testing with boosted/non-bossted frequencies, it didn't seem that cpu_overutilized()
> was taking the maximum frequency into consideration. This might mean that when lowering the
> maximum frequency of a policy, the maximum capacity of the CPUs of this policy is used
> to detect over-utilization.
> I would have thought that the over-utilization threshold would be lowered at the same time.

No it's not, It will be part of a next step patchset. This patchset
aims to consolidate and use the same reference so we can then easily
propagate changes if needed

>
> - Similarly for EAS, the energy computation doesn't take into account the maximum frequency
> of the policy. This should mean that EAS is taking into consideration frequencies that
> are not actually available.
>
>
> Regards,
> Pierre

2023-10-24 09:58:35

by Vincent Guittot

[permalink] [raw]
Subject: Re: [RFC v3 6/6] arm64/amu: use capacity_ref_freq to set AMU ratio

On Mon, 23 Oct 2023 at 22:58, Ionela Voinescu <[email protected]> wrote:
>
> Hi,
>
> On Wednesday 18 Oct 2023 at 18:25:40 (+0200), Vincent Guittot wrote:
> > Use the new capacity_ref_freq to set the ratio that is used by AMU for
> > computing the arch_scale_freq_capacity().
> > This helps to keep everything aligned using the same reference for
> > computing CPUs capacity.
> >
> > The default value of the ratio ensures that arch_scale_freq_capacity()
> > returns max capacity until it is set to its correct value with the
> > cpu capacity and capacity_ref_freq.
> >
> > Signed-off-by: Vincent Guittot <[email protected]>
> > ---
> > arch/arm64/kernel/topology.c | 18 ++++++++++--------
> > drivers/base/arch_topology.c | 14 ++++++++++++--
> > include/linux/arch_topology.h | 1 +
> > 3 files changed, 23 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index 817d788cd866..0f8f6e90c46d 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -82,7 +82,12 @@ int __init parse_acpi_topology(void)
> > #undef pr_fmt
> > #define pr_fmt(fmt) "AMU: " fmt
> >
> > -static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
> > +/*
> > + * Ensure that amu_scale_freq_tick() will return SCHED_CAPACITY_SHIFT until
> > + * the CPU capacity and its assosciated frequency have been correctly
> > + * initialized.
>
> s/SCHED_CAPACITY_SHIFT/SCHED_CAPACITY_SCALE
>
> > + */
> > +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = (2 * SCHED_CAPACITY_SHIFT);
>
> It should be 1UL << (2 * SCHED_CAPACITY_SHIFT).

yes

>
> > static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> > static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> > static cpumask_var_t amu_fie_cpus;
> > @@ -112,9 +117,9 @@ static inline bool freq_counters_valid(int cpu)
> > return true;
> > }
> >
> > -static int freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate)
> > +int freq_inv_set_max_ratio(int cpu, u64 max_rate)
> > {
> > - u64 ratio;
> > + u64 ratio, ref_rate = arch_timer_get_rate();
> >
> > if (unlikely(!max_rate || !ref_rate)) {
> > pr_debug("CPU%d: invalid maximum or reference frequency.\n",
> > @@ -142,7 +147,7 @@ static int freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate)
> > return -EINVAL;
> > }
>
> The return value is no longer used so it might be worth declaring it to
> return void and WARN_ONCE for unlikely(!max_rate || !ref_rate).
>
> >
> > - per_cpu(arch_max_freq_scale, cpu) = (unsigned long)ratio;
> > + WRITE_ONCE(per_cpu(arch_max_freq_scale, cpu), (unsigned long)ratio);
> >
> > return 0;
> > }
> > @@ -195,10 +200,7 @@ static void amu_fie_setup(const struct cpumask *cpus)
> > return;
> >
> > for_each_cpu(cpu, cpus) {
> > - if (!freq_counters_valid(cpu) ||
> > - freq_inv_set_max_ratio(cpu,
> > - cpufreq_get_hw_max_freq(cpu) * 1000ULL,
> > - arch_timer_get_rate()))
> > + if (!freq_counters_valid(cpu))
> > return;
> > }
> >
> > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> > index 2372ce791bb4..3a604b77b12d 100644
> > --- a/drivers/base/arch_topology.c
> > +++ b/drivers/base/arch_topology.c
> > @@ -344,6 +344,11 @@ bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
> > return !ret;
> > }
> >
> > +int __weak freq_inv_set_max_ratio(int cpu, u64 max_rate)
> > +{
> > + return 0;
> > +}
> > +
> > #ifdef CONFIG_ACPI_CPPC_LIB
> > #include <acpi/cppc_acpi.h>
> >
> > @@ -369,7 +374,6 @@ void topology_init_cpu_capacity_cppc(void)
> > capacity_scale = max_t(u64, capacity_scale, raw_capacity[cpu]);
> >
> > per_cpu(capacity_ref_freq, cpu) = cppc_perf_to_khz(&perf_caps, raw_capacity[cpu]);
> > -
> > pr_debug("cpu_capacity: CPU%d cpu_capacity=%u (raw).\n",
> > cpu, raw_capacity[cpu]);
> > continue;
> > @@ -381,6 +385,9 @@ void topology_init_cpu_capacity_cppc(void)
> > }
> >
> > for_each_possible_cpu(cpu) {
> > + freq_inv_set_max_ratio(cpu,
> > + per_cpu(capacity_ref_freq, cpu));
> > +
> > capacity = raw_capacity[cpu];
> > capacity = div64_u64(capacity << SCHED_CAPACITY_SHIFT,
> > capacity_scale);
> > @@ -422,8 +429,11 @@ init_cpu_capacity_callback(struct notifier_block *nb,
> >
> > cpumask_andnot(cpus_to_visit, cpus_to_visit, policy->related_cpus);
> >
> > - for_each_cpu(cpu, policy->related_cpus)
> > + for_each_cpu(cpu, policy->related_cpus) {
> > per_cpu(capacity_ref_freq, cpu) = policy->cpuinfo.max_freq;
> > + freq_inv_set_max_ratio(cpu,
> > + per_cpu(capacity_ref_freq, cpu));
>
> capacity_ref_freq is in KHz while arch_timer_get_rate() is in Hz. One or
> the other needs to be scaled.
>
> The same in topology_init_cpu_capacity_cppc().
>
> Thanks,
> Ionela.
>
> > + }
> >
> > if (cpumask_empty(cpus_to_visit)) {
> > topology_normalize_cpu_scale();
> > diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
> > index 38ca6c76af56..b6e95d763279 100644
> > --- a/include/linux/arch_topology.h
> > +++ b/include/linux/arch_topology.h
> > @@ -99,6 +99,7 @@ void update_siblings_masks(unsigned int cpu);
> > void remove_cpu_topology(unsigned int cpuid);
> > void reset_cpu_topology(void);
> > int parse_acpi_topology(void);
> > +int freq_inv_set_max_ratio(int cpu, u64 max_rate);
> > #endif
> >
> > #endif /* _LINUX_ARCH_TOPOLOGY_H_ */
> > --
> > 2.34.1
> >

2023-10-25 11:54:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] cpufreq/schedutil: use a fixed reference frequency

On Wed, Oct 18, 2023 at 06:25:37PM +0200, Vincent Guittot wrote:

> +static __always_inline
> +unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> +{
> + unsigned int freq = arch_scale_freq_ref(policy->cpu);
> +
> + if (freq)
> + return freq;
> +
> + if (arch_scale_freq_invariant())
> + return policy->cpuinfo.max_freq;
> +
> + return policy->cur;
> +}

Hmm, what should x86 do here? I know it mostly doesn't use these things,
but would it make sense to stick the base frequency in ?

2023-10-25 11:55:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] energy_model: use a fixed reference frequency

On Wed, Oct 18, 2023 at 06:25:38PM +0200, Vincent Guittot wrote:
> The last item of a performance domain is not always the performance point
> that has been used to compute CPU's capacity. This can lead to different
> target frequency compared with other part of the system like schedutil and
> would result in wrong energy estimation.
>
> A new arch_scale_freq_ref() is available to return a fixed and coherent
> frequency reference that can be used when computing the CPU's frequency
> for an level of utilization. Use this function to get this reference
> frequency.
>
> Energy model is never used without defining arch_scale_freq_ref() but
> can be compiled. Define a default arch_scale_freq_ref() returning 0
> in such case.
>
> Signed-off-by: Vincent Guittot <[email protected]>
> Reviewed-by: Lukasz Luba <[email protected]>
> Tested-by: Lukasz Luba <[email protected]>
>
> ---
> include/linux/energy_model.h | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index b9caa01dfac4..1b0c8490d4bd 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -204,6 +204,14 @@ struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd,
> return ps;
> }
>
> +#ifndef arch_scale_freq_ref
> +static __always_inline
> +unsigned int arch_scale_freq_ref(int cpu)
> +{
> + return 0;
> +}
> +#endif

Hmm, did I not see the exact same thing in cpufreq.h two patches ago?

2023-10-25 12:12:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] energy_model: use a fixed reference frequency

On Wed, Oct 25, 2023 at 01:54:56PM +0200, Peter Zijlstra wrote:

Also, can we pretty please not cross-post to moderated lists, that's
rude.

Rafael, can you please mark this acpica-devel crap as moderated in
MAINTAINERS so that it can be auto-magically excluded from receiving
mail?

2023-10-25 12:25:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] energy_model: use a fixed reference frequency

On Wed, Oct 25, 2023 at 2:11 PM Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Oct 25, 2023 at 01:54:56PM +0200, Peter Zijlstra wrote:
>
> Also, can we pretty please not cross-post to moderated lists, that's
> rude.
>
> Rafael, can you please mark this acpica-devel crap as moderated in
> MAINTAINERS so that it can be auto-magically excluded from receiving
> mail?

I'm actually not sure why it is moderated. I'll see if that can be changed.

2023-10-25 12:25:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] cpufreq/cppc: set the frequency used for computing the capacity

On Wed, Oct 18, 2023 at 06:25:39PM +0200, Vincent Guittot wrote:
> Save the frequency associated to the performance that has been used when
> initializing the capacity of CPUs.
> Also, cppc cpufreq driver can register an artificial energy model. In such
> case, it needs the frequency for this compute capacity.
> We moved and renamed cppc_perf_to_khz and cppc_perf_to_khz to use them

perf_to_khz and khz_to_perf presumably :-)

> outside cppc_cpufreq in topology_init_cpu_capacity_cppc().

2023-10-25 12:52:27

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] cpufreq/schedutil: use a fixed reference frequency

On Wed, 25 Oct 2023 at 13:53, Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Oct 18, 2023 at 06:25:37PM +0200, Vincent Guittot wrote:
>
> > +static __always_inline
> > +unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> > +{
> > + unsigned int freq = arch_scale_freq_ref(policy->cpu);
> > +
> > + if (freq)
> > + return freq;
> > +
> > + if (arch_scale_freq_invariant())
> > + return policy->cpuinfo.max_freq;
> > +
> > + return policy->cur;
> > +}
>
> Hmm, what should x86 do here? I know it mostly doesn't use these things,
> but would it make sense to stick the base frequency in ?

get_capacity_ref_freq() should return the frequency that is used as
the reference for the max compute capacity.

On Arm64, we have seen some inconsistency especially because we use
the energy model, we compute the CPU's capacity at boot and we can
have different compute capacity in our system whereas x86 always uses
SCHED_CAPACITY_SCALE even on hybrid system if I'm not wrong

So I was not sure that this will make any difference for x86 platform

>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-10-25 12:53:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] cpufreq/cppc: set the frequency used for computing the capacity

On Wed, Oct 18, 2023 at 06:25:39PM +0200, Vincent Guittot wrote:
> Save the frequency associated to the performance that has been used when
> initializing the capacity of CPUs.
> Also, cppc cpufreq driver can register an artificial energy model. In such
> case, it needs the frequency for this compute capacity.
> We moved and renamed cppc_perf_to_khz and cppc_perf_to_khz to use them
> outside cppc_cpufreq in topology_init_cpu_capacity_cppc().
>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
> drivers/acpi/cppc_acpi.c | 93 ++++++++++++++++++++++
> drivers/base/arch_topology.c | 15 +++-
> drivers/cpufreq/cppc_cpufreq.c | 141 ++++++---------------------------
> include/acpi/cppc_acpi.h | 2 +
> 4 files changed, 133 insertions(+), 118 deletions(-)

Perhaps split this patch into code movement and actual change for ease
of review? As in, I'm having trouble finding the actual changes ;-)

2023-10-25 12:54:57

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] energy_model: use a fixed reference frequency

On Wed, 25 Oct 2023 at 13:55, Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Oct 18, 2023 at 06:25:38PM +0200, Vincent Guittot wrote:
> > The last item of a performance domain is not always the performance point
> > that has been used to compute CPU's capacity. This can lead to different
> > target frequency compared with other part of the system like schedutil and
> > would result in wrong energy estimation.
> >
> > A new arch_scale_freq_ref() is available to return a fixed and coherent
> > frequency reference that can be used when computing the CPU's frequency
> > for an level of utilization. Use this function to get this reference
> > frequency.
> >
> > Energy model is never used without defining arch_scale_freq_ref() but
> > can be compiled. Define a default arch_scale_freq_ref() returning 0
> > in such case.
> >
> > Signed-off-by: Vincent Guittot <[email protected]>
> > Reviewed-by: Lukasz Luba <[email protected]>
> > Tested-by: Lukasz Luba <[email protected]>
> >
> > ---
> > include/linux/energy_model.h | 14 +++++++++++---
> > 1 file changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> > index b9caa01dfac4..1b0c8490d4bd 100644
> > --- a/include/linux/energy_model.h
> > +++ b/include/linux/energy_model.h
> > @@ -204,6 +204,14 @@ struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd,
> > return ps;
> > }
> >
> > +#ifndef arch_scale_freq_ref
> > +static __always_inline
> > +unsigned int arch_scale_freq_ref(int cpu)
> > +{
> > + return 0;
> > +}
> > +#endif
>
> Hmm, did I not see the exact same thing in cpufreq.h two patches ago?

Probably, this has been added because of error returned by some
allyes/randconfig on x86

>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-10-25 12:58:50

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] cpufreq/cppc: set the frequency used for computing the capacity

On Wed, 25 Oct 2023 at 14:52, Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Oct 18, 2023 at 06:25:39PM +0200, Vincent Guittot wrote:
> > Save the frequency associated to the performance that has been used when
> > initializing the capacity of CPUs.
> > Also, cppc cpufreq driver can register an artificial energy model. In such
> > case, it needs the frequency for this compute capacity.
> > We moved and renamed cppc_perf_to_khz and cppc_perf_to_khz to use them
> > outside cppc_cpufreq in topology_init_cpu_capacity_cppc().
> >
> > Signed-off-by: Vincent Guittot <[email protected]>
> > ---
> > drivers/acpi/cppc_acpi.c | 93 ++++++++++++++++++++++
> > drivers/base/arch_topology.c | 15 +++-
> > drivers/cpufreq/cppc_cpufreq.c | 141 ++++++---------------------------
> > include/acpi/cppc_acpi.h | 2 +
> > 4 files changed, 133 insertions(+), 118 deletions(-)
>
> Perhaps split this patch into code movement and actual change for ease
> of review? As in, I'm having trouble finding the actual changes ;-)

yes, I can split it.

The actual change is located in drivers/base/arch_topology.c

>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-10-25 20:13:35

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] cpufreq/schedutil: use a fixed reference frequency

On 18/10/2023 18:25, Vincent Guittot wrote:
> cpuinfo.max_freq can change at runtime because of boost as an example. This
> implies that the value could be different than the one that has been
> used when computing the capacity of a CPU.
>
> The new arch_scale_freq_ref() returns a fixed and coherent reference
> frequency that can be used when computing a frequency based on utilization.
>
> Use this arch_scale_freq_ref() when available and fallback to
> policy otherwise.
>
> Signed-off-by: Vincent Guittot <[email protected]>
> Reviewed-by: Lukasz Luba <[email protected]>
> Tested-by: Lukasz Luba <[email protected]>
>
> ---
> kernel/sched/cpufreq_schedutil.c | 26 ++++++++++++++++++++++++--
> 1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 458d359f5991..6e4030482ae8 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -114,6 +114,28 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy)
> }
> }
>
> +/**
> + * cpufreq_get_capacity_ref_freq - get the reference frequency of a given CPU that

s/cpufreq_get_capacity_ref_freq/get_capacity_ref_freq

s/of a given CPU/for a given cpufreq policy ? (of which the CPU managing
it is used for the arch_scale_freq_ref() call in the function.

> + * has been used to correlate frequency and compute capacity.
> + * @policy: the cpufreq policy of the CPU in question.
> + * @use_current: Fallback to current freq instead of policy->cpuinfo.max_freq.

Looks like use_current does not exists as a parameter.

> + *
> + * Return: the reference CPU frequency to compute a capacity.
> + */
> +static __always_inline
> +unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> +{
> + unsigned int freq = arch_scale_freq_ref(policy->cpu);
> +
> + if (freq)
> + return freq;
> +
> + if (arch_scale_freq_invariant())
> + return policy->cpuinfo.max_freq;
> +
> + return policy->cur;
> +}

Reviewed-by: Dietmar Eggemann <[email protected]>



2023-10-26 11:20:10

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [RFC v3 6/6] arm64/amu: use capacity_ref_freq to set AMU ratio

On 24/10/2023 11:58, Vincent Guittot wrote:
> On Mon, 23 Oct 2023 at 22:58, Ionela Voinescu <[email protected]> wrote:
>>
>> Hi,
>>
>> On Wednesday 18 Oct 2023 at 18:25:40 (+0200), Vincent Guittot wrote:
>>> Use the new capacity_ref_freq to set the ratio that is used by AMU for
>>> computing the arch_scale_freq_capacity().
>>> This helps to keep everything aligned using the same reference for
>>> computing CPUs capacity.
>>>
>>> The default value of the ratio ensures that arch_scale_freq_capacity()
>>> returns max capacity until it is set to its correct value with the
>>> cpu capacity and capacity_ref_freq.

Nitpick: Could you mention that arch_max_freq_scale is the default value
for this ratio? Took me a while to recreate the (not so simple) story
for this change, i.e. make the connection between ratio and
arch_max_freq_scale.

init_cpu_capacity_callback()

freq_inv_set_max_ratio()

u64 ratio
...
per_cpu(arch_max_freq_scale, cpu) = (unsigned long)ratio
^^^^^^^^^^^^^^^^^^^


static struct scale_freq_data amu_sfd = {
.set_freq_scale = amu_scale_freq_tick,
}

#define arch_scale_freq_tick topology_scale_freq_tick

topology_scale_freq_tick()

sfd->set_freq_scale()


amu_scale_freq_tick()

...
scale *= this_cpu_read(arch_max_freq_scale)
^^^^^^^^^^^^^^^^^^^
...
this_cpu_write(arch_freq_scale, (unsigned long)scale);


#define arch_scale_freq_capacity topology_get_freq_scale

topology_get_freq_scale(cpu)

return per_cpu(arch_freq_scale, cpu)

[...]

2023-10-26 14:33:24

by Vincent Guittot

[permalink] [raw]
Subject: Re: [RFC v3 6/6] arm64/amu: use capacity_ref_freq to set AMU ratio

On Thu, 26 Oct 2023 at 13:19, Dietmar Eggemann <[email protected]> wrote:
>
> On 24/10/2023 11:58, Vincent Guittot wrote:
> > On Mon, 23 Oct 2023 at 22:58, Ionela Voinescu <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> On Wednesday 18 Oct 2023 at 18:25:40 (+0200), Vincent Guittot wrote:
> >>> Use the new capacity_ref_freq to set the ratio that is used by AMU for
> >>> computing the arch_scale_freq_capacity().
> >>> This helps to keep everything aligned using the same reference for
> >>> computing CPUs capacity.
> >>>
> >>> The default value of the ratio ensures that arch_scale_freq_capacity()
> >>> returns max capacity until it is set to its correct value with the
> >>> cpu capacity and capacity_ref_freq.
>
> Nitpick: Could you mention that arch_max_freq_scale is the default value
> for this ratio? Took me a while to recreate the (not so simple) story
> for this change, i.e. make the connection between ratio and
> arch_max_freq_scale.

something like :
"
The default value of the ratio (saved in per_cpu(arch_max_freq_scale)
ensures that arch_scale_freq_capacity() returns max capacity until it
is set to its correct value with the cpu capacity and
capacity_ref_freq.
"

Or I can rename the variable per_cpu(arch_max_freq_ratio)

>
> init_cpu_capacity_callback()
>
> freq_inv_set_max_ratio()
>
> u64 ratio
> ...
> per_cpu(arch_max_freq_scale, cpu) = (unsigned long)ratio
> ^^^^^^^^^^^^^^^^^^^
>
>
> static struct scale_freq_data amu_sfd = {
> .set_freq_scale = amu_scale_freq_tick,
> }
>
> #define arch_scale_freq_tick topology_scale_freq_tick
>
> topology_scale_freq_tick()
>
> sfd->set_freq_scale()
>
>
> amu_scale_freq_tick()
>
> ...
> scale *= this_cpu_read(arch_max_freq_scale)
> ^^^^^^^^^^^^^^^^^^^
> ...
> this_cpu_write(arch_freq_scale, (unsigned long)scale);
>
>
> #define arch_scale_freq_capacity topology_get_freq_scale
>
> topology_get_freq_scale(cpu)
>
> return per_cpu(arch_freq_scale, cpu)
>
> [...]
>

2023-10-26 15:14:29

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] cpufreq/schedutil: use a fixed reference frequency

On Wed, 25 Oct 2023 at 22:13, Dietmar Eggemann <[email protected]> wrote:
>
> On 18/10/2023 18:25, Vincent Guittot wrote:
> > cpuinfo.max_freq can change at runtime because of boost as an example. This
> > implies that the value could be different than the one that has been
> > used when computing the capacity of a CPU.
> >
> > The new arch_scale_freq_ref() returns a fixed and coherent reference
> > frequency that can be used when computing a frequency based on utilization.
> >
> > Use this arch_scale_freq_ref() when available and fallback to
> > policy otherwise.
> >
> > Signed-off-by: Vincent Guittot <[email protected]>
> > Reviewed-by: Lukasz Luba <[email protected]>
> > Tested-by: Lukasz Luba <[email protected]>
> >
> > ---
> > kernel/sched/cpufreq_schedutil.c | 26 ++++++++++++++++++++++++--
> > 1 file changed, 24 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 458d359f5991..6e4030482ae8 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -114,6 +114,28 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy)
> > }
> > }
> >
> > +/**
> > + * cpufreq_get_capacity_ref_freq - get the reference frequency of a given CPU that
>
> s/cpufreq_get_capacity_ref_freq/get_capacity_ref_freq
>
> s/of a given CPU/for a given cpufreq policy ? (of which the CPU managing
> it is used for the arch_scale_freq_ref() call in the function.
>
> > + * has been used to correlate frequency and compute capacity.
> > + * @policy: the cpufreq policy of the CPU in question.
> > + * @use_current: Fallback to current freq instead of policy->cpuinfo.max_freq.
>
> Looks like use_current does not exists as a parameter.

Thanks for the review. I'm going to apply your comments

>
> > + *
> > + * Return: the reference CPU frequency to compute a capacity.
> > + */
> > +static __always_inline
> > +unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> > +{
> > + unsigned int freq = arch_scale_freq_ref(policy->cpu);
> > +
> > + if (freq)
> > + return freq;
> > +
> > + if (arch_scale_freq_invariant())
> > + return policy->cpuinfo.max_freq;
> > +
> > + return policy->cur;
> > +}
>
> Reviewed-by: Dietmar Eggemann <[email protected]>
>
>
>