LinuxLists.cc - [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

2021-06-21 09:21:29

by Viresh Kumar

[permalink] [raw]

Subject: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

Hello,

Changes since V2:

- We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it
work using policy ->init() and exit() alone.

- Two new cleanup patches 1/4 and 2/4.

- Improved commit log of 3/4.

- Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's
overlap (seen with Vincent's setup).

- Handle stuff from init/exit() callbacks only.

Changes since V1:

- Few of the patches migrating users to ->exit() callback are posted separately.

- The CPPC patch was completely reverted and so the support for FIE is again
added here from scratch.

- The start_cpu() and stop_cpu() interface is reworked a little so stop_cpu() is
only ever called for a CPU if start_cpu() was called for it earlier.

- A new patch to implement RCU locking in arch_topology core to avoid some
races.

- Some cleanup and very clear/separate paths for FIE in cppc driver now.

-------------------------8<-------------------------

CPPC cpufreq driver is used for ARM servers and this patch series tries to
provide counter-based frequency invariance support for them in the absence for
architecture specific counters (like AMUs).

This was reverted earlier for the 5.13 kernel after Qian Cai reported kernel
oops during suspend/resume.

This is based of v5.13-rc7 + a cleanup patchset:

https://lore.kernel.org/linux-pm/[email protected]/

All the patches are pushed here together for people to run.

https://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git/log/?h=cpufreq/cppc

This is tested on my Hikey platform (without the actual read/write to
performance counters), with this script for over an hour:

while true; do
for i in `seq 1 7`;
do
echo 0 > /sys/devices/system/cpu/cpu$i/online;
done;

for i in `seq 1 7`;
do
echo 1 > /sys/devices/system/cpu/cpu$i/online;
done;
done

The same is done by Vincent on ThunderX2 and no issues were seen.

I would like to get this merged for 5.14, since it was recently reverted from
5.13. And that it is still an independent change to a single driver and topology
APIs that no one is using apart from arm64 topology stuff.

Thanks.

--
Viresh

Viresh Kumar (4):
cpufreq: cppc: Fix potential memleak in cppc_cpufreq_cpu_init
cpufreq: cppc: Pass structure instance by reference
arch_topology: Avoid use-after-free for scale_freq_data
cpufreq: CPPC: Add support for frequency invariance

drivers/base/arch_topology.c | 27 +++-
drivers/cpufreq/Kconfig.arm | 10 ++
drivers/cpufreq/cppc_cpufreq.c | 287 +++++++++++++++++++++++++++++----
include/linux/arch_topology.h | 1 +
kernel/sched/core.c | 1 +
5 files changed, 292 insertions(+), 34 deletions(-)

--
2.31.1.272.g89b43f80a514

2021-06-21 09:21:41

by Viresh Kumar

[permalink] [raw]

Subject: [PATCH V3 1/4] cpufreq: cppc: Fix potential memleak in cppc_cpufreq_cpu_init

It's a classic example of memleak, we allocate something, we fail and
never free the resources.

Make sure we free all resources on policy ->init() failures.

Fixes: a28b2bfc099c ("cppc_cpufreq: replace per-cpu data array with a list")
Tested-by: Vincent Guittot <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cppc_cpufreq.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index be4f62e2c5f1..35b8ae66d1fb 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -256,6 +256,16 @@ static struct cppc_cpudata *cppc_cpufreq_get_cpu_data(unsigned int cpu)
return NULL;
}

+static void cppc_cpufreq_put_cpu_data(struct cpufreq_policy *policy)
+{
+ struct cppc_cpudata *cpu_data = policy->driver_data;
+
+ list_del(&cpu_data->node);
+ free_cpumask_var(cpu_data->shared_cpu_map);
+ kfree(cpu_data);
+ policy->driver_data = NULL;
+}
+
static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
{
unsigned int cpu = policy->cpu;
@@ -309,7 +319,8 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
default:
pr_debug("Unsupported CPU co-ord type: %d\n",
policy->shared_type);
- return -EFAULT;
+ ret = -EFAULT;
+ goto out;
}

/*
@@ -324,10 +335,14 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
cpu_data->perf_ctrls.desired_perf = caps->highest_perf;

ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
- if (ret)
- pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
- caps->highest_perf, cpu, ret);
+ if (!ret)
+ return 0;

+ pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
+ caps->highest_perf, cpu, ret);
+
+out:
+ cppc_cpufreq_put_cpu_data(policy);
return ret;
}

@@ -345,12 +360,7 @@ static int cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
caps->lowest_perf, cpu, ret);

- /* Remove CPU node from list and free driver data for policy */
- free_cpumask_var(cpu_data->shared_cpu_map);
- list_del(&cpu_data->node);
- kfree(policy->driver_data);
- policy->driver_data = NULL;
-
+ cppc_cpufreq_put_cpu_data(policy);
return 0;
}

--
2.31.1.272.g89b43f80a514

2021-06-21 09:21:45

by Viresh Kumar

[permalink] [raw]

Subject: [PATCH V3 2/4] cpufreq: cppc: Pass structure instance by reference

Don't pass structure instance by value, pass it by reference instead.

Tested-by: Vincent Guittot <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cppc_cpufreq.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 35b8ae66d1fb..490175d65082 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -373,18 +373,18 @@ static inline u64 get_delta(u64 t1, u64 t0)
}

static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
- struct cppc_perf_fb_ctrs fb_ctrs_t0,
- struct cppc_perf_fb_ctrs fb_ctrs_t1)
+ struct cppc_perf_fb_ctrs *fb_ctrs_t0,
+ struct cppc_perf_fb_ctrs *fb_ctrs_t1)
{
u64 delta_reference, delta_delivered;
u64 reference_perf, delivered_perf;

- reference_perf = fb_ctrs_t0.reference_perf;
+ reference_perf = fb_ctrs_t0->reference_perf;

- delta_reference = get_delta(fb_ctrs_t1.reference,
- fb_ctrs_t0.reference);
- delta_delivered = get_delta(fb_ctrs_t1.delivered,
- fb_ctrs_t0.delivered);
+ delta_reference = get_delta(fb_ctrs_t1->reference,
+ fb_ctrs_t0->reference);
+ delta_delivered = get_delta(fb_ctrs_t1->delivered,
+ fb_ctrs_t0->delivered);

/* Check to avoid divide-by zero */
if (delta_reference || delta_delivered)
@@ -415,7 +415,7 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
if (ret)
return ret;

- return cppc_get_rate_from_fbctrs(cpu_data, fb_ctrs_t0, fb_ctrs_t1);
+ return cppc_get_rate_from_fbctrs(cpu_data, &fb_ctrs_t0, &fb_ctrs_t1);
}

static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
--
2.31.1.272.g89b43f80a514

2021-06-21 09:22:27

by Viresh Kumar

[permalink] [raw]

Subject: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance

The Frequency Invariance Engine (FIE) is providing a frequency scaling
correction factor that helps achieve more accurate load-tracking.

Normally, this scaling factor can be obtained directly with the help of
the cpufreq drivers as they know the exact frequency the hardware is
running at. But that isn't the case for CPPC cpufreq driver.

Another way of obtaining that is using the arch specific counter
support, which is already present in kernel, but that hardware is
optional for platforms.

This patch updates the CPPC driver to register itself with the topology
core to provide its own implementation (cppc_scale_freq_tick()) of
topology_scale_freq_tick() which gets called by the scheduler on every
tick. Note that the arch specific counters have higher priority than
CPPC counters, if available, though the CPPC driver doesn't need to have
any special handling for that.

On an invocation of cppc_scale_freq_tick(), we schedule an irq work
(since we reach here from hard-irq context), which then schedules a
normal work item and cppc_scale_freq_workfn() updates the per_cpu
arch_freq_scale variable based on the counter updates since the last
tick.

To allow platforms to disable this CPPC counter-based frequency
invariance support, this is all done under CONFIG_ACPI_CPPC_CPUFREQ_FIE,
which is enabled by default.

This also exports sched_setattr_nocheck() as the CPPC driver can be
built as a module.

Cc: [email protected]
Tested-by: Vincent Guittot <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/Kconfig.arm | 10 ++
drivers/cpufreq/cppc_cpufreq.c | 249 +++++++++++++++++++++++++++++++--
include/linux/arch_topology.h | 1 +
kernel/sched/core.c | 1 +
4 files changed, 247 insertions(+), 14 deletions(-)

diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index e65e0a43be64..a5c5f70acfc9 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -19,6 +19,16 @@ config ACPI_CPPC_CPUFREQ

If in doubt, say N.

+config ACPI_CPPC_CPUFREQ_FIE
+ bool "Frequency Invariance support for CPPC cpufreq driver"
+ depends on ACPI_CPPC_CPUFREQ && GENERIC_ARCH_TOPOLOGY
+ default y
+ help
+ This extends frequency invariance support in the CPPC cpufreq driver,
+ by using CPPC delivered and reference performance counters.
+
+ If in doubt, say N.
+
config ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM
tristate "Allwinner nvmem based SUN50I CPUFreq driver"
depends on ARCH_SUNXI
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 490175d65082..db550fc92931 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -10,14 +10,18 @@

#define pr_fmt(fmt) "CPPC Cpufreq:" fmt

+#include <linux/arch_topology.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/cpu.h>
#include <linux/cpufreq.h>
#include <linux/dmi.h>
+#include <linux/irq_work.h>
+#include <linux/kthread.h>
#include <linux/time.h>
#include <linux/vmalloc.h>
+#include <uapi/linux/sched/types.h>

#include <asm/unaligned.h>

@@ -57,6 +61,210 @@ static struct cppc_workaround_oem_info wa_info[] = {
}
};

+#ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
+
+/* Frequency invariance support */
+struct cppc_freq_invariance {
+ int cpu;
+ struct irq_work irq_work;
+ struct kthread_work work;
+ struct cppc_perf_fb_ctrs prev_perf_fb_ctrs;
+ struct cppc_cpudata *cpu_data;
+};
+
+static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
+static struct kthread_worker *kworker_fie;
+
+static struct cpufreq_driver cppc_cpufreq_driver;
+static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpu);
+static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data,
+ struct cppc_perf_fb_ctrs *fb_ctrs_t0,
+ struct cppc_perf_fb_ctrs *fb_ctrs_t1);
+
+/**
+ * cppc_scale_freq_workfn - CPPC arch_freq_scale updater for frequency invariance
+ * @work: The work item.
+ *
+ * The CPPC driver register itself with the topology core to provide its own
+ * implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which
+ * gets called by the scheduler on every tick.
+ *
+ * Note that the arch specific counters have higher priority than CPPC counters,
+ * if available, though the CPPC driver doesn't need to have any special
+ * handling for that.
+ *
+ * On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we
+ * reach here from hard-irq context), which then schedules a normal work item
+ * and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable
+ * based on the counter updates since the last tick.
+ */
+static void cppc_scale_freq_workfn(struct kthread_work *work)
+{
+ struct cppc_freq_invariance *cppc_fi;
+ struct cppc_perf_fb_ctrs fb_ctrs = {0};
+ struct cppc_cpudata *cpu_data;
+ unsigned long local_freq_scale;
+ u64 perf;
+
+ cppc_fi = container_of(work, struct cppc_freq_invariance, work);
+ cpu_data = cppc_fi->cpu_data;
+
+ if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
+ pr_warn("%s: failed to read perf counters\n", __func__);
+ return;
+ }
+
+ perf = cppc_perf_from_fbctrs(cpu_data, &cppc_fi->prev_perf_fb_ctrs,
+ &fb_ctrs);
+ cppc_fi->prev_perf_fb_ctrs = fb_ctrs;
+
+ perf <<= SCHED_CAPACITY_SHIFT;
+ local_freq_scale = div64_u64(perf, cpu_data->perf_caps.highest_perf);
+
+ /* This can happen due to counter's overflow */
+ if (unlikely(local_freq_scale > 1024))
+ local_freq_scale = 1024;
+
+ per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale;
+}
+
+static void cppc_irq_work(struct irq_work *irq_work)
+{
+ struct cppc_freq_invariance *cppc_fi;
+
+ cppc_fi = container_of(irq_work, struct cppc_freq_invariance, irq_work);
+ kthread_queue_work(kworker_fie, &cppc_fi->work);
+}
+
+static void cppc_scale_freq_tick(void)
+{
+ struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id());
+
+ /*
+ * cppc_get_perf_ctrs() can potentially sleep, call that from the right
+ * context.
+ */
+ irq_work_queue(&cppc_fi->irq_work);
+}
+
+static struct scale_freq_data cppc_sftd = {
+ .source = SCALE_FREQ_SOURCE_CPPC,
+ .set_freq_scale = cppc_scale_freq_tick,
+};
+
+static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
+{
+ struct cppc_freq_invariance *cppc_fi;
+ int cpu, ret;
+
+ if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+ return;
+
+ for_each_cpu(cpu, policy->cpus) {
+ cppc_fi = &per_cpu(cppc_freq_inv, cpu);
+ cppc_fi->cpu = cpu;
+ cppc_fi->cpu_data = policy->driver_data;
+ kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
+ init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
+
+ ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
+ if (ret) {
+ pr_warn("%s: failed to read perf counters for cpu:%d: %d\n",
+ __func__, cpu, ret);
+ return;
+ }
+ }
+
+ /* Register for freq-invariance */
+ topology_set_scale_freq_source(&cppc_sftd, policy->cpus);
+}
+
+/*
+ * We free all the resources on policy's removal and not on CPU removal as the
+ * irq-work are per-cpu and the hotplug core takes care of flushing the pending
+ * irq-works (hint: smpcfd_dying_cpu()) on CPU hotplug. Even if the kthread-work
+ * fires on another CPU after the concerned CPU is removed, it won't harm.
+ *
+ * We just need to make sure to remove them all on policy->exit().
+ */
+static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
+{
+ struct cppc_freq_invariance *cppc_fi;
+ int cpu;
+
+ if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+ return;
+
+ /* policy->cpus will be empty here, use related_cpus instead */
+ topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
+
+ for_each_cpu(cpu, policy->related_cpus) {
+ cppc_fi = &per_cpu(cppc_freq_inv, cpu);
+ irq_work_sync(&cppc_fi->irq_work);
+ kthread_cancel_work_sync(&cppc_fi->work);
+ }
+}
+
+static void __init cppc_freq_invariance_init(void)
+{
+ struct sched_attr attr = {
+ .size = sizeof(struct sched_attr),
+ .sched_policy = SCHED_DEADLINE,
+ .sched_nice = 0,
+ .sched_priority = 0,
+ /*
+ * Fake (unused) bandwidth; workaround to "fix"
+ * priority inheritance.
+ */
+ .sched_runtime = 1000000,
+ .sched_deadline = 10000000,
+ .sched_period = 10000000,
+ };
+ int ret;
+
+ if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+ return;
+
+ kworker_fie = kthread_create_worker(0, "cppc_fie");
+ if (IS_ERR(kworker_fie))
+ return;
+
+ ret = sched_setattr_nocheck(kworker_fie->task, &attr);
+ if (ret) {
+ pr_warn("%s: failed to set SCHED_DEADLINE: %d\n", __func__,
+ ret);
+ kthread_destroy_worker(kworker_fie);
+ return;
+ }
+}
+
+static void cppc_freq_invariance_exit(void)
+{
+ if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
+ return;
+
+ kthread_destroy_worker(kworker_fie);
+ kworker_fie = NULL;
+}
+
+#else
+static inline void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
+{
+}
+
+static inline void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
+{
+}
+
+static inline void cppc_freq_invariance_init(void)
+{
+}
+
+static inline void cppc_freq_invariance_exit(void)
+{
+}
+#endif /* CONFIG_ACPI_CPPC_CPUFREQ_FIE */
+
/* Callback function used to retrieve the max frequency from DMI */
static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
{
@@ -335,8 +543,10 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
cpu_data->perf_ctrls.desired_perf = caps->highest_perf;

ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
- if (!ret)
+ if (!ret) {
+ cppc_cpufreq_cpu_fie_init(policy);
return 0;
+ }

pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
caps->highest_perf, cpu, ret);
@@ -353,6 +563,8 @@ static int cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
unsigned int cpu = policy->cpu;
int ret;

+ cppc_cpufreq_cpu_fie_exit(policy);
+
cpu_data->perf_ctrls.desired_perf = caps->lowest_perf;

ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
@@ -372,12 +584,12 @@ static inline u64 get_delta(u64 t1, u64 t0)
return (u32)t1 - (u32)t0;
}

-static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
- struct cppc_perf_fb_ctrs *fb_ctrs_t0,
- struct cppc_perf_fb_ctrs *fb_ctrs_t1)
+static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data,
+ struct cppc_perf_fb_ctrs *fb_ctrs_t0,
+ struct cppc_perf_fb_ctrs *fb_ctrs_t1)
{
u64 delta_reference, delta_delivered;
- u64 reference_perf, delivered_perf;
+ u64 reference_perf;

reference_perf = fb_ctrs_t0->reference_perf;

@@ -386,14 +598,11 @@ static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data,
delta_delivered = get_delta(fb_ctrs_t1->delivered,
fb_ctrs_t0->delivered);

- /* Check to avoid divide-by zero */
- if (delta_reference || delta_delivered)
- delivered_perf = (reference_perf * delta_delivered) /
- delta_reference;
- else
- delivered_perf = cpu_data->perf_ctrls.desired_perf;
+ /* Check to avoid divide-by zero and invalid delivered_perf */
+ if (!delta_reference || !delta_delivered)
+ return cpu_data->perf_ctrls.desired_perf;

- return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf);
+ return (reference_perf * delta_delivered) / delta_reference;
}

static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
@@ -401,6 +610,7 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
struct cppc_cpudata *cpu_data = policy->driver_data;
+ u64 delivered_perf;
int ret;

cpufreq_cpu_put(policy);
@@ -415,7 +625,10 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
if (ret)
return ret;

- return cppc_get_rate_from_fbctrs(cpu_data, &fb_ctrs_t0, &fb_ctrs_t1);
+ delivered_perf = cppc_perf_from_fbctrs(cpu_data, &fb_ctrs_t0,
+ &fb_ctrs_t1);
+
+ return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf);
}

static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
@@ -516,14 +729,21 @@ static void cppc_check_hisi_workaround(void)

static int __init cppc_cpufreq_init(void)
{
+ int ret;
+
if ((acpi_disabled) || !acpi_cpc_valid())
return -ENODEV;

INIT_LIST_HEAD(&cpu_data_list);

cppc_check_hisi_workaround();
+ cppc_freq_invariance_init();

- return cpufreq_register_driver(&cppc_cpufreq_driver);
+ ret = cpufreq_register_driver(&cppc_cpufreq_driver);
+ if (ret)
+ cppc_freq_invariance_exit();
+
+ return ret;
}

static inline void free_cpu_data(void)
@@ -541,6 +761,7 @@ static inline void free_cpu_data(void)
static void __exit cppc_cpufreq_exit(void)
{
cpufreq_unregister_driver(&cppc_cpufreq_driver);
+ cppc_freq_invariance_exit();

free_cpu_data();
}
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 11e555cfaecb..f180240dc95f 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -37,6 +37,7 @@ bool topology_scale_freq_invariant(void);
enum scale_freq_source {
SCALE_FREQ_SOURCE_CPUFREQ = 0,
SCALE_FREQ_SOURCE_ARCH,
+ SCALE_FREQ_SOURCE_CPPC,
};

struct scale_freq_data {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4ca80df205ce..5226cc26a095 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6389,6 +6389,7 @@ int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)
{
return __sched_setscheduler(p, attr, false, true);
}
+EXPORT_SYMBOL_GPL(sched_setattr_nocheck);

/**
* sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.
--
2.31.1.272.g89b43f80a514

2021-06-21 09:22:58

by Viresh Kumar

[permalink] [raw]

Subject: [PATCH V3 3/4] arch_topology: Avoid use-after-free for scale_freq_data

Currently topology_scale_freq_tick() (which gets called from
scheduler_tick()) may end up using a pointer to "struct
scale_freq_data", which was previously cleared by
topology_clear_scale_freq_source(), as there is no protection in place
here. The users of topology_clear_scale_freq_source() though needs a
guarantee that the previously cleared scale_freq_data isn't used
anymore, so they can free the related resources.

Since topology_scale_freq_tick() is called from scheduler tick, we don't
want to add locking in there. Use the RCU update mechanism instead
(which is already used by the scheduler's utilization update path) to
guarantee race free updates here.

synchronize_rcu() makes sure that all RCU critical sections that started
before it is called, will finish before it returns. And so the callers
of topology_clear_scale_freq_source() don't need to worry about their
callback getting called anymore.

Cc: Paul E. McKenney <[email protected]>
Fixes: 01e055c120a4 ("arch_topology: Allow multiple entities to provide sched_freq_tick() callback")
Tested-by: Vincent Guittot <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/base/arch_topology.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index c1179edc0f3b..921312a8d957 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -18,10 +18,11 @@
#include <linux/cpumask.h>
#include <linux/init.h>
#include <linux/percpu.h>
+#include <linux/rcupdate.h>
#include <linux/sched.h>
#include <linux/smp.h>

-static DEFINE_PER_CPU(struct scale_freq_data *, sft_data);
+static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
static struct cpumask scale_freq_counters_mask;
static bool scale_freq_invariant;

@@ -66,16 +67,20 @@ void topology_set_scale_freq_source(struct scale_freq_data *data,
if (cpumask_empty(&scale_freq_counters_mask))
scale_freq_invariant = topology_scale_freq_invariant();

+ rcu_read_lock();
+
for_each_cpu(cpu, cpus) {
- sfd = per_cpu(sft_data, cpu);
+ sfd = rcu_dereference(*per_cpu_ptr(&sft_data, cpu));

/* Use ARCH provided counters whenever possible */
if (!sfd || sfd->source != SCALE_FREQ_SOURCE_ARCH) {
- per_cpu(sft_data, cpu) = data;
+ rcu_assign_pointer(per_cpu(sft_data, cpu), data);
cpumask_set_cpu(cpu, &scale_freq_counters_mask);
}
}

+ rcu_read_unlock();
+
update_scale_freq_invariant(true);
}
EXPORT_SYMBOL_GPL(topology_set_scale_freq_source);
@@ -86,22 +91,32 @@ void topology_clear_scale_freq_source(enum scale_freq_source source,
struct scale_freq_data *sfd;
int cpu;

+ rcu_read_lock();
+
for_each_cpu(cpu, cpus) {
- sfd = per_cpu(sft_data, cpu);
+ sfd = rcu_dereference(*per_cpu_ptr(&sft_data, cpu));

if (sfd && sfd->source == source) {
- per_cpu(sft_data, cpu) = NULL;
+ rcu_assign_pointer(per_cpu(sft_data, cpu), NULL);
cpumask_clear_cpu(cpu, &scale_freq_counters_mask);
}
}

+ rcu_read_unlock();
+
+ /*
+ * Make sure all references to previous sft_data are dropped to avoid
+ * use-after-free races.
+ */
+ synchronize_rcu();
+
update_scale_freq_invariant(false);
}
EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source);

void topology_scale_freq_tick(void)
{
- struct scale_freq_data *sfd = *this_cpu_ptr(&sft_data);
+ struct scale_freq_data *sfd = rcu_dereference_sched(*this_cpu_ptr(&sft_data));

if (sfd)
sfd->set_freq_scale();
--
2.31.1.272.g89b43f80a514

2021-06-21 20:52:03

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 6/21/2021 5:19 AM, Viresh Kumar wrote:
> CPPC cpufreq driver is used for ARM servers and this patch series tries to
> provide counter-based frequency invariance support for them in the absence for
> architecture specific counters (like AMUs).

Viresh, this series works fine on my quick tests so far. BTW, I noticed some strange things even with the series applied mentioned below when reading acpi_cppc vs cpufreq sysfs. Do you happen to know are those just hardware/firmware issues because Linux just faithfully exported the register values?

== Arm64 server Foo ==
CPU max MHz: 3000.0000
CPU min MHz: 1000.0000

/sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf
300
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq
1000
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf
200
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf
100
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- should be 3000?
2800
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf <--- should be 300?
280
/sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf
100

== Arm64 server Bar ==
CPU max MHz: 3000.0000
CPU min MHz: 375.0000

/sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf <--- should be 3000? There is no cpufreq boost.
3300
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq <--- don't understand why 0.
0
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf
375
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf
375
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- ditto
0
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf
3000
/sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf
100

2021-06-22 06:56:33

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 21-06-21, 16:48, Qian Cai wrote:
> Viresh, this series works fine on my quick tests so far.

Thanks for testing.

> BTW, I
> noticed some strange things even with the series applied mentioned
> below when reading acpi_cppc vs cpufreq sysfs. Do you happen to know
> are those just hardware/firmware issues because Linux just
> faithfully exported the register values?

The values are exported by drivers/acpi/cppc_acpi.c I believe and they
look to be based on simple register reads.

> == Arm64 server Foo ==
> CPU max MHz: 3000.0000
> CPU min MHz: 1000.0000
>
> /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf
> 300
> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq
> 1000
> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf
> 200
> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf
> 100
> /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- should be 3000?
> 2800
> /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf <--- should be 300?
> 280

nominal-perf is max perf, and highest-perf is boost-perf. Same goes
with nominal-freq (i.e. policy->max).

So 280 and 2800 look to be the correct values, 300 and 3000 come with
boost enabled. Look at the first entry, highest_perf.

> /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf
> 100
>
> == Arm64 server Bar ==
> CPU max MHz: 3000.0000
> CPU min MHz: 375.0000
>
> /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf <--- should be 3000? There is no cpufreq boost.
> 3300

This isn't exported by cpufreq driver but acpi, and it just exports
hardware values of highest_perf (with boost i.e.). cpufreq may or
may not use this to support boost.

> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq <--- don't understand why 0.
> 0

Because corresponding hardware registers aren't implemented for your
platform, this is the function that reads these registers:

int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
{
...

/* Read optional lowest and nominal frequencies if present */
if (CPC_SUPPORTED(low_freq_reg))
cpc_read(cpunum, low_freq_reg, &low_f);

if (CPC_SUPPORTED(nom_freq_reg))
cpc_read(cpunum, nom_freq_reg, &nom_f);

perf_caps->lowest_freq = low_f;
perf_caps->nominal_freq = nom_f;
}

> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf
> 375
> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf
> 375
> /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- ditto
> 0
> /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf
> 3000
> /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf
> 100

--
viresh

2021-06-23 04:18:28

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 21-06-21, 16:48, Qian Cai wrote:
>
>
> On 6/21/2021 5:19 AM, Viresh Kumar wrote:
> > CPPC cpufreq driver is used for ARM servers and this patch series tries to
> > provide counter-based frequency invariance support for them in the absence for
> > architecture specific counters (like AMUs).
>
> Viresh, this series works fine on my quick tests so far.

Do you want me to add your Tested-by for the series ?

--
viresh

2021-06-23 12:59:46

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 6/23/2021 12:16 AM, Viresh Kumar wrote:
> On 21-06-21, 16:48, Qian Cai wrote:
>>
>>
>> On 6/21/2021 5:19 AM, Viresh Kumar wrote:
>>> CPPC cpufreq driver is used for ARM servers and this patch series tries to
>>> provide counter-based frequency invariance support for them in the absence for
>>> architecture specific counters (like AMUs).
>>
>> Viresh, this series works fine on my quick tests so far.
>
> Do you want me to add your Tested-by for the series ?

Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in development, and will provide an update once ready. Also, I noticed the delivered perf is even smaller than lowest_perf (100).

# cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
ref:103377547901 del:54540736873
# cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
ref:103379170101 del:54541599117

100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53

My understanding is that the delivered perf should fail into the range between lowest_perf and highest_perf. Is that assumption correct? This happens on 5.4-based kernel, so I am in process running your series on that system to see if there is any differences. In any case, if it is a bug it is pre-existing, but I'd like to understand a bit better in that front first.

2021-06-23 13:48:15

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 1/4] cpufreq: cppc: Fix potential memleak in cppc_cpufreq_cpu_init

Hi,

On Monday 21 Jun 2021 at 14:49:34 (+0530), Viresh Kumar wrote:
> It's a classic example of memleak, we allocate something, we fail and
> never free the resources.
>
> Make sure we free all resources on policy ->init() failures.
>
> Fixes: a28b2bfc099c ("cppc_cpufreq: replace per-cpu data array with a list")

This is on me, thanks for the fix!

Might be better for this to be separate from the series, but I suppose
all will be going in 5.14 anyway.

> Tested-by: Vincent Guittot <[email protected]>
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> drivers/cpufreq/cppc_cpufreq.c | 30 ++++++++++++++++++++----------
> 1 file changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index be4f62e2c5f1..35b8ae66d1fb 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -256,6 +256,16 @@ static struct cppc_cpudata *cppc_cpufreq_get_cpu_data(unsigned int cpu)
> return NULL;
> }
>
> +static void cppc_cpufreq_put_cpu_data(struct cpufreq_policy *policy)
> +{
> + struct cppc_cpudata *cpu_data = policy->driver_data;
> +
> + list_del(&cpu_data->node);
> + free_cpumask_var(cpu_data->shared_cpu_map);
> + kfree(cpu_data);
> + policy->driver_data = NULL;
> +}
> +
> static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> {
> unsigned int cpu = policy->cpu;
> @@ -309,7 +319,8 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> default:
> pr_debug("Unsupported CPU co-ord type: %d\n",
> policy->shared_type);
> - return -EFAULT;
> + ret = -EFAULT;
> + goto out;
> }
>
> /*
> @@ -324,10 +335,14 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
>
> ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
> - if (ret)
> - pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
> - caps->highest_perf, cpu, ret);
> + if (!ret)
> + return 0;
>
> + pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
> + caps->highest_perf, cpu, ret);
> +

Nit: I would have preferred the more traditional:

if (ret) {
pr_debug();
goto out;
}

return 0;

It's always easier to read.

Thanks,
Ionela.

> +out:
> + cppc_cpufreq_put_cpu_data(policy);
> return ret;
> }
>
> @@ -345,12 +360,7 @@ static int cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
> caps->lowest_perf, cpu, ret);
>
> - /* Remove CPU node from list and free driver data for policy */
> - free_cpumask_var(cpu_data->shared_cpu_map);
> - list_del(&cpu_data->node);
> - kfree(policy->driver_data);
> - policy->driver_data = NULL;
> -
> + cppc_cpufreq_put_cpu_data(policy);
> return 0;
> }
>
> --
> 2.31.1.272.g89b43f80a514
>

2021-06-23 13:52:36

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 3/4] arch_topology: Avoid use-after-free for scale_freq_data

Hey,

On Monday 21 Jun 2021 at 14:49:36 (+0530), Viresh Kumar wrote:
> Currently topology_scale_freq_tick() (which gets called from
> scheduler_tick()) may end up using a pointer to "struct
> scale_freq_data", which was previously cleared by
> topology_clear_scale_freq_source(), as there is no protection in place
> here. The users of topology_clear_scale_freq_source() though needs a
> guarantee that the previously cleared scale_freq_data isn't used
> anymore, so they can free the related resources.
>
> Since topology_scale_freq_tick() is called from scheduler tick, we don't
> want to add locking in there. Use the RCU update mechanism instead
> (which is already used by the scheduler's utilization update path) to
> guarantee race free updates here.
>
> synchronize_rcu() makes sure that all RCU critical sections that started
> before it is called, will finish before it returns. And so the callers
> of topology_clear_scale_freq_source() don't need to worry about their
> callback getting called anymore.
>
> Cc: Paul E. McKenney <[email protected]>
> Fixes: 01e055c120a4 ("arch_topology: Allow multiple entities to provide sched_freq_tick() callback")
> Tested-by: Vincent Guittot <[email protected]>
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> drivers/base/arch_topology.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index c1179edc0f3b..921312a8d957 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -18,10 +18,11 @@
> #include <linux/cpumask.h>
> #include <linux/init.h>
> #include <linux/percpu.h>
> +#include <linux/rcupdate.h>
> #include <linux/sched.h>
> #include <linux/smp.h>
>
> -static DEFINE_PER_CPU(struct scale_freq_data *, sft_data);
> +static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
> static struct cpumask scale_freq_counters_mask;
> static bool scale_freq_invariant;
>
> @@ -66,16 +67,20 @@ void topology_set_scale_freq_source(struct scale_freq_data *data,
> if (cpumask_empty(&scale_freq_counters_mask))
> scale_freq_invariant = topology_scale_freq_invariant();
>
> + rcu_read_lock();
> +
> for_each_cpu(cpu, cpus) {
> - sfd = per_cpu(sft_data, cpu);
> + sfd = rcu_dereference(*per_cpu_ptr(&sft_data, cpu));
>
> /* Use ARCH provided counters whenever possible */
> if (!sfd || sfd->source != SCALE_FREQ_SOURCE_ARCH) {
> - per_cpu(sft_data, cpu) = data;
> + rcu_assign_pointer(per_cpu(sft_data, cpu), data);
> cpumask_set_cpu(cpu, &scale_freq_counters_mask);
> }
> }
>
> + rcu_read_unlock();
> +
> update_scale_freq_invariant(true);
> }
> EXPORT_SYMBOL_GPL(topology_set_scale_freq_source);
> @@ -86,22 +91,32 @@ void topology_clear_scale_freq_source(enum scale_freq_source source,
> struct scale_freq_data *sfd;
> int cpu;
>
> + rcu_read_lock();
> +
> for_each_cpu(cpu, cpus) {
> - sfd = per_cpu(sft_data, cpu);
> + sfd = rcu_dereference(*per_cpu_ptr(&sft_data, cpu));
>
> if (sfd && sfd->source == source) {
> - per_cpu(sft_data, cpu) = NULL;
> + rcu_assign_pointer(per_cpu(sft_data, cpu), NULL);
> cpumask_clear_cpu(cpu, &scale_freq_counters_mask);
> }
> }
>
> + rcu_read_unlock();
> +
> + /*
> + * Make sure all references to previous sft_data are dropped to avoid
> + * use-after-free races.
> + */
> + synchronize_rcu();
> +
> update_scale_freq_invariant(false);
> }
> EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source);
>
> void topology_scale_freq_tick(void)
> {
> - struct scale_freq_data *sfd = *this_cpu_ptr(&sft_data);
> + struct scale_freq_data *sfd = rcu_dereference_sched(*this_cpu_ptr(&sft_data));
>
> if (sfd)
> sfd->set_freq_scale();
> --
> 2.31.1.272.g89b43f80a514
>

Reviewed-by: Ionela Voinescu <[email protected]>

2021-06-24 02:12:16

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 1/4] cpufreq: cppc: Fix potential memleak in cppc_cpufreq_cpu_init

On 23-06-21, 14:44, Ionela Voinescu wrote:
> Might be better for this to be separate from the series, but I suppose
> all will be going in 5.14 anyway.

Right, it is easier to keep this all together for reviews.

> Nit: I would have preferred the more traditional:
>
> if (ret) {
> pr_debug();
> goto out;
> }
>
> return 0;
>
> It's always easier to read.

Sure.

--
viresh

2021-06-24 02:14:02

by Viresh Kumar

[permalink] [raw]

Subject: [PATCH V3.1 1/4] cpufreq: cppc: Fix potential memleak in cppc_cpufreq_cpu_init

It's a classic example of memleak, we allocate something, we fail and
never free the resources.

Make sure we free all resources on policy ->init() failures.

Fixes: a28b2bfc099c ("cppc_cpufreq: replace per-cpu data array with a list")
Tested-by: Vincent Guittot <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
---
V3->V3.1:
- Updated "if (!ret)" to "if (ret)", the more commonly used format.

drivers/cpufreq/cppc_cpufreq.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index be4f62e2c5f1..945ab4942c1c 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -256,6 +256,16 @@ static struct cppc_cpudata *cppc_cpufreq_get_cpu_data(unsigned int cpu)
return NULL;
}

+static void cppc_cpufreq_put_cpu_data(struct cpufreq_policy *policy)
+{
+ struct cppc_cpudata *cpu_data = policy->driver_data;
+
+ list_del(&cpu_data->node);
+ free_cpumask_var(cpu_data->shared_cpu_map);
+ kfree(cpu_data);
+ policy->driver_data = NULL;
+}
+
static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
{
unsigned int cpu = policy->cpu;
@@ -309,7 +319,8 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
default:
pr_debug("Unsupported CPU co-ord type: %d\n",
policy->shared_type);
- return -EFAULT;
+ ret = -EFAULT;
+ goto out;
}

/*
@@ -324,10 +335,16 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
cpu_data->perf_ctrls.desired_perf = caps->highest_perf;

ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
- if (ret)
+ if (ret) {
pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
caps->highest_perf, cpu, ret);
+ goto out;
+ }
+
+ return 0;

+out:
+ cppc_cpufreq_put_cpu_data(policy);
return ret;
}

@@ -345,12 +362,7 @@ static int cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
caps->lowest_perf, cpu, ret);

- /* Remove CPU node from list and free driver data for policy */
- free_cpumask_var(cpu_data->shared_cpu_map);
- list_del(&cpu_data->node);
- kfree(policy->driver_data);
- policy->driver_data = NULL;
-
+ cppc_cpufreq_put_cpu_data(policy);
return 0;
}

--
2.31.1.272.g89b43f80a514

2021-06-24 02:24:18

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 2/4] cpufreq: cppc: Pass structure instance by reference

On 23-06-21, 14:45, Ionela Voinescu wrote:
> On Monday 21 Jun 2021 at 14:49:35 (+0530), Viresh Kumar wrote:
> > Don't pass structure instance by value, pass it by reference instead.
> >
>
> Might be best to justify the change a bit :)

I had it and removed later as I thought it would be obvious :)

> For me this is a judgement call, and I don't really see the reasons for
> changing it: we don't care if the structure is modified or not, as we're
> not reusing the data after the call to cppc_get_rate_from_fbctrs().
> More so, in this scenario we might not even want for the called function
> to modify the counter values. Also there is no further call to a function
> in cppc_get_rate_from_fbctrs(), that might require references to the
> fb_ctrs.
>
> So what is the reason behind this change?

How about this commit log then:

Theoretically speaking, call by reference is cheaper/faster than call by value
for structures as the later requires the compiler to make a new copy of the
whole structure (which has four u64 values here), to be used by the called
function, while with call by reference we just need to pass a single pointer
(u64 on 64-bit architectures) to the existing structure.

Yes, on modern architectures, the compilers will likely end up using the
processor registers for passing this structure as it isn't doesn't have lot of
fields and it shouldn't be bad eventually, but nevertheless the code should do
the right thing without assuming about the compiler's or architecture's
optimizations.

Don't pass structure instance by value, pass it by reference instead.

--
viresh

2021-06-24 02:55:41

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 23-06-21, 08:57, Qian Cai wrote:
> Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in
> development, and will provide an update once ready.

Oh sure, np.

> Also, I noticed the delivered perf is even smaller than lowest_perf (100).

> # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> ref:103377547901 del:54540736873
> # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> ref:103379170101 del:54541599117
>
> 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53
>
> My understanding is that the delivered perf should fail into the range between
> lowest_perf and highest_perf. Is that assumption correct? This happens on
> 5.4-based kernel, so I am in process running your series on that system to see
> if there is any differences. In any case, if it is a bug it is pre-existing,
> but I'd like to understand a bit better in that front first.

Vincent:

Can that happen because of CPU idle ?

--
viresh

2021-06-24 09:49:50

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance

Hey,

On Monday 21 Jun 2021 at 14:49:37 (+0530), Viresh Kumar wrote:
> The Frequency Invariance Engine (FIE) is providing a frequency scaling
> correction factor that helps achieve more accurate load-tracking.
[..]
> +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
> +{
> + struct cppc_freq_invariance *cppc_fi;
> + int cpu;
> +
> + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> + return;
> +
> + /* policy->cpus will be empty here, use related_cpus instead */
> + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
> +
> + for_each_cpu(cpu, policy->related_cpus) {
> + cppc_fi = &per_cpu(cppc_freq_inv, cpu);

Do you think it might be worth having here something like:

if (!cppc_fi->cpu_data)
continue;

This would be to protect against cases where the platform does not boot
with all CPUs or the module is loaded after some have already been
offlined. Unlikely, but..

> + irq_work_sync(&cppc_fi->irq_work);
> + kthread_cancel_work_sync(&cppc_fi->work);
> + }
> +}

The rest of the code is almost the same as the original, so that is all
from me :).

Thanks,
Ionela.

2021-06-24 09:51:06

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <[email protected]> wrote:
>
> On 23-06-21, 08:57, Qian Cai wrote:
> > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in
> > development, and will provide an update once ready.
>
> Oh sure, np.
>
> > Also, I noticed the delivered perf is even smaller than lowest_perf (100).
>
> > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> > ref:103377547901 del:54540736873
> > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> > ref:103379170101 del:54541599117
> >
> > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53

I'm not sure that I understand your point. The formula above says that
cpu8 run @ 53% of nominal performance

> >
> > My understanding is that the delivered perf should fail into the range between
> > lowest_perf and highest_perf. Is that assumption correct? This happens on
> > 5.4-based kernel, so I am in process running your series on that system to see
> > if there is any differences. In any case, if it is a bug it is pre-existing,
> > but I'd like to understand a bit better in that front first.
>
> Vincent:
>
> Can that happen because of CPU idle ?
>
> --
> viresh

2021-06-24 10:49:55

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

Hi guys,

On Thursday 24 Jun 2021 at 11:49:53 (+0200), Vincent Guittot wrote:
> On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <[email protected]> wrote:
> >
> > On 23-06-21, 08:57, Qian Cai wrote:
> > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in
> > > development, and will provide an update once ready.
> >
> > Oh sure, np.
> >
> > > Also, I noticed the delivered perf is even smaller than lowest_perf (100).
> >
> > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> > > ref:103377547901 del:54540736873
> > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> > > ref:103379170101 del:54541599117
> > >
> > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53
>
> I'm not sure that I understand your point. The formula above says that
> cpu8 run @ 53% of nominal performance
>

I think this is based on a previous example Qian had where:

/sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf
300
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq
1000
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf
100
/sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf
100

..so the 100 is not from obtaining percentage, is the reference
performance.

The logic of the formula is to obtain the delivered performance when
knowing the number of ticks for each counter, so:

So if one gets (103379170101 - 103377547901) ticks for the counter at
running at 1GHz(perf 100), what is the frequency of the core, if its
counter ticked (54541599117 - 54540736873) times in the same interval
of time?

The answer is 530MHz(perf 53), which is lower than the lowest frequency
at 1GHz(perf 100).

> > >
> > > My understanding is that the delivered perf should fail into the range between
> > > lowest_perf and highest_perf. Is that assumption correct? This happens on
> > > 5.4-based kernel, so I am in process running your series on that system to see
> > > if there is any differences. In any case, if it is a bug it is pre-existing,
> > > but I'd like to understand a bit better in that front first.
> >
> > Vincent:
> >
> > Can that happen because of CPU idle ?
> >

Not if the counters are implemented properly. The kernel considers that
both reference and delivered performance counters should stop or reset
during idle. The kernel would not account for idle itself.

If the reference performance counter does not stop during idle, while
the core performance counter (delivered) does stop, the behavior above
should be seen very often.

Qian, do you see these small delivered performance values often or
seldom?

Thanks,
Ionela.

> > --
> > viresh

2021-06-24 11:16:22

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On Thu, 24 Jun 2021 at 12:48, Ionela Voinescu <[email protected]> wrote:
>
> Hi guys,
>
> On Thursday 24 Jun 2021 at 11:49:53 (+0200), Vincent Guittot wrote:
> > On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <[email protected]> wrote:
> > >
> > > On 23-06-21, 08:57, Qian Cai wrote:
> > > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in
> > > > development, and will provide an update once ready.
> > >
> > > Oh sure, np.
> > >
> > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100).
> > >
> > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> > > > ref:103377547901 del:54540736873
> > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> > > > ref:103379170101 del:54541599117
> > > >
> > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53
> >
> > I'm not sure that I understand your point. The formula above says that
> > cpu8 run @ 53% of nominal performance
> >
>
> I think this is based on a previous example Qian had where:
>
> /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf
> 300
> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq
> 1000
> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf
> 100
> /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf
> 100
>
> ..so the 100 is not from obtaining percentage, is the reference
> performance.
>
> The logic of the formula is to obtain the delivered performance when
> knowing the number of ticks for each counter, so:
>
> So if one gets (103379170101 - 103377547901) ticks for the counter at
> running at 1GHz(perf 100), what is the frequency of the core, if its
> counter ticked (54541599117 - 54540736873) times in the same interval
> of time?
>
> The answer is 530MHz(perf 53), which is lower than the lowest frequency
> at 1GHz(perf 100).

But the nominal_perf is 280 and not 100 if i'm not wrong so the perf
value is 148 > lowest_perf in this case

>
>
> > > >
> > > > My understanding is that the delivered perf should fail into the range between
> > > > lowest_perf and highest_perf. Is that assumption correct? This happens on
> > > > 5.4-based kernel, so I am in process running your series on that system to see
> > > > if there is any differences. In any case, if it is a bug it is pre-existing,
> > > > but I'd like to understand a bit better in that front first.
> > >
> > > Vincent:
> > >
> > > Can that happen because of CPU idle ?
> > >
>
> Not if the counters are implemented properly. The kernel considers that
> both reference and delivered performance counters should stop or reset
> during idle. The kernel would not account for idle itself.
>
> If the reference performance counter does not stop during idle, while
> the core performance counter (delivered) does stop, the behavior above
> should be seen very often.
>
> Qian, do you see these small delivered performance values often or
> seldom?
>
> Thanks,
> Ionela.
>
> > > --
> > > viresh

2021-06-24 11:25:22

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On Thursday 24 Jun 2021 at 13:15:04 (+0200), Vincent Guittot wrote:
> On Thu, 24 Jun 2021 at 12:48, Ionela Voinescu <[email protected]> wrote:
> >
> > Hi guys,
> >
> > On Thursday 24 Jun 2021 at 11:49:53 (+0200), Vincent Guittot wrote:
> > > On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <[email protected]> wrote:
> > > >
> > > > On 23-06-21, 08:57, Qian Cai wrote:
> > > > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in
> > > > > development, and will provide an update once ready.
> > > >
> > > > Oh sure, np.
> > > >
> > > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100).
> > > >
> > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> > > > > ref:103377547901 del:54540736873
> > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> > > > > ref:103379170101 del:54541599117
> > > > >
> > > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53
> > >
> > > I'm not sure that I understand your point. The formula above says that
> > > cpu8 run @ 53% of nominal performance
> > >
> >
> > I think this is based on a previous example Qian had where:
> >
> > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf
> > 300
> > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq
> > 1000
> > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf
> > 100
> > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf
> > 100
> >
> > ..so the 100 is not from obtaining percentage, is the reference
> > performance.
> >
> > The logic of the formula is to obtain the delivered performance when
> > knowing the number of ticks for each counter, so:
> >
> > So if one gets (103379170101 - 103377547901) ticks for the counter at
> > running at 1GHz(perf 100), what is the frequency of the core, if its
> > counter ticked (54541599117 - 54540736873) times in the same interval
> > of time?
> >
> > The answer is 530MHz(perf 53), which is lower than the lowest frequency
> > at 1GHz(perf 100).
>
> But the nominal_perf is 280 and not 100 if i'm not wrong so the perf
> value is 148 > lowest_perf in this case
>

Nominal performance has no meaning here. The reference counter ticks
with the frequency equivalent to reference performance.

Nominal performance is the maximum performance when !boost. Highest
performance is the maximum performance available including boost
frequencies. So nominal performance has no impact in these translations
from counter values to delivered performance.

Hope it helps,
Ionela.

>
> >
> >
> > > > >
> > > > > My understanding is that the delivered perf should fail into the range between
> > > > > lowest_perf and highest_perf. Is that assumption correct? This happens on
> > > > > 5.4-based kernel, so I am in process running your series on that system to see
> > > > > if there is any differences. In any case, if it is a bug it is pre-existing,
> > > > > but I'd like to understand a bit better in that front first.
> > > >
> > > > Vincent:
> > > >
> > > > Can that happen because of CPU idle ?
> > > >
> >
> > Not if the counters are implemented properly. The kernel considers that
> > both reference and delivered performance counters should stop or reset
> > during idle. The kernel would not account for idle itself.
> >
> > If the reference performance counter does not stop during idle, while
> > the core performance counter (delivered) does stop, the behavior above
> > should be seen very often.
> >
> > Qian, do you see these small delivered performance values often or
> > seldom?
> >
> > Thanks,
> > Ionela.
> >
> > > > --
> > > > viresh

2021-06-24 12:01:05

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On Thu, 24 Jun 2021 at 13:23, Ionela Voinescu <[email protected]> wrote:
>
> On Thursday 24 Jun 2021 at 13:15:04 (+0200), Vincent Guittot wrote:
> > On Thu, 24 Jun 2021 at 12:48, Ionela Voinescu <[email protected]> wrote:
> > >
> > > Hi guys,
> > >
> > > On Thursday 24 Jun 2021 at 11:49:53 (+0200), Vincent Guittot wrote:
> > > > On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <[email protected]> wrote:
> > > > >
> > > > > On 23-06-21, 08:57, Qian Cai wrote:
> > > > > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in
> > > > > > development, and will provide an update once ready.
> > > > >
> > > > > Oh sure, np.
> > > > >
> > > > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100).
> > > > >
> > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> > > > > > ref:103377547901 del:54540736873
> > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs
> > > > > > ref:103379170101 del:54541599117
> > > > > >
> > > > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53
> > > >
> > > > I'm not sure that I understand your point. The formula above says that
> > > > cpu8 run @ 53% of nominal performance
> > > >
> > >
> > > I think this is based on a previous example Qian had where:
> > >
> > > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf
> > > 300
> > > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq
> > > 1000
> > > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf
> > > 100
> > > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf
> > > 100
> > >
> > > ..so the 100 is not from obtaining percentage, is the reference
> > > performance.
> > >
> > > The logic of the formula is to obtain the delivered performance when
> > > knowing the number of ticks for each counter, so:
> > >
> > > So if one gets (103379170101 - 103377547901) ticks for the counter at
> > > running at 1GHz(perf 100), what is the frequency of the core, if its
> > > counter ticked (54541599117 - 54540736873) times in the same interval
> > > of time?
> > >
> > > The answer is 530MHz(perf 53), which is lower than the lowest frequency
> > > at 1GHz(perf 100).
> >
> > But the nominal_perf is 280 and not 100 if i'm not wrong so the perf
> > value is 148 > lowest_perf in this case
> >
>
> Nominal performance has no meaning here. The reference counter ticks
> with the frequency equivalent to reference performance.
>
> Nominal performance is the maximum performance when !boost. Highest
> performance is the maximum performance available including boost
> frequencies. So nominal performance has no impact in these translations
> from counter values to delivered performance.

my bad, nominal_perf == reference_perf on the systems that I have locally

>
> Hope it helps,
> Ionela.
>
> >
> > >
> > >
> > > > > >
> > > > > > My understanding is that the delivered perf should fail into the range between
> > > > > > lowest_perf and highest_perf. Is that assumption correct? This happens on
> > > > > > 5.4-based kernel, so I am in process running your series on that system to see
> > > > > > if there is any differences. In any case, if it is a bug it is pre-existing,
> > > > > > but I'd like to understand a bit better in that front first.
> > > > >
> > > > > Vincent:
> > > > >
> > > > > Can that happen because of CPU idle ?
> > > > >
> > >
> > > Not if the counters are implemented properly. The kernel considers that
> > > both reference and delivered performance counters should stop or reset
> > > during idle. The kernel would not account for idle itself.
> > >
> > > If the reference performance counter does not stop during idle, while
> > > the core performance counter (delivered) does stop, the behavior above
> > > should be seen very often.
> > >
> > > Qian, do you see these small delivered performance values often or
> > > seldom?
> > >
> > > Thanks,
> > > Ionela.
> > >
> > > > > --
> > > > > viresh

2021-06-24 13:05:29

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance

On 24-06-21, 10:48, Ionela Voinescu wrote:
> On Monday 21 Jun 2021 at 14:49:37 (+0530), Viresh Kumar wrote:
> > The Frequency Invariance Engine (FIE) is providing a frequency scaling
> > correction factor that helps achieve more accurate load-tracking.
> [..]
> > +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
> > +{
> > + struct cppc_freq_invariance *cppc_fi;
> > + int cpu;
> > +
> > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> > + return;
> > +
> > + /* policy->cpus will be empty here, use related_cpus instead */
> > + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
> > +
> > + for_each_cpu(cpu, policy->related_cpus) {
> > + cppc_fi = &per_cpu(cppc_freq_inv, cpu);
>
> Do you think it might be worth having here something like:
>
> if (!cppc_fi->cpu_data)
> continue;
>
> This would be to protect against cases where the platform does not boot
> with all CPUs or the module is loaded after some have already been
> offlined. Unlikely, but..

Even in that case policy->cpus will contain all offline+online CPUs (at ->init()
time), isn't it ?

> > + irq_work_sync(&cppc_fi->irq_work);
> > + kthread_cancel_work_sync(&cppc_fi->work);
> > + }
> > +}
>
> The rest of the code is almost the same as the original, so that is all
> from me :).
>
> Thanks,
> Ionela.

--
viresh

2021-06-24 15:19:35

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 6/24/2021 6:48 AM, Ionela Voinescu wrote:
> Not if the counters are implemented properly. The kernel considers that
> both reference and delivered performance counters should stop or reset
> during idle. The kernel would not account for idle itself.
>
> If the reference performance counter does not stop during idle, while
> the core performance counter (delivered) does stop, the behavior above
> should be seen very often.
>
> Qian, do you see these small delivered performance values often or
> seldom?

Ionela, so I managed to upgrade the kernel on the system to today's linux-next which suppose to include this series. The delivered perf is now 280. However, scaling_min_freq (200 MHz) is not equal to lowest_perf (100).

scaling_driver: acpi_cppc
scaling_governor: schedutil

Is that normal because lowest_nonlinear_perf is 200?

Also, on this pretty idle system, 158 of 160 CPUs are always running in max freq (280 MHz). The other 2 are running in 243 and 213 MHz according to scaling_cur_freq. Apparently, "schedutil" does not work proper on this system. I am going to try other governors to narrow down the issue a bit.

FYI, here is the acpi_cppc registers reading:

/sys/devices/system/cpu/cpu0/acpi_cppc/feedback_ctrs
ref:160705801 del:449594095
/sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf
300
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq
1000
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf
200
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf
100
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq
2800
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf
280
/sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf
100
/sys/devices/system/cpu/cpu0/acpi_cppc/wraparound_time
18446744073709551615

2021-06-24 20:46:38

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 6/24/2021 6:48 AM, Ionela Voinescu wrote:
> Not if the counters are implemented properly. The kernel considers that
> both reference and delivered performance counters should stop or reset
> during idle. The kernel would not account for idle itself.
>
> If the reference performance counter does not stop during idle, while
> the core performance counter (delivered) does stop, the behavior above
> should be seen very often.
>
> Qian, do you see these small delivered performance values often or
> seldom?

FYI, the latest data point it that on the new kernel, the delivered performance does match the cpufreq_cur_freq. IOW, feedback_ctrs works fine. Also, "powersave" governor could bring down the scaling_cur_freq to scaling_min_freq. Right now, looks like the puzzles on this particular system as mentioned in the previous post are,

1) lowest_freq/lowest_perf != scaling_min_freq
2) CPPC + schedutil is not able to scale down CPUs.

2021-06-25 08:57:03

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance

Hey,

On Thursday 24 Jun 2021 at 18:34:18 (+0530), Viresh Kumar wrote:
> On 24-06-21, 10:48, Ionela Voinescu wrote:
> > On Monday 21 Jun 2021 at 14:49:37 (+0530), Viresh Kumar wrote:
> > > The Frequency Invariance Engine (FIE) is providing a frequency scaling
> > > correction factor that helps achieve more accurate load-tracking.
> > [..]
> > > +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
> > > +{
> > > + struct cppc_freq_invariance *cppc_fi;
> > > + int cpu;
> > > +
> > > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> > > + return;
> > > +
> > > + /* policy->cpus will be empty here, use related_cpus instead */
> > > + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
> > > +
> > > + for_each_cpu(cpu, policy->related_cpus) {
> > > + cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> >
> > Do you think it might be worth having here something like:
> >
> > if (!cppc_fi->cpu_data)
> > continue;
> >
> > This would be to protect against cases where the platform does not boot
> > with all CPUs or the module is loaded after some have already been
> > offlined. Unlikely, but..
>
> Even in that case policy->cpus will contain all offline+online CPUs (at ->init()
> time), isn't it ?
>

Right, my bad. I missed cpumask_and(policy->cpus, policy->cpus,
cpu_online_mask) being done after init(). It logically seems a bit
wrong, but drivers are in control of setting policy->cpus and acting on
it, and in this case the driver does the right thing.

Thanks,
Ionela.

> > > + irq_work_sync(&cppc_fi->irq_work);
> > > + kthread_cancel_work_sync(&cppc_fi->work);
> > > + }
> > > +}
> >
> > The rest of the code is almost the same as the original, so that is all
> > from me :).
> >
> > Thanks,
> > Ionela.
>
> --
> viresh

2021-06-25 10:22:01

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

Hi Qian,

On Thursday 24 Jun 2021 at 11:17:55 (-0400), Qian Cai wrote:
>
>
> On 6/24/2021 6:48 AM, Ionela Voinescu wrote:
> > Not if the counters are implemented properly. The kernel considers that
> > both reference and delivered performance counters should stop or reset
> > during idle. The kernel would not account for idle itself.
> >
> > If the reference performance counter does not stop during idle, while
> > the core performance counter (delivered) does stop, the behavior above
> > should be seen very often.
> >
> > Qian, do you see these small delivered performance values often or
> > seldom?
>
> Ionela, so I managed to upgrade the kernel on the system to today's
> linux-next which suppose to include this series. The delivered perf
> is now 280. However, scaling_min_freq (200 MHz) is not equal to
> lowest_perf (100).
>
> scaling_driver: acpi_cppc
^^^^^^^^^
I suppose you mean "cppc-cpufreq"?

"acpi_cppc" is not a scaling driver option.

> scaling_governor: schedutil
>
> Is that normal because lowest_nonlinear_perf is 200?
>

Yes, that is right : [1]

> Also, on this pretty idle system, 158 of 160 CPUs are always running
> in max freq (280 MHz). The other 2 are running in 243 and 213 MHz
> according to scaling_cur_freq. Apparently, "schedutil" does not work
> proper on this system. I am going to try other governors to narrow
> down the issue a bit.

So your CPUs run at frequencies between 200MHz and 280MHz?
Based on your acpi_cppc information below I would have assumed 2GHz as
lowest nonlinear and 2.8GHz as nominal. The reason for this is that
according to the ACPI spec the frequency values in the _CPC objects are
supposed to be in MHz, so 2800 MHz for nominal frequency would be
2.8GHz.

When you try more governors, make sure to check out the difference
between scaling_cur_freq and cpuinfo_cur_freq at [2]. The first gives
you the frequency that the governor (schedutil) is asking for, while the
second is giving you the current frequency obtained from the counters.

So to check the actual frequency the cores are running at, please check
cpuinfo_cur_freq.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/cpufreq/cppc_cpufreq.c?h=v5.13-rc7#n296
[2] https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt

Thanks,
Ionela.

>
> FYI, here is the acpi_cppc registers reading:
>
> /sys/devices/system/cpu/cpu0/acpi_cppc/feedback_ctrs
> ref:160705801 del:449594095
> /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf
> 300
> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq
> 1000
> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf
> 200
> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf
> 100
> /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq
> 2800
> /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf
> 280
> /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf
> 100
> /sys/devices/system/cpu/cpu0/acpi_cppc/wraparound_time
> 18446744073709551615

2021-06-25 10:33:43

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 2/4] cpufreq: cppc: Pass structure instance by reference

Hey,

On Thursday 24 Jun 2021 at 07:52:52 (+0530), Viresh Kumar wrote:
> On 23-06-21, 14:45, Ionela Voinescu wrote:
> > On Monday 21 Jun 2021 at 14:49:35 (+0530), Viresh Kumar wrote:
> > > Don't pass structure instance by value, pass it by reference instead.
> > >
> >
> > Might be best to justify the change a bit :)
>
> I had it and removed later as I thought it would be obvious :)
>
> > For me this is a judgement call, and I don't really see the reasons for
> > changing it: we don't care if the structure is modified or not, as we're
> > not reusing the data after the call to cppc_get_rate_from_fbctrs().
> > More so, in this scenario we might not even want for the called function
> > to modify the counter values. Also there is no further call to a function
> > in cppc_get_rate_from_fbctrs(), that might require references to the
> > fb_ctrs.
> >
> > So what is the reason behind this change?
>
> How about this commit log then:
>
> Theoretically speaking, call by reference is cheaper/faster than call by value
> for structures as the later requires the compiler to make a new copy of the
> whole structure (which has four u64 values here), to be used by the called
> function, while with call by reference we just need to pass a single pointer
> (u64 on 64-bit architectures) to the existing structure.
>
> Yes, on modern architectures, the compilers will likely end up using the
> processor registers for passing this structure as it isn't doesn't have lot of
> fields and it shouldn't be bad eventually, but nevertheless the code should do
> the right thing without assuming about the compiler's or architecture's
> optimizations.
>

Yes, that's why "judgement call", which I'll let you make. The code is
sane and I like the longer commit message.

Reviewed-by: Ionela Voinescu <[email protected]>

> Don't pass structure instance by value, pass it by reference instead.
>
> --
> viresh

2021-06-25 10:34:25

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3.1 1/4] cpufreq: cppc: Fix potential memleak in cppc_cpufreq_cpu_init

On Thursday 24 Jun 2021 at 07:40:45 (+0530), Viresh Kumar wrote:
> It's a classic example of memleak, we allocate something, we fail and
> never free the resources.
>
> Make sure we free all resources on policy ->init() failures.
>
> Fixes: a28b2bfc099c ("cppc_cpufreq: replace per-cpu data array with a list")
> Tested-by: Vincent Guittot <[email protected]>
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> V3->V3.1:
> - Updated "if (!ret)" to "if (ret)", the more commonly used format.
>
> drivers/cpufreq/cppc_cpufreq.c | 28 ++++++++++++++++++++--------
> 1 file changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index be4f62e2c5f1..945ab4942c1c 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -256,6 +256,16 @@ static struct cppc_cpudata *cppc_cpufreq_get_cpu_data(unsigned int cpu)
> return NULL;
> }
>
> +static void cppc_cpufreq_put_cpu_data(struct cpufreq_policy *policy)
> +{
> + struct cppc_cpudata *cpu_data = policy->driver_data;
> +
> + list_del(&cpu_data->node);
> + free_cpumask_var(cpu_data->shared_cpu_map);
> + kfree(cpu_data);
> + policy->driver_data = NULL;
> +}
> +
> static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> {
> unsigned int cpu = policy->cpu;
> @@ -309,7 +319,8 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> default:
> pr_debug("Unsupported CPU co-ord type: %d\n",
> policy->shared_type);
> - return -EFAULT;
> + ret = -EFAULT;
> + goto out;
> }
>
> /*
> @@ -324,10 +335,16 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
>
> ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
> - if (ret)
> + if (ret) {
> pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
> caps->highest_perf, cpu, ret);
> + goto out;
> + }
> +
> + return 0;
>
> +out:
> + cppc_cpufreq_put_cpu_data(policy);
> return ret;
> }
>
> @@ -345,12 +362,7 @@ static int cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
> caps->lowest_perf, cpu, ret);
>
> - /* Remove CPU node from list and free driver data for policy */
> - free_cpumask_var(cpu_data->shared_cpu_map);
> - list_del(&cpu_data->node);
> - kfree(policy->driver_data);
> - policy->driver_data = NULL;
> -
> + cppc_cpufreq_put_cpu_data(policy);
> return 0;
> }
>
> --
> 2.31.1.272.g89b43f80a514
>

Reviewed-by: Ionela Voinescu <[email protected]>

Many thanks for the changes,
Ionela.

2021-06-25 13:33:28

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 6/25/2021 6:21 AM, Ionela Voinescu wrote:
>> scaling_driver: acpi_cppc
> ^^^^^^^^^
> I suppose you mean "cppc-cpufreq"?
>
> "acpi_cppc" is not a scaling driver option.

Ionela, yes. Sorry about that.

> So your CPUs run at frequencies between 200MHz and 280MHz?

2000 to 2800 MHz.

> Based on your acpi_cppc information below I would have assumed 2GHz as
> lowest nonlinear and 2.8GHz as nominal. The reason for this is that
> according to the ACPI spec the frequency values in the _CPC objects are
> supposed to be in MHz, so 2800 MHz for nominal frequency would be
> 2.8GHz.
>
> When you try more governors, make sure to check out the difference
> between scaling_cur_freq and cpuinfo_cur_freq at [2]. The first gives
> you the frequency that the governor (schedutil) is asking for, while the
> second is giving you the current frequency obtained from the counters.
>
> So to check the actual frequency the cores are running at, please check
> cpuinfo_cur_freq.

The problem is that all CPUs are never scaling down. "cpuinfo_cur_freq" and "scaling_cur_freq" are always the 2800 MHz on all CPUs on this idle system. This looks like a regression somewhere as in 5.4-based kernel, I can see "cpuinfo_cur_freq" can go down to 2000 MHz in the same scenario. I'll bisect a bit unless you have better ideas?

2021-06-25 14:40:00

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

Hey,

On Friday 25 Jun 2021 at 09:31:58 (-0400), Qian Cai wrote:
>
>
> On 6/25/2021 6:21 AM, Ionela Voinescu wrote:
> >> scaling_driver: acpi_cppc
> > ^^^^^^^^^
> > I suppose you mean "cppc-cpufreq"?
> >
> > "acpi_cppc" is not a scaling driver option.
>
> Ionela, yes. Sorry about that.
>
> > So your CPUs run at frequencies between 200MHz and 280MHz?
>
> 2000 to 2800 MHz.
>

Thank you for the clarification.

> > Based on your acpi_cppc information below I would have assumed 2GHz as
> > lowest nonlinear and 2.8GHz as nominal. The reason for this is that
> > according to the ACPI spec the frequency values in the _CPC objects are
> > supposed to be in MHz, so 2800 MHz for nominal frequency would be
> > 2.8GHz.
> >
> > When you try more governors, make sure to check out the difference
> > between scaling_cur_freq and cpuinfo_cur_freq at [2]. The first gives
> > you the frequency that the governor (schedutil) is asking for, while the
> > second is giving you the current frequency obtained from the counters.
> >
> > So to check the actual frequency the cores are running at, please check
> > cpuinfo_cur_freq.
>
> The problem is that all CPUs are never scaling down. "cpuinfo_cur_freq"
> and "scaling_cur_freq" are always the 2800 MHz on all CPUs on this idle
> system. This looks like a regression somewhere as in 5.4-based kernel,
> I can see "cpuinfo_cur_freq" can go down to 2000 MHz in the same
> scenario. I'll bisect a bit unless you have better ideas?

Quick questions for you:

1. When you say you tried a 5.4 kernel, did you try it with these
patches backported? They also have some dependencies with the recent
changes in the arch topology driver and cpufreq so they would not be
straight forward to backport.

If the 5.4 kernel you tried did not have these patches, it might be best
to try next/master that has these patches, but with
CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that
an incorrect frequency scale factor here would affect utilization that
would then affect the schedutil frequency selection. I would not expect
this behavior even if the scale factor was wrong, but it would be good
to rule out.

2. Is your platform booting with all CPUs? Are any hotplug operations
done in your scenario?

Thanks,
Ionela.

2021-06-25 16:55:52

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance

On 25-06-21, 09:54, Ionela Voinescu wrote:
> Hey,
>
> On Thursday 24 Jun 2021 at 18:34:18 (+0530), Viresh Kumar wrote:
> > On 24-06-21, 10:48, Ionela Voinescu wrote:
> > > On Monday 21 Jun 2021 at 14:49:37 (+0530), Viresh Kumar wrote:
> > > > The Frequency Invariance Engine (FIE) is providing a frequency scaling
> > > > correction factor that helps achieve more accurate load-tracking.
> > > [..]
> > > > +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
> > > > +{
> > > > + struct cppc_freq_invariance *cppc_fi;
> > > > + int cpu;
> > > > +
> > > > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> > > > + return;
> > > > +
> > > > + /* policy->cpus will be empty here, use related_cpus instead */
> > > > + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
> > > > +
> > > > + for_each_cpu(cpu, policy->related_cpus) {
> > > > + cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> > >
> > > Do you think it might be worth having here something like:
> > >
> > > if (!cppc_fi->cpu_data)
> > > continue;
> > >
> > > This would be to protect against cases where the platform does not boot
> > > with all CPUs or the module is loaded after some have already been
> > > offlined. Unlikely, but..
> >
> > Even in that case policy->cpus will contain all offline+online CPUs (at ->init()
> > time), isn't it ?
> >
>
> Right, my bad. I missed cpumask_and(policy->cpus, policy->cpus,
> cpu_online_mask) being done after init(). It logically seems a bit
> wrong, but drivers are in control of setting policy->cpus and acting on
> it, and in this case the driver does the right thing.

Do you want me to re-add your Reviewed-by here ?

--
viresh

2021-06-25 16:58:12

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 6/25/2021 10:37 AM, Ionela Voinescu wrote:
> Quick questions for you:
>
> 1. When you say you tried a 5.4 kernel, did you try it with these
> patches backported? They also have some dependencies with the recent
> changes in the arch topology driver and cpufreq so they would not be
> straight forward to backport.

No. It turned out that this 5.4-based kernel has "ondemand" governor by default which works fine which could even scale down to the lowest_perf (1000 MHz). Once switched the governor to "schedutil", it could keep the freq to the lowest. While on the latest kernel, it also works fine by using "ondemand" first and then switch to "schedutil". Even though it can only scale down to lowest_nonlinear_perf (2000 MHz). It is more of that using "schedutil" by default would not work. Also, on the latest kernel, even "userspace" governor only allows to scale down to 2000 MHz.

> If the 5.4 kernel you tried did not have these patches, it might be best
> to try next/master that has these patches, but with
> CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that
> an incorrect frequency scale factor here would affect utilization that
> would then affect the schedutil frequency selection. I would not expect
> this behavior even if the scale factor was wrong, but it would be good
> to rule out.

I'll try that at least see if CONFIG_ACPI_CPPC_CPUFREQ_FIE=n would make the latest kernel to be able to scale down to 1000 MHz.

> 2. Is your platform booting with all CPUs? Are any hotplug operations
> done in your scenario?

Yes, booting with all CPUs. No additional hotplug there.

2021-06-26 02:33:59

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 6/25/2021 10:37 AM, Ionela Voinescu wrote:
> Quick questions for you:
>
> 1. When you say you tried a 5.4 kernel, did you try it with these
> patches backported? They also have some dependencies with the recent
> changes in the arch topology driver and cpufreq so they would not be
> straight forward to backport.
>
> If the 5.4 kernel you tried did not have these patches, it might be best
> to try next/master that has these patches, but with
> CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that
> an incorrect frequency scale factor here would affect utilization that
> would then affect the schedutil frequency selection. I would not expect
> this behavior even if the scale factor was wrong, but it would be good
> to rule out.
>
> 2. Is your platform booting with all CPUs? Are any hotplug operations
> done in your scenario?

Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m will fix the previous mentioned issues here (any explanations of that?) even though the scaling down is not perfect. Now, we have the following on this idle system:

# cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
79 1000000
1 1160000
73 1400000
1 2000000
4 2010000
1 2800000
1 860000

Even if I rerun a few times, there could still have a few CPUs running lower than lowest_perf (1GHz). Also, even though I set all CPUs to use "userspace" governor and set freq to the lowest. A few CPUs keep changing at will.

# cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
156 1000000
3 2000000
1 760000

2021-06-26 13:50:16

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 6/25/2021 10:29 PM, Qian Cai wrote:
> Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m will fix the previous mentioned issues here (any explanations of that?) even though the scaling down is not perfect. Now, we have the following on this idle system:
>
> # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
> 79 1000000
> 1 1160000
> 73 1400000
> 1 2000000
> 4 2010000
> 1 2800000
> 1 860000
>
> Even if I rerun a few times, there could still have a few CPUs running lower than lowest_perf (1GHz). Also, even though I set all CPUs to use "userspace" governor and set freq to the lowest. A few CPUs keep changing at will.
>
> # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
> 156 1000000
> 3 2000000
> 1 760000

Another date point is that set ACPI_CPPC_CPUFREQ_FIE=n fixed the issue that any CPU could run below the lowest freq.

schedutil:
# cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
80 1000000
78 1400000
1 2010000
1 2800000

userspace:
# cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
158 1000000
2 2000000

2021-06-28 10:51:17

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance

On Friday 25 Jun 2021 at 22:24:18 (+0530), Viresh Kumar wrote:
> On 25-06-21, 09:54, Ionela Voinescu wrote:
> > Hey,
> >
> > On Thursday 24 Jun 2021 at 18:34:18 (+0530), Viresh Kumar wrote:
> > > On 24-06-21, 10:48, Ionela Voinescu wrote:
> > > > On Monday 21 Jun 2021 at 14:49:37 (+0530), Viresh Kumar wrote:
> > > > > The Frequency Invariance Engine (FIE) is providing a frequency scaling
> > > > > correction factor that helps achieve more accurate load-tracking.
> > > > [..]
> > > > > +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
> > > > > +{
> > > > > + struct cppc_freq_invariance *cppc_fi;
> > > > > + int cpu;
> > > > > +
> > > > > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> > > > > + return;
> > > > > +
> > > > > + /* policy->cpus will be empty here, use related_cpus instead */
> > > > > + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
> > > > > +
> > > > > + for_each_cpu(cpu, policy->related_cpus) {
> > > > > + cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> > > >
> > > > Do you think it might be worth having here something like:
> > > >
> > > > if (!cppc_fi->cpu_data)
> > > > continue;
> > > >
> > > > This would be to protect against cases where the platform does not boot
> > > > with all CPUs or the module is loaded after some have already been
> > > > offlined. Unlikely, but..
> > >
> > > Even in that case policy->cpus will contain all offline+online CPUs (at ->init()
> > > time), isn't it ?
> > >
> >
> > Right, my bad. I missed cpumask_and(policy->cpus, policy->cpus,
> > cpu_online_mask) being done after init(). It logically seems a bit
> > wrong, but drivers are in control of setting policy->cpus and acting on
> > it, and in this case the driver does the right thing.
>
> Do you want me to re-add your Reviewed-by here ?
>

To be honest I would like to have more time on this before you merge the
set, to better understand Qian's results and some observations I have
for Thunder X2 (I will share in a bit).

For the code, I think it's fine. I have a single observation regarding
the following code:

> +static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
> +{
> + struct cppc_freq_invariance *cppc_fi;
> + int cpu, ret;
> +
> + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> + return;
> +
> + for_each_cpu(cpu, policy->cpus) {
> + cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> + cppc_fi->cpu = cpu;
> + cppc_fi->cpu_data = policy->driver_data;
> + kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
> + init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
> +
> + ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
> + if (ret) {
> + pr_warn("%s: failed to read perf counters for cpu:%d: %d\n",
> + __func__, cpu, ret);
> + return;
> + }

For this condition above, think about a scenario where reading counters
for offline CPUs returns an error. I'm not sure if that can happen, to
be honest. That would mean here that you will never initialise the freq
source unless all CPUs in the policy are online at policy creation.

My recommendation is to warn about the failed read of perf counters but
only return from this function if the target CPU is online as well when
reading counters fails.

This is probably a nit, so I'll let you decide if you want to do something
about this.

Thanks,
Ionela.

> + }
> +
> + /* Register for freq-invariance */
> + topology_set_scale_freq_source(&cppc_sftd, policy->cpus);
> +}

> --
> viresh

2021-06-28 11:56:00

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

Hi guys,

On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote:
> Hello,
>
> Changes since V2:
>
> - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it
> work using policy ->init() and exit() alone.
>
> - Two new cleanup patches 1/4 and 2/4.
>
> - Improved commit log of 3/4.
>
> - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's
> overlap (seen with Vincent's setup).
>

If you happen to have the data around, I would like to know more about
your observations on ThunderX2.

I tried ThunderX2 as well, with the following observations:

Booting with userspace governor and all CPUs online, the CPPC frequency
scale factor was all over the place (even much larger than 1024).

My initial assumptions:
- Counters do not behave properly in light of SMT
- Firmware does not do a good job to keep the reference and core
counters monotonic: save and restore at core off.

So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of
a single core (part of policy0). With this all works very well:

root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed
root@target:/sys/devices/system/cpu/cpufreq/policy0#
[ 1863.095370] CPU96: cppc scale: 697.
[ 1863.175370] CPU0: cppc scale: 492.
[ 1863.215367] CPU64: cppc scale: 492.
[ 1863.235366] CPU96: cppc scale: 492.
[ 1863.485368] CPU32: cppc scale: 492.

root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed
root@target:/sys/devices/system/cpu/cpufreq/policy0#
[ 1891.395363] CPU96: cppc scale: 558.
[ 1891.415362] CPU0: cppc scale: 595.
[ 1891.435362] CPU32: cppc scale: 615.
[ 1891.465363] CPU96: cppc scale: 635.
[ 1891.495361] CPU0: cppc scale: 673.
[ 1891.515360] CPU32: cppc scale: 703.
[ 1891.545360] CPU96: cppc scale: 738.
[ 1891.575360] CPU0: cppc scale: 779.
[ 1891.605360] CPU96: cppc scale: 829.
[ 1891.635360] CPU0: cppc scale: 879.

root@target:/sys/devices/system/cpu/cpufreq/policy0#
root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed
root@target:/sys/devices/system/cpu/cpufreq/policy0#
[ 1896.585363] CPU32: cppc scale: 1004.
[ 1896.675359] CPU64: cppc scale: 973.
[ 1896.715359] CPU0: cppc scale: 1024.

I'm doing a rate limited printk only for increase/decrease values over
64 in the scale factor value.

This showed me that SMT is handled properly.

Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor
stops being even close to correct, for example:

[238394.770328] CPU96: cppc scale: 22328.
[238395.628846] CPU96: cppc scale: 245.
[238516.087115] CPU96: cppc scale: 930.
[238523.385009] CPU96: cppc scale: 245.
[238538.767473] CPU96: cppc scale: 936.
[238538.867546] CPU96: cppc scale: 245.
[238599.367932] CPU97: cppc scale: 2728.
[238599.859865] CPU97: cppc scale: 452.
[238647.786284] CPU96: cppc scale: 1438.
[238669.604684] CPU96: cppc scale: 27306.
[238676.805049] CPU96: cppc scale: 245.
[238737.642902] CPU97: cppc scale: 2035.
[238737.664995] CPU97: cppc scale: 452.
[238788.066193] CPU96: cppc scale: 2749.
[238788.110192] CPU96: cppc scale: 245.
[238817.231659] CPU96: cppc scale: 2698.
[238818.083687] CPU96: cppc scale: 245.
[238845.466850] CPU97: cppc scale: 2990.
[238847.477805] CPU97: cppc scale: 452.
[238936.984107] CPU97: cppc scale: 1590.
[238937.029079] CPU97: cppc scale: 452.
[238979.052464] CPU97: cppc scale: 911.
[238980.900668] CPU97: cppc scale: 452.
[239149.587889] CPU96: cppc scale: 803.
[239151.085516] CPU96: cppc scale: 245.
[239303.871373] CPU64: cppc scale: 956.
[239303.906837] CPU64: cppc scale: 245.
[239308.666786] CPU96: cppc scale: 821.
[239319.440634] CPU96: cppc scale: 245.
[239389.978395] CPU97: cppc scale: 4229.
[239391.969562] CPU97: cppc scale: 452.
[239415.894738] CPU96: cppc scale: 630.
[239417.875326] CPU96: cppc scale: 245.

The counter values shown by feedback_ctrs do not seem monotonic even
when only core 0 threads are online.

ref:2812420736 del:166051103
ref:3683620736 del:641578595
ref:1049653440 del:1548202980
ref:2099053440 del:2120997459
ref:3185853440 del:2714205997
ref:712486144 del:3708490753
ref:3658438336 del:3401357212
ref:1570998080 del:2279728438

For now I was just wondering if you have seen the same and whether you
have an opinion on this.

> This is tested on my Hikey platform (without the actual read/write to
> performance counters), with this script for over an hour:
>
> while true; do
> for i in `seq 1 7`;
> do
> echo 0 > /sys/devices/system/cpu/cpu$i/online;
> done;
>
> for i in `seq 1 7`;
> do
> echo 1 > /sys/devices/system/cpu/cpu$i/online;
> done;
> done
>
>
> The same is done by Vincent on ThunderX2 and no issues were seen.

Hotplug worked fine for me as well on both platforms I tested (Juno R2
and ThunderX2).

Thanks,
Ionela.

2021-06-28 12:17:25

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <[email protected]> wrote:
>
> Hi guys,
>
> On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote:
> > Hello,
> >
> > Changes since V2:
> >
> > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it
> > work using policy ->init() and exit() alone.
> >
> > - Two new cleanup patches 1/4 and 2/4.
> >
> > - Improved commit log of 3/4.
> >
> > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's
> > overlap (seen with Vincent's setup).
> >
>
> If you happen to have the data around, I would like to know more about
> your observations on ThunderX2.
>
>
> I tried ThunderX2 as well, with the following observations:
>
> Booting with userspace governor and all CPUs online, the CPPC frequency
> scale factor was all over the place (even much larger than 1024).
>
> My initial assumptions:
> - Counters do not behave properly in light of SMT
> - Firmware does not do a good job to keep the reference and core
> counters monotonic: save and restore at core off.
>
> So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of
> a single core (part of policy0). With this all works very well:
>
> root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed
> root@target:/sys/devices/system/cpu/cpufreq/policy0#
> [ 1863.095370] CPU96: cppc scale: 697.
> [ 1863.175370] CPU0: cppc scale: 492.
> [ 1863.215367] CPU64: cppc scale: 492.
> [ 1863.235366] CPU96: cppc scale: 492.
> [ 1863.485368] CPU32: cppc scale: 492.
>
> root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed
> root@target:/sys/devices/system/cpu/cpufreq/policy0#
> [ 1891.395363] CPU96: cppc scale: 558.
> [ 1891.415362] CPU0: cppc scale: 595.
> [ 1891.435362] CPU32: cppc scale: 615.
> [ 1891.465363] CPU96: cppc scale: 635.
> [ 1891.495361] CPU0: cppc scale: 673.
> [ 1891.515360] CPU32: cppc scale: 703.
> [ 1891.545360] CPU96: cppc scale: 738.
> [ 1891.575360] CPU0: cppc scale: 779.
> [ 1891.605360] CPU96: cppc scale: 829.
> [ 1891.635360] CPU0: cppc scale: 879.
>
> root@target:/sys/devices/system/cpu/cpufreq/policy0#
> root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed
> root@target:/sys/devices/system/cpu/cpufreq/policy0#
> [ 1896.585363] CPU32: cppc scale: 1004.
> [ 1896.675359] CPU64: cppc scale: 973.
> [ 1896.715359] CPU0: cppc scale: 1024.
>
> I'm doing a rate limited printk only for increase/decrease values over
> 64 in the scale factor value.
>
> This showed me that SMT is handled properly.
>
> Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor
> stops being even close to correct, for example:
>
> [238394.770328] CPU96: cppc scale: 22328.
> [238395.628846] CPU96: cppc scale: 245.
> [238516.087115] CPU96: cppc scale: 930.
> [238523.385009] CPU96: cppc scale: 245.
> [238538.767473] CPU96: cppc scale: 936.
> [238538.867546] CPU96: cppc scale: 245.
> [238599.367932] CPU97: cppc scale: 2728.
> [238599.859865] CPU97: cppc scale: 452.
> [238647.786284] CPU96: cppc scale: 1438.
> [238669.604684] CPU96: cppc scale: 27306.
> [238676.805049] CPU96: cppc scale: 245.
> [238737.642902] CPU97: cppc scale: 2035.
> [238737.664995] CPU97: cppc scale: 452.
> [238788.066193] CPU96: cppc scale: 2749.
> [238788.110192] CPU96: cppc scale: 245.
> [238817.231659] CPU96: cppc scale: 2698.
> [238818.083687] CPU96: cppc scale: 245.
> [238845.466850] CPU97: cppc scale: 2990.
> [238847.477805] CPU97: cppc scale: 452.
> [238936.984107] CPU97: cppc scale: 1590.
> [238937.029079] CPU97: cppc scale: 452.
> [238979.052464] CPU97: cppc scale: 911.
> [238980.900668] CPU97: cppc scale: 452.
> [239149.587889] CPU96: cppc scale: 803.
> [239151.085516] CPU96: cppc scale: 245.
> [239303.871373] CPU64: cppc scale: 956.
> [239303.906837] CPU64: cppc scale: 245.
> [239308.666786] CPU96: cppc scale: 821.
> [239319.440634] CPU96: cppc scale: 245.
> [239389.978395] CPU97: cppc scale: 4229.
> [239391.969562] CPU97: cppc scale: 452.
> [239415.894738] CPU96: cppc scale: 630.
> [239417.875326] CPU96: cppc scale: 245.
>

With the counter being 32bits and the freq scaling being update at
tick, you can easily get a overflow on it in idle system. I can easily
imagine that when you unplug CPUs there is enough activity on the CPU
to update it regularly whereas with all CPUs, the idle time is longer
that the counter overflow

> The counter values shown by feedback_ctrs do not seem monotonic even
> when only core 0 threads are online.
>
> ref:2812420736 del:166051103
> ref:3683620736 del:641578595
> ref:1049653440 del:1548202980
> ref:2099053440 del:2120997459
> ref:3185853440 del:2714205997
> ref:712486144 del:3708490753
> ref:3658438336 del:3401357212
> ref:1570998080 del:2279728438
>
> For now I was just wondering if you have seen the same and whether you
> have an opinion on this.
>
> > This is tested on my Hikey platform (without the actual read/write to
> > performance counters), with this script for over an hour:
> >
> > while true; do
> > for i in `seq 1 7`;
> > do
> > echo 0 > /sys/devices/system/cpu/cpu$i/online;
> > done;
> >
> > for i in `seq 1 7`;
> > do
> > echo 1 > /sys/devices/system/cpu/cpu$i/online;
> > done;
> > done
> >
> >
> > The same is done by Vincent on ThunderX2 and no issues were seen.
>
> Hotplug worked fine for me as well on both platforms I tested (Juno R2
> and ThunderX2).
>
> Thanks,
> Ionela.

2021-06-28 12:20:16

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On Mon, 28 Jun 2021 at 14:14, Vincent Guittot
<[email protected]> wrote:
>
> On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <[email protected]> wrote:
> >
> > Hi guys,
> >
> > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote:
> > > Hello,
> > >
> > > Changes since V2:
> > >
> > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it
> > > work using policy ->init() and exit() alone.
> > >
> > > - Two new cleanup patches 1/4 and 2/4.
> > >
> > > - Improved commit log of 3/4.
> > >
> > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's
> > > overlap (seen with Vincent's setup).
> > >
> >
> > If you happen to have the data around, I would like to know more about
> > your observations on ThunderX2.
> >
> >
> > I tried ThunderX2 as well, with the following observations:
> >
> > Booting with userspace governor and all CPUs online, the CPPC frequency
> > scale factor was all over the place (even much larger than 1024).
> >
> > My initial assumptions:
> > - Counters do not behave properly in light of SMT
> > - Firmware does not do a good job to keep the reference and core
> > counters monotonic: save and restore at core off.
> >
> > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of
> > a single core (part of policy0). With this all works very well:
> >
> > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > [ 1863.095370] CPU96: cppc scale: 697.
> > [ 1863.175370] CPU0: cppc scale: 492.
> > [ 1863.215367] CPU64: cppc scale: 492.
> > [ 1863.235366] CPU96: cppc scale: 492.
> > [ 1863.485368] CPU32: cppc scale: 492.
> >
> > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > [ 1891.395363] CPU96: cppc scale: 558.
> > [ 1891.415362] CPU0: cppc scale: 595.
> > [ 1891.435362] CPU32: cppc scale: 615.
> > [ 1891.465363] CPU96: cppc scale: 635.
> > [ 1891.495361] CPU0: cppc scale: 673.
> > [ 1891.515360] CPU32: cppc scale: 703.
> > [ 1891.545360] CPU96: cppc scale: 738.
> > [ 1891.575360] CPU0: cppc scale: 779.
> > [ 1891.605360] CPU96: cppc scale: 829.
> > [ 1891.635360] CPU0: cppc scale: 879.
> >
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > [ 1896.585363] CPU32: cppc scale: 1004.
> > [ 1896.675359] CPU64: cppc scale: 973.
> > [ 1896.715359] CPU0: cppc scale: 1024.
> >
> > I'm doing a rate limited printk only for increase/decrease values over
> > 64 in the scale factor value.
> >
> > This showed me that SMT is handled properly.
> >
> > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor
> > stops being even close to correct, for example:
> >
> > [238394.770328] CPU96: cppc scale: 22328.
> > [238395.628846] CPU96: cppc scale: 245.
> > [238516.087115] CPU96: cppc scale: 930.
> > [238523.385009] CPU96: cppc scale: 245.
> > [238538.767473] CPU96: cppc scale: 936.
> > [238538.867546] CPU96: cppc scale: 245.
> > [238599.367932] CPU97: cppc scale: 2728.
> > [238599.859865] CPU97: cppc scale: 452.
> > [238647.786284] CPU96: cppc scale: 1438.
> > [238669.604684] CPU96: cppc scale: 27306.
> > [238676.805049] CPU96: cppc scale: 245.
> > [238737.642902] CPU97: cppc scale: 2035.
> > [238737.664995] CPU97: cppc scale: 452.
> > [238788.066193] CPU96: cppc scale: 2749.
> > [238788.110192] CPU96: cppc scale: 245.
> > [238817.231659] CPU96: cppc scale: 2698.
> > [238818.083687] CPU96: cppc scale: 245.
> > [238845.466850] CPU97: cppc scale: 2990.
> > [238847.477805] CPU97: cppc scale: 452.
> > [238936.984107] CPU97: cppc scale: 1590.
> > [238937.029079] CPU97: cppc scale: 452.
> > [238979.052464] CPU97: cppc scale: 911.
> > [238980.900668] CPU97: cppc scale: 452.
> > [239149.587889] CPU96: cppc scale: 803.
> > [239151.085516] CPU96: cppc scale: 245.
> > [239303.871373] CPU64: cppc scale: 956.
> > [239303.906837] CPU64: cppc scale: 245.
> > [239308.666786] CPU96: cppc scale: 821.
> > [239319.440634] CPU96: cppc scale: 245.
> > [239389.978395] CPU97: cppc scale: 4229.
> > [239391.969562] CPU97: cppc scale: 452.
> > [239415.894738] CPU96: cppc scale: 630.
> > [239417.875326] CPU96: cppc scale: 245.
> >
>
> With the counter being 32bits and the freq scaling being update at
> tick, you can easily get a overflow on it in idle system. I can easily
> imagine that when you unplug CPUs there is enough activity on the CPU
> to update it regularly whereas with all CPUs, the idle time is longer
> that the counter overflow
>
> > The counter values shown by feedback_ctrs do not seem monotonic even
> > when only core 0 threads are online.
> >
> > ref:2812420736 del:166051103
> > ref:3683620736 del:641578595
> > ref:1049653440 del:1548202980
> > ref:2099053440 del:2120997459
> > ref:3185853440 del:2714205997
> > ref:712486144 del:3708490753
> > ref:3658438336 del:3401357212
> > ref:1570998080 del:2279728438

There are 32bits and the overflow need to be handled by cppc_cpufreq driver

> >
> > For now I was just wondering if you have seen the same and whether you
> > have an opinion on this.
> >
> > > This is tested on my Hikey platform (without the actual read/write to
> > > performance counters), with this script for over an hour:
> > >
> > > while true; do
> > > for i in `seq 1 7`;
> > > do
> > > echo 0 > /sys/devices/system/cpu/cpu$i/online;
> > > done;
> > >
> > > for i in `seq 1 7`;
> > > do
> > > echo 1 > /sys/devices/system/cpu/cpu$i/online;
> > > done;
> > > done
> > >
> > >
> > > The same is done by Vincent on ThunderX2 and no issues were seen.
> >
> > Hotplug worked fine for me as well on both platforms I tested (Juno R2
> > and ThunderX2).
> >
> > Thanks,
> > Ionela.

2021-06-28 13:09:59

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On Monday 28 Jun 2021 at 14:14:14 (+0200), Vincent Guittot wrote:
> On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <[email protected]> wrote:
> >
> > Hi guys,
> >
> > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote:
> > > Hello,
> > >
> > > Changes since V2:
> > >
> > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it
> > > work using policy ->init() and exit() alone.
> > >
> > > - Two new cleanup patches 1/4 and 2/4.
> > >
> > > - Improved commit log of 3/4.
> > >
> > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's
> > > overlap (seen with Vincent's setup).
> > >
> >
> > If you happen to have the data around, I would like to know more about
> > your observations on ThunderX2.
> >
> >
> > I tried ThunderX2 as well, with the following observations:
> >
> > Booting with userspace governor and all CPUs online, the CPPC frequency
> > scale factor was all over the place (even much larger than 1024).
> >
> > My initial assumptions:
> > - Counters do not behave properly in light of SMT
> > - Firmware does not do a good job to keep the reference and core
> > counters monotonic: save and restore at core off.
> >
> > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of
> > a single core (part of policy0). With this all works very well:
> >
> > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > [ 1863.095370] CPU96: cppc scale: 697.
> > [ 1863.175370] CPU0: cppc scale: 492.
> > [ 1863.215367] CPU64: cppc scale: 492.
> > [ 1863.235366] CPU96: cppc scale: 492.
> > [ 1863.485368] CPU32: cppc scale: 492.
> >
> > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > [ 1891.395363] CPU96: cppc scale: 558.
> > [ 1891.415362] CPU0: cppc scale: 595.
> > [ 1891.435362] CPU32: cppc scale: 615.
> > [ 1891.465363] CPU96: cppc scale: 635.
> > [ 1891.495361] CPU0: cppc scale: 673.
> > [ 1891.515360] CPU32: cppc scale: 703.
> > [ 1891.545360] CPU96: cppc scale: 738.
> > [ 1891.575360] CPU0: cppc scale: 779.
> > [ 1891.605360] CPU96: cppc scale: 829.
> > [ 1891.635360] CPU0: cppc scale: 879.
> >
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > [ 1896.585363] CPU32: cppc scale: 1004.
> > [ 1896.675359] CPU64: cppc scale: 973.
> > [ 1896.715359] CPU0: cppc scale: 1024.
> >
> > I'm doing a rate limited printk only for increase/decrease values over
> > 64 in the scale factor value.
> >
> > This showed me that SMT is handled properly.
> >
> > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor
> > stops being even close to correct, for example:
> >
> > [238394.770328] CPU96: cppc scale: 22328.
> > [238395.628846] CPU96: cppc scale: 245.
> > [238516.087115] CPU96: cppc scale: 930.
> > [238523.385009] CPU96: cppc scale: 245.
> > [238538.767473] CPU96: cppc scale: 936.
> > [238538.867546] CPU96: cppc scale: 245.
> > [238599.367932] CPU97: cppc scale: 2728.
> > [238599.859865] CPU97: cppc scale: 452.
> > [238647.786284] CPU96: cppc scale: 1438.
> > [238669.604684] CPU96: cppc scale: 27306.
> > [238676.805049] CPU96: cppc scale: 245.
> > [238737.642902] CPU97: cppc scale: 2035.
> > [238737.664995] CPU97: cppc scale: 452.
> > [238788.066193] CPU96: cppc scale: 2749.
> > [238788.110192] CPU96: cppc scale: 245.
> > [238817.231659] CPU96: cppc scale: 2698.
> > [238818.083687] CPU96: cppc scale: 245.
> > [238845.466850] CPU97: cppc scale: 2990.
> > [238847.477805] CPU97: cppc scale: 452.
> > [238936.984107] CPU97: cppc scale: 1590.
> > [238937.029079] CPU97: cppc scale: 452.
> > [238979.052464] CPU97: cppc scale: 911.
> > [238980.900668] CPU97: cppc scale: 452.
> > [239149.587889] CPU96: cppc scale: 803.
> > [239151.085516] CPU96: cppc scale: 245.
> > [239303.871373] CPU64: cppc scale: 956.
> > [239303.906837] CPU64: cppc scale: 245.
> > [239308.666786] CPU96: cppc scale: 821.
> > [239319.440634] CPU96: cppc scale: 245.
> > [239389.978395] CPU97: cppc scale: 4229.
> > [239391.969562] CPU97: cppc scale: 452.
> > [239415.894738] CPU96: cppc scale: 630.
> > [239417.875326] CPU96: cppc scale: 245.
> >
>
> With the counter being 32bits and the freq scaling being update at
> tick, you can easily get a overflow on it in idle system. I can easily
> imagine that when you unplug CPUs there is enough activity on the CPU
> to update it regularly whereas with all CPUs, the idle time is longer
> that the counter overflow
>

Thanks! Yes, given the high wraparound time I thought they were 64 bit.
All variables in software are 64 bit, but looking at bit width in the
_CPC entries, the platform counters are 32 bit counters.

> There are 32bits and the overflow need to be handled by cppc_cpufreq
> driver

I'm wondering if this would be best handled in the function that reads
the counters or in the cppc_cpufreq driver that uses them. Probably the
latter, as you say, as the read function should only return the raw
values, but it does complicate things.

Thanks,
Ionela.

> > The counter values shown by feedback_ctrs do not seem monotonic even
> > when only core 0 threads are online.
> >
> > ref:2812420736 del:166051103
> > ref:3683620736 del:641578595
> > ref:1049653440 del:1548202980
> > ref:2099053440 del:2120997459
> > ref:3185853440 del:2714205997
> > ref:712486144 del:3708490753
> > ref:3658438336 del:3401357212
> > ref:1570998080 del:2279728438
> >
> > For now I was just wondering if you have seen the same and whether you
> > have an opinion on this.
> >
> > > This is tested on my Hikey platform (without the actual read/write to
> > > performance counters), with this script for over an hour:
> > >
> > > while true; do
> > > for i in `seq 1 7`;
> > > do
> > > echo 0 > /sys/devices/system/cpu/cpu$i/online;
> > > done;
> > >
> > > for i in `seq 1 7`;
> > > do
> > > echo 1 > /sys/devices/system/cpu/cpu$i/online;
> > > done;
> > > done
> > >
> > >
> > > The same is done by Vincent on ThunderX2 and no issues were seen.
> >
> > Hotplug worked fine for me as well on both platforms I tested (Juno R2
> > and ThunderX2).
> >
> > Thanks,
> > Ionela.

2021-06-29 00:35:00

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On Monday 28 Jun 2021 at 14:08:13 (+0100), Ionela Voinescu wrote:
> On Monday 28 Jun 2021 at 14:14:14 (+0200), Vincent Guittot wrote:
> > On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <[email protected]> wrote:
> > >
> > > Hi guys,
> > >
> > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote:
> > > > Hello,
> > > >
> > > > Changes since V2:
> > > >
> > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it
> > > > work using policy ->init() and exit() alone.
> > > >
> > > > - Two new cleanup patches 1/4 and 2/4.
> > > >
> > > > - Improved commit log of 3/4.
> > > >
> > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's
> > > > overlap (seen with Vincent's setup).
> > > >
> > >
> > > If you happen to have the data around, I would like to know more about
> > > your observations on ThunderX2.
> > >
> > >
> > > I tried ThunderX2 as well, with the following observations:
> > >
> > > Booting with userspace governor and all CPUs online, the CPPC frequency
> > > scale factor was all over the place (even much larger than 1024).
> > >
> > > My initial assumptions:
> > > - Counters do not behave properly in light of SMT
> > > - Firmware does not do a good job to keep the reference and core
> > > counters monotonic: save and restore at core off.
> > >
> > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of
> > > a single core (part of policy0). With this all works very well:
> > >
> > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed
> > > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > > [ 1863.095370] CPU96: cppc scale: 697.
> > > [ 1863.175370] CPU0: cppc scale: 492.
> > > [ 1863.215367] CPU64: cppc scale: 492.
> > > [ 1863.235366] CPU96: cppc scale: 492.
> > > [ 1863.485368] CPU32: cppc scale: 492.
> > >
> > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed
> > > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > > [ 1891.395363] CPU96: cppc scale: 558.
> > > [ 1891.415362] CPU0: cppc scale: 595.
> > > [ 1891.435362] CPU32: cppc scale: 615.
> > > [ 1891.465363] CPU96: cppc scale: 635.
> > > [ 1891.495361] CPU0: cppc scale: 673.
> > > [ 1891.515360] CPU32: cppc scale: 703.
> > > [ 1891.545360] CPU96: cppc scale: 738.
> > > [ 1891.575360] CPU0: cppc scale: 779.
> > > [ 1891.605360] CPU96: cppc scale: 829.
> > > [ 1891.635360] CPU0: cppc scale: 879.
> > >
> > > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed
> > > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > > [ 1896.585363] CPU32: cppc scale: 1004.
> > > [ 1896.675359] CPU64: cppc scale: 973.
> > > [ 1896.715359] CPU0: cppc scale: 1024.
> > >
> > > I'm doing a rate limited printk only for increase/decrease values over
> > > 64 in the scale factor value.
> > >
> > > This showed me that SMT is handled properly.
> > >
> > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor
> > > stops being even close to correct, for example:
> > >
> > > [238394.770328] CPU96: cppc scale: 22328.
> > > [238395.628846] CPU96: cppc scale: 245.
> > > [238516.087115] CPU96: cppc scale: 930.
> > > [238523.385009] CPU96: cppc scale: 245.
> > > [238538.767473] CPU96: cppc scale: 936.
> > > [238538.867546] CPU96: cppc scale: 245.
> > > [238599.367932] CPU97: cppc scale: 2728.
> > > [238599.859865] CPU97: cppc scale: 452.
> > > [238647.786284] CPU96: cppc scale: 1438.
> > > [238669.604684] CPU96: cppc scale: 27306.
> > > [238676.805049] CPU96: cppc scale: 245.
> > > [238737.642902] CPU97: cppc scale: 2035.
> > > [238737.664995] CPU97: cppc scale: 452.
> > > [238788.066193] CPU96: cppc scale: 2749.
> > > [238788.110192] CPU96: cppc scale: 245.
> > > [238817.231659] CPU96: cppc scale: 2698.
> > > [238818.083687] CPU96: cppc scale: 245.
> > > [238845.466850] CPU97: cppc scale: 2990.
> > > [238847.477805] CPU97: cppc scale: 452.
> > > [238936.984107] CPU97: cppc scale: 1590.
> > > [238937.029079] CPU97: cppc scale: 452.
> > > [238979.052464] CPU97: cppc scale: 911.
> > > [238980.900668] CPU97: cppc scale: 452.
> > > [239149.587889] CPU96: cppc scale: 803.
> > > [239151.085516] CPU96: cppc scale: 245.
> > > [239303.871373] CPU64: cppc scale: 956.
> > > [239303.906837] CPU64: cppc scale: 245.
> > > [239308.666786] CPU96: cppc scale: 821.
> > > [239319.440634] CPU96: cppc scale: 245.
> > > [239389.978395] CPU97: cppc scale: 4229.
> > > [239391.969562] CPU97: cppc scale: 452.
> > > [239415.894738] CPU96: cppc scale: 630.
> > > [239417.875326] CPU96: cppc scale: 245.
> > >
> >
> > With the counter being 32bits and the freq scaling being update at
> > tick, you can easily get a overflow on it in idle system. I can easily
> > imagine that when you unplug CPUs there is enough activity on the CPU
> > to update it regularly whereas with all CPUs, the idle time is longer
> > that the counter overflow

For sane counters, how long the CPU is idle should not matter (please
see below my definition of sane counters).

> >
>
> Thanks! Yes, given the high wraparound time I thought they were 64 bit.
> All variables in software are 64 bit, but looking at bit width in the
> _CPC entries, the platform counters are 32 bit counters.
>

I've looked a bit more over the code, and considering this particular
system (32 bit counters, maximum frequency of CPUs = 2.2GHz), I believe
the wraparound is considered, and this should not cause these strange
values in the scale factor.

I consider the counters sane if both stop during idle - either they stop
when CPU is clock gated, or some firmware does save/restore at core off.
Therefore, in all idle cases they seem to have stopped, from the point of
view of the OS. The ACPI spec mentions that both count "any time the
processor is active".

After the cores return from idle, the counters will wraparound at a
minimum of 1.95s. So with a tick every 4ms at most 1 wraparound would
have happened which allows the getDelta() function in cppc_cpufreq
driver to get the proper difference in values.

Let me know what you think.

Thanks,
Ionela.

> > There are 32bits and the overflow need to be handled by cppc_cpufreq
> > driver
>
> I'm wondering if this would be best handled in the function that reads
> the counters or in the cppc_cpufreq driver that uses them. Probably the
> latter, as you say, as the read function should only return the raw
> values, but it does complicate things.
>
> Thanks,
> Ionela.
>
>
>
> > > The counter values shown by feedback_ctrs do not seem monotonic even
> > > when only core 0 threads are online.
> > >
> > > ref:2812420736 del:166051103
> > > ref:3683620736 del:641578595
> > > ref:1049653440 del:1548202980
> > > ref:2099053440 del:2120997459
> > > ref:3185853440 del:2714205997
> > > ref:712486144 del:3708490753
> > > ref:3658438336 del:3401357212
> > > ref:1570998080 del:2279728438
> > >
> > > For now I was just wondering if you have seen the same and whether you
> > > have an opinion on this.
> > >
> > > > This is tested on my Hikey platform (without the actual read/write to
> > > > performance counters), with this script for over an hour:
> > > >
> > > > while true; do
> > > > for i in `seq 1 7`;
> > > > do
> > > > echo 0 > /sys/devices/system/cpu/cpu$i/online;
> > > > done;
> > > >
> > > > for i in `seq 1 7`;
> > > > do
> > > > echo 1 > /sys/devices/system/cpu/cpu$i/online;
> > > > done;
> > > > done
> > > >
> > > >
> > > > The same is done by Vincent on ThunderX2 and no issues were seen.
> > >
> > > Hotplug worked fine for me as well on both platforms I tested (Juno R2
> > > and ThunderX2).
> > >
> > > Thanks,
> > > Ionela.

2021-06-29 04:35:18

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance

On 28-06-21, 11:49, Ionela Voinescu wrote:
> To be honest I would like to have more time on this before you merge the
> set, to better understand Qian's results and some observations I have
> for Thunder X2 (I will share in a bit).

Ideally, this code was already merged in 5.13 and would have required
us to fix any problems as we encounter them. I did revert it because
it caused a kernel crash and I wasn't sure if there was a sane/easy
way of fixing that so late in the release cycle. That was the right
thing to do then.

All those issues are gone now, we may have an issue around rounding of
counters or some hardware specific issues, it isn't clear yet.

But the stuff works fine otherwise, doesn't make the kernel crash and
it is controlled with a CONFIG_ option, so those who don't want to use
it can still disable it.

The merge window is here now, if we don't merge it now, it gets
delayed by a full cycle (roughly two months) and if we merge it now
and are able to narrow down the rounding issues, if there are any, we
will have full two months to make a fix for that and still push it in
5.14 itself.

And so I would like to get it merged in this merge window itself, it
also makes sure more people would get to test it, like Qian was able
to figure out a problem here for us.

> For the code, I think it's fine. I have a single observation regarding
> the following code:
>
> > +static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
> > +{
> > + struct cppc_freq_invariance *cppc_fi;
> > + int cpu, ret;
> > +
> > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> > + return;
> > +
> > + for_each_cpu(cpu, policy->cpus) {
> > + cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> > + cppc_fi->cpu = cpu;
> > + cppc_fi->cpu_data = policy->driver_data;
> > + kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
> > + init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
> > +
> > + ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
> > + if (ret) {
> > + pr_warn("%s: failed to read perf counters for cpu:%d: %d\n",
> > + __func__, cpu, ret);
> > + return;
> > + }
>
> For this condition above, think about a scenario where reading counters
> for offline CPUs returns an error. I'm not sure if that can happen, to
> be honest. That would mean here that you will never initialise the freq
> source unless all CPUs in the policy are online at policy creation.
>
> My recommendation is to warn about the failed read of perf counters but
> only return from this function if the target CPU is online as well when
> reading counters fails.
>
> This is probably a nit, so I'll let you decide if you want to do something
> about this.

That is a very good observation actually. Thanks for that. This is how
I fixed it.

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index d688877e8fbe..f6540068d0fe 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -171,7 +171,13 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
if (ret) {
pr_warn("%s: failed to read perf counters for cpu:%d: %d\n",
__func__, cpu, ret);
- return;
+
+ /*
+ * Don't abort if the CPU was offline while the driver
+ * was getting registered.
+ */
+ if (cpu_online(cpu))
+ return;
}
}

--
viresh

2021-06-29 04:47:13

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 25-06-21, 09:31, Qian Cai wrote:
> The problem is that all CPUs are never scaling down.
> "cpuinfo_cur_freq" and "scaling_cur_freq" are always the 2800 MHz on
> all CPUs on this idle system. This looks like a regression somewhere
> as in 5.4-based kernel, I can see "cpuinfo_cur_freq" can go down to
> 2000 MHz in the same scenario. I'll bisect a bit unless you have
> better ideas?

Few things which may let us understand the readings properly.

- cpuinfo_cur_freq: eventually makes a call to cppc_cpufreq_get_rate()
and returns the *actual* frequency hardware is running at (based on
counter diff around 2 us delay).

- scaling_cur_freq: is the frequency the cpufreq core thinks the
hardware is running at, it would more in sync with what schedutil
(or other governors) wants the CPU to run at. This can be different
from what the hardware is running at, i.e. given by
cpuinfo_cur_freq.

--
viresh

2021-06-29 04:53:58

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 25-06-21, 22:29, Qian Cai wrote:
> Ionela, I found that set ACPI_PROCESSOR=y instead of
> ACPI_PROCESSOR=m will fix the previous mentioned issues here (any
> explanations of that?) even though the scaling down is not perfect.

Not sure how this affects it.

> Now, we have the following on this idle system:
>
> # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
> 79 1000000
> 1 1160000
> 73 1400000
> 1 2000000
> 4 2010000
> 1 2800000
> 1 860000
>
> Even if I rerun a few times, there could still have a few CPUs
> running lower than lowest_perf (1GHz).

(Please wrap your lines at 80 columns, it makes it harder to read
otherwise).

I think only the counters stopping on idle can get us that.

> Also, even though I set all CPUs to use "userspace" governor and set
> freq to the lowest. A few CPUs keep changing at will.
>
> # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
> 156 1000000
> 3 2000000
> 1 760000

I think this is expected since the hardware is in control of frequency
here. The software can only request it to run at X frequency, the
hardware may choose to do something else nevertheless.

--
viresh

2021-06-29 04:59:08

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 26-06-21, 09:41, Qian Cai wrote:
> Another date point is that set ACPI_CPPC_CPUFREQ_FIE=n fixed the issue that any CPU could run below the lowest freq.
>
> schedutil:
> # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
> 80 1000000
> 78 1400000
> 1 2010000
> 1 2800000
>
> userspace:
> # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
> 158 1000000
> 2 2000000

ACPI_CPPC_CPUFREQ_FIE can play a role with schedutil, but not with
userspace governor. Userspace doesn't use the values being updated by
ACPI_CPPC_CPUFREQ_FIE. So I think the CPUs may not have been idle,
just for some reason.

Also, now that you are able run on latest kernel (linux-next), it
would be better if we can talk in terms of that only going forward.
5.4 adds more to the already unstable results :)

--
viresh

2021-06-29 05:22:35

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 28-06-21, 12:54, Ionela Voinescu wrote:
> If you happen to have the data around, I would like to know more about
> your observations on ThunderX2.
>
>
> I tried ThunderX2 as well, with the following observations:
>
> Booting with userspace governor and all CPUs online, the CPPC frequency
> scale factor was all over the place (even much larger than 1024).
>
> My initial assumptions:
> - Counters do not behave properly in light of SMT
> - Firmware does not do a good job to keep the reference and core
> counters monotonic: save and restore at core off.
>
> So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of
> a single core (part of policy0). With this all works very well:

Interesting.

> root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed
> root@target:/sys/devices/system/cpu/cpufreq/policy0#
> [ 1863.095370] CPU96: cppc scale: 697.
> [ 1863.175370] CPU0: cppc scale: 492.
> [ 1863.215367] CPU64: cppc scale: 492.
> [ 1863.235366] CPU96: cppc scale: 492.
> [ 1863.485368] CPU32: cppc scale: 492.
>
> root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed
> root@target:/sys/devices/system/cpu/cpufreq/policy0#
> [ 1891.395363] CPU96: cppc scale: 558.
> [ 1891.415362] CPU0: cppc scale: 595.
> [ 1891.435362] CPU32: cppc scale: 615.
> [ 1891.465363] CPU96: cppc scale: 635.
> [ 1891.495361] CPU0: cppc scale: 673.
> [ 1891.515360] CPU32: cppc scale: 703.
> [ 1891.545360] CPU96: cppc scale: 738.
> [ 1891.575360] CPU0: cppc scale: 779.
> [ 1891.605360] CPU96: cppc scale: 829.
> [ 1891.635360] CPU0: cppc scale: 879.
>
> root@target:/sys/devices/system/cpu/cpufreq/policy0#
> root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed
> root@target:/sys/devices/system/cpu/cpufreq/policy0#
> [ 1896.585363] CPU32: cppc scale: 1004.
> [ 1896.675359] CPU64: cppc scale: 973.
> [ 1896.715359] CPU0: cppc scale: 1024.
>
> I'm doing a rate limited printk only for increase/decrease values over
> 64 in the scale factor value.
>
> This showed me that SMT is handled properly.
>
> Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor
> stops being even close to correct, for example:
>
> [238394.770328] CPU96: cppc scale: 22328.
> [238395.628846] CPU96: cppc scale: 245.
> [238516.087115] CPU96: cppc scale: 930.
> [238523.385009] CPU96: cppc scale: 245.
> [238538.767473] CPU96: cppc scale: 936.
> [238538.867546] CPU96: cppc scale: 245.
> [238599.367932] CPU97: cppc scale: 2728.
> [238599.859865] CPU97: cppc scale: 452.
> [238647.786284] CPU96: cppc scale: 1438.
> [238669.604684] CPU96: cppc scale: 27306.
> [238676.805049] CPU96: cppc scale: 245.
> [238737.642902] CPU97: cppc scale: 2035.
> [238737.664995] CPU97: cppc scale: 452.
> [238788.066193] CPU96: cppc scale: 2749.
> [238788.110192] CPU96: cppc scale: 245.
> [238817.231659] CPU96: cppc scale: 2698.
> [238818.083687] CPU96: cppc scale: 245.
> [238845.466850] CPU97: cppc scale: 2990.
> [238847.477805] CPU97: cppc scale: 452.
> [238936.984107] CPU97: cppc scale: 1590.
> [238937.029079] CPU97: cppc scale: 452.
> [238979.052464] CPU97: cppc scale: 911.
> [238980.900668] CPU97: cppc scale: 452.
> [239149.587889] CPU96: cppc scale: 803.
> [239151.085516] CPU96: cppc scale: 245.
> [239303.871373] CPU64: cppc scale: 956.
> [239303.906837] CPU64: cppc scale: 245.
> [239308.666786] CPU96: cppc scale: 821.
> [239319.440634] CPU96: cppc scale: 245.
> [239389.978395] CPU97: cppc scale: 4229.
> [239391.969562] CPU97: cppc scale: 452.
> [239415.894738] CPU96: cppc scale: 630.
> [239417.875326] CPU96: cppc scale: 245.
>
> The counter values shown by feedback_ctrs do not seem monotonic even
> when only core 0 threads are online.
>
> ref:2812420736 del:166051103
> ref:3683620736 del:641578595
> ref:1049653440 del:1548202980
> ref:2099053440 del:2120997459
> ref:3185853440 del:2714205997
> ref:712486144 del:3708490753
> ref:3658438336 del:3401357212
> ref:1570998080 del:2279728438
>
> For now I was just wondering if you have seen the same and whether you
> have an opinion on this.

I think we also saw numbers like this, which didn't explain a lot on
ThunderX2. We thought they may be due to rounding issues, but the
offlining stuff adds an interesting factor to that.

--
viresh

2021-06-29 08:46:14

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On Mon, 28 Jun 2021 at 23:37, Ionela Voinescu <[email protected]> wrote:
>
> On Monday 28 Jun 2021 at 14:08:13 (+0100), Ionela Voinescu wrote:
> > On Monday 28 Jun 2021 at 14:14:14 (+0200), Vincent Guittot wrote:
> > > On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <[email protected]> wrote:
> > > >
> > > > Hi guys,
> > > >
> > > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote:
> > > > > Hello,
> > > > >
> > > > > Changes since V2:
> > > > >
> > > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it
> > > > > work using policy ->init() and exit() alone.
> > > > >
> > > > > - Two new cleanup patches 1/4 and 2/4.
> > > > >
> > > > > - Improved commit log of 3/4.
> > > > >
> > > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's
> > > > > overlap (seen with Vincent's setup).
> > > > >
> > > >
> > > > If you happen to have the data around, I would like to know more about
> > > > your observations on ThunderX2.
> > > >
> > > >
> > > > I tried ThunderX2 as well, with the following observations:
> > > >
> > > > Booting with userspace governor and all CPUs online, the CPPC frequency
> > > > scale factor was all over the place (even much larger than 1024).
> > > >
> > > > My initial assumptions:
> > > > - Counters do not behave properly in light of SMT
> > > > - Firmware does not do a good job to keep the reference and core
> > > > counters monotonic: save and restore at core off.
> > > >
> > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of
> > > > a single core (part of policy0). With this all works very well:
> > > >
> > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed
> > > > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > > > [ 1863.095370] CPU96: cppc scale: 697.
> > > > [ 1863.175370] CPU0: cppc scale: 492.
> > > > [ 1863.215367] CPU64: cppc scale: 492.
> > > > [ 1863.235366] CPU96: cppc scale: 492.
> > > > [ 1863.485368] CPU32: cppc scale: 492.
> > > >
> > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed
> > > > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > > > [ 1891.395363] CPU96: cppc scale: 558.
> > > > [ 1891.415362] CPU0: cppc scale: 595.
> > > > [ 1891.435362] CPU32: cppc scale: 615.
> > > > [ 1891.465363] CPU96: cppc scale: 635.
> > > > [ 1891.495361] CPU0: cppc scale: 673.
> > > > [ 1891.515360] CPU32: cppc scale: 703.
> > > > [ 1891.545360] CPU96: cppc scale: 738.
> > > > [ 1891.575360] CPU0: cppc scale: 779.
> > > > [ 1891.605360] CPU96: cppc scale: 829.
> > > > [ 1891.635360] CPU0: cppc scale: 879.
> > > >
> > > > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed
> > > > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > > > [ 1896.585363] CPU32: cppc scale: 1004.
> > > > [ 1896.675359] CPU64: cppc scale: 973.
> > > > [ 1896.715359] CPU0: cppc scale: 1024.
> > > >
> > > > I'm doing a rate limited printk only for increase/decrease values over
> > > > 64 in the scale factor value.
> > > >
> > > > This showed me that SMT is handled properly.
> > > >
> > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor
> > > > stops being even close to correct, for example:
> > > >
> > > > [238394.770328] CPU96: cppc scale: 22328.
> > > > [238395.628846] CPU96: cppc scale: 245.
> > > > [238516.087115] CPU96: cppc scale: 930.
> > > > [238523.385009] CPU96: cppc scale: 245.
> > > > [238538.767473] CPU96: cppc scale: 936.
> > > > [238538.867546] CPU96: cppc scale: 245.
> > > > [238599.367932] CPU97: cppc scale: 2728.
> > > > [238599.859865] CPU97: cppc scale: 452.
> > > > [238647.786284] CPU96: cppc scale: 1438.
> > > > [238669.604684] CPU96: cppc scale: 27306.
> > > > [238676.805049] CPU96: cppc scale: 245.
> > > > [238737.642902] CPU97: cppc scale: 2035.
> > > > [238737.664995] CPU97: cppc scale: 452.
> > > > [238788.066193] CPU96: cppc scale: 2749.
> > > > [238788.110192] CPU96: cppc scale: 245.
> > > > [238817.231659] CPU96: cppc scale: 2698.
> > > > [238818.083687] CPU96: cppc scale: 245.
> > > > [238845.466850] CPU97: cppc scale: 2990.
> > > > [238847.477805] CPU97: cppc scale: 452.
> > > > [238936.984107] CPU97: cppc scale: 1590.
> > > > [238937.029079] CPU97: cppc scale: 452.
> > > > [238979.052464] CPU97: cppc scale: 911.
> > > > [238980.900668] CPU97: cppc scale: 452.
> > > > [239149.587889] CPU96: cppc scale: 803.
> > > > [239151.085516] CPU96: cppc scale: 245.
> > > > [239303.871373] CPU64: cppc scale: 956.
> > > > [239303.906837] CPU64: cppc scale: 245.
> > > > [239308.666786] CPU96: cppc scale: 821.
> > > > [239319.440634] CPU96: cppc scale: 245.
> > > > [239389.978395] CPU97: cppc scale: 4229.
> > > > [239391.969562] CPU97: cppc scale: 452.
> > > > [239415.894738] CPU96: cppc scale: 630.
> > > > [239417.875326] CPU96: cppc scale: 245.
> > > >
> > >
> > > With the counter being 32bits and the freq scaling being update at
> > > tick, you can easily get a overflow on it in idle system. I can easily
> > > imagine that when you unplug CPUs there is enough activity on the CPU
> > > to update it regularly whereas with all CPUs, the idle time is longer
> > > that the counter overflow
>
> For sane counters, how long the CPU is idle should not matter (please
> see below my definition of sane counters).
>
> > >
> >
> > Thanks! Yes, given the high wraparound time I thought they were 64 bit.
> > All variables in software are 64 bit, but looking at bit width in the
> > _CPC entries, the platform counters are 32 bit counters.
> >
>
> I've looked a bit more over the code, and considering this particular
> system (32 bit counters, maximum frequency of CPUs = 2.2GHz), I believe
> the wraparound is considered, and this should not cause these strange
> values in the scale factor.
>
> I consider the counters sane if both stop during idle - either they stop
> when CPU is clock gated, or some firmware does save/restore at core off.
> Therefore, in all idle cases they seem to have stopped, from the point of
> view of the OS. The ACPI spec mentions that both count "any time the
> processor is active".
>
> After the cores return from idle, the counters will wraparound at a
> minimum of 1.95s. So with a tick every 4ms at most 1 wraparound would
> have happened which allows the getDelta() function in cppc_cpufreq
> driver to get the proper difference in values.
>
> Let me know what you think.

This is not what I can see on my THX2. Counter are always increasing
at the same pace even if the system is idle (nothing running except
background activity)

As long as you read counter often enough, the results is coherent with
the freq that has been set by the governor

>
> Thanks,
> Ionela.
>
> > > There are 32bits and the overflow need to be handled by cppc_cpufreq
> > > driver
> >
> > I'm wondering if this would be best handled in the function that reads
> > the counters or in the cppc_cpufreq driver that uses them. Probably the
> > latter, as you say, as the read function should only return the raw
> > values, but it does complicate things.
> >
> > Thanks,
> > Ionela.
> >
> >
> >
> > > > The counter values shown by feedback_ctrs do not seem monotonic even
> > > > when only core 0 threads are online.
> > > >
> > > > ref:2812420736 del:166051103
> > > > ref:3683620736 del:641578595
> > > > ref:1049653440 del:1548202980
> > > > ref:2099053440 del:2120997459
> > > > ref:3185853440 del:2714205997
> > > > ref:712486144 del:3708490753
> > > > ref:3658438336 del:3401357212
> > > > ref:1570998080 del:2279728438
> > > >
> > > > For now I was just wondering if you have seen the same and whether you
> > > > have an opinion on this.
> > > >
> > > > > This is tested on my Hikey platform (without the actual read/write to
> > > > > performance counters), with this script for over an hour:
> > > > >
> > > > > while true; do
> > > > > for i in `seq 1 7`;
> > > > > do
> > > > > echo 0 > /sys/devices/system/cpu/cpu$i/online;
> > > > > done;
> > > > >
> > > > > for i in `seq 1 7`;
> > > > > do
> > > > > echo 1 > /sys/devices/system/cpu/cpu$i/online;
> > > > > done;
> > > > > done
> > > > >
> > > > >
> > > > > The same is done by Vincent on ThunderX2 and no issues were seen.
> > > >
> > > > Hotplug worked fine for me as well on both platforms I tested (Juno R2
> > > > and ThunderX2).
> > > >
> > > > Thanks,
> > > > Ionela.

2021-06-29 08:48:45

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

Hi,

On Tuesday 29 Jun 2021 at 10:50:28 (+0530), Viresh Kumar wrote:
> On 28-06-21, 12:54, Ionela Voinescu wrote:
> > If you happen to have the data around, I would like to know more about
> > your observations on ThunderX2.
> >
> >
> > I tried ThunderX2 as well, with the following observations:
> >
> > Booting with userspace governor and all CPUs online, the CPPC frequency
> > scale factor was all over the place (even much larger than 1024).
> >
> > My initial assumptions:
> > - Counters do not behave properly in light of SMT
> > - Firmware does not do a good job to keep the reference and core
> > counters monotonic: save and restore at core off.
> >
> > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of
> > a single core (part of policy0). With this all works very well:
>
> Interesting.
>
> > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > [ 1863.095370] CPU96: cppc scale: 697.
> > [ 1863.175370] CPU0: cppc scale: 492.
> > [ 1863.215367] CPU64: cppc scale: 492.
> > [ 1863.235366] CPU96: cppc scale: 492.
> > [ 1863.485368] CPU32: cppc scale: 492.
> >
> > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > [ 1891.395363] CPU96: cppc scale: 558.
> > [ 1891.415362] CPU0: cppc scale: 595.
> > [ 1891.435362] CPU32: cppc scale: 615.
> > [ 1891.465363] CPU96: cppc scale: 635.
> > [ 1891.495361] CPU0: cppc scale: 673.
> > [ 1891.515360] CPU32: cppc scale: 703.
> > [ 1891.545360] CPU96: cppc scale: 738.
> > [ 1891.575360] CPU0: cppc scale: 779.
> > [ 1891.605360] CPU96: cppc scale: 829.
> > [ 1891.635360] CPU0: cppc scale: 879.
> >
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed
> > root@target:/sys/devices/system/cpu/cpufreq/policy0#
> > [ 1896.585363] CPU32: cppc scale: 1004.
> > [ 1896.675359] CPU64: cppc scale: 973.
> > [ 1896.715359] CPU0: cppc scale: 1024.
> >
> > I'm doing a rate limited printk only for increase/decrease values over
> > 64 in the scale factor value.
> >
> > This showed me that SMT is handled properly.
> >
> > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor
> > stops being even close to correct, for example:
> >
> > [238394.770328] CPU96: cppc scale: 22328.
> > [238395.628846] CPU96: cppc scale: 245.
> > [238516.087115] CPU96: cppc scale: 930.
> > [238523.385009] CPU96: cppc scale: 245.
> > [238538.767473] CPU96: cppc scale: 936.
> > [238538.867546] CPU96: cppc scale: 245.
> > [238599.367932] CPU97: cppc scale: 2728.
> > [238599.859865] CPU97: cppc scale: 452.
> > [238647.786284] CPU96: cppc scale: 1438.
> > [238669.604684] CPU96: cppc scale: 27306.
> > [238676.805049] CPU96: cppc scale: 245.
> > [238737.642902] CPU97: cppc scale: 2035.
> > [238737.664995] CPU97: cppc scale: 452.
> > [238788.066193] CPU96: cppc scale: 2749.
> > [238788.110192] CPU96: cppc scale: 245.
> > [238817.231659] CPU96: cppc scale: 2698.
> > [238818.083687] CPU96: cppc scale: 245.
> > [238845.466850] CPU97: cppc scale: 2990.
> > [238847.477805] CPU97: cppc scale: 452.
> > [238936.984107] CPU97: cppc scale: 1590.
> > [238937.029079] CPU97: cppc scale: 452.
> > [238979.052464] CPU97: cppc scale: 911.
> > [238980.900668] CPU97: cppc scale: 452.
> > [239149.587889] CPU96: cppc scale: 803.
> > [239151.085516] CPU96: cppc scale: 245.
> > [239303.871373] CPU64: cppc scale: 956.
> > [239303.906837] CPU64: cppc scale: 245.
> > [239308.666786] CPU96: cppc scale: 821.
> > [239319.440634] CPU96: cppc scale: 245.
> > [239389.978395] CPU97: cppc scale: 4229.
> > [239391.969562] CPU97: cppc scale: 452.
> > [239415.894738] CPU96: cppc scale: 630.
> > [239417.875326] CPU96: cppc scale: 245.
> >
> > The counter values shown by feedback_ctrs do not seem monotonic even
> > when only core 0 threads are online.
> >
> > ref:2812420736 del:166051103
> > ref:3683620736 del:641578595
> > ref:1049653440 del:1548202980
> > ref:2099053440 del:2120997459
> > ref:3185853440 del:2714205997
> > ref:712486144 del:3708490753
> > ref:3658438336 del:3401357212
> > ref:1570998080 del:2279728438
> >
> > For now I was just wondering if you have seen the same and whether you
> > have an opinion on this.
>
> I think we also saw numbers like this, which didn't explain a lot on
> ThunderX2. We thought they may be due to rounding issues, but the
> offlining stuff adds an interesting factor to that.
>

More testing last night showed that having 1 core (with all 4 threads)
online per socket works fine, while having 2 cores online per socket
gives these bad values. My assumption is that the counters reset to 0
at core off which introduces the behavior. When there is a single core
per socket, this single core cannot be turned off.

I tried to boot with less CPUs and cpuidle.off=1 but it did not make a
difference. I expect much of the idle control to be in
hardware/firmware, so possibly cores were still turned off.

I'll do more research on idle state management for SMT and debugging
to see if it explains more, but it will take longer as I've ignored a
few other responsibilities in the meantime.

Thanks,
Ionela.

> --
> viresh

2021-06-29 08:49:20

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance

Hey,

On Tuesday 29 Jun 2021 at 10:02:44 (+0530), Viresh Kumar wrote:
> On 28-06-21, 11:49, Ionela Voinescu wrote:
> > To be honest I would like to have more time on this before you merge the
> > set, to better understand Qian's results and some observations I have
> > for Thunder X2 (I will share in a bit).
>
> Ideally, this code was already merged in 5.13 and would have required
> us to fix any problems as we encounter them. I did revert it because
> it caused a kernel crash and I wasn't sure if there was a sane/easy
> way of fixing that so late in the release cycle. That was the right
> thing to do then.
>
> All those issues are gone now, we may have an issue around rounding of
> counters or some hardware specific issues, it isn't clear yet.
>
> But the stuff works fine otherwise, doesn't make the kernel crash and
> it is controlled with a CONFIG_ option, so those who don't want to use
> it can still disable it.
>
> The merge window is here now, if we don't merge it now, it gets
> delayed by a full cycle (roughly two months) and if we merge it now
> and are able to narrow down the rounding issues, if there are any, we
> will have full two months to make a fix for that and still push it in
> 5.14 itself.
>
> And so I would like to get it merged in this merge window itself, it
> also makes sure more people would get to test it, like Qian was able
> to figure out a problem here for us.
>

Okay, makes sense. I have not seen this code actually do anything wrong
so far, and the issues I see on ThunderX2 point more to misbehaving
counters for this purpose. This being said, I would have probably
preferred for this feature to be disabled by default, until we've tested
more, but that won't give the chance to anyone else to test.

> > For the code, I think it's fine. I have a single observation regarding
> > the following code:
> >
> > > +static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
> > > +{
> > > + struct cppc_freq_invariance *cppc_fi;
> > > + int cpu, ret;
> > > +
> > > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate)
> > > + return;
> > > +
> > > + for_each_cpu(cpu, policy->cpus) {
> > > + cppc_fi = &per_cpu(cppc_freq_inv, cpu);
> > > + cppc_fi->cpu = cpu;
> > > + cppc_fi->cpu_data = policy->driver_data;
> > > + kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
> > > + init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
> > > +
> > > + ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
> > > + if (ret) {
> > > + pr_warn("%s: failed to read perf counters for cpu:%d: %d\n",
> > > + __func__, cpu, ret);
> > > + return;
> > > + }
> >
> > For this condition above, think about a scenario where reading counters
> > for offline CPUs returns an error. I'm not sure if that can happen, to
> > be honest. That would mean here that you will never initialise the freq
> > source unless all CPUs in the policy are online at policy creation.
> >
> > My recommendation is to warn about the failed read of perf counters but
> > only return from this function if the target CPU is online as well when
> > reading counters fails.
> >
> > This is probably a nit, so I'll let you decide if you want to do something
> > about this.
>
> That is a very good observation actually. Thanks for that. This is how
> I fixed it.
>
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index d688877e8fbe..f6540068d0fe 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -171,7 +171,13 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
> if (ret) {
> pr_warn("%s: failed to read perf counters for cpu:%d: %d\n",
> __func__, cpu, ret);
> - return;
> +
> + /*
> + * Don't abort if the CPU was offline while the driver
> + * was getting registered.
> + */
> + if (cpu_online(cpu))
> + return;
> }
> }
>
> --

Thanks!

Reviewed-by: Ionela Voinescu <[email protected]>

Ionela.

> viresh

2021-06-29 08:55:56

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance

On 29-06-21, 09:47, Ionela Voinescu wrote:
> Okay, makes sense. I have not seen this code actually do anything wrong
> so far, and the issues I see on ThunderX2 point more to misbehaving
> counters for this purpose. This being said, I would have probably
> preferred for this feature to be disabled by default, until we've tested
> more, but that won't give the chance to anyone else to test.
>
> Thanks!
>
> Reviewed-by: Ionela Voinescu <[email protected]>

Thanks for understanding Ionela.

--
viresh

2021-06-29 09:08:04

by Ionela Voinescu

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

Hi Qian,

Sorry for the delay. I was trying to run tests on ThunderX2 as well, as
it can give me more control over testing, to get more insight on this.

On Friday 25 Jun 2021 at 22:29:26 (-0400), Qian Cai wrote:
>
>
> On 6/25/2021 10:37 AM, Ionela Voinescu wrote:
> > Quick questions for you:
> >
> > 1. When you say you tried a 5.4 kernel, did you try it with these
> > patches backported? They also have some dependencies with the recent
> > changes in the arch topology driver and cpufreq so they would not be
> > straight forward to backport.
> >
> > If the 5.4 kernel you tried did not have these patches, it might be best
> > to try next/master that has these patches, but with
> > CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that
> > an incorrect frequency scale factor here would affect utilization that
> > would then affect the schedutil frequency selection. I would not expect
> > this behavior even if the scale factor was wrong, but it would be good
> > to rule out.
> >
> > 2. Is your platform booting with all CPUs? Are any hotplug operations
> > done in your scenario?
>
> Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m
> will fix the previous mentioned issues here (any explanations of that?)
> even though the scaling down is not perfect. Now, we have the following
> on this idle system:
>

I don't see how this would have played a role. The cppc cpufreq driver
depends on functionality gated by CONFIG_ACPI_CPPC_LIB which in turn
needs CONFIG_ACPI_PROCESSOR to trigger the parsing of the _CPC objects.
If CONFIG_ACPI_PROCESSOR functionality is not built in or loaded, the
cppc cpufreq driver could not be used at all.

> # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
> 79 1000000
> 1 1160000
> 73 1400000
> 1 2000000
> 4 2010000
> 1 2800000
> 1 860000
>
> Even if I rerun a few times, there could still have a few CPUs running
> lower than lowest_perf (1GHz). Also, even though I set all CPUs to use
> "userspace" governor and set freq to the lowest. A few CPUs keep changing
> at will.
>

I do not believe getting values lower than lowest is worrying as long as
they are not much much lower and they don't happen very often. First of
all firmware has control over CPU frequencies and it can decide to
select a lower frequency if it wishes.

Looking at the fact that it only happens rarely in your tests, it's also
possible that this is introduced by the fact that the CPU only spends
only a few cycles in active state. That would reduce the resolution of
the computed current performance level, which results in a lower value
when converted to frequency.

Hope it helps,
Ionela.

> # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c
> 156 1000000
> 3 2000000
> 1 760000

2021-06-29 15:39:51

[permalink] [raw]

Subject: Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance

On 6/29/2021 5:06 AM, Ionela Voinescu wrote:
>> Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m
>> will fix the previous mentioned issues here (any explanations of that?)
>> even though the scaling down is not perfect. Now, we have the following
>> on this idle system:
>>
>
> I don't see how this would have played a role. The cppc cpufreq driver
> depends on functionality gated by CONFIG_ACPI_CPPC_LIB which in turn
> needs CONFIG_ACPI_PROCESSOR to trigger the parsing of the _CPC objects.
> If CONFIG_ACPI_PROCESSOR functionality is not built in or loaded, the
> cppc cpufreq driver could not be used at all.

Ionela, that is fine. I can live with setting ACPI_PROCESSOR=y to workaround. I
was more of just curious about why manually loading processor.ko would set the
lowest to 2GHz instead of 1GHz.

Anyway, if running below the lowest is not the a concern, then I have concluded
my testing here. Feel free to include:

Tested-by: Qian Cai <[email protected]>