2024-04-17 09:39:57

by Beata Michalska

[permalink] [raw]
Subject: [PATCH v5 0/5] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu

Introducing arm64 specific version of arch_freq_get_on_cpu, cashing on
existing implementation for FIE and AMUv1 support: the frequency scale
factor, updated on each sched tick, serves as a base for retrieving
the frequency for a given CPU, representing an average frequency
reported between the ticks - thus its accuracy is limited.

The changes have been rather lightly (due to some limitations) tested on
an FVP model. Note that some small discrepancies have been observed while
testing (on the model) and this is currently being investigated, though it
should not have any significant impact on the overall results.

Relevant discussions:
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/7eozim2xnepacnnkzxlbx34hib4otycnbn4dqymfziqou5lw5u@5xzpv3t7sxo3/
[3] https://lore.kernel.org/all/[email protected]/
[4] https://lore.kernel.org/lkml/[email protected]/T/#m4e74cb5a0aaa353c60fedc6cfb95ab7a6e381e3c

v5:
- Fix invalid access to cpumask
- Reworked finding reference cpu when getting the freq

v4:
- dropping seqcount
- fixing identifying active cpu within given policy
- skipping full dynticks cpus when retrieving the freq
- bringing back plugging in arch_freq_get_on_cpu into cpuinfo_cur_freq

v3:
- dropping changes to cpufreq_verify_current_freq
- pulling in changes from Ionela initializing capacity_freq_ref to 0
(thanks for that!) and applying suggestions made by her during last review:
- switching to arch_scale_freq_capacity and arch_scale_freq_ref when
reversing freq scale factor computation
- swapping shift with multiplication
- adding time limit for considering last scale update as valid
- updating frequency scale factor upon entering idle

v2:
- Splitting the patches
- Adding comment for full dyntick mode
- Plugging arch_freq_get_on_cpu into cpufreq_verify_current_freq instead
of in show_cpuinfo_cur_freq to allow the framework to stay more in sync
with potential freq changes


Beata Michalska (4):
arm64: amu: Rule out potential usa after free
arm64: Provide an AMU-based version of arch_freq_get_on_cpu
arm64: Update AMU-based frequency scale factor on entering idle
cpufreq: Use arch specific feedback for cpuinfo_cur_freq

Ionela Voinescu (1):
arch_topology: init capacity_freq_ref to 0

arch/arm64/kernel/topology.c | 129 ++++++++++++++++++++++++++++++++---
drivers/base/arch_topology.c | 8 ++-
drivers/cpufreq/cpufreq.c | 4 +-
3 files changed, 126 insertions(+), 15 deletions(-)

--
2.25.1



2024-04-17 09:40:10

by Beata Michalska

[permalink] [raw]
Subject: [PATCH v5 1/5] arch_topology: init capacity_freq_ref to 0

From: Ionela Voinescu <[email protected]>

It's useful to have capacity_freq_ref initialized to 0 for users of
arch_scale_freq_ref() to detect when capacity_freq_ref was not
yet set.

The only scenario affected by this change in the init value is when a
cpufreq driver is never loaded. As a result, the only setter of a
cpu scale factor remains the call of topology_normalize_cpu_scale()
from parse_dt_topology(). There we cannot use the value 0 of
capacity_freq_ref so we have to compensate for its uninitialized state.

Signed-off-by: Ionela Voinescu <[email protected]>
Signed-off-by: Beata Michalska <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
drivers/base/arch_topology.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 024b78a0cfc1..7d4c92cd2bad 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -27,7 +27,7 @@
static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
static struct cpumask scale_freq_counters_mask;
static bool scale_freq_invariant;
-DEFINE_PER_CPU(unsigned long, capacity_freq_ref) = 1;
+DEFINE_PER_CPU(unsigned long, capacity_freq_ref) = 0;
EXPORT_PER_CPU_SYMBOL_GPL(capacity_freq_ref);

static bool supports_scale_freq_counters(const struct cpumask *cpus)
@@ -292,13 +292,15 @@ void topology_normalize_cpu_scale(void)

capacity_scale = 1;
for_each_possible_cpu(cpu) {
- capacity = raw_capacity[cpu] * per_cpu(capacity_freq_ref, cpu);
+ capacity = raw_capacity[cpu] *
+ (per_cpu(capacity_freq_ref, cpu) ?: 1);
capacity_scale = max(capacity, capacity_scale);
}

pr_debug("cpu_capacity: capacity_scale=%llu\n", capacity_scale);
for_each_possible_cpu(cpu) {
- capacity = raw_capacity[cpu] * per_cpu(capacity_freq_ref, cpu);
+ capacity = raw_capacity[cpu] *
+ (per_cpu(capacity_freq_ref, cpu) ?: 1);
capacity = div64_u64(capacity << SCHED_CAPACITY_SHIFT,
capacity_scale);
topology_set_cpu_scale(cpu, capacity);
--
2.25.1


2024-04-17 09:40:15

by Beata Michalska

[permalink] [raw]
Subject: [PATCH v5 2/5] arm64: amu: Rule out potential use after free

For the time being, the amu_fie_cpus cpumask is being exclusively used
by the AMU-related internals of FIE support and is guaranteed to be
valid on every access currently made. Still the mask is not being
invalidated on one of the error handling code paths, which leaves
a soft spot with potential risk of uaf for CPUMASK_OFFSTACK cases.
To make things sound, set the cpumaks pointer explicitly to NULL upon
failing to register the cpufreq notifier.
Note that, due to the quirks of CPUMASK_OFFSTACK, this change needs to
be wrapped with grim ifdefing (it would be better served by
incorporating this into free_cpumask_var ...)

Signed-off-by: Beata Michalska <[email protected]>
---
arch/arm64/kernel/topology.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1a2c72f3e7f8..3c814a278534 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -244,8 +244,12 @@ static int __init init_amu_fie(void)

ret = cpufreq_register_notifier(&init_amu_fie_notifier,
CPUFREQ_POLICY_NOTIFIER);
- if (ret)
+ if (ret) {
free_cpumask_var(amu_fie_cpus);
+#ifdef CONFIG_CPUMASK_OFFSTACK
+ amu_fie_cpus = NULL;
+#endif
+ }

return ret;
}
--
2.25.1


2024-04-17 09:40:30

by Beata Michalska

[permalink] [raw]
Subject: [PATCH v5 3/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu

With the Frequency Invariance Engine (FIE) being already wired up with
sched tick and making use of relevant (core counter and constant
counter) AMU counters, getting the current frequency for a given CPU,
can be achieved by utilizing the frequency scale factor which reflects
an average CPU frequency for the last tick period length.

The solution is partially based on APERF/MPERF implementation of
arch_freq_get_on_cpu.

Suggested-by: Ionela Voinescu <[email protected]>
Signed-off-by: Beata Michalska <[email protected]>
---
arch/arm64/kernel/topology.c | 110 +++++++++++++++++++++++++++++++----
1 file changed, 100 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 3c814a278534..475fdbf3032a 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -17,6 +17,7 @@
#include <linux/cpufreq.h>
#include <linux/init.h>
#include <linux/percpu.h>
+#include <linux/sched/isolation.h>

#include <asm/cpu.h>
#include <asm/cputype.h>
@@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
* initialized.
*/
static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
-static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
-static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
static cpumask_var_t amu_fie_cpus;

+struct amu_cntr_sample {
+ u64 arch_const_cycles_prev;
+ u64 arch_core_cycles_prev;
+ unsigned long last_update;
+};
+
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
+
void update_freq_counters_refs(void)
{
- this_cpu_write(arch_core_cycles_prev, read_corecnt());
- this_cpu_write(arch_const_cycles_prev, read_constcnt());
+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
+
+ amu_sample->arch_core_cycles_prev = read_corecnt();
+ amu_sample->arch_const_cycles_prev = read_constcnt();
}

static inline bool freq_counters_valid(int cpu)
{
+ struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
+
if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
return false;

@@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
return false;
}

- if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
- !per_cpu(arch_core_cycles_prev, cpu))) {
+ if (unlikely(!amu_sample->arch_const_cycles_prev ||
+ !amu_sample->arch_core_cycles_prev)) {
pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
return false;
}
@@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)

static void amu_scale_freq_tick(void)
{
+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
u64 prev_core_cnt, prev_const_cnt;
u64 core_cnt, const_cnt, scale;

- prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
- prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
+ prev_const_cnt = amu_sample->arch_const_cycles_prev;
+ prev_core_cnt = amu_sample->arch_core_cycles_prev;

update_freq_counters_refs();

- const_cnt = this_cpu_read(arch_const_cycles_prev);
- core_cnt = this_cpu_read(arch_core_cycles_prev);
+ const_cnt = amu_sample->arch_const_cycles_prev;
+ core_cnt = amu_sample->arch_core_cycles_prev;

+ /*
+ * This should not happen unless the AMUs have been reset and the
+ * counter values have not been restored - unlikely
+ */
if (unlikely(core_cnt <= prev_core_cnt ||
const_cnt <= prev_const_cnt))
return;
@@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)

scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
this_cpu_write(arch_freq_scale, (unsigned long)scale);
+
+ amu_sample->last_update = jiffies;
}

static struct scale_freq_data amu_sfd = {
@@ -189,6 +207,78 @@ static struct scale_freq_data amu_sfd = {
.set_freq_scale = amu_scale_freq_tick,
};

+static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
+{
+ return cpumask_available(amu_fie_cpus) &&
+ cpumask_test_cpu(cpu, amu_fie_cpus);
+}
+
+#define AMU_SAMPLE_EXP_MS 20
+
+unsigned int arch_freq_get_on_cpu(int cpu)
+{
+ struct amu_cntr_sample *amu_sample;
+ unsigned int start_cpu = cpu;
+ unsigned long last_update;
+ unsigned int freq = 0;
+ u64 scale;
+
+ if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
+ return 0;
+
+retry:
+ amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
+
+ last_update = amu_sample->last_update;
+
+ /*
+ * For those CPUs that are in full dynticks mode,
+ * and those that have not seen tick for a while
+ * try an alternative source for the counters (and thus freq scale),
+ * if available, for given policy:
+ * this boils down to identifying an active cpu within the same freq
+ * domain, if any.
+ */
+ if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
+ time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+ int ref_cpu = cpu;
+
+ if (!policy)
+ goto leave;
+
+ if (!policy_is_shared(policy)) {
+ cpufreq_cpu_put(policy);
+ goto leave;
+ }
+
+ do {
+ ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
+ start_cpu, false);
+
+ } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
+
+ cpufreq_cpu_put(policy);
+
+ if (ref_cpu >= nr_cpu_ids)
+ /* No alternative to pull info from */
+ goto leave;
+
+ cpu = ref_cpu;
+ goto retry;
+ }
+ /*
+ * Reversed computation to the one used to determine
+ * the arch_freq_scale value
+ * (see amu_scale_freq_tick for details)
+ */
+ scale = arch_scale_freq_capacity(cpu);
+ freq = scale * arch_scale_freq_ref(cpu);
+ freq >>= SCHED_CAPACITY_SHIFT;
+leave:
+ return freq;
+}
+
static void amu_fie_setup(const struct cpumask *cpus)
{
int cpu;
--
2.25.1


2024-04-17 09:41:02

by Beata Michalska

[permalink] [raw]
Subject: [PATCH v5 5/5] cpufreq: Use arch specific feedback for cpuinfo_cur_freq

Some architectures provide a way to determine an average frequency over
a certain period of time based on available performance monitors (AMU on
ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu
into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to
represent the current frequency of a given CPU,as obtained by the hardware.
This is the type of feedback that counters do provide.

Signed-off-by: Beata Michalska <[email protected]>
---
drivers/cpufreq/cpufreq.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index f6f8d7f450e7..89118406ec68 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -793,8 +793,10 @@ store_one(scaling_max_freq, max);
static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
char *buf)
{
- unsigned int cur_freq = __cpufreq_get(policy);
+ unsigned int cur_freq = arch_freq_get_on_cpu(policy->cpu);

+ if (!cur_freq)
+ cur_freq = __cpufreq_get(policy);
if (cur_freq)
return sprintf(buf, "%u\n", cur_freq);

--
2.25.1


2024-04-17 09:41:09

by Beata Michalska

[permalink] [raw]
Subject: [PATCH v5 4/5] arm64: Update AMU-based frequency scale factor on entering idle

Now that the frequency scale factor has been activated for retrieving
current frequency on a given CPU, trigger its update upon entering
idle. This will, to an extent, allow querying last known frequency
in a non-invasive way. It will also improve the frequency scale factor
accuracy when a CPU entering idle did not receive a tick for a while.
As a consequence, for idle cores, the reported frequency will be the
last one observed before entering the idle state.

Suggested-by: Vanshidhar Konda <[email protected]>
Signed-off-by: Beata Michalska <[email protected]>
---
arch/arm64/kernel/topology.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 475fdbf3032a..3110863ee18c 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -213,6 +213,19 @@ static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
cpumask_test_cpu(cpu, amu_fie_cpus);
}

+void arch_cpu_idle_enter(void)
+{
+ unsigned int cpu = smp_processor_id();
+
+ if (!amu_fie_cpu_supported(cpu))
+ return;
+
+ /* Kick in AMU update but only if one has not happened already */
+ if (housekeeping_cpu(cpu, HK_TYPE_TICK) &&
+ time_is_before_jiffies(per_cpu(cpu_amu_samples.last_update, cpu)))
+ amu_scale_freq_tick();
+}
+
#define AMU_SAMPLE_EXP_MS 20

unsigned int arch_freq_get_on_cpu(int cpu)
@@ -239,8 +252,8 @@ unsigned int arch_freq_get_on_cpu(int cpu)
* this boils down to identifying an active cpu within the same freq
* domain, if any.
*/
- if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
- time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
+ if (!housekeeping_cpu(cpu, HK_TYPE_TICK) || (!idle_cpu(cpu) &&
+ time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS)))) {
struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
int ref_cpu = cpu;

--
2.25.1


2024-04-18 01:39:19

by Vanshidhar Konda

[permalink] [raw]
Subject: Re: [PATCH v5 3/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu

On Wed, Apr 17, 2024 at 10:38:46AM +0100, Beata Michalska wrote:
>With the Frequency Invariance Engine (FIE) being already wired up with
>sched tick and making use of relevant (core counter and constant
>counter) AMU counters, getting the current frequency for a given CPU,
>can be achieved by utilizing the frequency scale factor which reflects
>an average CPU frequency for the last tick period length.
>
>The solution is partially based on APERF/MPERF implementation of
>arch_freq_get_on_cpu.
>
>Suggested-by: Ionela Voinescu <[email protected]>
>Signed-off-by: Beata Michalska <[email protected]>
>---
> arch/arm64/kernel/topology.c | 110 +++++++++++++++++++++++++++++++----
> 1 file changed, 100 insertions(+), 10 deletions(-)
>
>diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>index 3c814a278534..475fdbf3032a 100644
>--- a/arch/arm64/kernel/topology.c
>+++ b/arch/arm64/kernel/topology.c
>@@ -17,6 +17,7 @@
> #include <linux/cpufreq.h>
> #include <linux/init.h>
> #include <linux/percpu.h>
>+#include <linux/sched/isolation.h>
>
> #include <asm/cpu.h>
> #include <asm/cputype.h>
>@@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
> * initialized.
> */
> static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
>-static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
>-static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> static cpumask_var_t amu_fie_cpus;
>
>+struct amu_cntr_sample {
>+ u64 arch_const_cycles_prev;
>+ u64 arch_core_cycles_prev;
>+ unsigned long last_update;
>+};
>+
>+static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
>+
> void update_freq_counters_refs(void)
> {
>- this_cpu_write(arch_core_cycles_prev, read_corecnt());
>- this_cpu_write(arch_const_cycles_prev, read_constcnt());
>+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
>+
>+ amu_sample->arch_core_cycles_prev = read_corecnt();
>+ amu_sample->arch_const_cycles_prev = read_constcnt();
> }
>
> static inline bool freq_counters_valid(int cpu)
> {
>+ struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
>+
> if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
> return false;
>
>@@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
> return false;
> }
>
>- if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
>- !per_cpu(arch_core_cycles_prev, cpu))) {
>+ if (unlikely(!amu_sample->arch_const_cycles_prev ||
>+ !amu_sample->arch_core_cycles_prev)) {
> pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> return false;
> }
>@@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
>
> static void amu_scale_freq_tick(void)
> {
>+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> u64 prev_core_cnt, prev_const_cnt;
> u64 core_cnt, const_cnt, scale;
>
>- prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
>- prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
>+ prev_const_cnt = amu_sample->arch_const_cycles_prev;
>+ prev_core_cnt = amu_sample->arch_core_cycles_prev;
>
> update_freq_counters_refs();
>
>- const_cnt = this_cpu_read(arch_const_cycles_prev);
>- core_cnt = this_cpu_read(arch_core_cycles_prev);
>+ const_cnt = amu_sample->arch_const_cycles_prev;
>+ core_cnt = amu_sample->arch_core_cycles_prev;
>
>+ /*
>+ * This should not happen unless the AMUs have been reset and the
>+ * counter values have not been restored - unlikely
>+ */
> if (unlikely(core_cnt <= prev_core_cnt ||
> const_cnt <= prev_const_cnt))
> return;
>@@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
>
> scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> this_cpu_write(arch_freq_scale, (unsigned long)scale);
>+
>+ amu_sample->last_update = jiffies;
> }
>
> static struct scale_freq_data amu_sfd = {
>@@ -189,6 +207,78 @@ static struct scale_freq_data amu_sfd = {
> .set_freq_scale = amu_scale_freq_tick,
> };
>
>+static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
>+{
>+ return cpumask_available(amu_fie_cpus) &&
>+ cpumask_test_cpu(cpu, amu_fie_cpus);
>+}
>+
>+#define AMU_SAMPLE_EXP_MS 20
>+
>+unsigned int arch_freq_get_on_cpu(int cpu)
>+{
>+ struct amu_cntr_sample *amu_sample;
>+ unsigned int start_cpu = cpu;
>+ unsigned long last_update;
>+ unsigned int freq = 0;
>+ u64 scale;
>+
>+ if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
>+ return 0;
>+
>+retry:
>+ amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
>+
>+ last_update = amu_sample->last_update;
>+
>+ /*
>+ * For those CPUs that are in full dynticks mode,
>+ * and those that have not seen tick for a while
>+ * try an alternative source for the counters (and thus freq scale),

While testing this on AmpereOne system I found that the scaling_cur_freq
and cpufreq_cur_freq are inconsistent for nohz_full CPUs that are being
throttled (OS requested freq != HW provided freq).

For the test I ran stress-ng workload on 9 cores. All the other cores
are idle. I then forced the hardware to throttle the active cores - core
won't run at maximum frequency despite a request from the OS. Each core
has an independent cpufreq policy.

For the nohz_full CPUs since arch_freq_get_on_cpu bails out. In
show_scaling_cur_freq() the next check is to see if
cpufreq_driver->set_policy method is implemented. cppc_cpufreq does not
implement this method and we just end up returning the policy->cur
value. As discussed in other threads, it looks like we want the behavior
to be identical to x86 systems. In that case it seems like returning 0
from arch_freq_get_on_cpu is not going to be valid behavior.

Core scaling_cur_freq cpufreq_cur_freq
[0]: 2700000 2700000
[1]: 2750000 2750000

nohz_full=2-7
[2]: 3200000 2691000
[3]: 3200000 2645000
[4]: 3200000 2731000
[5]: 3200000 2714000
[6]: 3200000 2466000
[7]: 3200000 2708000

isolcpus=8-11 (no workload applied to core 10-11)
[8]: 2700000 2700000
[9]: 2550000 2550000
[10]: 1046875 1046875
[11]: 1096875 1096875

Thanks,
Vanshi

>+ * if available, for given policy:
>+ * this boils down to identifying an active cpu within the same freq
>+ * domain, if any.
>+ */
>+ if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
>+ time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
>+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
>+ int ref_cpu = cpu;
>+
>+ if (!policy)
>+ goto leave;
>+
>+ if (!policy_is_shared(policy)) {
>+ cpufreq_cpu_put(policy);
>+ goto leave;
>+ }
>+
>+ do {
>+ ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
>+ start_cpu, false);
>+
>+ } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
>+
>+ cpufreq_cpu_put(policy);
>+
>+ if (ref_cpu >= nr_cpu_ids)
>+ /* No alternative to pull info from */
>+ goto leave;
>+
>+ cpu = ref_cpu;
>+ goto retry;
>+ }
>+ /*
>+ * Reversed computation to the one used to determine
>+ * the arch_freq_scale value
>+ * (see amu_scale_freq_tick for details)
>+ */
>+ scale = arch_scale_freq_capacity(cpu);
>+ freq = scale * arch_scale_freq_ref(cpu);
>+ freq >>= SCHED_CAPACITY_SHIFT;
>+leave:
>+ return freq;
>+}
>+
> static void amu_fie_setup(const struct cpumask *cpus)
> {
> int cpu;
>--
>2.25.1
>

2024-04-18 10:51:16

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v5 1/5] arch_topology: init capacity_freq_ref to 0

On Wed, Apr 17, 2024 at 10:38:44AM +0100, Beata Michalska wrote:
> From: Ionela Voinescu <[email protected]>
>
> It's useful to have capacity_freq_ref initialized to 0 for users of
> arch_scale_freq_ref() to detect when capacity_freq_ref was not
> yet set.
>
> The only scenario affected by this change in the init value is when a
> cpufreq driver is never loaded. As a result, the only setter of a
> cpu scale factor remains the call of topology_normalize_cpu_scale()
> from parse_dt_topology(). There we cannot use the value 0 of
> capacity_freq_ref so we have to compensate for its uninitialized state.
>

Reviewed-by: Sudeep Holla <[email protected]>

--
Regards,
Sudeep

2024-04-18 10:52:45

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v5 2/5] arm64: amu: Rule out potential use after free

On Wed, Apr 17, 2024 at 10:38:45AM +0100, Beata Michalska wrote:
> For the time being, the amu_fie_cpus cpumask is being exclusively used
> by the AMU-related internals of FIE support and is guaranteed to be
> valid on every access currently made. Still the mask is not being
> invalidated on one of the error handling code paths, which leaves
> a soft spot with potential risk of uaf for CPUMASK_OFFSTACK cases.
> To make things sound, set the cpumaks pointer explicitly to NULL upon
> failing to register the cpufreq notifier.
> Note that, due to the quirks of CPUMASK_OFFSTACK, this change needs to
> be wrapped with grim ifdefing (it would be better served by
> incorporating this into free_cpumask_var ...)
>

Yes it doesn't look neat.

> Signed-off-by: Beata Michalska <[email protected]>
> ---
> arch/arm64/kernel/topology.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 1a2c72f3e7f8..3c814a278534 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -244,8 +244,12 @@ static int __init init_amu_fie(void)
>
> ret = cpufreq_register_notifier(&init_amu_fie_notifier,
> CPUFREQ_POLICY_NOTIFIER);
> - if (ret)
> + if (ret) {
> free_cpumask_var(amu_fie_cpus);
> +#ifdef CONFIG_CPUMASK_OFFSTACK
> + amu_fie_cpus = NULL;
> +#endif
> + }

Instead of this #ifdeffery, I was wondering if we can actually do the
allocation in init_amu_fie_callback() the first time it gets called
checking if amu_fie_cpus is NULL. init_amu_fie_callback() must get called
only if the cpufreq_register_notifier() succeeds right ?

Also I don't see anyone calling amu_fie_setup(), so where do you think
the possible use after free could occur for amu_fie_cpus. Just thinking
out loud to check if I missed anything.

--
Regards,
Sudeep

2024-04-18 16:08:40

by Beata Michalska

[permalink] [raw]
Subject: Re: [PATCH v5 2/5] arm64: amu: Rule out potential use after free

On Thu, Apr 18, 2024 at 11:50:52AM +0100, Sudeep Holla wrote:
> On Wed, Apr 17, 2024 at 10:38:45AM +0100, Beata Michalska wrote:
> > For the time being, the amu_fie_cpus cpumask is being exclusively used
> > by the AMU-related internals of FIE support and is guaranteed to be
> > valid on every access currently made. Still the mask is not being
> > invalidated on one of the error handling code paths, which leaves
> > a soft spot with potential risk of uaf for CPUMASK_OFFSTACK cases.
> > To make things sound, set the cpumaks pointer explicitly to NULL upon
> > failing to register the cpufreq notifier.
> > Note that, due to the quirks of CPUMASK_OFFSTACK, this change needs to
> > be wrapped with grim ifdefing (it would be better served by
> > incorporating this into free_cpumask_var ...)
> >
>
> Yes it doesn't look neat.
>
> > Signed-off-by: Beata Michalska <[email protected]>
> > ---
> > arch/arm64/kernel/topology.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index 1a2c72f3e7f8..3c814a278534 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -244,8 +244,12 @@ static int __init init_amu_fie(void)
> >
> > ret = cpufreq_register_notifier(&init_amu_fie_notifier,
> > CPUFREQ_POLICY_NOTIFIER);
> > - if (ret)
> > + if (ret) {
> > free_cpumask_var(amu_fie_cpus);
> > +#ifdef CONFIG_CPUMASK_OFFSTACK
> > + amu_fie_cpus = NULL;
> > +#endif
> > + }
>
> Instead of this #ifdeffery, I was wondering if we can actually do the
> allocation in init_amu_fie_callback() the first time it gets called
> checking if amu_fie_cpus is NULL. init_amu_fie_callback() must get called
> only if the cpufreq_register_notifier() succeeds right ?
>
Delayed allocation ... I guess this will do the trick.
> Also I don't see anyone calling amu_fie_setup(), so where do you think
> the possible use after free could occur for amu_fie_cpus. Just thinking
> out loud to check if I missed anything.
>
You haven't missed anything. Currently the uaf is purely theoretical as the code
that relies on that mask will only be executed if we have succeeded to register
the amu fie support: so far so good.
This change is required for following patches, where that mask is being used to
determine if given CPU has been setup with AMU counters, and as such there needs
to be a safe way to validate it (at any time) by arch_freq_get_on_cpu and
arch_cpu_idle_enter (patches 3/5 & 4/5): both will be called from outside the
FIE. Without this change, if cpufreq_register_notifier fails in init_amu_fie,
we will have amu_fie_cpus pointing to released memory.

Hope that makes things more clear.

---
BR
Beata
> --
> Regards,
> Sudeep

2024-04-24 10:25:41

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v5 2/5] arm64: amu: Rule out potential use after free

On Thu, Apr 18, 2024 at 05:55:43PM +0200, Beata Michalska wrote:
> On Thu, Apr 18, 2024 at 11:50:52AM +0100, Sudeep Holla wrote:
> > On Wed, Apr 17, 2024 at 10:38:45AM +0100, Beata Michalska wrote:
> > > For the time being, the amu_fie_cpus cpumask is being exclusively used
> > > by the AMU-related internals of FIE support and is guaranteed to be
> > > valid on every access currently made. Still the mask is not being
> > > invalidated on one of the error handling code paths, which leaves
> > > a soft spot with potential risk of uaf for CPUMASK_OFFSTACK cases.
> > > To make things sound, set the cpumaks pointer explicitly to NULL upon
> > > failing to register the cpufreq notifier.
> > > Note that, due to the quirks of CPUMASK_OFFSTACK, this change needs to
> > > be wrapped with grim ifdefing (it would be better served by
> > > incorporating this into free_cpumask_var ...)
> > >
> >
> > Yes it doesn't look neat.
> >
> > > Signed-off-by: Beata Michalska <[email protected]>
> > > ---
> > > arch/arm64/kernel/topology.c | 6 +++++-
> > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > > index 1a2c72f3e7f8..3c814a278534 100644
> > > --- a/arch/arm64/kernel/topology.c
> > > +++ b/arch/arm64/kernel/topology.c
> > > @@ -244,8 +244,12 @@ static int __init init_amu_fie(void)
> > >
> > > ret = cpufreq_register_notifier(&init_amu_fie_notifier,
> > > CPUFREQ_POLICY_NOTIFIER);
> > > - if (ret)
> > > + if (ret) {
> > > free_cpumask_var(amu_fie_cpus);
> > > +#ifdef CONFIG_CPUMASK_OFFSTACK
> > > + amu_fie_cpus = NULL;
> > > +#endif
> > > + }
> >
> > Instead of this #ifdeffery, I was wondering if we can actually do the
> > allocation in init_amu_fie_callback() the first time it gets called
> > checking if amu_fie_cpus is NULL. init_amu_fie_callback() must get called
> > only if the cpufreq_register_notifier() succeeds right ?
> >

> Delayed allocation ... I guess this will do the trick.

I prefer that if we can't find any other alternative. Do you see any issues
with that ? That said I am fine if Will/Catalin is happy with this.

> > Also I don't see anyone calling amu_fie_setup(), so where do you think
> > the possible use after free could occur for amu_fie_cpus. Just thinking
> > out loud to check if I missed anything.
> >
> You haven't missed anything. Currently the uaf is purely theoretical as the code
> that relies on that mask will only be executed if we have succeeded to register
> the amu fie support: so far so good.

Yes it is better to handle it even if it is theoretical.

I assume you get some compiler error if you assign unconditionally and
if(IS_ENABLED()) also doesn't work in this case as it would still give
error ?

--
Regards,
Sudeep

2024-04-25 14:53:17

by Beata Michalska

[permalink] [raw]
Subject: Re: [PATCH v5 2/5] arm64: amu: Rule out potential use after free

On Wed, Apr 24, 2024 at 11:25:27AM +0100, Sudeep Holla wrote:
> On Thu, Apr 18, 2024 at 05:55:43PM +0200, Beata Michalska wrote:
> > On Thu, Apr 18, 2024 at 11:50:52AM +0100, Sudeep Holla wrote:
> > > On Wed, Apr 17, 2024 at 10:38:45AM +0100, Beata Michalska wrote:
> > > > For the time being, the amu_fie_cpus cpumask is being exclusively used
> > > > by the AMU-related internals of FIE support and is guaranteed to be
> > > > valid on every access currently made. Still the mask is not being
> > > > invalidated on one of the error handling code paths, which leaves
> > > > a soft spot with potential risk of uaf for CPUMASK_OFFSTACK cases.
> > > > To make things sound, set the cpumaks pointer explicitly to NULL upon
> > > > failing to register the cpufreq notifier.
> > > > Note that, due to the quirks of CPUMASK_OFFSTACK, this change needs to
> > > > be wrapped with grim ifdefing (it would be better served by
> > > > incorporating this into free_cpumask_var ...)
> > > >
> > >
> > > Yes it doesn't look neat.
> > >
> > > > Signed-off-by: Beata Michalska <[email protected]>
> > > > ---
> > > > arch/arm64/kernel/topology.c | 6 +++++-
> > > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > > > index 1a2c72f3e7f8..3c814a278534 100644
> > > > --- a/arch/arm64/kernel/topology.c
> > > > +++ b/arch/arm64/kernel/topology.c
> > > > @@ -244,8 +244,12 @@ static int __init init_amu_fie(void)
> > > >
> > > > ret = cpufreq_register_notifier(&init_amu_fie_notifier,
> > > > CPUFREQ_POLICY_NOTIFIER);
> > > > - if (ret)
> > > > + if (ret) {
> > > > free_cpumask_var(amu_fie_cpus);
> > > > +#ifdef CONFIG_CPUMASK_OFFSTACK
> > > > + amu_fie_cpus = NULL;
> > > > +#endif
> > > > + }
> > >
> > > Instead of this #ifdeffery, I was wondering if we can actually do the
> > > allocation in init_amu_fie_callback() the first time it gets called
> > > checking if amu_fie_cpus is NULL. init_amu_fie_callback() must get called
> > > only if the cpufreq_register_notifier() succeeds right ?
> > >
>
> > Delayed allocation ... I guess this will do the trick.
>
> I prefer that if we can't find any other alternative. Do you see any issues
> with that ? That said I am fine if Will/Catalin is happy with this.
>
We could actually move it up further to amu_fie_setup and potentially save on
memory if none of the present CPUs have valid AMU counters. This is unlikely but
still. So it could look like:

--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@@ -297,7 -194,7 +297,8 @@@ static void amu_fie_setup(const struct
int cpu;

/* We are already set since the last insmod of cpufreq driver */
++ if (cpumask_available(amu_fie_cpus) &&
-- if (unlikely(cpumask_subset(cpus, amu_fie_cpus)))
++ unlikely(cpumask_subset(cpus, amu_fie_cpus)))
return;

for_each_cpu(cpu, cpus) {
@@@ -305,6 -202,6 +306,10 @@@
return;
}

++ if (!cpumask_available(amu_fie_cpus) &&
++ !zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
++ return;
++

In both cases we risk not setting up AMUs for FIE for all or some CPUs
if we fail to allocate the memory but I guess we are already there.
@Ionela: What do you think?

> > > Also I don't see anyone calling amu_fie_setup(), so where do you think
> > > the possible use after free could occur for amu_fie_cpus. Just thinking
> > > out loud to check if I missed anything.
> > >
> > You haven't missed anything. Currently the uaf is purely theoretical as the code
> > that relies on that mask will only be executed if we have succeeded to register
> > the amu fie support: so far so good.
>
> Yes it is better to handle it even if it is theoretical.
>
> I assume you get some compiler error if you assign unconditionally and
> if(IS_ENABLED()) also doesn't work in this case as it would still give
> error ?
Yes, the #if is needed to exclude it from compilation if !CPUMASK_OFFSTACK.

---
BR
Beata
>
> --
> Regards,
> Sudeep

2024-04-26 10:52:54

by Beata Michalska

[permalink] [raw]
Subject: Re: [PATCH v5 3/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu

On Wed, Apr 17, 2024 at 06:39:00PM -0700, Vanshidhar Konda wrote:
> On Wed, Apr 17, 2024 at 10:38:46AM +0100, Beata Michalska wrote:
> > With the Frequency Invariance Engine (FIE) being already wired up with
> > sched tick and making use of relevant (core counter and constant
> > counter) AMU counters, getting the current frequency for a given CPU,
> > can be achieved by utilizing the frequency scale factor which reflects
> > an average CPU frequency for the last tick period length.
> >
> > The solution is partially based on APERF/MPERF implementation of
> > arch_freq_get_on_cpu.
> >
> > Suggested-by: Ionela Voinescu <[email protected]>
> > Signed-off-by: Beata Michalska <[email protected]>
> > ---
> > arch/arm64/kernel/topology.c | 110 +++++++++++++++++++++++++++++++----
> > 1 file changed, 100 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index 3c814a278534..475fdbf3032a 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -17,6 +17,7 @@
> > #include <linux/cpufreq.h>
> > #include <linux/init.h>
> > #include <linux/percpu.h>
> > +#include <linux/sched/isolation.h>
> >
> > #include <asm/cpu.h>
> > #include <asm/cputype.h>
> > @@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
> > * initialized.
> > */
> > static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
> > -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> > -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> > static cpumask_var_t amu_fie_cpus;
> >
> > +struct amu_cntr_sample {
> > + u64 arch_const_cycles_prev;
> > + u64 arch_core_cycles_prev;
> > + unsigned long last_update;
> > +};
> > +
> > +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
> > +
> > void update_freq_counters_refs(void)
> > {
> > - this_cpu_write(arch_core_cycles_prev, read_corecnt());
> > - this_cpu_write(arch_const_cycles_prev, read_constcnt());
> > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> > +
> > + amu_sample->arch_core_cycles_prev = read_corecnt();
> > + amu_sample->arch_const_cycles_prev = read_constcnt();
> > }
> >
> > static inline bool freq_counters_valid(int cpu)
> > {
> > + struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > +
> > if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
> > return false;
> >
> > @@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
> > return false;
> > }
> >
> > - if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> > - !per_cpu(arch_core_cycles_prev, cpu))) {
> > + if (unlikely(!amu_sample->arch_const_cycles_prev ||
> > + !amu_sample->arch_core_cycles_prev)) {
> > pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> > return false;
> > }
> > @@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
> >
> > static void amu_scale_freq_tick(void)
> > {
> > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> > u64 prev_core_cnt, prev_const_cnt;
> > u64 core_cnt, const_cnt, scale;
> >
> > - prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> > - prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> > + prev_const_cnt = amu_sample->arch_const_cycles_prev;
> > + prev_core_cnt = amu_sample->arch_core_cycles_prev;
> >
> > update_freq_counters_refs();
> >
> > - const_cnt = this_cpu_read(arch_const_cycles_prev);
> > - core_cnt = this_cpu_read(arch_core_cycles_prev);
> > + const_cnt = amu_sample->arch_const_cycles_prev;
> > + core_cnt = amu_sample->arch_core_cycles_prev;
> >
> > + /*
> > + * This should not happen unless the AMUs have been reset and the
> > + * counter values have not been restored - unlikely
> > + */
> > if (unlikely(core_cnt <= prev_core_cnt ||
> > const_cnt <= prev_const_cnt))
> > return;
> > @@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
> >
> > scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> > this_cpu_write(arch_freq_scale, (unsigned long)scale);
> > +
> > + amu_sample->last_update = jiffies;
> > }
> >
> > static struct scale_freq_data amu_sfd = {
> > @@ -189,6 +207,78 @@ static struct scale_freq_data amu_sfd = {
> > .set_freq_scale = amu_scale_freq_tick,
> > };
> >
> > +static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
> > +{
> > + return cpumask_available(amu_fie_cpus) &&
> > + cpumask_test_cpu(cpu, amu_fie_cpus);
> > +}
> > +
> > +#define AMU_SAMPLE_EXP_MS 20
> > +
> > +unsigned int arch_freq_get_on_cpu(int cpu)
> > +{
> > + struct amu_cntr_sample *amu_sample;
> > + unsigned int start_cpu = cpu;
> > + unsigned long last_update;
> > + unsigned int freq = 0;
> > + u64 scale;
> > +
> > + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
> > + return 0;
> > +
> > +retry:
> > + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > +
> > + last_update = amu_sample->last_update;
> > +
> > + /*
> > + * For those CPUs that are in full dynticks mode,
> > + * and those that have not seen tick for a while
> > + * try an alternative source for the counters (and thus freq scale),
>
> While testing this on AmpereOne system I found that the scaling_cur_freq
> and cpufreq_cur_freq are inconsistent for nohz_full CPUs that are being
> throttled (OS requested freq != HW provided freq).
>
> For the test I ran stress-ng workload on 9 cores. All the other cores
> are idle. I then forced the hardware to throttle the active cores - core
> won't run at maximum frequency despite a request from the OS. Each core
> has an independent cpufreq policy.
>
> For the nohz_full CPUs since arch_freq_get_on_cpu bails out. In
> show_scaling_cur_freq() the next check is to see if
> cpufreq_driver->set_policy method is implemented. cppc_cpufreq does not
> implement this method and we just end up returning the policy->cur
> value. As discussed in other threads, it looks like we want the behavior
> to be identical to x86 systems. In that case it seems like returning 0
> from arch_freq_get_on_cpu is not going to be valid behavior.
>
So the tricky bit is that up until now, arch_freq_get_on_cpu has been solely
used by show_scaling_cur_freq and as a result APERF/MPERF based implementation
could make certain assumptions, based on which, an alternative source of
information regarding current frequency could be used, wrapping all within the
arch_freq_get_on_cpu implementation.
This approach is no longer valid if both cpuinfo_cur_freq and scaling_cur_freq
rely on arch specific feedback to determine current frequency. As mentioned
earlier, the most ideal way would be to not use that feedback for
scaling_cur_freq and have it plugged in for cpuinfo_cur_freq and thus to align
with the expectations (docs) as per what type of information each attribute is
to provide. But that does not seem to be an option and we have to deal with the
aftermath. I do get your point though and I will try to revive the discussion
we have had on this one in another thread [1]

---
[1] https://lore.kernel.org/all/[email protected]/
---
BR
Beata
> Core scaling_cur_freq cpufreq_cur_freq
> [0]: 2700000 2700000
> [1]: 2750000 2750000
>
> nohz_full=2-7
> [2]: 3200000 2691000
> [3]: 3200000 2645000
> [4]: 3200000 2731000
> [5]: 3200000 2714000
> [6]: 3200000 2466000
> [7]: 3200000 2708000
>
> isolcpus=8-11 (no workload applied to core 10-11)
> [8]: 2700000 2700000
> [9]: 2550000 2550000
> [10]: 1046875 1046875
> [11]: 1096875 1096875
>
> Thanks,
> Vanshi
>
> > + * if available, for given policy:
> > + * this boils down to identifying an active cpu within the same freq
> > + * domain, if any.
> > + */
> > + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
> > + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
> > + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > + int ref_cpu = cpu;
> > +
> > + if (!policy)
> > + goto leave;
> > +
> > + if (!policy_is_shared(policy)) {
> > + cpufreq_cpu_put(policy);
> > + goto leave;
> > + }
> > +
> > + do {
> > + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
> > + start_cpu, false);
> > +
> > + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
> > +
> > + cpufreq_cpu_put(policy);
> > +
> > + if (ref_cpu >= nr_cpu_ids)
> > + /* No alternative to pull info from */
> > + goto leave;
> > +
> > + cpu = ref_cpu;
> > + goto retry;
> > + }
> > + /*
> > + * Reversed computation to the one used to determine
> > + * the arch_freq_scale value
> > + * (see amu_scale_freq_tick for details)
> > + */
> > + scale = arch_scale_freq_capacity(cpu);
> > + freq = scale * arch_scale_freq_ref(cpu);
> > + freq >>= SCHED_CAPACITY_SHIFT;
> > +leave:
> > + return freq;
> > +}
> > +
> > static void amu_fie_setup(const struct cpumask *cpus)
> > {
> > int cpu;
> > --
> > 2.25.1
> >

2024-05-07 07:24:23

by Ionela Voinescu

[permalink] [raw]
Subject: Re: [PATCH v5 2/5] arm64: amu: Rule out potential use after free

Hi Beata,

On Thursday 25 Apr 2024 at 16:27:37 (+0200), Beata Michalska wrote:
> On Wed, Apr 24, 2024 at 11:25:27AM +0100, Sudeep Holla wrote:
> > On Thu, Apr 18, 2024 at 05:55:43PM +0200, Beata Michalska wrote:
> > > On Thu, Apr 18, 2024 at 11:50:52AM +0100, Sudeep Holla wrote:
> > > > On Wed, Apr 17, 2024 at 10:38:45AM +0100, Beata Michalska wrote:
> > > > > For the time being, the amu_fie_cpus cpumask is being exclusively used
> > > > > by the AMU-related internals of FIE support and is guaranteed to be
> > > > > valid on every access currently made. Still the mask is not being
> > > > > invalidated on one of the error handling code paths, which leaves
> > > > > a soft spot with potential risk of uaf for CPUMASK_OFFSTACK cases.
> > > > > To make things sound, set the cpumaks pointer explicitly to NULL upon
> > > > > failing to register the cpufreq notifier.
> > > > > Note that, due to the quirks of CPUMASK_OFFSTACK, this change needs to
> > > > > be wrapped with grim ifdefing (it would be better served by
> > > > > incorporating this into free_cpumask_var ...)
> > > > >
> > > >
> > > > Yes it doesn't look neat.
> > > >
> > > > > Signed-off-by: Beata Michalska <[email protected]>
> > > > > ---
> > > > > arch/arm64/kernel/topology.c | 6 +++++-
> > > > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > > > > index 1a2c72f3e7f8..3c814a278534 100644
> > > > > --- a/arch/arm64/kernel/topology.c
> > > > > +++ b/arch/arm64/kernel/topology.c
> > > > > @@ -244,8 +244,12 @@ static int __init init_amu_fie(void)
> > > > >
> > > > > ret = cpufreq_register_notifier(&init_amu_fie_notifier,
> > > > > CPUFREQ_POLICY_NOTIFIER);
> > > > > - if (ret)
> > > > > + if (ret) {
> > > > > free_cpumask_var(amu_fie_cpus);
> > > > > +#ifdef CONFIG_CPUMASK_OFFSTACK
> > > > > + amu_fie_cpus = NULL;
> > > > > +#endif
> > > > > + }
> > > >
> > > > Instead of this #ifdeffery, I was wondering if we can actually do the
> > > > allocation in init_amu_fie_callback() the first time it gets called
> > > > checking if amu_fie_cpus is NULL. init_amu_fie_callback() must get called
> > > > only if the cpufreq_register_notifier() succeeds right ?
> > > >
> >
> > > Delayed allocation ... I guess this will do the trick.
> >
> > I prefer that if we can't find any other alternative. Do you see any issues
> > with that ? That said I am fine if Will/Catalin is happy with this.
> >
> We could actually move it up further to amu_fie_setup and potentially save on
> memory if none of the present CPUs have valid AMU counters. This is unlikely but
> still. So it could look like:
>
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@@ -297,7 -194,7 +297,8 @@@ static void amu_fie_setup(const struct
> int cpu;
>
> /* We are already set since the last insmod of cpufreq driver */
> ++ if (cpumask_available(amu_fie_cpus) &&
> -- if (unlikely(cpumask_subset(cpus, amu_fie_cpus)))
> ++ unlikely(cpumask_subset(cpus, amu_fie_cpus)))
> return;
>
> for_each_cpu(cpu, cpus) {
> @@@ -305,6 -202,6 +306,10 @@@
> return;
> }
>
> ++ if (!cpumask_available(amu_fie_cpus) &&
> ++ !zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
> ++ return;
> ++
>
> In both cases we risk not setting up AMUs for FIE for all or some CPUs
> if we fail to allocate the memory but I guess we are already there.
> @Ionela: What do you think?

It looks good to me. Many thanks for the fix.

Ionela.

>
> > > > Also I don't see anyone calling amu_fie_setup(), so where do you think
> > > > the possible use after free could occur for amu_fie_cpus. Just thinking
> > > > out loud to check if I missed anything.
> > > >
> > > You haven't missed anything. Currently the uaf is purely theoretical as the code
> > > that relies on that mask will only be executed if we have succeeded to register
> > > the amu fie support: so far so good.
> >
> > Yes it is better to handle it even if it is theoretical.
> >
> > I assume you get some compiler error if you assign unconditionally and
> > if(IS_ENABLED()) also doesn't work in this case as it would still give
> > error ?
> Yes, the #if is needed to exclude it from compilation if !CPUMASK_OFFSTACK.
>
> ---
> BR
> Beata
> >
> > --
> > Regards,
> > Sudeep