Hi all,
This patch set v2 aims to refactor the thermal pressure update
code. There are already two clients which do similar thing:
convert the capped frequency value into the capacity of
affected CPU and call the 'set' function to store the
reduced capacity into the per-cpu variable.
There might be more than two of these users. In near future
it will be scmi-cpufreq driver, which receives notification
from FW about reduced frequency due to thermal. Other vendors
might follow. Let's avoid code duplication and potential
conversion bugs. Move the conversion code into the arch_topology.c
where the capacity calculation setup code and thermal pressure sit.
Apart from that $subject patches, there is one patch (3/5) which fixes
issue in qcom-cpufreq-hw.c when the thermal pressure is not
updated for offline CPUs. It's similar fix that has been merged
recently for cpufreq_cooling.c:
2ad8ccc17d1e4270cf65a3f2
Changes:
v2:
- added Reviewed-by from Thara for patch 3/5
- changed the doxygen comment and used mult_frac()
according to Thara's suggestion in patch 1/5
v1 -> [1]
Regards,
Lukasz Luba
[1] https://lore.kernel.org/linux-pm/[email protected]/
Lukasz Luba (5):
arch_topology: Introduce thermal pressure update function
thermal: cpufreq_cooling: Use new thermal pressure update function
cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function
arch_topology: Remove unused topology_set_thermal_pressure() and
related
arch/arm/include/asm/topology.h | 2 +-
arch/arm64/include/asm/topology.h | 2 +-
drivers/base/arch_topology.c | 35 +++++++++++++++++++++++++++----
drivers/cpufreq/qcom-cpufreq-hw.c | 14 +++++--------
drivers/thermal/cpufreq_cooling.c | 6 +-----
include/linux/arch_topology.h | 4 ++--
include/linux/sched/topology.h | 6 +++---
init/Kconfig | 2 +-
8 files changed, 45 insertions(+), 26 deletions(-)
--
2.17.1
The thermal pressure is a mechanism which is used for providing
information about reduced CPU performance to the scheduler. Usually code
has to convert the value from frequency units into capacity units,
which are understandable by the scheduler. Create a common conversion code
which can be just used via a handy API.
Signed-off-by: Lukasz Luba <[email protected]>
---
arch/arm/include/asm/topology.h | 1 +
arch/arm64/include/asm/topology.h | 1 +
drivers/base/arch_topology.c | 36 ++++++++++++++++++++++++++++++-
include/linux/arch_topology.h | 3 +++
include/linux/sched/topology.h | 7 ++++++
5 files changed, 47 insertions(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index 470299ee2fba..aee6c456c085 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -24,6 +24,7 @@
/* Replace task scheduler's default thermal pressure API */
#define arch_scale_thermal_pressure topology_get_thermal_pressure
#define arch_set_thermal_pressure topology_set_thermal_pressure
+#define arch_thermal_pressure_update topology_thermal_pressure_update
#else
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index ec2db3419c41..c997015402bc 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -33,6 +33,7 @@ void update_freq_counters_refs(void);
/* Replace task scheduler's default thermal pressure API */
#define arch_scale_thermal_pressure topology_get_thermal_pressure
#define arch_set_thermal_pressure topology_set_thermal_pressure
+#define arch_thermal_pressure_update topology_thermal_pressure_update
#include <asm-generic/topology.h>
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 43407665918f..1fa28b5afdb2 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -25,6 +25,7 @@
static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
static struct cpumask scale_freq_counters_mask;
static bool scale_freq_invariant;
+static DEFINE_PER_CPU(u32, freq_factor) = 1;
static bool supports_scale_freq_counters(const struct cpumask *cpus)
{
@@ -168,6 +169,40 @@ void topology_set_thermal_pressure(const struct cpumask *cpus,
}
EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
+/**
+ * topology_thermal_pressure_update() - Update thermal pressure for CPUs
+ * @cpus : The related CPUs for which capacity has been reduced
+ * @capped_freq : The maximum allowed frequency that CPUs can run at
+ *
+ * Update the value of thermal pressure for all @cpus in the mask. The
+ * cpumask should include all (online+offline) affected CPUs, to avoid
+ * operating on stale data when hot-plug is used for some CPUs. The
+ * @capped_freq must be less or equal to the max possible frequency and
+ * reflects the currently allowed max CPUs frequency due to thermal capping.
+ * The @capped_freq must be provided in kHz.
+ */
+void topology_thermal_pressure_update(const struct cpumask *cpus,
+ unsigned long capped_freq)
+{
+ unsigned long max_capacity, capacity;
+ int cpu;
+
+ if (!cpus)
+ return;
+
+ cpu = cpumask_first(cpus);
+ max_capacity = arch_scale_cpu_capacity(cpu);
+
+ /* Convert to MHz scale which is used in 'freq_factor' */
+ capped_freq /= 1000;
+
+ capacity = mult_frac(capped_freq, max_capacity,
+ per_cpu(freq_factor, cpu));
+
+ arch_set_thermal_pressure(cpus, max_capacity - capacity);
+}
+EXPORT_SYMBOL_GPL(topology_thermal_pressure_update);
+
static ssize_t cpu_capacity_show(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -220,7 +255,6 @@ static void update_topology_flags_workfn(struct work_struct *work)
update_topology = 0;
}
-static DEFINE_PER_CPU(u32, freq_factor) = 1;
static u32 *raw_capacity;
static int free_raw_capacity(void)
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index f180240dc95f..9e183621a59b 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -59,6 +59,9 @@ static inline unsigned long topology_get_thermal_pressure(int cpu)
void topology_set_thermal_pressure(const struct cpumask *cpus,
unsigned long th_pressure);
+void topology_thermal_pressure_update(const struct cpumask *cpus,
+ unsigned long capped_freq);
+
struct cpu_topology {
int thread_id;
int core_id;
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 8f0f778b7c91..990d14814427 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -266,6 +266,13 @@ void arch_set_thermal_pressure(const struct cpumask *cpus,
{ }
#endif
+#ifndef arch_thermal_pressure_update
+static __always_inline
+void arch_thermal_pressure_update(const struct cpumask *cpus,
+ unsigned long capped_frequency)
+{ }
+#endif
+
static inline int task_node(const struct task_struct *p)
{
return cpu_to_node(task_cpu(p));
--
2.17.1
Thermal pressure provides a new API, which allows to use CPU frequency
as an argument. That removes the need of local conversion to capacity.
Use this new API and remove old local conversion code.
Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 0138b2ec406d..bf7871c2a4c9 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -275,10 +275,10 @@ static unsigned int qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)
static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
{
- unsigned long max_capacity, capacity, freq_hz, throttled_freq;
struct cpufreq_policy *policy = data->policy;
int cpu = cpumask_first(policy->cpus);
struct device *dev = get_cpu_device(cpu);
+ unsigned long freq_hz, throttled_freq;
struct dev_pm_opp *opp;
unsigned int freq;
@@ -295,17 +295,12 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
throttled_freq = freq_hz / HZ_PER_KHZ;
- /* Update thermal pressure */
-
- max_capacity = arch_scale_cpu_capacity(cpu);
- capacity = mult_frac(max_capacity, throttled_freq, policy->cpuinfo.max_freq);
-
/* Don't pass boost capacity to scheduler */
- if (capacity > max_capacity)
- capacity = max_capacity;
+ if (throttled_freq > policy->cpuinfo.max_freq)
+ throttled_freq = policy->cpuinfo.max_freq;
- arch_set_thermal_pressure(policy->related_cpus,
- max_capacity - capacity);
+ /* Update thermal pressure */
+ arch_thermal_pressure_update(policy->related_cpus, throttled_freq);
/*
* In the unlikely case policy is unregistered do not enable
--
2.17.1
The thermal pressure signal gives information to the scheduler about
reduced CPU capacity due to thermal. It is based on a value stored in
a per-cpu 'thermal_pressure' variable. The online CPUs will get the
new value there, while the offline won't. Unfortunately, when the CPU
is back online, the value read from per-cpu variable might be wrong
(stale data). This might affect the scheduler decisions, since it
sees the CPU capacity differently than what is actually available.
Fix it by making sure that all online+offline CPUs would get the
proper value in their per-cpu variable when there is throttling
or throttling is removed.
Fixes: 275157b367f479 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Reviewed-by: Thara Gopinath <[email protected]>
Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index a2be0df7e174..0138b2ec406d 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -304,7 +304,8 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
if (capacity > max_capacity)
capacity = max_capacity;
- arch_set_thermal_pressure(policy->cpus, max_capacity - capacity);
+ arch_set_thermal_pressure(policy->related_cpus,
+ max_capacity - capacity);
/*
* In the unlikely case policy is unregistered do not enable
--
2.17.1
Thermal pressure provides a new API, which allows to use CPU frequency
as an argument. That removes the need of local conversion to capacity.
Use this new function and remove old conversion code.
Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/cpufreq_cooling.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index 43b1ae8a7789..835c091ce818 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -462,7 +462,6 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
struct cpumask *cpus;
unsigned int frequency;
- unsigned long max_capacity, capacity;
int ret;
/* Request state should be less than max_level */
@@ -479,10 +478,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
if (ret >= 0) {
cpufreq_cdev->cpufreq_state = state;
cpus = cpufreq_cdev->policy->related_cpus;
- max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
- capacity = frequency * max_capacity;
- capacity /= cpufreq_cdev->policy->cpuinfo.max_freq;
- arch_set_thermal_pressure(cpus, max_capacity - capacity);
+ arch_thermal_pressure_update(cpus, frequency);
ret = 0;
}
--
2.17.1
There is no need of this function (and related) since code has been
converted to use the new arch_thermal_pressure_update() API. The old
code can be removed.
Signed-off-by: Lukasz Luba <[email protected]>
---
arch/arm/include/asm/topology.h | 1 -
arch/arm64/include/asm/topology.h | 1 -
drivers/base/arch_topology.c | 17 +++++------------
include/linux/arch_topology.h | 3 ---
include/linux/sched/topology.h | 7 -------
init/Kconfig | 2 +-
6 files changed, 6 insertions(+), 25 deletions(-)
diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index aee6c456c085..5e51fdcfcbd4 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -23,7 +23,6 @@
/* Replace task scheduler's default thermal pressure API */
#define arch_scale_thermal_pressure topology_get_thermal_pressure
-#define arch_set_thermal_pressure topology_set_thermal_pressure
#define arch_thermal_pressure_update topology_thermal_pressure_update
#else
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index c997015402bc..92cd1288906f 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -32,7 +32,6 @@ void update_freq_counters_refs(void);
/* Replace task scheduler's default thermal pressure API */
#define arch_scale_thermal_pressure topology_get_thermal_pressure
-#define arch_set_thermal_pressure topology_set_thermal_pressure
#define arch_thermal_pressure_update topology_thermal_pressure_update
#include <asm-generic/topology.h>
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 1fa28b5afdb2..fa5ba3c7416c 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -159,16 +159,6 @@ void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
DEFINE_PER_CPU(unsigned long, thermal_pressure);
-void topology_set_thermal_pressure(const struct cpumask *cpus,
- unsigned long th_pressure)
-{
- int cpu;
-
- for_each_cpu(cpu, cpus)
- WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
-}
-EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
-
/**
* topology_thermal_pressure_update() - Update thermal pressure for CPUs
* @cpus : The related CPUs for which capacity has been reduced
@@ -184,7 +174,7 @@ EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
void topology_thermal_pressure_update(const struct cpumask *cpus,
unsigned long capped_freq)
{
- unsigned long max_capacity, capacity;
+ unsigned long max_capacity, capacity, th_pressure;
int cpu;
if (!cpus)
@@ -199,7 +189,10 @@ void topology_thermal_pressure_update(const struct cpumask *cpus,
capacity = mult_frac(capped_freq, max_capacity,
per_cpu(freq_factor, cpu));
- arch_set_thermal_pressure(cpus, max_capacity - capacity);
+ th_pressure = max_capacity - capacity;
+
+ for_each_cpu(cpu, cpus)
+ WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
}
EXPORT_SYMBOL_GPL(topology_thermal_pressure_update);
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 9e183621a59b..9b95e5b29ee9 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -56,9 +56,6 @@ static inline unsigned long topology_get_thermal_pressure(int cpu)
return per_cpu(thermal_pressure, cpu);
}
-void topology_set_thermal_pressure(const struct cpumask *cpus,
- unsigned long th_pressure);
-
void topology_thermal_pressure_update(const struct cpumask *cpus,
unsigned long capped_freq);
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 990d14814427..f31da5454baa 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -259,13 +259,6 @@ unsigned long arch_scale_thermal_pressure(int cpu)
}
#endif
-#ifndef arch_set_thermal_pressure
-static __always_inline
-void arch_set_thermal_pressure(const struct cpumask *cpus,
- unsigned long th_pressure)
-{ }
-#endif
-
#ifndef arch_thermal_pressure_update
static __always_inline
void arch_thermal_pressure_update(const struct cpumask *cpus,
diff --git a/init/Kconfig b/init/Kconfig
index f494e405c156..334c302e588f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -559,7 +559,7 @@ config SCHED_THERMAL_PRESSURE
i.e. put less load on throttled CPUs than on non/less throttled ones.
This requires the architecture to implement
- arch_set_thermal_pressure() and arch_scale_thermal_pressure().
+ arch_thermal_pressure_update() and arch_scale_thermal_pressure().
config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
--
2.17.1
On 10/15/21 3:45 PM, Lukasz Luba wrote:
> Hi all,
>
> This patch set v2 aims to refactor the thermal pressure update
> code. There are already two clients which do similar thing:
> convert the capped frequency value into the capacity of
> affected CPU and call the 'set' function to store the
> reduced capacity into the per-cpu variable.
> There might be more than two of these users. In near future
> it will be scmi-cpufreq driver, which receives notification
> from FW about reduced frequency due to thermal. Other vendors
> might follow. Let's avoid code duplication and potential
> conversion bugs. Move the conversion code into the arch_topology.c
> where the capacity calculation setup code and thermal pressure sit.
>
> Apart from that $subject patches, there is one patch (3/5) which fixes
> issue in qcom-cpufreq-hw.c when the thermal pressure is not
> updated for offline CPUs. It's similar fix that has been merged
> recently for cpufreq_cooling.c:
> 2ad8ccc17d1e4270cf65a3f2
>
> Changes:
> v2:
> - added Reviewed-by from Thara for patch 3/5
> - changed the doxygen comment and used mult_frac()
> according to Thara's suggestion in patch 1/5
> v1 -> [1]
>
Gentle ping.
Viresh, Daniel, Rafael could you have a look at this, please?
Regards,
Lukasz
On 15/10/2021 16:45, Lukasz Luba wrote:
> The thermal pressure is a mechanism which is used for providing
> information about reduced CPU performance to the scheduler. Usually code
> has to convert the value from frequency units into capacity units,
> which are understandable by the scheduler. Create a common conversion code
> which can be just used via a handy API.
>
> Signed-off-by: Lukasz Luba <[email protected]>
> ---
> arch/arm/include/asm/topology.h | 1 +
> arch/arm64/include/asm/topology.h | 1 +
> drivers/base/arch_topology.c | 36 ++++++++++++++++++++++++++++++-
> include/linux/arch_topology.h | 3 +++
> include/linux/sched/topology.h | 7 ++++++
> 5 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
> index 470299ee2fba..aee6c456c085 100644
> --- a/arch/arm/include/asm/topology.h
> +++ b/arch/arm/include/asm/topology.h
> @@ -24,6 +24,7 @@
> /* Replace task scheduler's default thermal pressure API */
> #define arch_scale_thermal_pressure topology_get_thermal_pressure
> #define arch_set_thermal_pressure topology_set_thermal_pressure
> +#define arch_thermal_pressure_update topology_thermal_pressure_update
>
> #else
>
> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> index ec2db3419c41..c997015402bc 100644
> --- a/arch/arm64/include/asm/topology.h
> +++ b/arch/arm64/include/asm/topology.h
> @@ -33,6 +33,7 @@ void update_freq_counters_refs(void);
> /* Replace task scheduler's default thermal pressure API */
> #define arch_scale_thermal_pressure topology_get_thermal_pressure
> #define arch_set_thermal_pressure topology_set_thermal_pressure
> +#define arch_thermal_pressure_update topology_thermal_pressure_update
s/thermal_pressure_update/update_thermal_pressure ?
The scheme seems to be {arch|topology}_*foo*_thermal_pressure
But ...
>
> #include <asm-generic/topology.h>
>
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 43407665918f..1fa28b5afdb2 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -25,6 +25,7 @@
> static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
> static struct cpumask scale_freq_counters_mask;
> static bool scale_freq_invariant;
> +static DEFINE_PER_CPU(u32, freq_factor) = 1;
>
> static bool supports_scale_freq_counters(const struct cpumask *cpus)
> {
> @@ -168,6 +169,40 @@ void topology_set_thermal_pressure(const struct cpumask *cpus,
> }
> EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
>
> +/**
> + * topology_thermal_pressure_update() - Update thermal pressure for CPUs
> + * @cpus : The related CPUs for which capacity has been reduced
> + * @capped_freq : The maximum allowed frequency that CPUs can run at
> + *
> + * Update the value of thermal pressure for all @cpus in the mask. The
> + * cpumask should include all (online+offline) affected CPUs, to avoid
> + * operating on stale data when hot-plug is used for some CPUs. The
> + * @capped_freq must be less or equal to the max possible frequency and
> + * reflects the currently allowed max CPUs frequency due to thermal capping.
> + * The @capped_freq must be provided in kHz.
> + */
> +void topology_thermal_pressure_update(const struct cpumask *cpus,
> + unsigned long capped_freq)
> +{
... why not just s/unsigned long th_pressure/unsigned long capped_freq
in existing topology_set_thermal_pressure() and move code the
frequency/capacity conversion in there? The patch set will become
considerably smaller.
void topology_set_thermal_pressure(const struct cpumask *cpus,
- unsigned long th_pressure)
+ unsigned long capped_freq)
{
+ unsigned long max_capacity, capacity;
int cpu;
- for_each_cpu(cpu, cpus)
- WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
+ if (!cpus)
+ return;
+
+ cpu = cpumask_first(cpus);
+ max_capacity = arch_scale_cpu_capacity(cpu);
+
+ /* Convert to MHz scale which is used in 'freq_factor' */
+ capped_freq /= 1000;
+
+ capacity = mult_frac(capped_freq, max_capacity,
+ per_cpu(freq_factor, cpu));
+
+ for_each_cpu(cpu, cpus) {
+ WRITE_ONCE(per_cpu(thermal_pressure, cpu),
+ max_capacity - capacity);
+ }
}
EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
And a user like [drivers/thermal/cpufreq_cooling.c] can call
arch_set_thermal_pressure(cpus, frequency).
[...]
On 15/10/2021 16:45, Lukasz Luba wrote:
[...]
> @@ -479,10 +478,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
> if (ret >= 0) {
> cpufreq_cdev->cpufreq_state = state;
> cpus = cpufreq_cdev->policy->related_cpus;
> - max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
> - capacity = frequency * max_capacity;
> - capacity /= cpufreq_cdev->policy->cpuinfo.max_freq;
Took me a while to realize that `cpufreq_cdev->policy->cpuinfo.max_freq`
is 1000 * per_cpu(freq_factor, cpu), the latter being used now in
arch_thermal_pressure_update(). Maybe worth mentioning in the patch header?
On 10/26/21 5:51 PM, Dietmar Eggemann wrote:
> On 15/10/2021 16:45, Lukasz Luba wrote:
>
> [...]
>
>> @@ -479,10 +478,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
>> if (ret >= 0) {
>> cpufreq_cdev->cpufreq_state = state;
>> cpus = cpufreq_cdev->policy->related_cpus;
>> - max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
>> - capacity = frequency * max_capacity;
>> - capacity /= cpufreq_cdev->policy->cpuinfo.max_freq;
>
> Took me a while to realize that `cpufreq_cdev->policy->cpuinfo.max_freq`
> is 1000 * per_cpu(freq_factor, cpu), the latter being used now in
> arch_thermal_pressure_update(). Maybe worth mentioning in the patch header?
>
OK, I will put that information into the patch description in the next
version.
On 27/10/2021 10:56, Lukasz Luba wrote:
> Hi Dietmar,
>
> Thank you for having a look at this.
>
> On 10/26/21 5:51 PM, Dietmar Eggemann wrote:
>> On 15/10/2021 16:45, Lukasz Luba wrote:
[...]
>>> +void topology_thermal_pressure_update(const struct cpumask *cpus,
>>> + unsigned long capped_freq)
>>> +{
>>
>> ... why not just s/unsigned long th_pressure/unsigned long capped_freq
>> in existing topology_set_thermal_pressure() and move code the
>> frequency/capacity conversion in there? The patch set will become
>> considerably smaller.
>
> I've been trying to avoid confusion when changing actually behavior
> of the API function. Thus, introducing new would IMO opinion
> make sure the old 'set' function was expecting proper pressure
> value, while the new 'update' expects frequency.
>
> I agree that the patch set would be smaller in that case, but I'm
> not sure if that would not hide some issues. This one would
> definitely break compilation of some vendor modules (or drivers
> queuing or under review), not silently passing them through (with wrong
> argument).
I see, since the parameter type list would stay the same, this could
potentially happen.
[...]
On Fri 15 Oct 07:45 PDT 2021, Lukasz Luba wrote:
> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
[..]
> +/**
> + * topology_thermal_pressure_update() - Update thermal pressure for CPUs
> + * @cpus : The related CPUs for which capacity has been reduced
> + * @capped_freq : The maximum allowed frequency that CPUs can run at
I know this matches what I see in e.g. the Qualcomm cpufreq hw driver,
but in what cases will @capped_freq differ from
cpufreq_get_hw_max_freq(cpumask_first(cpus))?
Regards,
Bjorn
> + *
> + * Update the value of thermal pressure for all @cpus in the mask. The
> + * cpumask should include all (online+offline) affected CPUs, to avoid
> + * operating on stale data when hot-plug is used for some CPUs. The
> + * @capped_freq must be less or equal to the max possible frequency and
> + * reflects the currently allowed max CPUs frequency due to thermal capping.
> + * The @capped_freq must be provided in kHz.
> + */
> +void topology_thermal_pressure_update(const struct cpumask *cpus,
> + unsigned long capped_freq)
> +{
> + unsigned long max_capacity, capacity;
> + int cpu;
> +
> + if (!cpus)
> + return;
> +
> + cpu = cpumask_first(cpus);
> + max_capacity = arch_scale_cpu_capacity(cpu);
> +
> + /* Convert to MHz scale which is used in 'freq_factor' */
> + capped_freq /= 1000;
> +
> + capacity = mult_frac(capped_freq, max_capacity,
> + per_cpu(freq_factor, cpu));
> +
> + arch_set_thermal_pressure(cpus, max_capacity - capacity);
> +}
> +EXPORT_SYMBOL_GPL(topology_thermal_pressure_update);
> +
> static ssize_t cpu_capacity_show(struct device *dev,
> struct device_attribute *attr,
> char *buf)
> @@ -220,7 +255,6 @@ static void update_topology_flags_workfn(struct work_struct *work)
> update_topology = 0;
> }
>
> -static DEFINE_PER_CPU(u32, freq_factor) = 1;
> static u32 *raw_capacity;
>
> static int free_raw_capacity(void)
Hi Dietmar,
Thank you for having a look at this.
On 10/26/21 5:51 PM, Dietmar Eggemann wrote:
> On 15/10/2021 16:45, Lukasz Luba wrote:
[snip]
>> +#define arch_thermal_pressure_update topology_thermal_pressure_update
>
> s/thermal_pressure_update/update_thermal_pressure ?
I can reorder that naming.
>
> The scheme seems to be {arch|topology}_*foo*_thermal_pressure
>
> But ...
>
>>
[snip]
>> +void topology_thermal_pressure_update(const struct cpumask *cpus,
>> + unsigned long capped_freq)
>> +{
>
> ... why not just s/unsigned long th_pressure/unsigned long capped_freq
> in existing topology_set_thermal_pressure() and move code the
> frequency/capacity conversion in there? The patch set will become
> considerably smaller.
I've been trying to avoid confusion when changing actually behavior
of the API function. Thus, introducing new would IMO opinion
make sure the old 'set' function was expecting proper pressure
value, while the new 'update' expects frequency.
I agree that the patch set would be smaller in that case, but I'm
not sure if that would not hide some issues. This one would
definitely break compilation of some vendor modules (or drivers
queuing or under review), not silently passing them through (with wrong
argument).
>
> void topology_set_thermal_pressure(const struct cpumask *cpus,
> - unsigned long th_pressure)
> + unsigned long capped_freq)
[snip]
> EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
>
> And a user like [drivers/thermal/cpufreq_cooling.c] can call
> arch_set_thermal_pressure(cpus, frequency).
>
> [...]
>
I'm not sure if that is a safe way.
Regards,
Lukasz
On 15-10-21, 15:45, Lukasz Luba wrote:
> +/**
> + * topology_thermal_pressure_update() - Update thermal pressure for CPUs
> + * @cpus : The related CPUs for which capacity has been reduced
> + * @capped_freq : The maximum allowed frequency that CPUs can run at
Maybe replace tabs with spaces here ?
> + *
> + * Update the value of thermal pressure for all @cpus in the mask. The
> + * cpumask should include all (online+offline) affected CPUs, to avoid
> + * operating on stale data when hot-plug is used for some CPUs. The
> + * @capped_freq must be less or equal to the max possible frequency and
> + * reflects the currently allowed max CPUs frequency due to thermal capping.
> + * The @capped_freq must be provided in kHz.
> + */
> +void topology_thermal_pressure_update(const struct cpumask *cpus,
> + unsigned long capped_freq)
> +{
> + unsigned long max_capacity, capacity;
> + int cpu;
> +
> + if (!cpus)
I will drop this and let the kernel crash :)
> + return;
> +
> + cpu = cpumask_first(cpus);
> + max_capacity = arch_scale_cpu_capacity(cpu);
> +
> + /* Convert to MHz scale which is used in 'freq_factor' */
> + capped_freq /= 1000;
We should make sure capped_freq > freq_factor and WARN if not. This will also
get rid of similar checks at the users.
> +
> + capacity = mult_frac(capped_freq, max_capacity,
> + per_cpu(freq_factor, cpu));
> +
> + arch_set_thermal_pressure(cpus, max_capacity - capacity);
> +}
> +EXPORT_SYMBOL_GPL(topology_thermal_pressure_update);
--
viresh
On 10/27/21 7:43 PM, Bjorn Andersson wrote:
> On Fri 15 Oct 07:45 PDT 2021, Lukasz Luba wrote:
>> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> [..]
>> +/**
>> + * topology_thermal_pressure_update() - Update thermal pressure for CPUs
>> + * @cpus : The related CPUs for which capacity has been reduced
>> + * @capped_freq : The maximum allowed frequency that CPUs can run at
>
> I know this matches what I see in e.g. the Qualcomm cpufreq hw driver,
> but in what cases will @capped_freq differ from
> cpufreq_get_hw_max_freq(cpumask_first(cpus))?
The @capped_freq is the maximum allowed frequency value due to
thermal reasons, which will always be lower or equal to the value
returned by cpufreq_get_hw_max_freq()
(effectively: 'policy->cpuinfo.max_freq').
We limit the frequency (and voltage) of CPU to reduce power (and heat)
in the passive cooling system. That information is important to us,
because scheduler needs to know how fast the CPU can go. It cannot
assume that the speed is always 'policy->cpuinfo.max_freq'. Often
it's less then that at heavy load or GPU heavy load (the same SoC).
Regards,
Lukasz
On 10/28/21 6:44 AM, Viresh Kumar wrote:
> On 15-10-21, 15:45, Lukasz Luba wrote:
>> +/**
>> + * topology_thermal_pressure_update() - Update thermal pressure for CPUs
>> + * @cpus : The related CPUs for which capacity has been reduced
>> + * @capped_freq : The maximum allowed frequency that CPUs can run at
>
> Maybe replace tabs with spaces here ?
Sure
>
>> + *
>> + * Update the value of thermal pressure for all @cpus in the mask. The
>> + * cpumask should include all (online+offline) affected CPUs, to avoid
>> + * operating on stale data when hot-plug is used for some CPUs. The
>> + * @capped_freq must be less or equal to the max possible frequency and
>> + * reflects the currently allowed max CPUs frequency due to thermal capping.
>> + * The @capped_freq must be provided in kHz.
>> + */
>> +void topology_thermal_pressure_update(const struct cpumask *cpus,
>> + unsigned long capped_freq)
>> +{
>> + unsigned long max_capacity, capacity;
>> + int cpu;
>> +
>> + if (!cpus)
>
> I will drop this and let the kernel crash :)
OK :)
>
>> + return;
>> +
>> + cpu = cpumask_first(cpus);
>> + max_capacity = arch_scale_cpu_capacity(cpu);
>> +
>> + /* Convert to MHz scale which is used in 'freq_factor' */
>> + capped_freq /= 1000;
>
> We should make sure capped_freq > freq_factor and WARN if not. This will also
> get rid of similar checks at the users.
OK, I'll change that.
Thank you for the review.
Regards,
Lukasz
On Thu 28 Oct 00:16 PDT 2021, Lukasz Luba wrote:
>
>
> On 10/27/21 7:43 PM, Bjorn Andersson wrote:
> > On Fri 15 Oct 07:45 PDT 2021, Lukasz Luba wrote:
> > > diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> > [..]
> > > +/**
> > > + * topology_thermal_pressure_update() - Update thermal pressure for CPUs
> > > + * @cpus : The related CPUs for which capacity has been reduced
> > > + * @capped_freq : The maximum allowed frequency that CPUs can run at
> >
> > I know this matches what I see in e.g. the Qualcomm cpufreq hw driver,
> > but in what cases will @capped_freq differ from
> > cpufreq_get_hw_max_freq(cpumask_first(cpus))?
>
> The @capped_freq is the maximum allowed frequency value due to
> thermal reasons, which will always be lower or equal to the value
> returned by cpufreq_get_hw_max_freq()
> (effectively: 'policy->cpuinfo.max_freq').
>
Read patch 3 and 4 again and now this makes sense to me.
Thanks,
Bjorn
> We limit the frequency (and voltage) of CPU to reduce power (and heat)
> in the passive cooling system. That information is important to us,
> because scheduler needs to know how fast the CPU can go. It cannot
> assume that the speed is always 'policy->cpuinfo.max_freq'. Often
> it's less then that at heavy load or GPU heavy load (the same SoC).
>
> Regards,
> Lukasz