2021-11-03 16:12:18

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication

Hi all,

This patch set v3 aims to refactor the thermal pressure update
code. There are already two clients which do similar thing:
convert the capped frequency value into the capacity of
affected CPU and call the 'set' function to store the
reduced capacity into the per-cpu variable.
There might be more than two of these users. In near future
it will be scmi-cpufreq driver, which receives notification
from FW about reduced frequency due to thermal. Other vendors
might follow. Let's avoid code duplication and potential
conversion bugs. Move the conversion code into the arch_topology.c
where the capacity calculation setup code and thermal pressure sit.

Apart from that $subject patches, there is one patch (3/5) which fixes
issue in qcom-cpufreq-hw.c when the thermal pressure is not
updated for offline CPUs. It's similar fix that has been merged
recently for cpufreq_cooling.c:
2ad8ccc17d1e4270cf65a3f2

Changes:
v3:
- added warning and check if provided capped frequency is lower than
max (Viresh)
- removed check for empty cpu mask (Viresh)
- replaced tabs with spaces in the doxygen comment (Viresh)
- renamed {arch|topology}_thermal_pressure_update() to
{arch|topology}_update_thermal_pressure() so it's align with scheme (Dietmar)
- added info about MHz in freq_factor into patch description (Dietmar)
v2 [2]:
- added Reviewed-by from Thara for patch 3/5
- changed the doxygen comment and used mult_frac()
according to Thara's suggestion in patch 1/5
v1 -> [1]

Regards,
Lukasz Luba

[1] https://lore.kernel.org/linux-pm/[email protected]/
[2] https://lore.kernel.org/linux-pm/[email protected]/

Lukasz Luba (5):
arch_topology: Introduce thermal pressure update function
thermal: cpufreq_cooling: Use new thermal pressure update function
cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function
arch_topology: Remove unused topology_set_thermal_pressure() and
related

arch/arm/include/asm/topology.h | 2 +-
arch/arm64/include/asm/topology.h | 2 +-
drivers/base/arch_topology.c | 36 +++++++++++++++++++++++++++----
drivers/cpufreq/qcom-cpufreq-hw.c | 14 +++++-------
drivers/thermal/cpufreq_cooling.c | 6 +-----
include/linux/arch_topology.h | 4 ++--
include/linux/sched/topology.h | 6 +++---
init/Kconfig | 2 +-
8 files changed, 46 insertions(+), 26 deletions(-)

--
2.17.1


2021-11-03 16:12:49

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 3/5] cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure

The thermal pressure signal gives information to the scheduler about
reduced CPU capacity due to thermal. It is based on a value stored in
a per-cpu 'thermal_pressure' variable. The online CPUs will get the
new value there, while the offline won't. Unfortunately, when the CPU
is back online, the value read from per-cpu variable might be wrong
(stale data). This might affect the scheduler decisions, since it
sees the CPU capacity differently than what is actually available.

Fix it by making sure that all online+offline CPUs would get the
proper value in their per-cpu variable when there is throttling
or throttling is removed.

Fixes: 275157b367f479 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Reviewed-by: Thara Gopinath <[email protected]>
Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index a2be0df7e174..0138b2ec406d 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -304,7 +304,8 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
if (capacity > max_capacity)
capacity = max_capacity;

- arch_set_thermal_pressure(policy->cpus, max_capacity - capacity);
+ arch_set_thermal_pressure(policy->related_cpus,
+ max_capacity - capacity);

/*
* In the unlikely case policy is unregistered do not enable
--
2.17.1

2021-11-03 16:13:37

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 2/5] thermal: cpufreq_cooling: Use new thermal pressure update function

Thermal pressure provides a new API, which allows to use CPU frequency
as an argument. That removes the need of local conversion to capacity.
Use this new function and remove old conversion code.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/cpufreq_cooling.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index 43b1ae8a7789..0bfb8eebd126 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -462,7 +462,6 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
struct cpumask *cpus;
unsigned int frequency;
- unsigned long max_capacity, capacity;
int ret;

/* Request state should be less than max_level */
@@ -479,10 +478,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
if (ret >= 0) {
cpufreq_cdev->cpufreq_state = state;
cpus = cpufreq_cdev->policy->related_cpus;
- max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
- capacity = frequency * max_capacity;
- capacity /= cpufreq_cdev->policy->cpuinfo.max_freq;
- arch_set_thermal_pressure(cpus, max_capacity - capacity);
+ arch_update_thermal_pressure(cpus, frequency);
ret = 0;
}

--
2.17.1

2021-11-03 16:13:54

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 4/5] cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function

Thermal pressure provides a new API, which allows to use CPU frequency
as an argument. That removes the need of local conversion to capacity.
Use this new API and remove old local conversion code.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 0138b2ec406d..425f351450ad 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -275,10 +275,10 @@ static unsigned int qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)

static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
{
- unsigned long max_capacity, capacity, freq_hz, throttled_freq;
struct cpufreq_policy *policy = data->policy;
int cpu = cpumask_first(policy->cpus);
struct device *dev = get_cpu_device(cpu);
+ unsigned long freq_hz, throttled_freq;
struct dev_pm_opp *opp;
unsigned int freq;

@@ -295,17 +295,12 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)

throttled_freq = freq_hz / HZ_PER_KHZ;

- /* Update thermal pressure */
-
- max_capacity = arch_scale_cpu_capacity(cpu);
- capacity = mult_frac(max_capacity, throttled_freq, policy->cpuinfo.max_freq);
-
/* Don't pass boost capacity to scheduler */
- if (capacity > max_capacity)
- capacity = max_capacity;
+ if (throttled_freq > policy->cpuinfo.max_freq)
+ throttled_freq = policy->cpuinfo.max_freq;

- arch_set_thermal_pressure(policy->related_cpus,
- max_capacity - capacity);
+ /* Update thermal pressure */
+ arch_update_thermal_pressure(policy->related_cpus, throttled_freq);

/*
* In the unlikely case policy is unregistered do not enable
--
2.17.1

2021-11-03 16:14:02

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 5/5] arch_topology: Remove unused topology_set_thermal_pressure() and related

There is no need of this function (and related) since code has been
converted to use the new arch_thermal_pressure_update() API. The old
code can be removed.

Signed-off-by: Lukasz Luba <[email protected]>
---
arch/arm/include/asm/topology.h | 1 -
arch/arm64/include/asm/topology.h | 1 -
drivers/base/arch_topology.c | 17 +++++------------
include/linux/arch_topology.h | 3 ---
include/linux/sched/topology.h | 7 -------
init/Kconfig | 2 +-
6 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index f1eafacc9a30..c7d2510e5a78 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -23,7 +23,6 @@

/* Replace task scheduler's default thermal pressure API */
#define arch_scale_thermal_pressure topology_get_thermal_pressure
-#define arch_set_thermal_pressure topology_set_thermal_pressure
#define arch_update_thermal_pressure topology_update_thermal_pressure

#else
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 7a421cbc99ed..f386b90a79c8 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -32,7 +32,6 @@ void update_freq_counters_refs(void);

/* Replace task scheduler's default thermal pressure API */
#define arch_scale_thermal_pressure topology_get_thermal_pressure
-#define arch_set_thermal_pressure topology_set_thermal_pressure
#define arch_update_thermal_pressure topology_update_thermal_pressure

#include <asm-generic/topology.h>
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index d66aa46b7e1a..db18d79065fe 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -159,16 +159,6 @@ void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)

DEFINE_PER_CPU(unsigned long, thermal_pressure);

-void topology_set_thermal_pressure(const struct cpumask *cpus,
- unsigned long th_pressure)
-{
- int cpu;
-
- for_each_cpu(cpu, cpus)
- WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
-}
-EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
-
/**
* topology_update_thermal_pressure() - Update thermal pressure for CPUs
* @cpus : The related CPUs for which capacity has been reduced
@@ -184,7 +174,7 @@ EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
void topology_update_thermal_pressure(const struct cpumask *cpus,
unsigned long capped_freq)
{
- unsigned long max_capacity, capacity;
+ unsigned long max_capacity, capacity, th_pressure;
u32 max_freq;
int cpu;

@@ -200,7 +190,10 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,

capacity = mult_frac(capped_freq, max_capacity, max_freq);

- arch_set_thermal_pressure(cpus, max_capacity - capacity);
+ th_pressure = max_capacity - capacity;
+
+ for_each_cpu(cpu, cpus)
+ WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
}
EXPORT_SYMBOL_GPL(topology_update_thermal_pressure);

diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index ace1e5dcf773..cce6136b300a 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -56,9 +56,6 @@ static inline unsigned long topology_get_thermal_pressure(int cpu)
return per_cpu(thermal_pressure, cpu);
}

-void topology_set_thermal_pressure(const struct cpumask *cpus,
- unsigned long th_pressure);
-
void topology_update_thermal_pressure(const struct cpumask *cpus,
unsigned long capped_freq);

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 6e89a8e43aa7..8054641c0a7b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -266,13 +266,6 @@ unsigned long arch_scale_thermal_pressure(int cpu)
}
#endif

-#ifndef arch_set_thermal_pressure
-static __always_inline
-void arch_set_thermal_pressure(const struct cpumask *cpus,
- unsigned long th_pressure)
-{ }
-#endif
-
#ifndef arch_update_thermal_pressure
static __always_inline
void arch_update_thermal_pressure(const struct cpumask *cpus,
diff --git a/init/Kconfig b/init/Kconfig
index 11f8a845f259..b5d6eeb86275 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -550,7 +550,7 @@ config SCHED_THERMAL_PRESSURE
i.e. put less load on throttled CPUs than on non/less throttled ones.

This requires the architecture to implement
- arch_set_thermal_pressure() and arch_scale_thermal_pressure().
+ arch_update_thermal_pressure() and arch_scale_thermal_pressure().

config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
--
2.17.1

2021-11-05 16:12:24

by Steev Klimaszewski

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication

Hi Lukasz,

On 11/3/21 11:10 AM, Lukasz Luba wrote:
> Hi all,
>
> This patch set v3 aims to refactor the thermal pressure update
> code. There are already two clients which do similar thing:
> convert the capped frequency value into the capacity of
> affected CPU and call the 'set' function to store the
> reduced capacity into the per-cpu variable.
> There might be more than two of these users. In near future
> it will be scmi-cpufreq driver, which receives notification
> from FW about reduced frequency due to thermal. Other vendors
> might follow. Let's avoid code duplication and potential
> conversion bugs. Move the conversion code into the arch_topology.c
> where the capacity calculation setup code and thermal pressure sit.
>
> Apart from that $subject patches, there is one patch (3/5) which fixes
> issue in qcom-cpufreq-hw.c when the thermal pressure is not
> updated for offline CPUs. It's similar fix that has been merged
> recently for cpufreq_cooling.c:
> 2ad8ccc17d1e4270cf65a3f2
>
> Changes:
> v3:
> - added warning and check if provided capped frequency is lower than
> max (Viresh)
> - removed check for empty cpu mask (Viresh)
> - replaced tabs with spaces in the doxygen comment (Viresh)
> - renamed {arch|topology}_thermal_pressure_update() to
> {arch|topology}_update_thermal_pressure() so it's align with scheme (Dietmar)
> - added info about MHz in freq_factor into patch description (Dietmar)
> v2 [2]:
> - added Reviewed-by from Thara for patch 3/5
> - changed the doxygen comment and used mult_frac()
> according to Thara's suggestion in patch 1/5
> v1 -> [1]
>
> Regards,
> Lukasz Luba
>
> [1] https://lore.kernel.org/linux-pm/[email protected]/
> [2] https://lore.kernel.org/linux-pm/[email protected]/
>
> Lukasz Luba (5):
> arch_topology: Introduce thermal pressure update function
> thermal: cpufreq_cooling: Use new thermal pressure update function
> cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
> cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function
> arch_topology: Remove unused topology_set_thermal_pressure() and
> related
>
> arch/arm/include/asm/topology.h | 2 +-
> arch/arm64/include/asm/topology.h | 2 +-
> drivers/base/arch_topology.c | 36 +++++++++++++++++++++++++++----
> drivers/cpufreq/qcom-cpufreq-hw.c | 14 +++++-------
> drivers/thermal/cpufreq_cooling.c | 6 +-----
> include/linux/arch_topology.h | 4 ++--
> include/linux/sched/topology.h | 6 +++---
> init/Kconfig | 2 +-
> 8 files changed, 46 insertions(+), 26 deletions(-)
>
I've been testing this patchset on the Lenovo Yoga C630, and today while
compiling alacritty and running an apt-get full-upgrade, I found the
following in dmesg output:

[  194.343903] ------------[ cut here ]------------
[  194.343912] WARNING: CPU: 4 PID: 192 at
drivers/base/arch_topology.c:188 topology_update_thermal_pressure+0xe4/0x100
[  194.343928] Modules linked in: aes_ce_ccm snd_seq_dummy snd_hrtimer
snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep
cpufreq_ondemand cpufreq_conservative cpufreq_powersave
cpufreq_userspace lz4 lz4_compress zram zsmalloc q6asm_dai q6routing
q6afe_dai q6adm q6asm q6afe q6dsp_common snd_soc_wsa881x q6core
regmap_sdw snd_soc_wcd934x gpio_wcd934x soundwire_qcom snd_soc_wcd_mbhc
wcd934x regmap_slimbus uvcvideo videobuf2_vmalloc venus_enc venus_dec
videobuf2_dma_contig videobuf2_memops qrtr_smd fastrpc apr binfmt_misc
nls_ascii nls_cp437 vfat fat snd_soc_sdm845 snd_soc_rt5663
snd_soc_qcom_common pm8941_pwrkey joydev snd_soc_rl6231 aes_ce_blk
qcom_spmi_adc5 soundwire_bus qcom_vadc_common crypto_simd
qcom_spmi_temp_alarm snd_soc_core cryptd hci_uart snd_compress btqca
industrialio aes_ce_cipher snd_pcm_dmaengine btrtl crct10dif_ce btbcm
ghash_ce snd_pcm btintel gf128mul sha2_ce snd_timer venus_core bluetooth
snd v4l2_mem2mem sha256_arm64 videobuf2_v4l2 videobuf2_common soundcore
[  194.344007]  sha1_ce videodev ecdh_generic ecc mc ath10k_snoc
ath10k_core hid_multitouch ath mac80211 qcom_rng libarc4 qcom_q6v5_mss
cfg80211 sg rfkill qcom_q6v5_pas qcom_pil_info slim_qcom_ngd_ctrl
qcom_wdt pdr_interface qcom_q6v5 evdev rmtfs_mem slimbus qcom_sysmon
fuse configfs qrtr ip_tables x_tables autofs4 ext4 mbcache jbd2
panel_simple rtc_pm8xxx msm llcc_qcom ocmem gpu_sched ti_sn65dsi86
drm_dp_aux_bus drm_kms_helper drm camcc_sdm845 ipa qcom_common
qmi_helpers mdt_loader gpio_keys pwm_bl
[  194.344056] CPU: 4 PID: 192 Comm: kworker/4:1H Not tainted 5.15.0 #9
[  194.344060] Hardware name: LENOVO 81JL/LNVNB161216, BIOS
9UCN33WW(V2.06) 06/ 4/2019
[  194.344062] Workqueue: events_highpri qcom_lmh_dcvs_poll
[  194.344068] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[  194.344070] pc : topology_update_thermal_pressure+0xe4/0x100
[  194.344073] lr : topology_update_thermal_pressure+0x30/0x100
[  194.344075] sp : ffff800014043d10
[  194.344076] x29: ffff800014043d10 x28: 0000000000000000 x27:
0000000000000000
[  194.344080] x26: ffff759c40b66974 x25: ffff759db37a2405 x24:
ffff759c40e83358
[  194.344084] x23: 0000000000000000 x22: ffffb2699eafe1d8 x21:
00000000002d1e00
[  194.344087] x20: ffff759c49f5bc20 x19: 000000000000b8cc x18:
0000000000000000
[  194.344090] x17: 2f756e672d78756e x16: ffffb2699d7d83d0 x15:
0000000000000000
[  194.344093] x14: 0000000000000000 x13: 0000000000000030 x12:
0101010101010101
[  194.344096] x11: 7f7f7f7f7f7f7f7f x10: feff68716f676668 x9 :
ffffb2699dd66a58
[  194.344099] x8 : fefefefefefefeff x7 : 000000000000000f x6 :
0000000000000002
[  194.344102] x5 : ffffc33415124000 x4 : 0000000000000400 x3 :
0000000000000b19
[  194.344105] x2 : 0000000000000b8c x1 : ffffb2699e678f40 x0 :
ffffb2699e678f48
[  194.344108] Call trace:
[  194.344110]  topology_update_thermal_pressure+0xe4/0x100
[  194.344113]  qcom_lmh_dcvs_notify+0xc8/0x160
[  194.344115]  qcom_lmh_dcvs_poll+0x20/0x2c
[  194.344116]  process_one_work+0x1f4/0x490
[  194.344120]  worker_thread+0x188/0x504
[  194.344121]  kthread+0x12c/0x140
[  194.344125]  ret_from_fork+0x10/0x20
[  194.344128] ---[ end trace bd0039c4fb892d5b ]---
[  194.344170] ------------[ cut here ]------------
[  194.344171] WARNING: CPU: 4 PID: 161 at
drivers/base/arch_topology.c:188 topology_update_thermal_pressure+0xe4/0x100
[  194.344175] Modules linked in: aes_ce_ccm snd_seq_dummy snd_hrtimer
snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep
cpufreq_ondemand cpufreq_conservative cpufreq_powersave
cpufreq_userspace lz4 lz4_compress zram zsmalloc q6asm_dai q6routing
q6afe_dai q6adm q6asm q6afe q6dsp_common snd_soc_wsa881x q6core
regmap_sdw snd_soc_wcd934x gpio_wcd934x soundwire_qcom snd_soc_wcd_mbhc
wcd934x regmap_slimbus uvcvideo videobuf2_vmalloc venus_enc venus_dec
videobuf2_dma_contig videobuf2_memops qrtr_smd fastrpc apr binfmt_misc
nls_ascii nls_cp437 vfat fat snd_soc_sdm845 snd_soc_rt5663
snd_soc_qcom_common pm8941_pwrkey joydev snd_soc_rl6231 aes_ce_blk
qcom_spmi_adc5 soundwire_bus qcom_vadc_common crypto_simd
qcom_spmi_temp_alarm snd_soc_core cryptd hci_uart snd_compress btqca
industrialio aes_ce_cipher snd_pcm_dmaengine btrtl crct10dif_ce btbcm
ghash_ce snd_pcm btintel gf128mul sha2_ce snd_timer venus_core bluetooth
snd v4l2_mem2mem sha256_arm64 videobuf2_v4l2 videobuf2_common soundcore
[  194.344225]  sha1_ce videodev ecdh_generic ecc mc ath10k_snoc
ath10k_core hid_multitouch ath mac80211 qcom_rng libarc4 qcom_q6v5_mss
cfg80211 sg rfkill qcom_q6v5_pas qcom_pil_info slim_qcom_ngd_ctrl
qcom_wdt pdr_interface qcom_q6v5 evdev rmtfs_mem slimbus qcom_sysmon
fuse configfs qrtr ip_tables x_tables autofs4 ext4 mbcache jbd2
panel_simple rtc_pm8xxx msm llcc_qcom ocmem gpu_sched ti_sn65dsi86
drm_dp_aux_bus drm_kms_helper drm camcc_sdm845 ipa qcom_common
qmi_helpers mdt_loader gpio_keys pwm_bl
[  194.344256] CPU: 4 PID: 161 Comm: irq/159-dcvsh-i Tainted: G       
W         5.15.0 #9
[  194.344259] Hardware name: LENOVO 81JL/LNVNB161216, BIOS
9UCN33WW(V2.06) 06/ 4/2019
[  194.344260] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[  194.344262] pc : topology_update_thermal_pressure+0xe4/0x100
[  194.344264] lr : topology_update_thermal_pressure+0x30/0x100
[  194.344266] sp : ffff800011e93cf0
[  194.344267] x29: ffff800011e93cf0 x28: 0000000000000000 x27:
0000000000000000
[  194.344270] x26: ffffb2699d4d0a80 x25: ffffb2699d4d0b60 x24:
ffff759c40caec00
[  194.344273] x23: ffff759c49f02c40 x22: ffffb2699eafe1d8 x21:
00000000002d1e00
[  194.344276] x20: ffff759c49f5bc20 x19: 000000000000b8cc x18:
000000000000000e
[  194.344280] x17: 0000000000000001 x16: 0000000000000019 x15:
0000000000000033
[  194.344283] x14: 000000000000004c x13: 0000000000000068 x12:
0000000000000040
[  194.344286] x11: ffff759c4049f478 x10: ffff759c4049f47a x9 :
ffffb2699dd66a58
[  194.344288] x8 : ffff759c40401dd0 x7 : 0000000000000000 x6 :
0000000000000002
[  194.344291] x5 : ffffc33415124000 x4 : 0000000000000400 x3 :
0000000000000b19
[  194.344294] x2 : 0000000000000b8c x1 : ffffb2699e678f40 x0 :
ffffb2699e678f48
[  194.344297] Call trace:
[  194.344298]  topology_update_thermal_pressure+0xe4/0x100
[  194.344300]  qcom_lmh_dcvs_notify+0xc8/0x160
[  194.344302]  qcom_lmh_dcvs_handle_irq+0x30/0x44
[  194.344304]  irq_thread_fn+0x38/0xb0
[  194.344307]  irq_thread+0x194/0x3b0
[  194.344308]  kthread+0x12c/0x140
[  194.344311]  ret_from_fork+0x10/0x20
[  194.344313] ---[ end trace bd0039c4fb892d5c ]---
[  194.405913] ------------[ cut here ]------------
[  194.405923] WARNING: CPU: 4 PID: 192 at
drivers/base/arch_topology.c:188 topology_update_thermal_pressure+0xe4/0x100
[  194.405939] Modules linked in: aes_ce_ccm snd_seq_dummy snd_hrtimer
snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep
cpufreq_ondemand cpufreq_conservative cpufreq_powersave
cpufreq_userspace lz4 lz4_compress zram zsmalloc q6asm_dai q6routing
q6afe_dai q6adm q6asm q6afe q6dsp_common snd_soc_wsa881x q6core
regmap_sdw snd_soc_wcd934x gpio_wcd934x soundwire_qcom snd_soc_wcd_mbhc
wcd934x regmap_slimbus uvcvideo videobuf2_vmalloc venus_enc venus_dec
videobuf2_dma_contig videobuf2_memops qrtr_smd fastrpc apr binfmt_misc
nls_ascii nls_cp437 vfat fat snd_soc_sdm845 snd_soc_rt5663
snd_soc_qcom_common pm8941_pwrkey joydev snd_soc_rl6231 aes_ce_blk
qcom_spmi_adc5 soundwire_bus qcom_vadc_common crypto_simd
qcom_spmi_temp_alarm snd_soc_core cryptd hci_uart snd_compress btqca
industrialio aes_ce_cipher snd_pcm_dmaengine btrtl crct10dif_ce btbcm
ghash_ce snd_pcm btintel gf128mul sha2_ce snd_timer venus_core bluetooth
snd v4l2_mem2mem sha256_arm64 videobuf2_v4l2 videobuf2_common soundcore
[  194.406038]  sha1_ce videodev ecdh_generic ecc mc ath10k_snoc
ath10k_core hid_multitouch ath mac80211 qcom_rng libarc4 qcom_q6v5_mss
cfg80211 sg rfkill qcom_q6v5_pas qcom_pil_info slim_qcom_ngd_ctrl
qcom_wdt pdr_interface qcom_q6v5 evdev rmtfs_mem slimbus qcom_sysmon
fuse configfs qrtr ip_tables x_tables autofs4 ext4 mbcache jbd2
panel_simple rtc_pm8xxx msm llcc_qcom ocmem gpu_sched ti_sn65dsi86
drm_dp_aux_bus drm_kms_helper drm camcc_sdm845 ipa qcom_common
qmi_helpers mdt_loader gpio_keys pwm_bl
[  194.406089] CPU: 4 PID: 192 Comm: kworker/4:1H Tainted: G       
W         5.15.0 #9
[  194.406092] Hardware name: LENOVO 81JL/LNVNB161216, BIOS
9UCN33WW(V2.06) 06/ 4/2019
[  194.406094] Workqueue: events_highpri qcom_lmh_dcvs_poll
[  194.406101] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[  194.406104] pc : topology_update_thermal_pressure+0xe4/0x100
[  194.406107] lr : topology_update_thermal_pressure+0x30/0x100
[  194.406109] sp : ffff800014043d10
[  194.406111] x29: ffff800014043d10 x28: 0000000000000000 x27:
0000000000000000
[  194.406114] x26: ffff759c40b66974 x25: ffff759db37a2405 x24:
ffff759c40e83358
[  194.406117] x23: 0000000000000000 x22: ffffb2699eafe1d8 x21:
00000000002d1e00
[  194.406120] x20: ffff759c49f5bc20 x19: 000000000000b8cc x18:
0000000000000000
[  194.406123] x17: ffffc33415124000 x16: ffff800010024000 x15:
0000000000004000
[  194.406126] x14: 0000000000000000 x13: 0000000000000030 x12:
0101010101010101
[  194.406129] x11: 7f7f7f7f7f7f7f7f x10: feff68716f676668 x9 :
ffffb2699dd66a58
[  194.406132] x8 : fefefefefefefeff x7 : 000000000000000f x6 :
0000000000000002
[  194.406135] x5 : ffffc33415124000 x4 : 0000000000000400 x3 :
0000000000000b19
[  194.406138] x2 : 0000000000000b8c x1 : ffffb2699e678f40 x0 :
ffffb2699e678f48
[  194.406141] Call trace:
[  194.406143]  topology_update_thermal_pressure+0xe4/0x100
[  194.406146]  qcom_lmh_dcvs_notify+0xc8/0x160
[  194.406148]  qcom_lmh_dcvs_poll+0x20/0x2c
[  194.406149]  process_one_work+0x1f4/0x490
[  194.406153]  worker_thread+0x188/0x504
[  194.406154]  kthread+0x12c/0x140
[  194.406158]  ret_from_fork+0x10/0x20
[  194.406162] ---[ end trace bd0039c4fb892d5d ]---

2021-11-05 18:46:42

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication

Hi Steev,

On 11/5/21 3:39 PM, Steev Klimaszewski wrote:
> Hi Lukasz,
>

[snip]

> I've been testing this patchset on the Lenovo Yoga C630, and today while
> compiling alacritty and running an apt-get full-upgrade, I found the
> following in dmesg output:

Thank you for testing and sending feedback!

Are you using a mainline kernel or you applied on some vendor production
kernel this patch set? I need to exclude a different code base
from the equation, especially to the arch_topology.c init code.

>
> [  194.343903] ------------[ cut here ]------------
> [  194.343912] WARNING: CPU: 4 PID: 192 at
> drivers/base/arch_topology.c:188
> topology_update_thermal_pressure+0xe4/0x100
> [  194.343928] Modules linked in: aes_ce_ccm snd_seq_dummy snd_hrtimer
> snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep
> cpufreq_ondemand cpufreq_conservative cpufreq_powersave
> cpufreq_userspace lz4 lz4_compress zram zsmalloc q6asm_dai q6routing
> q6afe_dai q6adm q6asm q6afe q6dsp_common snd_soc_wsa881x q6core
> regmap_sdw snd_soc_wcd934x gpio_wcd934x soundwire_qcom snd_soc_wcd_mbhc
> wcd934x regmap_slimbus uvcvideo videobuf2_vmalloc venus_enc venus_dec
> videobuf2_dma_contig videobuf2_memops qrtr_smd fastrpc apr binfmt_misc
> nls_ascii nls_cp437 vfat fat snd_soc_sdm845 snd_soc_rt5663
> snd_soc_qcom_common pm8941_pwrkey joydev snd_soc_rl6231 aes_ce_blk
> qcom_spmi_adc5 soundwire_bus qcom_vadc_common crypto_simd
> qcom_spmi_temp_alarm snd_soc_core cryptd hci_uart snd_compress btqca
> industrialio aes_ce_cipher snd_pcm_dmaengine btrtl crct10dif_ce btbcm
> ghash_ce snd_pcm btintel gf128mul sha2_ce snd_timer venus_core bluetooth
> snd v4l2_mem2mem sha256_arm64 videobuf2_v4l2 videobuf2_common soundcore
> [  194.344007]  sha1_ce videodev ecdh_generic ecc mc ath10k_snoc
> ath10k_core hid_multitouch ath mac80211 qcom_rng libarc4 qcom_q6v5_mss
> cfg80211 sg rfkill qcom_q6v5_pas qcom_pil_info slim_qcom_ngd_ctrl
> qcom_wdt pdr_interface qcom_q6v5 evdev rmtfs_mem slimbus qcom_sysmon
> fuse configfs qrtr ip_tables x_tables autofs4 ext4 mbcache jbd2
> panel_simple rtc_pm8xxx msm llcc_qcom ocmem gpu_sched ti_sn65dsi86
> drm_dp_aux_bus drm_kms_helper drm camcc_sdm845 ipa qcom_common
> qmi_helpers mdt_loader gpio_keys pwm_bl
> [  194.344056] CPU: 4 PID: 192 Comm: kworker/4:1H Not tainted 5.15.0 #9
> [  194.344060] Hardware name: LENOVO 81JL/LNVNB161216, BIOS
> 9UCN33WW(V2.06) 06/ 4/2019
> [  194.344062] Workqueue: events_highpri qcom_lmh_dcvs_poll
> [  194.344068] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [  194.344070] pc : topology_update_thermal_pressure+0xe4/0x100
> [  194.344073] lr : topology_update_thermal_pressure+0x30/0x100
> [  194.344075] sp : ffff800014043d10
> [  194.344076] x29: ffff800014043d10 x28: 0000000000000000 x27:
> 0000000000000000
> [  194.344080] x26: ffff759c40b66974 x25: ffff759db37a2405 x24:
> ffff759c40e83358
> [  194.344084] x23: 0000000000000000 x22: ffffb2699eafe1d8 x21:
> 00000000002d1e00
> [  194.344087] x20: ffff759c49f5bc20 x19: 000000000000b8cc x18:
> 0000000000000000
> [  194.344090] x17: 2f756e672d78756e x16: ffffb2699d7d83d0 x15:
> 0000000000000000
> [  194.344093] x14: 0000000000000000 x13: 0000000000000030 x12:
> 0101010101010101
> [  194.344096] x11: 7f7f7f7f7f7f7f7f x10: feff68716f676668 x9 :
> ffffb2699dd66a58
> [  194.344099] x8 : fefefefefefefeff x7 : 000000000000000f x6 :
> 0000000000000002
> [  194.344102] x5 : ffffc33415124000 x4 : 0000000000000400 x3 :
> 0000000000000b19
> [  194.344105] x2 : 0000000000000b8c x1 : ffffb2699e678f40 x0 :
> ffffb2699e678f48
> [  194.344108] Call trace:
> [  194.344110]  topology_update_thermal_pressure+0xe4/0x100
> [  194.344113]  qcom_lmh_dcvs_notify+0xc8/0x160
> [  194.344115]  qcom_lmh_dcvs_poll+0x20/0x2c
> [  194.344116]  process_one_work+0x1f4/0x490
> [  194.344120]  worker_thread+0x188/0x504
> [  194.344121]  kthread+0x12c/0x140
> [  194.344125]  ret_from_fork+0x10/0x20
> [  194.344128] ---[ end trace bd0039c4fb892d5b ]---

[snip]

That's interesting why we hit this. I should have added info about
those two values, which are compared.

Could you make this change and try it again, please?
We would know the problematic values, which triggered this.
---------------------8<-----------------------------------
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index db18d79065fe..0d8db0927041 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -185,8 +185,11 @@ void topology_update_thermal_pressure(const struct
cpumask *cpus,
/* Convert to MHz scale which is used in 'freq_factor' */
capped_freq /= 1000;

- if (WARN_ON(max_freq < capped_freq))
+ if (max_freq < capped_freq) {
+ pr_warn("THERMAL_PRESSURE: max_freq (%lu) < capped_freq
(%lu) for CPUs [%*pbl]\n",
+ max_freq, capped_freq, cpumask_pr_args(cpus));
return;
+ }

capacity = mult_frac(capped_freq, max_capacity, max_freq);

------------------------------>8---------------------------

Could you also dump for me the cpufreq and capacity sysfs content?
$ grep . /sys/devices/system/cpu/cpu*/cpufreq/*
$ grep . /sys/devices/system/cpu/cpu*/cpu_capacity

Regards,
Lukasz

2021-11-05 20:12:15

by Thara Gopinath

[permalink] [raw]
Subject: Re: [PATCH v3 4/5] cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function

Hi Lukasz,


On 11/3/21 12:10 PM, Lukasz Luba wrote:
> Thermal pressure provides a new API, which allows to use CPU frequency
> as an argument. That removes the need of local conversion to capacity.
> Use this new API and remove old local conversion code.
>
> Signed-off-by: Lukasz Luba <[email protected]>
> ---
> drivers/cpufreq/qcom-cpufreq-hw.c | 15 +++++----------
> 1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 0138b2ec406d..425f351450ad 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -275,10 +275,10 @@ static unsigned int qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)
>
> static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
> {
> - unsigned long max_capacity, capacity, freq_hz, throttled_freq;
> struct cpufreq_policy *policy = data->policy;
> int cpu = cpumask_first(policy->cpus);
> struct device *dev = get_cpu_device(cpu);
> + unsigned long freq_hz, throttled_freq;
> struct dev_pm_opp *opp;
> unsigned int freq;
>
> @@ -295,17 +295,12 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>
> throttled_freq = freq_hz / HZ_PER_KHZ;
>
> - /* Update thermal pressure */
> -
> - max_capacity = arch_scale_cpu_capacity(cpu);
> - capacity = mult_frac(max_capacity, throttled_freq, policy->cpuinfo.max_freq);
> -
> /* Don't pass boost capacity to scheduler */
> - if (capacity > max_capacity)
> - capacity = max_capacity;

So, I think this should go into the common
topology_update_thermal_pressure in lieu of

+ if (WARN_ON(max_freq < capped_freq))
+ return;

This will fix the issue Steev Klimaszewski has been reporting
https://lore.kernel.org/linux-arm-kernel/[email protected]/


--
Warm Regards
Thara (She/Her/Hers)


> + if (throttled_freq > policy->cpuinfo.max_freq)
> + throttled_freq = policy->cpuinfo.max_freq;
>
> - arch_set_thermal_pressure(policy->related_cpus,
> - max_capacity - capacity);
> + /* Update thermal pressure */
> + arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
>
> /*
> * In the unlikely case policy is unregistered do not enable
>


2021-11-05 20:16:50

by Thara Gopinath

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication



On 11/5/21 1:33 PM, Steev Klimaszewski wrote:
> Hi,
>
> On 11/5/21 11:26 AM, Lukasz Luba wrote:
>> Hi Steev,
>>
>> On 11/5/21 3:39 PM, Steev Klimaszewski wrote:
>>> Hi Lukasz,
>>>
>>
>> [snip]
>>
>>> I've been testing this patchset on the Lenovo Yoga C630, and today
>>> while compiling alacritty and running an apt-get full-upgrade, I
>>> found the following in dmesg output:
>>
>> Thank you for testing and sending feedback!
>>
>> Are you using a mainline kernel or you applied on some vendor production
>> kernel this patch set? I need to exclude a different code base
>> from the equation, especially to the arch_topology.c init code.
>>
> This is stabe 5.15.0 tree with ~72 (including your 6 patches on top
> (including the below as a patch).  You can find it at
> https://github.com/steev/linux/commits/linux-5.15.y - the vast majority
> are just various fixups for sdm845/sdm850 for the Lenovo Yoga (or crypto
> since I'd like to see the crypto unit working).
>
> I did grep through my logs and it appears that this started after I
> moved from v2 to v3 - I'd tested v2 and it didn't show this.
>
> [snip]
>
>> [snip]
>>
>> That's interesting why we hit this. I should have added info about
>> those two values, which are compared.
>>
>> Could you make this change and try it again, please?
>> We would know the problematic values, which triggered this.
>> ---------------------8<-----------------------------------
>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>> index db18d79065fe..0d8db0927041 100644
>> --- a/drivers/base/arch_topology.c
>> +++ b/drivers/base/arch_topology.c
>> @@ -185,8 +185,11 @@ void topology_update_thermal_pressure(const
>> struct cpumask *cpus,
>>         /* Convert to MHz scale which is used in 'freq_factor' */
>>         capped_freq /= 1000;
>>
>> -       if (WARN_ON(max_freq < capped_freq))
>> +       if (max_freq < capped_freq) {
>> +               pr_warn("THERMAL_PRESSURE: max_freq (%lu) <
>> capped_freq (%lu) for CPUs [%*pbl]\n",
>> +                       max_freq, capped_freq, cpumask_pr_args(cpus));
>>                 return;
>> +       }
>>
>>         capacity = mult_frac(capped_freq, max_capacity, max_freq);
>>
>> ------------------------------>8---------------------------
>
> Applying this to the above kernel.. will test...
>
>
>>
>> Could you also dump for me the cpufreq and capacity sysfs content?
>> $ grep . /sys/devices/system/cpu/cpu*/cpufreq/*
>
>
> /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:0 1 2 3
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:300000
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1766400
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:300000
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:0
> /sys/devices/system/cpu/cpu0/cpufreq/related_cpus:0 1 2 3
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:300000 403200
> 480000 576000 652800 748800 825600 902400 979200 1056000 1132800 1228800
> 1324800 1420800 1516800 1612800 1689600 1766400
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:ondemand conservative
> powersave userspace performance schedutil
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:300000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:qcom-cpufreq-hw
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:schedutil
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1766400
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:300000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:<unsupported>
> /sys/devices/system/cpu/cpu1/cpufreq/affected_cpus:0 1 2 3
> /sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_cur_freq:300000
> /sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq:1766400
> /sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_min_freq:300000
> /sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_transition_latency:0
> /sys/devices/system/cpu/cpu1/cpufreq/related_cpus:0 1 2 3
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_frequencies:300000 403200
> 480000 576000 652800 748800 825600 902400 979200 1056000 1132800 1228800
> 1324800 1420800 1516800 1612800 1689600 1766400
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governors:ondemand conservative
> powersave userspace performance schedutil
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:300000
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:qcom-cpufreq-hw
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:schedutil
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:1766400
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq:300000
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed:<unsupported>
> /sys/devices/system/cpu/cpu2/cpufreq/affected_cpus:0 1 2 3
> /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_cur_freq:300000
> /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_max_freq:1766400
> /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_min_freq:300000
> /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_transition_latency:0
> /sys/devices/system/cpu/cpu2/cpufreq/related_cpus:0 1 2 3
> /sys/devices/system/cpu/cpu2/cpufreq/scaling_available_frequencies:300000 403200
> 480000 576000 652800 748800 825600 902400 979200 1056000 1132800 1228800
> 1324800 1420800 1516800 1612800 1689600 1766400
> /sys/devices/system/cpu/cpu2/cpufreq/scaling_available_governors:ondemand conservative
> powersave userspace performance schedutil
> /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:300000
> /sys/devices/system/cpu/cpu2/cpufreq/scaling_driver:qcom-cpufreq-hw
> /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:schedutil
> /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq:1766400
> /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq:300000
> /sys/devices/system/cpu/cpu2/cpufreq/scaling_setspeed:<unsupported>
> /sys/devices/system/cpu/cpu3/cpufreq/affected_cpus:0 1 2 3
> /sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_cur_freq:300000
> /sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_max_freq:1766400
> /sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_min_freq:300000
> /sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_transition_latency:0
> /sys/devices/system/cpu/cpu3/cpufreq/related_cpus:0 1 2 3
> /sys/devices/system/cpu/cpu3/cpufreq/scaling_available_frequencies:300000 403200
> 480000 576000 652800 748800 825600 902400 979200 1056000 1132800 1228800
> 1324800 1420800 1516800 1612800 1689600 1766400
> /sys/devices/system/cpu/cpu3/cpufreq/scaling_available_governors:ondemand conservative
> powersave userspace performance schedutil
> /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:300000
> /sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:qcom-cpufreq-hw
> /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:schedutil
> /sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:1766400
> /sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq:300000
> /sys/devices/system/cpu/cpu3/cpufreq/scaling_setspeed:<unsupported>
> /sys/devices/system/cpu/cpu4/cpufreq/affected_cpus:4 5 6 7
> /sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_cur_freq:1920000
> /sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_max_freq:2956800
> /sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_min_freq:825600
> /sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_transition_latency:0
> /sys/devices/system/cpu/cpu4/cpufreq/related_cpus:4 5 6 7
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_available_frequencies:825600 902400
> 979200 1056000 1209600 1286400 1363200 1459200 1536000 1612800 1689600
> 1766400 1843200 1920000 1996800 2092800 2169600 2246400 2323200 2400000
> 2476800 2553600 2649600 2745600 2841600
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_available_governors:ondemand conservative
> powersave userspace performance schedutil
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_boost_frequencies:2956800
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:1920000
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_driver:qcom-cpufreq-hw
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:schedutil
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq:2841600
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq:825600
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_setspeed:<unsupported>
> /sys/devices/system/cpu/cpu5/cpufreq/affected_cpus:4 5 6 7
> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_cur_freq:1996800
> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:2956800
> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_min_freq:825600
> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_transition_latency:0
> /sys/devices/system/cpu/cpu5/cpufreq/related_cpus:4 5 6 7
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_available_frequencies:825600 902400
> 979200 1056000 1209600 1286400 1363200 1459200 1536000 1612800 1689600
> 1766400 1843200 1920000 1996800 2092800 2169600 2246400 2323200 2400000
> 2476800 2553600 2649600 2745600 2841600
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_available_governors:ondemand conservative
> powersave userspace performance schedutil
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_boost_frequencies:2956800
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:1996800
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:qcom-cpufreq-hw
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:schedutil
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:2841600
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq:825600
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_setspeed:<unsupported>
> /sys/devices/system/cpu/cpu6/cpufreq/affected_cpus:4 5 6 7
> /sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_cur_freq:1996800
> /sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_max_freq:2956800
> /sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_min_freq:825600
> /sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_transition_latency:0
> /sys/devices/system/cpu/cpu6/cpufreq/related_cpus:4 5 6 7
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_available_frequencies:825600 902400
> 979200 1056000 1209600 1286400 1363200 1459200 1536000 1612800 1689600
> 1766400 1843200 1920000 1996800 2092800 2169600 2246400 2323200 2400000
> 2476800 2553600 2649600 2745600 2841600
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_available_governors:ondemand conservative
> powersave userspace performance schedutil
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_boost_frequencies:2956800
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq:1996800
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_driver:qcom-cpufreq-hw
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:schedutil
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_max_freq:2841600
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_min_freq:825600
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_setspeed:<unsupported>
> /sys/devices/system/cpu/cpu7/cpufreq/affected_cpus:4 5 6 7
> /sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_cur_freq:1996800
> /sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_max_freq:2956800
> /sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_min_freq:825600
> /sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_transition_latency:0
> /sys/devices/system/cpu/cpu7/cpufreq/related_cpus:4 5 6 7
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_available_frequencies:825600 902400
> 979200 1056000 1209600 1286400 1363200 1459200 1536000 1612800 1689600
> 1766400 1843200 1920000 1996800 2092800 2169600 2246400 2323200 2400000
> 2476800 2553600 2649600 2745600 2841600
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_available_governors:ondemand conservative
> powersave userspace performance schedutil
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_boost_frequencies:2956800
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq:1996800
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_driver:qcom-cpufreq-hw
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:schedutil
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_max_freq:2841600
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_min_freq:825600
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_setspeed:<unsupported>
>
>
>> $ grep . /sys/devices/system/cpu/cpu*/cpu_capacity
>
> /sys/devices/system/cpu/cpu0/cpu_capacity:377
> /sys/devices/system/cpu/cpu1/cpu_capacity:377
> /sys/devices/system/cpu/cpu2/cpu_capacity:377
> /sys/devices/system/cpu/cpu3/cpu_capacity:377
> /sys/devices/system/cpu/cpu4/cpu_capacity:1024
> /sys/devices/system/cpu/cpu5/cpu_capacity:1024
> /sys/devices/system/cpu/cpu6/cpu_capacity:1024
> /sys/devices/system/cpu/cpu7/cpu_capacity:1024
>
>
> In taking a look at cpufreq-info, one thing I noticed is that even
> though I have 1 in /sys/devices/system/cpu/cpufreq/boost, I am *never*
> hitting the 2.96GHz now
>
> cpufreq stats: 826 MHz:59.14%, 902 MHz:0.15%, 979 MHz:0.18%, 1.06
> GHz:0.11%, 1.21 GHz:0.49%, 1.29 GHz:0.26%, 1.36 GHz:0.12%, 1.46
> GHz:0.23%, 1.54 GHz:0.10%, 1.61 GHz:0.14%, 1.69 GHz:0.09%, 1.77
> GHz:0.28%, 1.84 GHz:0.64%, 1.92 GHz:0.23%, 2.00 GHz:0.05%, 2.09
> GHz:0.05%, 2.17 GHz:0.03%, 2.25 GHz:0.03%, 2.32 GHz:0.03%, 2.40
> GHz:0.03%, 2.48 GHz:0.02%, 2.55 GHz:0.02%, 2.65 GHz:0.03%, 2.75
> GHz:0.03%, 2.84 GHz:37.53%, 2.96 GHz:0.00%  (20854)
>
> Aaaand it looks like that is part of the deal - with your patch from
> above applied, we get:
>
> [   22.487268] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for
> CPUs [4-7]
> [   22.487313] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for
> CPUs [4-7]
> [   22.508642] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for
> CPUs [4-7]
> [   22.552273] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for
> CPUs [4-7]
>
> So, we're not able to hit boost frequencies with this applied?

Hi Steve,

Does your system have enough load to hit the boost frequencies ? I don't
think this patch should affect hitting boost frequencies as there is no
error being returned from topology_update_thermal_pressure.

The warning you are getting is because you have boost frequency enabled
and IIUC lmh enabled and thermal pressure framework bails out due to
boost_frequency being greater than what is available in per_cpu
freq_factor. This is because we do not recalculate freq_factor every
time boost is enabled / disabled. IIRC there were some discussions
around rebuilding scheduler domains and capacity with user space changes
to max frequency but it has never proceeded much. Till that point, I
think the right way, is to check whether the new capcity exceeds the
max_capacity of the cpu and if yes use max_capacity in lieu of
new_capacity to calculate thermal pressure.

>
> Thank you for the fast response!
>
> -- steev
>

--
Warm Regards
Thara (She/Her/Hers)

2021-11-05 20:30:22

by Steev Klimaszewski

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication


On 11/5/21 2:18 PM, Thara Gopinath wrote:
>
>
> On 11/5/21 1:33 PM, Steev Klimaszewski wrote:
>> Hi,
>>
>> On 11/5/21 11:26 AM, Lukasz Luba wrote:
>>> Hi Steev,
>>>
>>> On 11/5/21 3:39 PM, Steev Klimaszewski wrote:
>>>> Hi Lukasz,
>>>>
>>> [snip]
> Hi Steve,
>
> Does your system have enough load to hit the boost frequencies ? I
> don't think this patch should affect hitting boost frequencies as
> there is no error being returned from topology_update_thermal_pressure.
>
> The warning you are getting is because you have boost frequency
> enabled and IIUC lmh enabled and thermal pressure framework bails out
> due to boost_frequency being greater than what is available in per_cpu
> freq_factor. This is because we do not recalculate freq_factor every
> time boost is enabled / disabled. IIRC there were some discussions
> around rebuilding scheduler domains and capacity with user space
> changes to max frequency but it has never proceeded much. Till that
> point, I think the right way, is to check whether the new capcity
> exceeds the max_capacity of the cpu and if yes use max_capacity in
> lieu of new_capacity to calculate thermal pressure.
>
Hi Thara,

I should definitely be able to push it to 2.96GHz, however I'm simply
not getting it at all with these patches applied.

So, I'm currently compiling multiple applications - alacritty
(https://github.com/alacritty/alacritty), and zellij
(https://github.com/zellij-org/zellij), as well as running pixz on a
5.1GB file to compress it, and throwing in cpuburn-a53
(https://github.com/ssvb/cpuburn-arm) and I'm simply not getting 2.96GHz
at all.  Ever.  I don't normally try to push it that high, but I wanted
to see if we could ever hit it (the system was also never going above 86C)

analyzing CPU 4:
  driver: qcom-cpufreq-hw
  CPUs which run at the same hardware frequency: 4 5 6 7
  CPUs which need to have their frequency coordinated by software: 4 5 6 7
  maximum transition latency: 4294.55 ms.
  hardware limits: 826 MHz - 2.96 GHz
  available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
  available cpufreq governors: ondemand, conservative, powersave,
userspace, performance, schedutil
  current policy: frequency should be within 826 MHz and 2.84 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency is 2.84 GHz.
  cpufreq stats: 826 MHz:54.84%, 902 MHz:0.02%, 979 MHz:0.02%, 1.06
GHz:0.02%, 1.21 GHz:0.08%, 1.29 GHz:0.07%, 1.36 GHz:0.09%, 1.46
GHz:0.04%, 1.54 GHz:0.02%, 1.61 GHz:0.02%, 1.69 GHz:0.02%, 1.77
GHz:0.13%, 1.84 GHz:0.04%, 1.92 GHz:0.04%, 2.00 GHz:0.02%, 2.09
GHz:0.03%, 2.17 GHz:0.02%, 2.25 GHz:0.02%, 2.32 GHz:0.01%, 2.40
GHz:0.02%, 2.48 GHz:0.02%, 2.55 GHz:0.02%, 2.65 GHz:0.02%, 2.75
GHz:0.02%, 2.84 GHz:44.38%, 2.96 GHz:0.00%  (8066)
analyzing CPU 5:
  driver: qcom-cpufreq-hw
  CPUs which run at the same hardware frequency: 4 5 6 7
  CPUs which need to have their frequency coordinated by software: 4 5 6 7
  maximum transition latency: 4294.55 ms.
  hardware limits: 826 MHz - 2.96 GHz
  available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
  available cpufreq governors: ondemand, conservative, powersave,
userspace, performance, schedutil
  current policy: frequency should be within 826 MHz and 2.84 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency is 2.84 GHz.
  cpufreq stats: 826 MHz:54.84%, 902 MHz:0.02%, 979 MHz:0.02%, 1.06
GHz:0.02%, 1.21 GHz:0.08%, 1.29 GHz:0.07%, 1.36 GHz:0.09%, 1.46
GHz:0.04%, 1.54 GHz:0.02%, 1.61 GHz:0.02%, 1.69 GHz:0.02%, 1.77
GHz:0.13%, 1.84 GHz:0.04%, 1.92 GHz:0.04%, 2.00 GHz:0.02%, 2.09
GHz:0.03%, 2.17 GHz:0.02%, 2.25 GHz:0.02%, 2.32 GHz:0.01%, 2.40
GHz:0.02%, 2.48 GHz:0.02%, 2.55 GHz:0.02%, 2.65 GHz:0.02%, 2.75
GHz:0.02%, 2.84 GHz:44.38%, 2.96 GHz:0.00%  (8066)
analyzing CPU 6:
  driver: qcom-cpufreq-hw
  CPUs which run at the same hardware frequency: 4 5 6 7
  CPUs which need to have their frequency coordinated by software: 4 5 6 7
  maximum transition latency: 4294.55 ms.
  hardware limits: 826 MHz - 2.96 GHz
  available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
  available cpufreq governors: ondemand, conservative, powersave,
userspace, performance, schedutil
  current policy: frequency should be within 826 MHz and 2.84 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency is 2.84 GHz.
  cpufreq stats: 826 MHz:54.84%, 902 MHz:0.02%, 979 MHz:0.02%, 1.06
GHz:0.02%, 1.21 GHz:0.08%, 1.29 GHz:0.07%, 1.36 GHz:0.09%, 1.46
GHz:0.04%, 1.54 GHz:0.02%, 1.61 GHz:0.02%, 1.69 GHz:0.02%, 1.77
GHz:0.13%, 1.84 GHz:0.04%, 1.92 GHz:0.04%, 2.00 GHz:0.02%, 2.09
GHz:0.03%, 2.17 GHz:0.02%, 2.25 GHz:0.02%, 2.32 GHz:0.01%, 2.40
GHz:0.02%, 2.48 GHz:0.02%, 2.55 GHz:0.02%, 2.65 GHz:0.02%, 2.75
GHz:0.02%, 2.84 GHz:44.38%, 2.96 GHz:0.00%  (8066)
analyzing CPU 7:
  driver: qcom-cpufreq-hw
  CPUs which run at the same hardware frequency: 4 5 6 7
  CPUs which need to have their frequency coordinated by software: 4 5 6 7
  maximum transition latency: 4294.55 ms.
  hardware limits: 826 MHz - 2.96 GHz
  available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
  available cpufreq governors: ondemand, conservative, powersave,
userspace, performance, schedutil
  current policy: frequency should be within 826 MHz and 2.84 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency is 2.84 GHz.
  cpufreq stats: 826 MHz:54.84%, 902 MHz:0.02%, 979 MHz:0.02%, 1.06
GHz:0.02%, 1.21 GHz:0.08%, 1.29 GHz:0.07%, 1.36 GHz:0.09%, 1.46
GHz:0.04%, 1.54 GHz:0.02%, 1.61 GHz:0.02%, 1.69 GHz:0.02%, 1.77
GHz:0.13%, 1.84 GHz:0.04%, 1.92 GHz:0.04%, 2.00 GHz:0.02%, 2.09
GHz:0.03%, 2.17 GHz:0.02%, 2.25 GHz:0.02%, 2.32 GHz:0.01%, 2.40
GHz:0.02%, 2.48 GHz:0.02%, 2.55 GHz:0.02%, 2.65 GHz:0.02%, 2.75
GHz:0.02%, 2.84 GHz:44.38%, 2.96 GHz:0.00%  (8066)



After removing this patchset, and rebooting and just compiling zellij:

analyzing CPU 4:
  driver: qcom-cpufreq-hw
  CPUs which run at the same hardware frequency: 4 5 6 7
  CPUs which need to have their frequency coordinated by software: 4 5 6 7
  maximum transition latency: 4294.55 ms.
  hardware limits: 826 MHz - 2.96 GHz
  available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
  available cpufreq governors: ondemand, conservative, powersave,
userspace, performance, schedutil
  current policy: frequency should be within 826 MHz and 2.84 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency is 2.84 GHz.
  cpufreq stats: 826 MHz:16.01%, 902 MHz:0.08%, 979 MHz:0.05%, 1.06
GHz:0.06%, 1.21 GHz:0.37%, 1.29 GHz:0.17%, 1.36 GHz:0.15%, 1.46
GHz:0.20%, 1.54 GHz:0.18%, 1.61 GHz:0.21%, 1.69 GHz:0.17%, 1.77
GHz:0.22%, 1.84 GHz:0.32%, 1.92 GHz:0.37%, 2.00 GHz:0.22%, 2.09
GHz:0.20%, 2.17 GHz:0.20%, 2.25 GHz:0.19%, 2.32 GHz:0.19%, 2.40
GHz:0.21%, 2.48 GHz:0.18%, 2.55 GHz:0.18%, 2.65 GHz:0.21%, 2.75
GHz:0.16%, 2.84 GHz:79.49%, 2.96 GHz:0.03%  (5315)
analyzing CPU 5:
  driver: qcom-cpufreq-hw
  CPUs which run at the same hardware frequency: 4 5 6 7
  CPUs which need to have their frequency coordinated by software: 4 5 6 7
  maximum transition latency: 4294.55 ms.
  hardware limits: 826 MHz - 2.96 GHz
  available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
  available cpufreq governors: ondemand, conservative, powersave,
userspace, performance, schedutil
  current policy: frequency should be within 826 MHz and 2.84 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency is 2.84 GHz.
  cpufreq stats: 826 MHz:16.01%, 902 MHz:0.08%, 979 MHz:0.05%, 1.06
GHz:0.06%, 1.21 GHz:0.37%, 1.29 GHz:0.17%, 1.36 GHz:0.15%, 1.46
GHz:0.20%, 1.54 GHz:0.18%, 1.61 GHz:0.21%, 1.69 GHz:0.17%, 1.77
GHz:0.22%, 1.84 GHz:0.32%, 1.92 GHz:0.37%, 2.00 GHz:0.22%, 2.09
GHz:0.20%, 2.17 GHz:0.20%, 2.25 GHz:0.19%, 2.32 GHz:0.19%, 2.40
GHz:0.21%, 2.48 GHz:0.18%, 2.55 GHz:0.18%, 2.65 GHz:0.21%, 2.75
GHz:0.16%, 2.84 GHz:79.49%, 2.96 GHz:0.03%  (5315)
analyzing CPU 6:
  driver: qcom-cpufreq-hw
  CPUs which run at the same hardware frequency: 4 5 6 7
  CPUs which need to have their frequency coordinated by software: 4 5 6 7
  maximum transition latency: 4294.55 ms.
  hardware limits: 826 MHz - 2.96 GHz
  available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
  available cpufreq governors: ondemand, conservative, powersave,
userspace, performance, schedutil
  current policy: frequency should be within 826 MHz and 2.84 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency is 2.84 GHz.
  cpufreq stats: 826 MHz:16.01%, 902 MHz:0.08%, 979 MHz:0.05%, 1.06
GHz:0.06%, 1.21 GHz:0.37%, 1.29 GHz:0.17%, 1.36 GHz:0.15%, 1.46
GHz:0.20%, 1.54 GHz:0.18%, 1.61 GHz:0.21%, 1.69 GHz:0.17%, 1.77
GHz:0.22%, 1.84 GHz:0.32%, 1.92 GHz:0.37%, 2.00 GHz:0.22%, 2.09
GHz:0.20%, 2.17 GHz:0.20%, 2.25 GHz:0.19%, 2.32 GHz:0.19%, 2.40
GHz:0.21%, 2.48 GHz:0.18%, 2.55 GHz:0.18%, 2.65 GHz:0.21%, 2.75
GHz:0.16%, 2.84 GHz:79.49%, 2.96 GHz:0.03%  (5315)
analyzing CPU 7:
  driver: qcom-cpufreq-hw
  CPUs which run at the same hardware frequency: 4 5 6 7
  CPUs which need to have their frequency coordinated by software: 4 5 6 7
  maximum transition latency: 4294.55 ms.
  hardware limits: 826 MHz - 2.96 GHz
  available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
  available cpufreq governors: ondemand, conservative, powersave,
userspace, performance, schedutil
  current policy: frequency should be within 826 MHz and 2.84 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency is 2.84 GHz.
  cpufreq stats: 826 MHz:16.01%, 902 MHz:0.08%, 979 MHz:0.05%, 1.06
GHz:0.06%, 1.21 GHz:0.37%, 1.29 GHz:0.17%, 1.36 GHz:0.15%, 1.46
GHz:0.20%, 1.54 GHz:0.18%, 1.61 GHz:0.21%, 1.69 GHz:0.17%, 1.77
GHz:0.22%, 1.84 GHz:0.32%, 1.92 GHz:0.37%, 2.00 GHz:0.22%, 2.09
GHz:0.20%, 2.17 GHz:0.20%, 2.25 GHz:0.19%, 2.32 GHz:0.19%, 2.40
GHz:0.21%, 2.48 GHz:0.18%, 2.55 GHz:0.18%, 2.65 GHz:0.21%, 2.75
GHz:0.16%, 2.84 GHz:79.49%, 2.96 GHz:0.03%  (5315)


>>
>> Thank you for the fast response!
>>
>> -- steev
>>
>

2021-11-05 21:21:28

by Steev Klimaszewski

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication

Hi,

On 11/5/21 11:26 AM, Lukasz Luba wrote:
> Hi Steev,
>
> On 11/5/21 3:39 PM, Steev Klimaszewski wrote:
>> Hi Lukasz,
>>
>
> [snip]
>
>> I've been testing this patchset on the Lenovo Yoga C630, and today
>> while compiling alacritty and running an apt-get full-upgrade, I
>> found the following in dmesg output:
>
> Thank you for testing and sending feedback!
>
> Are you using a mainline kernel or you applied on some vendor production
> kernel this patch set? I need to exclude a different code base
> from the equation, especially to the arch_topology.c init code.
>
This is stabe 5.15.0 tree with ~72 (including your 6 patches on top
(including the below as a patch).  You can find it at
https://github.com/steev/linux/commits/linux-5.15.y - the vast majority
are just various fixups for sdm845/sdm850 for the Lenovo Yoga (or crypto
since I'd like to see the crypto unit working).

I did grep through my logs and it appears that this started after I
moved from v2 to v3 - I'd tested v2 and it didn't show this.

[snip]

> [snip]
>
> That's interesting why we hit this. I should have added info about
> those two values, which are compared.
>
> Could you make this change and try it again, please?
> We would know the problematic values, which triggered this.
> ---------------------8<-----------------------------------
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index db18d79065fe..0d8db0927041 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -185,8 +185,11 @@ void topology_update_thermal_pressure(const
> struct cpumask *cpus,
>         /* Convert to MHz scale which is used in 'freq_factor' */
>         capped_freq /= 1000;
>
> -       if (WARN_ON(max_freq < capped_freq))
> +       if (max_freq < capped_freq) {
> +               pr_warn("THERMAL_PRESSURE: max_freq (%lu) <
> capped_freq (%lu) for CPUs [%*pbl]\n",
> +                       max_freq, capped_freq, cpumask_pr_args(cpus));
>                 return;
> +       }
>
>         capacity = mult_frac(capped_freq, max_capacity, max_freq);
>
> ------------------------------>8---------------------------

Applying this to the above kernel.. will test...


>
> Could you also dump for me the cpufreq and capacity sysfs content?
> $ grep . /sys/devices/system/cpu/cpu*/cpufreq/*


/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:0 1 2 3
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:300000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1766400
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:300000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu0/cpufreq/related_cpus:0 1 2 3
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:300000
403200 480000 576000 652800 748800 825600 902400 979200 1056000 1132800
1228800 1324800 1420800 1516800 1612800 1689600 1766400
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:ondemand
conservative powersave userspace performance schedutil
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:300000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:qcom-cpufreq-hw
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:schedutil
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1766400
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:300000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:<unsupported>
/sys/devices/system/cpu/cpu1/cpufreq/affected_cpus:0 1 2 3
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_cur_freq:300000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq:1766400
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_min_freq:300000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu1/cpufreq/related_cpus:0 1 2 3
/sys/devices/system/cpu/cpu1/cpufreq/scaling_available_frequencies:300000
403200 480000 576000 652800 748800 825600 902400 979200 1056000 1132800
1228800 1324800 1420800 1516800 1612800 1689600 1766400
/sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governors:ondemand
conservative powersave userspace performance schedutil
/sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:300000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:qcom-cpufreq-hw
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:schedutil
/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:1766400
/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq:300000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed:<unsupported>
/sys/devices/system/cpu/cpu2/cpufreq/affected_cpus:0 1 2 3
/sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_cur_freq:300000
/sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_max_freq:1766400
/sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_min_freq:300000
/sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu2/cpufreq/related_cpus:0 1 2 3
/sys/devices/system/cpu/cpu2/cpufreq/scaling_available_frequencies:300000
403200 480000 576000 652800 748800 825600 902400 979200 1056000 1132800
1228800 1324800 1420800 1516800 1612800 1689600 1766400
/sys/devices/system/cpu/cpu2/cpufreq/scaling_available_governors:ondemand
conservative powersave userspace performance schedutil
/sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:300000
/sys/devices/system/cpu/cpu2/cpufreq/scaling_driver:qcom-cpufreq-hw
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:schedutil
/sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq:1766400
/sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq:300000
/sys/devices/system/cpu/cpu2/cpufreq/scaling_setspeed:<unsupported>
/sys/devices/system/cpu/cpu3/cpufreq/affected_cpus:0 1 2 3
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_cur_freq:300000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_max_freq:1766400
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_min_freq:300000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu3/cpufreq/related_cpus:0 1 2 3
/sys/devices/system/cpu/cpu3/cpufreq/scaling_available_frequencies:300000
403200 480000 576000 652800 748800 825600 902400 979200 1056000 1132800
1228800 1324800 1420800 1516800 1612800 1689600 1766400
/sys/devices/system/cpu/cpu3/cpufreq/scaling_available_governors:ondemand
conservative powersave userspace performance schedutil
/sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:300000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:qcom-cpufreq-hw
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:schedutil
/sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:1766400
/sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq:300000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_setspeed:<unsupported>
/sys/devices/system/cpu/cpu4/cpufreq/affected_cpus:4 5 6 7
/sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_cur_freq:1920000
/sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_max_freq:2956800
/sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_min_freq:825600
/sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu4/cpufreq/related_cpus:4 5 6 7
/sys/devices/system/cpu/cpu4/cpufreq/scaling_available_frequencies:825600
902400 979200 1056000 1209600 1286400 1363200 1459200 1536000 1612800
1689600 1766400 1843200 1920000 1996800 2092800 2169600 2246400 2323200
2400000 2476800 2553600 2649600 2745600 2841600
/sys/devices/system/cpu/cpu4/cpufreq/scaling_available_governors:ondemand
conservative powersave userspace performance schedutil
/sys/devices/system/cpu/cpu4/cpufreq/scaling_boost_frequencies:2956800
/sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:1920000
/sys/devices/system/cpu/cpu4/cpufreq/scaling_driver:qcom-cpufreq-hw
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:schedutil
/sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq:2841600
/sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq:825600
/sys/devices/system/cpu/cpu4/cpufreq/scaling_setspeed:<unsupported>
/sys/devices/system/cpu/cpu5/cpufreq/affected_cpus:4 5 6 7
/sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_cur_freq:1996800
/sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:2956800
/sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_min_freq:825600
/sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu5/cpufreq/related_cpus:4 5 6 7
/sys/devices/system/cpu/cpu5/cpufreq/scaling_available_frequencies:825600
902400 979200 1056000 1209600 1286400 1363200 1459200 1536000 1612800
1689600 1766400 1843200 1920000 1996800 2092800 2169600 2246400 2323200
2400000 2476800 2553600 2649600 2745600 2841600
/sys/devices/system/cpu/cpu5/cpufreq/scaling_available_governors:ondemand
conservative powersave userspace performance schedutil
/sys/devices/system/cpu/cpu5/cpufreq/scaling_boost_frequencies:2956800
/sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:1996800
/sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:qcom-cpufreq-hw
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:schedutil
/sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:2841600
/sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq:825600
/sys/devices/system/cpu/cpu5/cpufreq/scaling_setspeed:<unsupported>
/sys/devices/system/cpu/cpu6/cpufreq/affected_cpus:4 5 6 7
/sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_cur_freq:1996800
/sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_max_freq:2956800
/sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_min_freq:825600
/sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu6/cpufreq/related_cpus:4 5 6 7
/sys/devices/system/cpu/cpu6/cpufreq/scaling_available_frequencies:825600
902400 979200 1056000 1209600 1286400 1363200 1459200 1536000 1612800
1689600 1766400 1843200 1920000 1996800 2092800 2169600 2246400 2323200
2400000 2476800 2553600 2649600 2745600 2841600
/sys/devices/system/cpu/cpu6/cpufreq/scaling_available_governors:ondemand
conservative powersave userspace performance schedutil
/sys/devices/system/cpu/cpu6/cpufreq/scaling_boost_frequencies:2956800
/sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq:1996800
/sys/devices/system/cpu/cpu6/cpufreq/scaling_driver:qcom-cpufreq-hw
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:schedutil
/sys/devices/system/cpu/cpu6/cpufreq/scaling_max_freq:2841600
/sys/devices/system/cpu/cpu6/cpufreq/scaling_min_freq:825600
/sys/devices/system/cpu/cpu6/cpufreq/scaling_setspeed:<unsupported>
/sys/devices/system/cpu/cpu7/cpufreq/affected_cpus:4 5 6 7
/sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_cur_freq:1996800
/sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_max_freq:2956800
/sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_min_freq:825600
/sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu7/cpufreq/related_cpus:4 5 6 7
/sys/devices/system/cpu/cpu7/cpufreq/scaling_available_frequencies:825600
902400 979200 1056000 1209600 1286400 1363200 1459200 1536000 1612800
1689600 1766400 1843200 1920000 1996800 2092800 2169600 2246400 2323200
2400000 2476800 2553600 2649600 2745600 2841600
/sys/devices/system/cpu/cpu7/cpufreq/scaling_available_governors:ondemand
conservative powersave userspace performance schedutil
/sys/devices/system/cpu/cpu7/cpufreq/scaling_boost_frequencies:2956800
/sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq:1996800
/sys/devices/system/cpu/cpu7/cpufreq/scaling_driver:qcom-cpufreq-hw
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:schedutil
/sys/devices/system/cpu/cpu7/cpufreq/scaling_max_freq:2841600
/sys/devices/system/cpu/cpu7/cpufreq/scaling_min_freq:825600
/sys/devices/system/cpu/cpu7/cpufreq/scaling_setspeed:<unsupported>


> $ grep . /sys/devices/system/cpu/cpu*/cpu_capacity

/sys/devices/system/cpu/cpu0/cpu_capacity:377
/sys/devices/system/cpu/cpu1/cpu_capacity:377
/sys/devices/system/cpu/cpu2/cpu_capacity:377
/sys/devices/system/cpu/cpu3/cpu_capacity:377
/sys/devices/system/cpu/cpu4/cpu_capacity:1024
/sys/devices/system/cpu/cpu5/cpu_capacity:1024
/sys/devices/system/cpu/cpu6/cpu_capacity:1024
/sys/devices/system/cpu/cpu7/cpu_capacity:1024


In taking a look at cpufreq-info, one thing I noticed is that even
though I have 1 in /sys/devices/system/cpu/cpufreq/boost, I am *never*
hitting the 2.96GHz now

cpufreq stats: 826 MHz:59.14%, 902 MHz:0.15%, 979 MHz:0.18%, 1.06
GHz:0.11%, 1.21 GHz:0.49%, 1.29 GHz:0.26%, 1.36 GHz:0.12%, 1.46
GHz:0.23%, 1.54 GHz:0.10%, 1.61 GHz:0.14%, 1.69 GHz:0.09%, 1.77
GHz:0.28%, 1.84 GHz:0.64%, 1.92 GHz:0.23%, 2.00 GHz:0.05%, 2.09
GHz:0.05%, 2.17 GHz:0.03%, 2.25 GHz:0.03%, 2.32 GHz:0.03%, 2.40
GHz:0.03%, 2.48 GHz:0.02%, 2.55 GHz:0.02%, 2.65 GHz:0.03%, 2.75
GHz:0.03%, 2.84 GHz:37.53%, 2.96 GHz:0.00%  (20854)

Aaaand it looks like that is part of the deal - with your patch from
above applied, we get:

[   22.487268] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for
CPUs [4-7]
[   22.487313] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for
CPUs [4-7]
[   22.508642] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for
CPUs [4-7]
[   22.552273] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for
CPUs [4-7]

So, we're not able to hit boost frequencies with this applied?

Thank you for the fast response!

-- steev

2021-11-06 00:00:06

by Thara Gopinath

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication



On 11/5/21 3:51 PM, Steev Klimaszewski wrote:
>
> On 11/5/21 2:18 PM, Thara Gopinath wrote:
>>
>>
>> On 11/5/21 1:33 PM, Steev Klimaszewski wrote:
>>> Hi,
>>>
>>> On 11/5/21 11:26 AM, Lukasz Luba wrote:
>>>> Hi Steev,
>>>>
>>>> On 11/5/21 3:39 PM, Steev Klimaszewski wrote:
>>>>> Hi Lukasz,
>>>>>
>>>> [snip]
>> Hi Steve,
>>
>> Does your system have enough load to hit the boost frequencies ? I
>> don't think this patch should affect hitting boost frequencies as
>> there is no error being returned from topology_update_thermal_pressure.
>>
>> The warning you are getting is because you have boost frequency
>> enabled and IIUC lmh enabled and thermal pressure framework bails out
>> due to boost_frequency being greater than what is available in per_cpu
>> freq_factor. This is because we do not recalculate freq_factor every
>> time boost is enabled / disabled. IIRC there were some discussions
>> around rebuilding scheduler domains and capacity with user space
>> changes to max frequency but it has never proceeded much. Till that
>> point, I think the right way, is to check whether the new capcity
>> exceeds the max_capacity of the cpu and if yes use max_capacity in
>> lieu of new_capacity to calculate thermal pressure.
>>
> Hi Thara,
>
> I should definitely be able to push it to 2.96GHz, however I'm simply
> not getting it at all with these patches applied. >
> So, I'm currently compiling multiple applications - alacritty
> (https://github.com/alacritty/alacritty), and zellij
> (https://github.com/zellij-org/zellij), as well as running pixz on a
> 5.1GB file to compress it, and throwing in cpuburn-a53
> (https://github.com/ssvb/cpuburn-arm) and I'm simply not getting 2.96GHz
> at all.  Ever.  I don't normally try to push it that high, but I wanted
> to see if we could ever hit it (the system was also never going above 86C)

Hi,

So IIUC the below logs correctly, you are never hitting boost frequency
(with or without this patch series). Is that correct ?

w.r.t temperature , how are you measuring it? Do you have LMh enabled or
are you using tsens to mitigate cpu temperature ?

--
Warm Regards
Thara (She/Her/Hers)
>
> analyzing CPU 4:
>   driver: qcom-cpufreq-hw
>   CPUs which run at the same hardware frequency: 4 5 6 7
>   CPUs which need to have their frequency coordinated by software: 4 5 6 7
>   maximum transition latency: 4294.55 ms.
>   hardware limits: 826 MHz - 2.96 GHz
>   available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
> GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
> GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
> GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
>   available cpufreq governors: ondemand, conservative, powersave,
> userspace, performance, schedutil
>   current policy: frequency should be within 826 MHz and 2.84 GHz.
>                   The governor "schedutil" may decide which speed to use
>                   within this range.
>   current CPU frequency is 2.84 GHz.
>   cpufreq stats: 826 MHz:54.84%, 902 MHz:0.02%, 979 MHz:0.02%, 1.06
> GHz:0.02%, 1.21 GHz:0.08%, 1.29 GHz:0.07%, 1.36 GHz:0.09%, 1.46
> GHz:0.04%, 1.54 GHz:0.02%, 1.61 GHz:0.02%, 1.69 GHz:0.02%, 1.77
> GHz:0.13%, 1.84 GHz:0.04%, 1.92 GHz:0.04%, 2.00 GHz:0.02%, 2.09
> GHz:0.03%, 2.17 GHz:0.02%, 2.25 GHz:0.02%, 2.32 GHz:0.01%, 2.40
> GHz:0.02%, 2.48 GHz:0.02%, 2.55 GHz:0.02%, 2.65 GHz:0.02%, 2.75
> GHz:0.02%, 2.84 GHz:44.38%, 2.96 GHz:0.00%  (8066)
> analyzing CPU 5:
>   driver: qcom-cpufreq-hw
>   CPUs which run at the same hardware frequency: 4 5 6 7
>   CPUs which need to have their frequency coordinated by software: 4 5 6 7
>   maximum transition latency: 4294.55 ms.
>   hardware limits: 826 MHz - 2.96 GHz
>   available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
> GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
> GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
> GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
>   available cpufreq governors: ondemand, conservative, powersave,
> userspace, performance, schedutil
>   current policy: frequency should be within 826 MHz and 2.84 GHz.
>                   The governor "schedutil" may decide which speed to use
>                   within this range.
>   current CPU frequency is 2.84 GHz.
>   cpufreq stats: 826 MHz:54.84%, 902 MHz:0.02%, 979 MHz:0.02%, 1.06
> GHz:0.02%, 1.21 GHz:0.08%, 1.29 GHz:0.07%, 1.36 GHz:0.09%, 1.46
> GHz:0.04%, 1.54 GHz:0.02%, 1.61 GHz:0.02%, 1.69 GHz:0.02%, 1.77
> GHz:0.13%, 1.84 GHz:0.04%, 1.92 GHz:0.04%, 2.00 GHz:0.02%, 2.09
> GHz:0.03%, 2.17 GHz:0.02%, 2.25 GHz:0.02%, 2.32 GHz:0.01%, 2.40
> GHz:0.02%, 2.48 GHz:0.02%, 2.55 GHz:0.02%, 2.65 GHz:0.02%, 2.75
> GHz:0.02%, 2.84 GHz:44.38%, 2.96 GHz:0.00%  (8066)
> analyzing CPU 6:
>   driver: qcom-cpufreq-hw
>   CPUs which run at the same hardware frequency: 4 5 6 7
>   CPUs which need to have their frequency coordinated by software: 4 5 6 7
>   maximum transition latency: 4294.55 ms.
>   hardware limits: 826 MHz - 2.96 GHz
>   available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
> GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
> GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
> GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
>   available cpufreq governors: ondemand, conservative, powersave,
> userspace, performance, schedutil
>   current policy: frequency should be within 826 MHz and 2.84 GHz.
>                   The governor "schedutil" may decide which speed to use
>                   within this range.
>   current CPU frequency is 2.84 GHz.
>   cpufreq stats: 826 MHz:54.84%, 902 MHz:0.02%, 979 MHz:0.02%, 1.06
> GHz:0.02%, 1.21 GHz:0.08%, 1.29 GHz:0.07%, 1.36 GHz:0.09%, 1.46
> GHz:0.04%, 1.54 GHz:0.02%, 1.61 GHz:0.02%, 1.69 GHz:0.02%, 1.77
> GHz:0.13%, 1.84 GHz:0.04%, 1.92 GHz:0.04%, 2.00 GHz:0.02%, 2.09
> GHz:0.03%, 2.17 GHz:0.02%, 2.25 GHz:0.02%, 2.32 GHz:0.01%, 2.40
> GHz:0.02%, 2.48 GHz:0.02%, 2.55 GHz:0.02%, 2.65 GHz:0.02%, 2.75
> GHz:0.02%, 2.84 GHz:44.38%, 2.96 GHz:0.00%  (8066)
> analyzing CPU 7:
>   driver: qcom-cpufreq-hw
>   CPUs which run at the same hardware frequency: 4 5 6 7
>   CPUs which need to have their frequency coordinated by software: 4 5 6 7
>   maximum transition latency: 4294.55 ms.
>   hardware limits: 826 MHz - 2.96 GHz
>   available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
> GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
> GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
> GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
>   available cpufreq governors: ondemand, conservative, powersave,
> userspace, performance, schedutil
>   current policy: frequency should be within 826 MHz and 2.84 GHz.
>                   The governor "schedutil" may decide which speed to use
>                   within this range.
>   current CPU frequency is 2.84 GHz.
>   cpufreq stats: 826 MHz:54.84%, 902 MHz:0.02%, 979 MHz:0.02%, 1.06
> GHz:0.02%, 1.21 GHz:0.08%, 1.29 GHz:0.07%, 1.36 GHz:0.09%, 1.46
> GHz:0.04%, 1.54 GHz:0.02%, 1.61 GHz:0.02%, 1.69 GHz:0.02%, 1.77
> GHz:0.13%, 1.84 GHz:0.04%, 1.92 GHz:0.04%, 2.00 GHz:0.02%, 2.09
> GHz:0.03%, 2.17 GHz:0.02%, 2.25 GHz:0.02%, 2.32 GHz:0.01%, 2.40
> GHz:0.02%, 2.48 GHz:0.02%, 2.55 GHz:0.02%, 2.65 GHz:0.02%, 2.75
> GHz:0.02%, 2.84 GHz:44.38%, 2.96 GHz:0.00%  (8066)
>
>
>
> After removing this patchset, and rebooting and just compiling zellij:
>
> analyzing CPU 4:
>   driver: qcom-cpufreq-hw
>   CPUs which run at the same hardware frequency: 4 5 6 7
>   CPUs which need to have their frequency coordinated by software: 4 5 6 7
>   maximum transition latency: 4294.55 ms.
>   hardware limits: 826 MHz - 2.96 GHz
>   available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
> GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
> GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
> GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
>   available cpufreq governors: ondemand, conservative, powersave,
> userspace, performance, schedutil
>   current policy: frequency should be within 826 MHz and 2.84 GHz.
>                   The governor "schedutil" may decide which speed to use
>                   within this range.
>   current CPU frequency is 2.84 GHz.
>   cpufreq stats: 826 MHz:16.01%, 902 MHz:0.08%, 979 MHz:0.05%, 1.06
> GHz:0.06%, 1.21 GHz:0.37%, 1.29 GHz:0.17%, 1.36 GHz:0.15%, 1.46
> GHz:0.20%, 1.54 GHz:0.18%, 1.61 GHz:0.21%, 1.69 GHz:0.17%, 1.77
> GHz:0.22%, 1.84 GHz:0.32%, 1.92 GHz:0.37%, 2.00 GHz:0.22%, 2.09
> GHz:0.20%, 2.17 GHz:0.20%, 2.25 GHz:0.19%, 2.32 GHz:0.19%, 2.40
> GHz:0.21%, 2.48 GHz:0.18%, 2.55 GHz:0.18%, 2.65 GHz:0.21%, 2.75
> GHz:0.16%, 2.84 GHz:79.49%, 2.96 GHz:0.03%  (5315)
> analyzing CPU 5:
>   driver: qcom-cpufreq-hw
>   CPUs which run at the same hardware frequency: 4 5 6 7
>   CPUs which need to have their frequency coordinated by software: 4 5 6 7
>   maximum transition latency: 4294.55 ms.
>   hardware limits: 826 MHz - 2.96 GHz
>   available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
> GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
> GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
> GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
>   available cpufreq governors: ondemand, conservative, powersave,
> userspace, performance, schedutil
>   current policy: frequency should be within 826 MHz and 2.84 GHz.
>                   The governor "schedutil" may decide which speed to use
>                   within this range.
>   current CPU frequency is 2.84 GHz.
>   cpufreq stats: 826 MHz:16.01%, 902 MHz:0.08%, 979 MHz:0.05%, 1.06
> GHz:0.06%, 1.21 GHz:0.37%, 1.29 GHz:0.17%, 1.36 GHz:0.15%, 1.46
> GHz:0.20%, 1.54 GHz:0.18%, 1.61 GHz:0.21%, 1.69 GHz:0.17%, 1.77
> GHz:0.22%, 1.84 GHz:0.32%, 1.92 GHz:0.37%, 2.00 GHz:0.22%, 2.09
> GHz:0.20%, 2.17 GHz:0.20%, 2.25 GHz:0.19%, 2.32 GHz:0.19%, 2.40
> GHz:0.21%, 2.48 GHz:0.18%, 2.55 GHz:0.18%, 2.65 GHz:0.21%, 2.75
> GHz:0.16%, 2.84 GHz:79.49%, 2.96 GHz:0.03%  (5315)
> analyzing CPU 6:
>   driver: qcom-cpufreq-hw
>   CPUs which run at the same hardware frequency: 4 5 6 7
>   CPUs which need to have their frequency coordinated by software: 4 5 6 7
>   maximum transition latency: 4294.55 ms.
>   hardware limits: 826 MHz - 2.96 GHz
>   available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
> GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
> GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
> GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
>   available cpufreq governors: ondemand, conservative, powersave,
> userspace, performance, schedutil
>   current policy: frequency should be within 826 MHz and 2.84 GHz.
>                   The governor "schedutil" may decide which speed to use
>                   within this range.
>   current CPU frequency is 2.84 GHz.
>   cpufreq stats: 826 MHz:16.01%, 902 MHz:0.08%, 979 MHz:0.05%, 1.06
> GHz:0.06%, 1.21 GHz:0.37%, 1.29 GHz:0.17%, 1.36 GHz:0.15%, 1.46
> GHz:0.20%, 1.54 GHz:0.18%, 1.61 GHz:0.21%, 1.69 GHz:0.17%, 1.77
> GHz:0.22%, 1.84 GHz:0.32%, 1.92 GHz:0.37%, 2.00 GHz:0.22%, 2.09
> GHz:0.20%, 2.17 GHz:0.20%, 2.25 GHz:0.19%, 2.32 GHz:0.19%, 2.40
> GHz:0.21%, 2.48 GHz:0.18%, 2.55 GHz:0.18%, 2.65 GHz:0.21%, 2.75
> GHz:0.16%, 2.84 GHz:79.49%, 2.96 GHz:0.03%  (5315)
> analyzing CPU 7:
>   driver: qcom-cpufreq-hw
>   CPUs which run at the same hardware frequency: 4 5 6 7
>   CPUs which need to have their frequency coordinated by software: 4 5 6 7
>   maximum transition latency: 4294.55 ms.
>   hardware limits: 826 MHz - 2.96 GHz
>   available frequency steps: 826 MHz, 902 MHz, 979 MHz, 1.06 GHz, 1.21
> GHz, 1.29 GHz, 1.36 GHz, 1.46 GHz, 1.54 GHz, 1.61 GHz, 1.69 GHz, 1.77
> GHz, 1.84 GHz, 1.92 GHz, 2.00 GHz, 2.09 GHz, 2.17 GHz, 2.25 GHz, 2.32
> GHz, 2.40 GHz, 2.48 GHz, 2.55 GHz, 2.65 GHz, 2.75 GHz, 2.84 GHz
>   available cpufreq governors: ondemand, conservative, powersave,
> userspace, performance, schedutil
>   current policy: frequency should be within 826 MHz and 2.84 GHz.
>                   The governor "schedutil" may decide which speed to use
>                   within this range.
>   current CPU frequency is 2.84 GHz.
>   cpufreq stats: 826 MHz:16.01%, 902 MHz:0.08%, 979 MHz:0.05%, 1.06
> GHz:0.06%, 1.21 GHz:0.37%, 1.29 GHz:0.17%, 1.36 GHz:0.15%, 1.46
> GHz:0.20%, 1.54 GHz:0.18%, 1.61 GHz:0.21%, 1.69 GHz:0.17%, 1.77
> GHz:0.22%, 1.84 GHz:0.32%, 1.92 GHz:0.37%, 2.00 GHz:0.22%, 2.09
> GHz:0.20%, 2.17 GHz:0.20%, 2.25 GHz:0.19%, 2.32 GHz:0.19%, 2.40
> GHz:0.21%, 2.48 GHz:0.18%, 2.55 GHz:0.18%, 2.65 GHz:0.21%, 2.75
> GHz:0.16%, 2.84 GHz:79.49%, 2.96 GHz:0.03%  (5315)
>
>
>>>
>>> Thank you for the fast response!
>>>
>>> -- steev
>>>
>>

2021-11-06 01:33:19

by Steev Klimaszewski

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication


> [snip]
> Hi,
>
> So IIUC the below logs correctly, you are never hitting boost
> frequency (with or without this patch series). Is that correct ?
>
> w.r.t temperature , how are you measuring it? Do you have LMh enabled
> or are you using tsens to mitigate cpu temperature ?


Hi,

I was wrong - it does indeed go boost with the patchset applied, it's
just that it doesn't boost up to 2.96GHz very often at all. As noted by
the 0.03% when i ran it while compiling zellij; I reapplied the patches
(and the 6th patch from Lukasz's email) and after boot, 2.96GHz was
showing at 0.39%.

Most tools that read the cpu frequency don't really seem to be well
suited for big.LITTLE, and seem to throw an average of the speed, so
cpufreq-info was the best I have.  We're apparently supposed to be using
cpupower these days, but it doesn't seem to know anything about arm64
devices.

Temperature wise, I'm just getting from the sensors, and I am using LMh.

Now, I have to admit, while I've thrown a patch here or there, I'm not
exactly a kernel developer, just enough knowledge to be somewhat
dangerous and know how to backport things.  In my mind, and my line of
thinking, I would expect with boost enabled, that the cpu would boost up
to that as often as possible, not require a specific workload to
actually hit it.  But then again, I would expect multiple compilation
jobs to be one of the workloads that would?

So I think, the part about never hitting 2.96GHz can be dismissed, and
was simply my lack of knowledge about the cpufreq-info tool's averages. 
It does seem however to rarely ever hit 2.96GHz and I would actually
expect it to hit it far more often.

2021-11-08 15:34:33

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication



On 11/5/21 10:46 PM, Steev Klimaszewski wrote:
>
>> [snip]
>> Hi,
>>
>> So IIUC the below logs correctly, you are never hitting boost
>> frequency (with or without this patch series). Is that correct ?
>>
>> w.r.t temperature , how are you measuring it? Do you have LMh enabled
>> or are you using tsens to mitigate cpu temperature ?
>
>
> Hi,
>
> I was wrong - it does indeed go boost with the patchset applied, it's
> just that it doesn't boost up to 2.96GHz very often at all. As noted by
> the 0.03% when i ran it while compiling zellij; I reapplied the patches
> (and the 6th patch from Lukasz's email) and after boot, 2.96GHz was
> showing at 0.39%.
>
> Most tools that read the cpu frequency don't really seem to be well
> suited for big.LITTLE, and seem to throw an average of the speed, so
> cpufreq-info was the best I have.  We're apparently supposed to be using
> cpupower these days, but it doesn't seem to know anything about arm64
> devices.
>
> Temperature wise, I'm just getting from the sensors, and I am using LMh.
>
> Now, I have to admit, while I've thrown a patch here or there, I'm not
> exactly a kernel developer, just enough knowledge to be somewhat
> dangerous and know how to backport things.  In my mind, and my line of
> thinking, I would expect with boost enabled, that the cpu would boost up
> to that as often as possible, not require a specific workload to
> actually hit it.  But then again, I would expect multiple compilation
> jobs to be one of the workloads that would?
>
> So I think, the part about never hitting 2.96GHz can be dismissed, and
> was simply my lack of knowledge about the cpufreq-info tool's averages.
> It does seem however to rarely ever hit 2.96GHz and I would actually
> expect it to hit it far more often.
>

Thank you for the logs and re-testing it with the debug changes.
That's valuable information. I'll work on it today.

Regarding the 'boost' frequency, as you observed, it's hard to
measure it properly. Normally when you use the frequency governor
'schedutil' (which is your case), you need a high utilization workload
to request highest frequency. The task scheduler does this calculation
and then asks for the frequency.
I'll spend more time trying to understand how this Qcom driver and HW
handles it. The patch set itself should not block it.

I'll get back to you soon.

Regards,
Lukasz

2021-11-08 16:50:05

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v3 4/5] cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function

Hi Thara,

+CC Steev, who discovered this issue with boost
frequency

On 11/5/21 7:12 PM, Thara Gopinath wrote:
> Hi Lukasz,
>
>
> On 11/3/21 12:10 PM, Lukasz Luba wrote:
>> Thermal pressure provides a new API, which allows to use CPU frequency
>> as an argument. That removes the need of local conversion to capacity.
>> Use this new API and remove old local conversion code.
>>
>> Signed-off-by: Lukasz Luba <[email protected]>
>> ---
>>   drivers/cpufreq/qcom-cpufreq-hw.c | 15 +++++----------
>>   1 file changed, 5 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c
>> b/drivers/cpufreq/qcom-cpufreq-hw.c
>> index 0138b2ec406d..425f351450ad 100644
>> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
>> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
>> @@ -275,10 +275,10 @@ static unsigned int
>> qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)
>>   static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>>   {
>> -    unsigned long max_capacity, capacity, freq_hz, throttled_freq;
>>       struct cpufreq_policy *policy = data->policy;
>>       int cpu = cpumask_first(policy->cpus);
>>       struct device *dev = get_cpu_device(cpu);
>> +    unsigned long freq_hz, throttled_freq;
>>       struct dev_pm_opp *opp;
>>       unsigned int freq;
>> @@ -295,17 +295,12 @@ static void qcom_lmh_dcvs_notify(struct
>> qcom_cpufreq_data *data)
>>       throttled_freq = freq_hz / HZ_PER_KHZ;
>> -    /* Update thermal pressure */
>> -
>> -    max_capacity = arch_scale_cpu_capacity(cpu);
>> -    capacity = mult_frac(max_capacity, throttled_freq,
>> policy->cpuinfo.max_freq);
>> -
>>       /* Don't pass boost capacity to scheduler */
>> -    if (capacity > max_capacity)
>> -        capacity = max_capacity;
>
> So, I think this should go into the common
> topology_update_thermal_pressure in lieu of
>
> +    if (WARN_ON(max_freq < capped_freq))
> +        return;
>
> This will fix the issue Steev Klimaszewski has been reporting
> https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
>
>

Well, I think the issue is broader. Look at the code which
calculate this 'capacity'. It's just a multiplication & division:

max_capacity = arch_scale_cpu_capacity(cpu); // =1024 in our case
capacity = mult_frac(max_capacity, throttled_freq,
policy->cpuinfo.max_freq);

In the reported by Steev output from sysfs cpufreq we know
that the value of 'policy->cpuinfo.max_freq' is:
/sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:2956800

so when we put the values to the equation we get:
capacity = 1024 * 2956800 / 2956800; // =1024
The 'capacity' will be always <= 1024 and this check won't
be triggered:

/* Don't pass boost capacity to scheduler */
if (capacity > max_capacity)
capacity = max_capacity;


IIUC you original code, you don't want to have this boost
frequency to be treated as 1024 capacity. The reason is because
the whole capacity machinery in arch_topology.c is calculated based
on max freq value = 2841600,
so the max capacity 1024 would be pinned to that frequency
(according to Steeve's log:
[ 22.552273] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for
CPUs [4-7] )


Having all this in mind, the multiplication and division in your
original code should be done:

capacity = 1024 * 2956800 / 2841600; // = 1065

then clamped to 1024 value.

My change just unveiled this division issue.

With that in mind, I tend to agree that I should have not
rely on passed boost freq value and try to apply your suggestion check.
Let me experiment with that...

Regards,
Lukasz

2021-11-08 17:42:29

by Steev Klimaszewski

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication


> Hi Steev,
>
> So this depends on the cpufreq governor you are using. By-default arm
> systems have sched-util governor enabled. This means you will scale up
> to boost depending on cpu load and not always. If you want to ensure
> you are always hitting boost frequency, you should enable performance
> governor for cpufreq and try.
>
> Also since the defconfig has by default CPU_FREQ_STAT enabled, you
> should be able to get statistics out of cpufreq to see the time spent
> by a cpu in each frequency. I think cpufreq-info -s should give you
> this info. If not, you can explicitly get it for each cpu from
>
> cat /sys/devices/system/cpu/cpu<X>/cpufreq/stats/time_in_state
>
> Regarding temperature, if you have applied all the patches in the
> sdm845 LMh series and have LMh enabled, cpu throttling starts around
> 95 degree C.
>
Hi Thara,

Indeed, I ended up finding the time_in_state when I was doing more
digging after my last mail.  I do have the sdm845 LMh series and LMh
enabled, however I don't think I've ever seen my system go above 90C here.

So a quick look, and... we are simply almost never getting the 2.95GHz
at all, regardless of workload.  I saw Lukasz response as well about the
math possibly being wrong, but I haven't had a chance.

Regarding the time in state - I went with policy4 instead of per cpu
(for brevity sake) and it's here:

c630:~$ cat /sys/devices/system/cpu/cpufreq/policy4/stats/time_in_state
825600 225037
902400 92
979200 205
1056000 96
1209600 902
1286400 386
1363200 396
1459200 217
1536000 101
1612800 75
1689600 95
1766400 130
1843200 255
1920000 318
1996800 92
2092800 87
2169600 66
2246400 60
2323200 58
2400000 54
2476800 47
2553600 50
2649600 69
2745600 58
2841600 54619
2956800 5

So we spend *very* little time in 2.96GHz and this is after almost 14
hours of uptime on the C630.  By comparison, on a Pinebook Pro where
I've added in 2GHz as a boost frequency :

pinebook-pro:~$ cat
/sys/devices/system/cpu/cpufreq/policy4/stats/time_in_state
408000 16084466
600000 27212
816000 32487
1008000 11331
1200000 13268
1416000 75078
1608000 18392
1800000 207266
2016000 648612

With the Pinebook Pro, which doesn't even come close to getting to 95C,
we spend a lot more time in 2GHz.

-- steev

2021-11-08 19:54:47

by Thara Gopinath

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication



On 11/5/21 6:46 PM, Steev Klimaszewski wrote:
>
>> [snip]
>> Hi,
>>
>> So IIUC the below logs correctly, you are never hitting boost
>> frequency (with or without this patch series). Is that correct ?
>>
>> w.r.t temperature , how are you measuring it? Do you have LMh enabled
>> or are you using tsens to mitigate cpu temperature ?
>
>
> Hi,
>
> I was wrong - it does indeed go boost with the patchset applied, it's
> just that it doesn't boost up to 2.96GHz very often at all. As noted by
> the 0.03% when i ran it while compiling zellij; I reapplied the patches
> (and the 6th patch from Lukasz's email) and after boot, 2.96GHz was
> showing at 0.39%.
>
> Most tools that read the cpu frequency don't really seem to be well
> suited for big.LITTLE, and seem to throw an average of the speed, so
> cpufreq-info was the best I have.  We're apparently supposed to be using
> cpupower these days, but it doesn't seem to know anything about arm64
> devices.
>
> Temperature wise, I'm just getting from the sensors, and I am using LMh.
>
> Now, I have to admit, while I've thrown a patch here or there, I'm not
> exactly a kernel developer, just enough knowledge to be somewhat
> dangerous and know how to backport things.  In my mind, and my line of
> thinking, I would expect with boost enabled, that the cpu would boost up
> to that as often as possible, not require a specific workload to
> actually hit it.  But then again, I would expect multiple compilation
> jobs to be one of the workloads that would?

Hi Steev,

So this depends on the cpufreq governor you are using. By-default arm
systems have sched-util governor enabled. This means you will scale up
to boost depending on cpu load and not always. If you want to ensure you
are always hitting boost frequency, you should enable performance
governor for cpufreq and try.

Also since the defconfig has by default CPU_FREQ_STAT enabled, you
should be able to get statistics out of cpufreq to see the time spent by
a cpu in each frequency. I think cpufreq-info -s should give you this
info. If not, you can explicitly get it for each cpu from

cat /sys/devices/system/cpu/cpu<X>/cpufreq/stats/time_in_state

Regarding temperature, if you have applied all the patches in the sdm845
LMh series and have LMh enabled, cpu throttling starts around 95 degree C.

>
> So I think, the part about never hitting 2.96GHz can be dismissed, and
> was simply my lack of knowledge about the cpufreq-info tool's averages.
> It does seem however to rarely ever hit 2.96GHz and I would actually
> expect it to hit it far more often.
>

--
Warm Regards
Thara (She/Her/Hers)

2021-11-09 01:04:48

by Thara Gopinath

[permalink] [raw]
Subject: Re: [PATCH v3 4/5] cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function



On 11/8/21 9:12 AM, Lukasz Luba wrote:
...snip

>>
>>
>
> Well, I think the issue is broader. Look at the code which
> calculate this 'capacity'. It's just a multiplication & division:
>
> max_capacity = arch_scale_cpu_capacity(cpu); // =1024 in our case
> capacity = mult_frac(max_capacity, throttled_freq,
>         policy->cpuinfo.max_freq);
>
> In the reported by Steev output from sysfs cpufreq we know
> that the value of 'policy->cpuinfo.max_freq' is:
> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:2956800
>
> so when we put the values to the equation we get:
> capacity = 1024 * 2956800 / 2956800; // =1024
> The 'capacity' will be always <= 1024 and this check won't
> be triggered:
>
> /* Don't pass boost capacity to scheduler */
> if (capacity > max_capacity)
>     capacity = max_capacity;
>
>
> IIUC you original code, you don't want to have this boost
> frequency to be treated as 1024 capacity. The reason is because
> the whole capacity machinery in arch_topology.c is calculated based
> on max freq value = 2841600,
> so the max capacity 1024 would be pinned to that frequency
> (according to Steeve's log:
> [   22.552273] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for
> CPUs [4-7] )

Hi Lukasz,

Yes you are right in that I was using policy->cpuinfo.max_freq where as
I should have used freq_factor. So now that you are using freq_factor,
it makes sense to cap the capacity at the max capacity calulated by the
scheduler.

I agree that the problem is complex because at some point we should look
at rebuilding the topology based on changes to policy->cpuinfo.max_freq.

>
>
> Having all this in mind, the multiplication and division in your
> original code should be done:
>
> capacity = 1024 * 2956800 / 2841600; // = 1065
>
> then clamped to 1024 value.
>
> My change just unveiled this division issue.
>
> With that in mind, I tend to agree that I should have not
> rely on passed boost freq value and try to apply your suggestion check.
> Let me experiment with that...
>
> Regards,
> Lukasz

--
Warm Regards
Thara (She/Her/Hers)

2021-11-09 01:08:47

by Steev Klimaszewski

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication

Hi Thara,
> Hi Steev,
>
> IIUC, PineBook Pro has Rockchip RK3399 which has 2 Cortex A-72 and 4
> Cortex A-52 where as C630 has Qualcomm sdm845 which has 4 Cortex A-75
> and 4 Cortex A-55. Task placements and subsequently cpu load will be
> different for both the platforms. With the same workload, I will
> expect Rockchip to system to be more loaded than sdm845. Having said
> that, what cpu-freq governor are you using on both the systems.
>
I'm using sched-util on both of the systems.

I've tried a number of different ways of forcing builds only on the A-75
cores, and I simply cannot get the load to be "enough" to kick in the
boost frequency.

An example being

git clone https://github.com/zellij-org/zellij.git

cd zellij

taskset --cpu-list 4-7 cargo build --release

git clean -fdx

taskset --cpu-list 6-7 cargo build --release


On my C630, it never goes higher than 85C with the 4 cores being used,
and with 2, it never goes about 65C and I do not get any 2.96GHz.  It's
currently sitting at "6" in the time_in_state for 2965800.


--steev

2021-11-09 05:14:37

by Thara Gopinath

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication



On 11/8/21 10:22 AM, Steev Klimaszewski wrote:
>
>> Hi Steev,
>>
>> So this depends on the cpufreq governor you are using. By-default arm
>> systems have sched-util governor enabled. This means you will scale up
>> to boost depending on cpu load and not always. If you want to ensure
>> you are always hitting boost frequency, you should enable performance
>> governor for cpufreq and try.
>>
>> Also since the defconfig has by default CPU_FREQ_STAT enabled, you
>> should be able to get statistics out of cpufreq to see the time spent
>> by a cpu in each frequency. I think cpufreq-info -s should give you
>> this info. If not, you can explicitly get it for each cpu from
>>
>> cat /sys/devices/system/cpu/cpu<X>/cpufreq/stats/time_in_state
>>
>> Regarding temperature, if you have applied all the patches in the
>> sdm845 LMh series and have LMh enabled, cpu throttling starts around
>> 95 degree C.
>>
> Hi Thara,
>
> Indeed, I ended up finding the time_in_state when I was doing more
> digging after my last mail.  I do have the sdm845 LMh series and LMh
> enabled, however I don't think I've ever seen my system go above 90C here.
>
> So a quick look, and... we are simply almost never getting the 2.95GHz
> at all, regardless of workload.  I saw Lukasz response as well about the
> math possibly being wrong, but I haven't had a chance.
>
> Regarding the time in state - I went with policy4 instead of per cpu
> (for brevity sake) and it's here:
>
> c630:~$ cat /sys/devices/system/cpu/cpufreq/policy4/stats/time_in_state
> 825600 225037
> 902400 92
> 979200 205
> 1056000 96
> 1209600 902
> 1286400 386
> 1363200 396
> 1459200 217
> 1536000 101
> 1612800 75
> 1689600 95
> 1766400 130
> 1843200 255
> 1920000 318
> 1996800 92
> 2092800 87
> 2169600 66
> 2246400 60
> 2323200 58
> 2400000 54
> 2476800 47
> 2553600 50
> 2649600 69
> 2745600 58
> 2841600 54619
> 2956800 5
>
> So we spend *very* little time in 2.96GHz and this is after almost 14
> hours of uptime on the C630.  By comparison, on a Pinebook Pro where
> I've added in 2GHz as a boost frequency :

Hi Steev,

IIUC, PineBook Pro has Rockchip RK3399 which has 2 Cortex A-72 and 4
Cortex A-52 where as C630 has Qualcomm sdm845 which has 4 Cortex A-75
and 4 Cortex A-55. Task placements and subsequently cpu load will be
different for both the platforms. With the same workload, I will expect
Rockchip to system to be more loaded than sdm845. Having said that, what
cpu-freq governor are you using on both the systems.


>
> pinebook-pro:~$ cat
> /sys/devices/system/cpu/cpufreq/policy4/stats/time_in_state
> 408000 16084466
> 600000 27212
> 816000 32487
> 1008000 11331
> 1200000 13268
> 1416000 75078
> 1608000 18392
> 1800000 207266
> 2016000 648612
>
> With the Pinebook Pro, which doesn't even come close to getting to 95C,
> we spend a lot more time in 2GHz.
>
> -- steev
>

--
Warm Regards
Thara (She/Her/Hers)

2021-11-09 17:07:48

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication

Hi Steev,

That's interesting what you've done with Rockchip RK3399.
I would like to reproduce your experiment on my RockPI 4B v1.3.
Could you tell me how you to add this boost frequency that you have
mentioned in some previous emails?

I want to have similar setup to yours and I'll check all the subsystems
involved in the decision making process for triggering this boost freq.

On 11/8/21 11:21 PM, Steev Klimaszewski wrote:
> Hi Thara,
>> Hi Steev,
>>
>> IIUC, PineBook Pro has Rockchip RK3399 which has 2 Cortex A-72 and 4
>> Cortex A-52 where as C630 has Qualcomm sdm845 which has 4 Cortex A-75
>> and 4 Cortex A-55. Task placements and subsequently cpu load will be
>> different for both the platforms. With the same workload, I will
>> expect Rockchip to system to be more loaded than sdm845. Having said
>> that, what cpu-freq governor are you using on both the systems.
>>
> I'm using sched-util on both of the systems.
>
> I've tried a number of different ways of forcing builds only on the A-75
> cores, and I simply cannot get the load to be "enough" to kick in the
> boost frequency.
>
> An example being
>
> git clone https://github.com/zellij-org/zellij.git
>
> cd zellij
>
> taskset --cpu-list 4-7 cargo build --release
>
> git clean -fdx
>
> taskset --cpu-list 6-7 cargo build --release

Thanks for the pointers, I'll give it a try when I sort out this
Rockchip boost setup.

>
>
> On my C630, it never goes higher than 85C with the 4 cores being used,
> and with 2, it never goes about 65C and I do not get any 2.96GHz.  It's
> currently sitting at "6" in the time_in_state for 2965800.
>
>
> --steev
>

Thank you for your support.

Regards,
Lukasz

2021-11-09 17:47:58

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v3 4/5] cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function



On 11/8/21 9:23 PM, Thara Gopinath wrote:
>
>
> On 11/8/21 9:12 AM, Lukasz Luba wrote:
> ...snip
>
>>>
>>>
>>
>> Well, I think the issue is broader. Look at the code which
>> calculate this 'capacity'. It's just a multiplication & division:
>>
>> max_capacity = arch_scale_cpu_capacity(cpu); // =1024 in our case
>> capacity = mult_frac(max_capacity, throttled_freq,
>>          policy->cpuinfo.max_freq);
>>
>> In the reported by Steev output from sysfs cpufreq we know
>> that the value of 'policy->cpuinfo.max_freq' is:
>> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:2956800
>>
>> so when we put the values to the equation we get:
>> capacity = 1024 * 2956800 / 2956800; // =1024
>> The 'capacity' will be always <= 1024 and this check won't
>> be triggered:
>>
>> /* Don't pass boost capacity to scheduler */
>> if (capacity > max_capacity)
>>      capacity = max_capacity;
>>
>>
>> IIUC you original code, you don't want to have this boost
>> frequency to be treated as 1024 capacity. The reason is because
>> the whole capacity machinery in arch_topology.c is calculated based
>> on max freq value = 2841600,
>> so the max capacity 1024 would be pinned to that frequency
>> (according to Steeve's log:
>> [   22.552273] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956)
>> for CPUs [4-7] )
>
> Hi Lukasz,
>
> Yes you are right in that I was using policy->cpuinfo.max_freq where as
> I should have used freq_factor. So now that you are using freq_factor,
> it makes sense to cap the capacity at the max capacity calulated by the
> scheduler.
>
> I agree that the problem is complex because at some point we should look
> at rebuilding the topology based on changes to policy->cpuinfo.max_freq.
>

I probably cannot fix your driver easily right now. What I can do and is
actually required for this new API arch_update_thermal_pressure() is to
accept boost frequencies (values which are higher that 'freq_factor')
without triggering a warning and just setting the thermal pressure to 0
(since we are told that the frequency capping is completely removed even
for boost values).

The next step would be to perform longer investigation how the boost
frequencies are accepted then triggered/used by scheduler and other
involved machinery.

I've asked Steev for help with setting up this Rockchip RK3399 new boost
frequency which actually is used. I want to understand why that platform
is able to use the boost freq and this Qcom SoC is not able to use it.

I agree with you that at some point we might need to try rebuilding the
topology information based on these policy->cpuinfo.max_freq changes.

I hope it would take only a few steps to fix these issues completely,
without destroying a lot of existing code...

Regards,
Lukasz

2021-11-10 00:03:02

by Steev Klimaszewski

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication


On 11/9/21 2:29 AM, Lukasz Luba wrote:
> Hi Steev,
>
> That's interesting what you've done with Rockchip RK3399.
> I would like to reproduce your experiment on my RockPI 4B v1.3.
> Could you tell me how you to add this boost frequency that you have
> mentioned in some previous emails?
>
> I want to have similar setup to yours and I'll check all the subsystems
> involved in the decision making process for triggering this boost freq.
>
> Thank you for your support.
>
> Regards,
> Lukasz


Hi Lukasz,

It was actually something that Armbian had been doing as an overlay for
their setup, and I thought, why does it need to be an overlay, when we
could simply hide it behind turbo-mode so that if users want to
overclock, they simply echo 1 and if it's unstable or cooling/power
isn't enough, they can echo 0 or leave it off (boost defaults to off) -
so that being said:

I apply this patch
https://gitlab.com/kalilinux/build-scripts/kali-arm/-/blob/master/patches/pinebook-pro/pbp-5.14/rk3399-opp-overclock-2GHz-turbo-mode.patch
which adds the 1.5GHz for little cores and 2GHz for the big to the
rk3399 dtsi

To enable at boot time, I simply have "echo 1 >
/sys/devices/system/cpu/cpufreq/boost" in my /etc/rc.local  And to
disable, simply echo 0 in there (it defaults to 0 so it's off and most
users won't know it exists.)

I'm pretty sure this is "abusing" turbo-mode, but it works well enough...

Hope that helps,

-- steev

2021-11-10 00:05:40

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication



On 11/9/21 3:46 PM, Steev Klimaszewski wrote:
>
> On 11/9/21 2:29 AM, Lukasz Luba wrote:
>> Hi Steev,
>>
>> That's interesting what you've done with Rockchip RK3399.
>> I would like to reproduce your experiment on my RockPI 4B v1.3.
>> Could you tell me how you to add this boost frequency that you have
>> mentioned in some previous emails?
>>
>> I want to have similar setup to yours and I'll check all the subsystems
>> involved in the decision making process for triggering this boost freq.
>>
>> Thank you for your support.
>>
>> Regards,
>> Lukasz
>
>
> Hi Lukasz,
>
> It was actually something that Armbian had been doing as an overlay for
> their setup, and I thought, why does it need to be an overlay, when we
> could simply hide it behind turbo-mode so that if users want to
> overclock, they simply echo 1 and if it's unstable or cooling/power
> isn't enough, they can echo 0 or leave it off (boost defaults to off) -
> so that being said:
>
> I apply this patch
> https://gitlab.com/kalilinux/build-scripts/kali-arm/-/blob/master/patches/pinebook-pro/pbp-5.14/rk3399-opp-overclock-2GHz-turbo-mode.patch
> which adds the 1.5GHz for little cores and 2GHz for the big to the
> rk3399 dtsi
>
> To enable at boot time, I simply have "echo 1 >
> /sys/devices/system/cpu/cpufreq/boost" in my /etc/rc.local  And to
> disable, simply echo 0 in there (it defaults to 0 so it's off and most
> users won't know it exists.)
>
> I'm pretty sure this is "abusing" turbo-mode, but it works well enough...
>
> Hope that helps,
>

Yes, that help. Thank you for the info.
I'll play a bit with this boosting and try to figure out
the mechanisms.

For the $subject patch set, I'm going to send v4, since
it's not affecting the boost usage. The newly introduced
interface must handle these boost frequency values and not
simply ignore them with also printing a warning.
They are valid frequencies and we should just put 0 to
the thermal pressure in such cases.

Regards,
Lukasz

2021-11-10 00:12:43

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication



On 11/9/21 4:22 PM, Lukasz Luba wrote:
>
>
> On 11/9/21 3:46 PM, Steev Klimaszewski wrote:
>>
>> On 11/9/21 2:29 AM, Lukasz Luba wrote:
>>> Hi Steev,
>>>
>>> That's interesting what you've done with Rockchip RK3399.
>>> I would like to reproduce your experiment on my RockPI 4B v1.3.
>>> Could you tell me how you to add this boost frequency that you have
>>> mentioned in some previous emails?
>>>
>>> I want to have similar setup to yours and I'll check all the subsystems
>>> involved in the decision making process for triggering this boost freq.
>>>
>>> Thank you for your support.
>>>
>>> Regards,
>>> Lukasz
>>
>>
>> Hi Lukasz,
>>
>> It was actually something that Armbian had been doing as an overlay
>> for their setup, and I thought, why does it need to be an overlay,
>> when we could simply hide it behind turbo-mode so that if users want
>> to overclock, they simply echo 1 and if it's unstable or cooling/power
>> isn't enough, they can echo 0 or leave it off (boost defaults to off)
>> - so that being said:
>>
>> I apply this patch
>> https://gitlab.com/kalilinux/build-scripts/kali-arm/-/blob/master/patches/pinebook-pro/pbp-5.14/rk3399-opp-overclock-2GHz-turbo-mode.patch
>> which adds the 1.5GHz for little cores and 2GHz for the big to the
>> rk3399 dtsi
>>
>> To enable at boot time, I simply have "echo 1 >
>> /sys/devices/system/cpu/cpufreq/boost" in my /etc/rc.local  And to
>> disable, simply echo 0 in there (it defaults to 0 so it's off and most
>> users won't know it exists.)
>>
>> I'm pretty sure this is "abusing" turbo-mode, but it works well enough...
>>
>> Hope that helps,
>>
>
> Yes, that help. Thank you for the info.
> I'll play a bit with this boosting and try to figure out
> the mechanisms.
>
> For the $subject patch set, I'm going to send v4, since
> it's not affecting the boost usage. The newly introduced
> interface must handle these boost frequency values and not
> simply ignore them with also printing a warning.
> They are valid frequencies and we should just put 0 to
> the thermal pressure in such cases.
>

I think I have figure out what is going on with the issue that
you've reported. On this rockchip platform you are probably using
step-wise thermal governor, which tries to decrease/increase
max allowed frequency step-by-step walking through the sorted
frequencies. So it would always set the thermal pressure to 0
when the thermal throttling is gone.
On the Qcom platform there is a different policy in HW/FW which
controls thermal and it can simple remove clamping 'instantly'
and allow all frequencies also the boost one. The highest possible
frequency is passed then to the this thermal pressure machinery.
So we see the warning that the boost frequency value is trying to
be passed to this arch_update_thermal_pressure(), but we ignore
such big frequency value and unfortunately do not clean the previously
set thermal pressure. Then the scheduler still sees the reduced
capacity on that CPU and cannot request higher frequencies.

The v4 patch would allow to pass the boost frequencies values, so
the issue would be solved.

Regards,
Lukasz

2021-11-10 00:13:49

by Steev Klimaszewski

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication

Hi Lukasz,
>>
>
> I think I have figure out what is going on with the issue that
> you've reported. On this rockchip platform you are probably using
> step-wise thermal governor, which tries to decrease/increase
> max allowed frequency step-by-step walking through the sorted
> frequencies. So it would always set the thermal pressure to 0
> when the thermal throttling is gone.
> On the Qcom platform there is a different policy in HW/FW which
> controls thermal and it can simple remove clamping 'instantly'
> and allow all frequencies also the boost one. The highest possible
> frequency is passed then to the this thermal pressure machinery.
> So we see the warning that the boost frequency value is trying to
> be passed to this arch_update_thermal_pressure(), but we ignore
> such big frequency value and unfortunately do not clean the previously
> set thermal pressure. Then the scheduler still sees the reduced
> capacity on that CPU and cannot request higher frequencies.
>
> The v4 patch would allow to pass the boost frequencies values, so
> the issue would be solved.
>
> Regards,
> Lukasz

Sounds good, I look forward to testing v4 :)

-- steev