Add support for thermal throttling on SDM845.
We introduce a generic .ready callback to be used by cpufreq drivers to
register as a thermal cooling device. If this approach is acceptable I can
send a series converting other cpufreq drivers to use this callback.
Amit Kucheria (7):
drivers: thermal: of-thermal: Print name of device node with error
drivers: cpufreq: Add thermal_cooling_device pointer to struct
cpufreq_policy
cpu_cooling: Add generic driver ready callback
cpufreq: qcom-hw: Move to device_initcall
cpufreq: qcom-hw: Register as a cpufreq cooling device
arm64: dts: sdm845: Increase alert trip point to 95 degrees
arm64: dts: sdm845: wireup the thermal trip points to cpufreq
arch/arm64/boot/dts/qcom/sdm845.dtsi | 161 +++++++++++++++++++++++++--
drivers/cpufreq/qcom-cpufreq-hw.c | 7 +-
drivers/thermal/cpu_cooling.c | 18 +++
drivers/thermal/of-thermal.c | 4 +-
include/linux/cpu_cooling.h | 9 ++
include/linux/cpufreq.h | 2 +
6 files changed, 190 insertions(+), 11 deletions(-)
--
2.17.1
All cpufreq drivers do similar things to register as a cooling device.
Provide a generic call back so we can get rid of duplicated code in the
drivers.
Signed-off-by: Amit Kucheria <[email protected]>
Suggested-by: Stephen Boyd <[email protected]>
---
drivers/thermal/cpu_cooling.c | 18 ++++++++++++++++++
include/linux/cpu_cooling.h | 9 +++++++++
2 files changed, 27 insertions(+)
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index dfd23245f778..5154ffc12332 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -694,6 +694,7 @@ __cpufreq_cooling_register(struct device_node *np,
cpufreq_cdev->clipped_freq = cpufreq_cdev->freq_table[0].frequency;
cpufreq_cdev->cdev = cdev;
+ policy->cooldev = cdev;
mutex_lock(&cooling_list_lock);
/* Register the notifier for first cpufreq cooling device */
@@ -785,6 +786,23 @@ of_cpufreq_cooling_register(struct cpufreq_policy *policy)
}
EXPORT_SYMBOL_GPL(of_cpufreq_cooling_register);
+/**
+ * generic_cpufreq_ready - generic driver callback to register a thermal_cooling_device
+ * @policy: cpufreq policy
+ */
+void generic_cpufreq_ready(struct cpufreq_policy *policy)
+{
+ struct thermal_cooling_device **cdev = &policy->cooldev;
+
+ *cdev = of_cpufreq_cooling_register(policy);
+ if (IS_ERR(*cdev)) {
+ pr_err("Failed to register cpufreq cooling device for CPU%d: %ld\n",
+ policy->cpu, PTR_ERR(cdev));
+ cdev = NULL;
+ }
+}
+EXPORT_SYMBOL_GPL(generic_cpufreq_ready);
+
/**
* cpufreq_cooling_unregister - function to remove cpufreq cooling device.
* @cdev: thermal cooling device pointer.
diff --git a/include/linux/cpu_cooling.h b/include/linux/cpu_cooling.h
index de0dafb9399d..d7815bb2967a 100644
--- a/include/linux/cpu_cooling.h
+++ b/include/linux/cpu_cooling.h
@@ -65,12 +65,21 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
*/
struct thermal_cooling_device *
of_cpufreq_cooling_register(struct cpufreq_policy *policy);
+
+/**
+ * generic_cpufreq_ready - generic driver callback to register a thermal_cooling_device
+ * @policy: cpufreq policy
+ */
+void generic_cpufreq_ready(struct cpufreq_policy *policy);
#else
static inline struct thermal_cooling_device *
of_cpufreq_cooling_register(struct cpufreq_policy *policy)
{
return NULL;
}
+
+void generic_cpufreq_ready(struct cpufreq_policy *policy) {}
+
#endif /* defined(CONFIG_THERMAL_OF) && defined(CONFIG_CPU_THERMAL) */
#endif /* __CPU_COOLING_H__ */
--
2.17.1
subsys_initcall causes problems registering the driver as a thermal
cooling device.
If "faster boot" is the main reason for doing subsys_initcall, this
should be handled in the bootloader or another boot constraint
framework.
Signed-off-by: Amit Kucheria <[email protected]>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index d83939a1b3d4..649dddd72749 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -296,7 +296,7 @@ static int __init qcom_cpufreq_hw_init(void)
{
return platform_driver_register(&qcom_cpufreq_hw_driver);
}
-subsys_initcall(qcom_cpufreq_hw_init);
+device_initcall(qcom_cpufreq_hw_init);
static void __exit qcom_cpufreq_hw_exit(void)
{
--
2.17.1
Signed-off-by: Amit Kucheria <[email protected]>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 649dddd72749..1c01311e5927 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -5,6 +5,7 @@
#include <linux/bitfield.h>
#include <linux/cpufreq.h>
+#include <linux/cpu_cooling.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
@@ -216,7 +217,10 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
static int qcom_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy)
{
void __iomem *base = policy->driver_data - REG_PERF_STATE;
+ struct thermal_cooling_device *cdev = policy->cooldev;
+ if (cdev)
+ cpufreq_cooling_unregister(cdev);
kfree(policy->freq_table);
devm_iounmap(&global_pdev->dev, base);
@@ -238,6 +242,7 @@ static struct cpufreq_driver cpufreq_qcom_hw_driver = {
.init = qcom_cpufreq_hw_cpu_init,
.exit = qcom_cpufreq_hw_cpu_exit,
.fast_switch = qcom_cpufreq_hw_fast_switch,
+ .ready = generic_cpufreq_ready,
.name = "qcom-cpufreq-hw",
.attr = qcom_cpufreq_hw_attr,
};
--
2.17.1
Since the big and little cpus are in the same frequency domain, use all
of them for mitigation in the cooling-map. At the lower trip points we
restrict ourselves to throttling only a few OPPs. At higher trip
temperatures, allow ourselves to be throttled to any extent.
Signed-off-by: Amit Kucheria <[email protected]>
---
arch/arm64/boot/dts/qcom/sdm845.dtsi | 145 +++++++++++++++++++++++++++
1 file changed, 145 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 29e823b0caf4..cd6402a9aa64 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -13,6 +13,7 @@
#include <dt-bindings/reset/qcom,sdm845-aoss.h>
#include <dt-bindings/soc/qcom,rpmh-rsc.h>
#include <dt-bindings/clock/qcom,gcc-sdm845.h>
+#include <dt-bindings/thermal/thermal.h>
/ {
interrupt-parent = <&intc>;
@@ -99,6 +100,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x0>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_0>;
L2_0: l2-cache {
compatible = "cache";
@@ -114,6 +116,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x100>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_100>;
L2_100: l2-cache {
compatible = "cache";
@@ -126,6 +129,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x200>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_200>;
L2_200: l2-cache {
compatible = "cache";
@@ -138,6 +142,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x300>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_300>;
L2_300: l2-cache {
compatible = "cache";
@@ -150,6 +155,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x400>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_400>;
L2_400: l2-cache {
compatible = "cache";
@@ -162,6 +168,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x500>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_500>;
L2_500: l2-cache {
compatible = "cache";
@@ -174,6 +181,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x600>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_600>;
L2_600: l2-cache {
compatible = "cache";
@@ -186,6 +194,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x700>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_700>;
L2_700: l2-cache {
compatible = "cache";
@@ -1703,6 +1712,23 @@
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu_alert0>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
+ <&CPU1 THERMAL_NO_LIMIT 4>,
+ <&CPU2 THERMAL_NO_LIMIT 4>,
+ <&CPU3 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu_crit0>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu1-thermal {
@@ -1724,6 +1750,23 @@
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu_alert1>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
+ <&CPU1 THERMAL_NO_LIMIT 4>,
+ <&CPU2 THERMAL_NO_LIMIT 4>,
+ <&CPU3 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu_crit1>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu2-thermal {
@@ -1745,6 +1788,23 @@
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu_alert2>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
+ <&CPU1 THERMAL_NO_LIMIT 4>,
+ <&CPU2 THERMAL_NO_LIMIT 4>,
+ <&CPU3 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu_crit2>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu3-thermal {
@@ -1766,6 +1826,23 @@
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu_alert3>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
+ <&CPU1 THERMAL_NO_LIMIT 4>,
+ <&CPU2 THERMAL_NO_LIMIT 4>,
+ <&CPU3 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu_crit3>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu4-thermal {
@@ -1787,6 +1864,23 @@
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu_alert4>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT 4>,
+ <&CPU5 THERMAL_NO_LIMIT 4>,
+ <&CPU6 THERMAL_NO_LIMIT 4>,
+ <&CPU7 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu_crit4>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu5-thermal {
@@ -1808,6 +1902,23 @@
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu_alert5>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT 4>,
+ <&CPU5 THERMAL_NO_LIMIT 4>,
+ <&CPU6 THERMAL_NO_LIMIT 4>,
+ <&CPU7 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu_crit5>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu6-thermal {
@@ -1829,6 +1940,23 @@
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu_alert6>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT 4>,
+ <&CPU5 THERMAL_NO_LIMIT 4>,
+ <&CPU6 THERMAL_NO_LIMIT 4>,
+ <&CPU7 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu_crit6>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu7-thermal {
@@ -1850,6 +1978,23 @@
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu_alert7>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT 4>,
+ <&CPU5 THERMAL_NO_LIMIT 4>,
+ <&CPU6 THERMAL_NO_LIMIT 4>,
+ <&CPU7 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu_crit7>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
};
};
--
2.17.1
75 degrees is too aggressive for throttling the CPU. After speaking to
Qualcomm engineers, increase it to 95 degrees.
Signed-off-by: Amit Kucheria <[email protected]>
---
arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index c27cbd3bcb0a..29e823b0caf4 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -1692,7 +1692,7 @@
trips {
cpu_alert0: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1713,7 +1713,7 @@
trips {
cpu_alert1: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1734,7 +1734,7 @@
trips {
cpu_alert2: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1755,7 +1755,7 @@
trips {
cpu_alert3: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1776,7 +1776,7 @@
trips {
cpu_alert4: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1797,7 +1797,7 @@
trips {
cpu_alert5: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1818,7 +1818,7 @@
trips {
cpu_alert6: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1839,7 +1839,7 @@
trips {
cpu_alert7: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
--
2.17.1
Signed-off-by: Amit Kucheria <[email protected]>
---
drivers/thermal/of-thermal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
index 4bfdb4a1e47d..5ca2a6b370ea 100644
--- a/drivers/thermal/of-thermal.c
+++ b/drivers/thermal/of-thermal.c
@@ -867,14 +867,14 @@ __init *thermal_of_build_thermal_zone(struct device_node *np)
ret = of_property_read_u32(np, "polling-delay-passive", &prop);
if (ret < 0) {
- pr_err("missing polling-delay-passive property\n");
+ pr_err("%s: missing polling-delay-passive property\n", np->name);
goto free_tz;
}
tz->passive_delay = prop;
ret = of_property_read_u32(np, "polling-delay", &prop);
if (ret < 0) {
- pr_err("missing polling-delay property\n");
+ pr_err("%s: missing polling-delay property\n", np->name);
goto free_tz;
}
tz->polling_delay = prop;
--
2.17.1
Several cpufreq drivers register themselves as thermal cooling devices.
Adding a pointer to struct cpufreq_policy removes the need for them to
store this pointer in a private data structure.
Signed-off-by: Amit Kucheria <[email protected]>
---
include/linux/cpufreq.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index c86d6d8bdfed..2496549d7573 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -95,6 +95,8 @@ struct cpufreq_policy {
struct cpufreq_frequency_table *freq_table;
enum cpufreq_table_sorting freq_table_sorted;
+ struct thermal_cooling_device *cooldev; /* Pointer to the cooling
+ * device if used for thermal mitigation */
struct list_head policy_list;
struct kobject kobj;
struct completion kobj_unregister;
--
2.17.1
Quoting Amit Kucheria (2019-01-09 16:00:50)
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> drivers/thermal/of-thermal.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> index 4bfdb4a1e47d..5ca2a6b370ea 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -867,14 +867,14 @@ __init *thermal_of_build_thermal_zone(struct device_node *np)
>
> ret = of_property_read_u32(np, "polling-delay-passive", &prop);
> if (ret < 0) {
> - pr_err("missing polling-delay-passive property\n");
> + pr_err("%s: missing polling-delay-passive property\n", np->name);
You should use %pOFn to print node names.
Quoting Amit Kucheria (2019-01-09 16:00:56)
> Since the big and little cpus are in the same frequency domain, use all
Oh? I thought the big and little cpus were in different frequency
domains and voltage domains. Maybe that's what you're saying here but
I'm misunderstanding. So change the wording a bit to be more clear?
> of them for mitigation in the cooling-map. At the lower trip points we
> restrict ourselves to throttling only a few OPPs. At higher trip
> temperatures, allow ourselves to be throttled to any extent.
>
> Signed-off-by: Amit Kucheria <[email protected]>
Quoting Amit Kucheria (2019-01-09 16:00:55)
> 75 degrees is too aggressive for throttling the CPU. After speaking to
> Qualcomm engineers, increase it to 95 degrees.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
Is the plan that these are some defaults that would be adjusted by board
variants? Just curious why we have anything in here and don't punt it
all to each board dts file.
On Thu, Jan 10, 2019 at 05:30:51AM +0530, Amit Kucheria wrote:
> Several cpufreq drivers register themselves as thermal cooling devices.
> Adding a pointer to struct cpufreq_policy removes the need for them to
> store this pointer in a private data structure.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> include/linux/cpufreq.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index c86d6d8bdfed..2496549d7573 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -95,6 +95,8 @@ struct cpufreq_policy {
> struct cpufreq_frequency_table *freq_table;
> enum cpufreq_table_sorting freq_table_sorted;
>
> + struct thermal_cooling_device *cooldev; /* Pointer to the cooling
> + * device if used for thermal mitigation */
> struct list_head policy_list;
> struct kobject kobj;
> struct completion kobj_unregister;
I've mixed feelings about this. It's definitely desirable to avoid
code duplication and tying the cooling device to the cpufreq_policy is
a convenient way to achieve that. However semantically it seems a bit
odd that a CPU cooling device is part of the cpufreq policy.
Anyway, unless there are better ideas we probably want to be pragmatic
here, so if Viresh is fine with it who am I to complain ;-)
Cheers
Matthias
Hi Amit,
On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> 75 degrees is too aggressive for throttling the CPU. After speaking to
> Qualcomm engineers, increase it to 95 degrees.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index c27cbd3bcb0a..29e823b0caf4 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -1692,7 +1692,7 @@
>
> trips {
> cpu_alert0: trip0 {
> - temperature = <75000>;
> + temperature = <95000>;
> hysteresis = <2000>;
> type = "passive";
> };
> @@ -1713,7 +1713,7 @@
>
> trips {
> cpu_alert1: trip0 {
> - temperature = <75000>;
> + temperature = <95000>;
> hysteresis = <2000>;
> type = "passive";
> };
> @@ -1734,7 +1734,7 @@
>
> trips {
> cpu_alert2: trip0 {
> - temperature = <75000>;
> + temperature = <95000>;
> hysteresis = <2000>;
> type = "passive";
> };
> @@ -1755,7 +1755,7 @@
>
> trips {
> cpu_alert3: trip0 {
> - temperature = <75000>;
> + temperature = <95000>;
> hysteresis = <2000>;
> type = "passive";
> };
> @@ -1776,7 +1776,7 @@
>
> trips {
> cpu_alert4: trip0 {
> - temperature = <75000>;
> + temperature = <95000>;
> hysteresis = <2000>;
> type = "passive";
> };
> @@ -1797,7 +1797,7 @@
>
> trips {
> cpu_alert5: trip0 {
> - temperature = <75000>;
> + temperature = <95000>;
> hysteresis = <2000>;
> type = "passive";
> };
> @@ -1818,7 +1818,7 @@
>
> trips {
> cpu_alert6: trip0 {
> - temperature = <75000>;
> + temperature = <95000>;
> hysteresis = <2000>;
> type = "passive";
> };
> @@ -1839,7 +1839,7 @@
>
> trips {
> cpu_alert7: trip0 {
> - temperature = <75000>;
> + temperature = <95000>;
> hysteresis = <2000>;
> type = "passive";
> };
The change itself looks good to me, however I wonder if it would be
worth to eliminate redundancy and merge the current 8 thermal zones
into 2, one for the Silver and one for the Gold cluster (as done by
http://crrev.com/c/1381752). There is a single cooling device for
each cluster, so it's not clear to me if there is any gain from having
a separate thermal zone for each CPU. If it is important to monitor
the temperatures of the individual cores this can still be done by
configuring the thermal zone of the cluster with multiple thermal
sensors.
Cheers
Matthias
On Wed, Jan 09, 2019 at 05:15:33PM -0800, Matthias Kaehlcke wrote:
> Hi Amit,
>
> On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > Qualcomm engineers, increase it to 95 degrees.
> >
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> > 1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > index c27cbd3bcb0a..29e823b0caf4 100644
> > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > @@ -1692,7 +1692,7 @@
> >
> > trips {
> > cpu_alert0: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1713,7 +1713,7 @@
> >
> > trips {
> > cpu_alert1: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1734,7 +1734,7 @@
> >
> > trips {
> > cpu_alert2: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1755,7 +1755,7 @@
> >
> > trips {
> > cpu_alert3: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1776,7 +1776,7 @@
> >
> > trips {
> > cpu_alert4: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1797,7 +1797,7 @@
> >
> > trips {
> > cpu_alert5: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1818,7 +1818,7 @@
> >
> > trips {
> > cpu_alert6: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1839,7 +1839,7 @@
> >
> > trips {
> > cpu_alert7: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
>
> The change itself looks good to me, however I wonder if it would be
> worth to eliminate redundancy and merge the current 8 thermal zones
> into 2, one for the Silver and one for the Gold cluster (as done by
> http://crrev.com/c/1381752). There is a single cooling device for
> each cluster, so it's not clear to me if there is any gain from having
> a separate thermal zone for each CPU. If it is important to monitor
> the temperatures of the individual cores this can still be done by
> configuring the thermal zone of the cluster with multiple thermal
> sensors.
I see your idea is to have a cooling device per CPU ("arm64: dts:
sdm845: wireup the thermal trip points to cpufreq" /
https://lore.kernel.org/patchwork/patch/1030742/), however that
doesn't work as intended. Only two cpufreq 'devices' are created,
one for CPU0 and one for CPU4. In consequence cpufreq->ready() only
runs for these cores and only two cooling devices are
registered. Since the cores of a cluster all run at the same
frequency I also doubt if having multiple cooling devices would
bring any benefits.
Cheers
Matthias
Hi Amit,
On Thu, Jan 10, 2019 at 05:30:56AM +0530, Amit Kucheria wrote:
> Since the big and little cpus are in the same frequency domain, use all
> of them for mitigation in the cooling-map. At the lower trip points we
> restrict ourselves to throttling only a few OPPs. At higher trip
> temperatures, allow ourselves to be throttled to any extent.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> arch/arm64/boot/dts/qcom/sdm845.dtsi | 145 +++++++++++++++++++++++++++
> 1 file changed, 145 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index 29e823b0caf4..cd6402a9aa64 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -13,6 +13,7 @@
> #include <dt-bindings/reset/qcom,sdm845-aoss.h>
> #include <dt-bindings/soc/qcom,rpmh-rsc.h>
> #include <dt-bindings/clock/qcom,gcc-sdm845.h>
> +#include <dt-bindings/thermal/thermal.h>
>
> / {
> interrupt-parent = <&intc>;
> @@ -99,6 +100,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x0>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_0>;
> L2_0: l2-cache {
> compatible = "cache";
> @@ -114,6 +116,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x100>;
> enable-method = "psci";
> + #cooling-cells = <2>;
This is not needed (also applies to other for other non-policy
cores). A single cpufreq device is created per frequency domain /
cluster, hence a single cooling device is registered per cluster,
which IMO makes sense given that the CPUs of a cluster can't change
their frequencies independently.
> next-level-cache = <&L2_100>;
> L2_100: l2-cache {
> compatible = "cache";
> @@ -126,6 +129,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x200>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_200>;
> L2_200: l2-cache {
> compatible = "cache";
> @@ -138,6 +142,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x300>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_300>;
> L2_300: l2-cache {
> compatible = "cache";
> @@ -150,6 +155,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x400>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_400>;
> L2_400: l2-cache {
> compatible = "cache";
> @@ -162,6 +168,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x500>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_500>;
> L2_500: l2-cache {
> compatible = "cache";
> @@ -174,6 +181,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x600>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_600>;
> L2_600: l2-cache {
> compatible = "cache";
> @@ -186,6 +194,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x700>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_700>;
> L2_700: l2-cache {
> compatible = "cache";
> @@ -1703,6 +1712,23 @@
> type = "critical";
> };
> };
> +
> + cooling-maps {
> + map0 {
> + trip = <&cpu_alert0>;
> + cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> + <&CPU1 THERMAL_NO_LIMIT 4>,
As per above, there are no cooling devices for CPU1-3 and CPU5-7.
Cheers
Matthias
On 10-01-19, 05:30, Amit Kucheria wrote:
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> drivers/cpufreq/qcom-cpufreq-hw.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 649dddd72749..1c01311e5927 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -5,6 +5,7 @@
>
> #include <linux/bitfield.h>
> #include <linux/cpufreq.h>
> +#include <linux/cpu_cooling.h>
> #include <linux/init.h>
> #include <linux/kernel.h>
> #include <linux/module.h>
> @@ -216,7 +217,10 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
> static int qcom_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy)
> {
> void __iomem *base = policy->driver_data - REG_PERF_STATE;
> + struct thermal_cooling_device *cdev = policy->cooldev;
>
> + if (cdev)
> + cpufreq_cooling_unregister(cdev);
> kfree(policy->freq_table);
> devm_iounmap(&global_pdev->dev, base);
>
> @@ -238,6 +242,7 @@ static struct cpufreq_driver cpufreq_qcom_hw_driver = {
> .init = qcom_cpufreq_hw_cpu_init,
> .exit = qcom_cpufreq_hw_cpu_exit,
> .fast_switch = qcom_cpufreq_hw_fast_switch,
> + .ready = generic_cpufreq_ready,
> .name = "qcom-cpufreq-hw",
> .attr = qcom_cpufreq_hw_attr,
> };
I liked the idea of reducing code duplication, but not much the
implementation. All we were able to get rid of was a call to
of_cpufreq_cooling_register() and nothing else. Is it worth it ?
Maybe we can add another flag in cpufreq.h:
#define CPUFREQ_AUTO_REGISTER_COOLING_DEV (1 << 7)
and let the core do it all automatically by itself, that will get rid
of code duplication actually.
@Rafael: What do you say ?
--
viresh
On 10-01-19, 05:30, Amit Kucheria wrote:
> All cpufreq drivers do similar things to register as a cooling device.
> Provide a generic call back so we can get rid of duplicated code in the
> drivers.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> Suggested-by: Stephen Boyd <[email protected]>
> ---
> drivers/thermal/cpu_cooling.c | 18 ++++++++++++++++++
> include/linux/cpu_cooling.h | 9 +++++++++
> 2 files changed, 27 insertions(+)
We may be doing this differently, but I found some other issues with
the patch.
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index dfd23245f778..5154ffc12332 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -694,6 +694,7 @@ __cpufreq_cooling_register(struct device_node *np,
>
> cpufreq_cdev->clipped_freq = cpufreq_cdev->freq_table[0].frequency;
> cpufreq_cdev->cdev = cdev;
> + policy->cooldev = cdev;
Why was this required ? The below routine was already doing it...
>
> mutex_lock(&cooling_list_lock);
> /* Register the notifier for first cpufreq cooling device */
> @@ -785,6 +786,23 @@ of_cpufreq_cooling_register(struct cpufreq_policy *policy)
> }
> EXPORT_SYMBOL_GPL(of_cpufreq_cooling_register);
>
> +/**
> + * generic_cpufreq_ready - generic driver callback to register a thermal_cooling_device
> + * @policy: cpufreq policy
> + */
> +void generic_cpufreq_ready(struct cpufreq_policy *policy)
> +{
> + struct thermal_cooling_device **cdev = &policy->cooldev;
> +
> + *cdev = of_cpufreq_cooling_register(policy);
... here
> + if (IS_ERR(*cdev)) {
We never get an error here, only NULL.
> + pr_err("Failed to register cpufreq cooling device for CPU%d: %ld\n",
> + policy->cpu, PTR_ERR(cdev));
The of_cpufreq_cooling_register() routine already prints enough error
messages on failures.
> + cdev = NULL;
This is a meaningless statement, perhaps you wanted to do *cdev = NULL
:)
--
viresh
On 09-01-19, 18:22, Matthias Kaehlcke wrote:
> Hi Amit,
>
> On Thu, Jan 10, 2019 at 05:30:56AM +0530, Amit Kucheria wrote:
> > Since the big and little cpus are in the same frequency domain, use all
> > of them for mitigation in the cooling-map. At the lower trip points we
> > restrict ourselves to throttling only a few OPPs. At higher trip
> > temperatures, allow ourselves to be throttled to any extent.
> >
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > arch/arm64/boot/dts/qcom/sdm845.dtsi | 145 +++++++++++++++++++++++++++
> > 1 file changed, 145 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > index 29e823b0caf4..cd6402a9aa64 100644
> > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > @@ -13,6 +13,7 @@
> > #include <dt-bindings/reset/qcom,sdm845-aoss.h>
> > #include <dt-bindings/soc/qcom,rpmh-rsc.h>
> > #include <dt-bindings/clock/qcom,gcc-sdm845.h>
> > +#include <dt-bindings/thermal/thermal.h>
> >
> > / {
> > interrupt-parent = <&intc>;
> > @@ -99,6 +100,7 @@
> > compatible = "qcom,kryo385";
> > reg = <0x0 0x0>;
> > enable-method = "psci";
> > + #cooling-cells = <2>;
> > next-level-cache = <&L2_0>;
> > L2_0: l2-cache {
> > compatible = "cache";
> > @@ -114,6 +116,7 @@
> > compatible = "qcom,kryo385";
> > reg = <0x0 0x100>;
> > enable-method = "psci";
> > + #cooling-cells = <2>;
>
> This is not needed (also applies to other for other non-policy
> cores). A single cpufreq device is created per frequency domain /
> cluster, hence a single cooling device is registered per cluster,
> which IMO makes sense given that the CPUs of a cluster can't change
> their frequencies independently.
> As per above, there are no cooling devices for CPU1-3 and CPU5-7.
lore.kernel.org/lkml/[email protected]
lore.kernel.org/lkml/b687bb6035fbb010383f4511a206abb4006679fa.1527244201.git.viresh.kumar@linaro.org
--
viresh
On 10-01-19, 05:30, Amit Kucheria wrote:
> subsys_initcall causes problems registering the driver as a thermal
> cooling device.
>
> If "faster boot" is the main reason for doing subsys_initcall, this
> should be handled in the bootloader or another boot constraint
> framework.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> drivers/cpufreq/qcom-cpufreq-hw.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index d83939a1b3d4..649dddd72749 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -296,7 +296,7 @@ static int __init qcom_cpufreq_hw_init(void)
> {
> return platform_driver_register(&qcom_cpufreq_hw_driver);
> }
> -subsys_initcall(qcom_cpufreq_hw_init);
> +device_initcall(qcom_cpufreq_hw_init);
>
> static void __exit qcom_cpufreq_hw_exit(void)
> {
Applied. Thanks.
--
viresh
On Thu, Jan 10, 2019 at 11:42 AM Viresh Kumar <[email protected]> wrote:
>
> On 10-01-19, 05:30, Amit Kucheria wrote:
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > drivers/cpufreq/qcom-cpufreq-hw.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> > index 649dddd72749..1c01311e5927 100644
> > --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> > @@ -5,6 +5,7 @@
> >
> > #include <linux/bitfield.h>
> > #include <linux/cpufreq.h>
> > +#include <linux/cpu_cooling.h>
> > #include <linux/init.h>
> > #include <linux/kernel.h>
> > #include <linux/module.h>
> > @@ -216,7 +217,10 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
> > static int qcom_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy)
> > {
> > void __iomem *base = policy->driver_data - REG_PERF_STATE;
> > + struct thermal_cooling_device *cdev = policy->cooldev;
> >
> > + if (cdev)
> > + cpufreq_cooling_unregister(cdev);
> > kfree(policy->freq_table);
> > devm_iounmap(&global_pdev->dev, base);
> >
> > @@ -238,6 +242,7 @@ static struct cpufreq_driver cpufreq_qcom_hw_driver = {
> > .init = qcom_cpufreq_hw_cpu_init,
> > .exit = qcom_cpufreq_hw_cpu_exit,
> > .fast_switch = qcom_cpufreq_hw_fast_switch,
> > + .ready = generic_cpufreq_ready,
> > .name = "qcom-cpufreq-hw",
> > .attr = qcom_cpufreq_hw_attr,
> > };
>
> I liked the idea of reducing code duplication, but not much the
> implementation. All we were able to get rid of was a call to
> of_cpufreq_cooling_register() and nothing else. Is it worth it ?
>
> Maybe we can add another flag in cpufreq.h:
>
> #define CPUFREQ_AUTO_REGISTER_COOLING_DEV (1 << 7)
>
> and let the core do it all automatically by itself, that will get rid
> of code duplication actually.
I like the idea of a flag. I'll spin something implementing it in the next rev.
> @Rafael: What do you say ?
On Thu, Jan 10, 2019 at 7:12 AM Viresh Kumar <[email protected]> wrote:
>
> On 10-01-19, 05:30, Amit Kucheria wrote:
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > drivers/cpufreq/qcom-cpufreq-hw.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> > index 649dddd72749..1c01311e5927 100644
> > --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> > @@ -5,6 +5,7 @@
> >
> > #include <linux/bitfield.h>
> > #include <linux/cpufreq.h>
> > +#include <linux/cpu_cooling.h>
> > #include <linux/init.h>
> > #include <linux/kernel.h>
> > #include <linux/module.h>
> > @@ -216,7 +217,10 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
> > static int qcom_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy)
> > {
> > void __iomem *base = policy->driver_data - REG_PERF_STATE;
> > + struct thermal_cooling_device *cdev = policy->cooldev;
> >
> > + if (cdev)
> > + cpufreq_cooling_unregister(cdev);
> > kfree(policy->freq_table);
> > devm_iounmap(&global_pdev->dev, base);
> >
> > @@ -238,6 +242,7 @@ static struct cpufreq_driver cpufreq_qcom_hw_driver = {
> > .init = qcom_cpufreq_hw_cpu_init,
> > .exit = qcom_cpufreq_hw_cpu_exit,
> > .fast_switch = qcom_cpufreq_hw_fast_switch,
> > + .ready = generic_cpufreq_ready,
> > .name = "qcom-cpufreq-hw",
> > .attr = qcom_cpufreq_hw_attr,
> > };
>
> I liked the idea of reducing code duplication, but not much the
> implementation. All we were able to get rid of was a call to
> of_cpufreq_cooling_register() and nothing else. Is it worth it ?
>
> Maybe we can add another flag in cpufreq.h:
>
> #define CPUFREQ_AUTO_REGISTER_COOLING_DEV (1 << 7)
>
> and let the core do it all automatically by itself, that will get rid
> of code duplication actually.
>
> @Rafael: What do you say ?
Getting rid of code duplication is good, let's do that.
On Thu, Jan 10, 2019 at 5:58 AM Stephen Boyd <[email protected]> wrote:
>
> Quoting Amit Kucheria (2019-01-09 16:00:56)
> > Since the big and little cpus are in the same frequency domain, use all
>
> Oh? I thought the big and little cpus were in different frequency
> domains and voltage domains. Maybe that's what you're saying here but
> I'm misunderstanding. So change the wording a bit to be more clear?
Yeah, forgive my English - it is my second language. ;-)
That should've been "since all the cpus in the big and little clusters
are in the same frequency domain". Will fix.
> > of them for mitigation in the cooling-map. At the lower trip points we
> > restrict ourselves to throttling only a few OPPs. At higher trip
> > temperatures, allow ourselves to be throttled to any extent.
> >
> > Signed-off-by: Amit Kucheria <[email protected]>
>
On Thu, Jan 10, 2019 at 7:45 AM Matthias Kaehlcke <[email protected]> wrote:
>
> On Wed, Jan 09, 2019 at 05:15:33PM -0800, Matthias Kaehlcke wrote:
> > Hi Amit,
> >
> > On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> > > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > > Qualcomm engineers, increase it to 95 degrees.
> > >
> > > Signed-off-by: Amit Kucheria <[email protected]>
> > > ---
> > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> > > 1 file changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > index c27cbd3bcb0a..29e823b0caf4 100644
> > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > @@ -1692,7 +1692,7 @@
> > >
> > > trips {
> > > cpu_alert0: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1713,7 +1713,7 @@
> > >
> > > trips {
> > > cpu_alert1: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1734,7 +1734,7 @@
> > >
> > > trips {
> > > cpu_alert2: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1755,7 +1755,7 @@
> > >
> > > trips {
> > > cpu_alert3: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1776,7 +1776,7 @@
> > >
> > > trips {
> > > cpu_alert4: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1797,7 +1797,7 @@
> > >
> > > trips {
> > > cpu_alert5: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1818,7 +1818,7 @@
> > >
> > > trips {
> > > cpu_alert6: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1839,7 +1839,7 @@
> > >
> > > trips {
> > > cpu_alert7: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> >
> > The change itself looks good to me, however I wonder if it would be
> > worth to eliminate redundancy and merge the current 8 thermal zones
> > into 2, one for the Silver and one for the Gold cluster (as done by
> > http://crrev.com/c/1381752). There is a single cooling device for
> > each cluster, so it's not clear to me if there is any gain from having
> > a separate thermal zone for each CPU. If it is important to monitor
> > the temperatures of the individual cores this can still be done by
> > configuring the thermal zone of the cluster with multiple thermal
> > sensors.
>
> I see your idea is to have a cooling device per CPU ("arm64: dts:
> sdm845: wireup the thermal trip points to cpufreq" /
> https://lore.kernel.org/patchwork/patch/1030742/), however that
> doesn't work as intended. Only two cpufreq 'devices' are created,
> one for CPU0 and one for CPU4. In consequence cpufreq->ready() only
> runs for these cores and only two cooling devices are
> registered. Since the cores of a cluster all run at the same
> frequency I also doubt if having multiple cooling devices would
> bring any benefits.
I actually only intended for two cooling devices - one for each
frequency domain. I'll clarify it better in the patch description.
Hi,
On Wed, Jan 9, 2019 at 4:29 PM Stephen Boyd <[email protected]> wrote:
>
> Quoting Amit Kucheria (2019-01-09 16:00:55)
> > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > Qualcomm engineers, increase it to 95 degrees.
> >
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
>
> Is the plan that these are some defaults that would be adjusted by board
> variants? Just curious why we have anything in here and don't punt it
> all to each board dts file.
My preference would be that the SoC device tree file should contain
thermal numbers that are important to pay attention to for the safety
/ proper operation of the SoC. ...then individual boards could (if
they needed to) override with lower values to control, for instance,
skin temperature.
From experience with previous boards, if you've got enough an off-SoC
thermistors then those are the ones you'd want to monitor to control
skin temperature. It's OK if the SoC spikes up quite high as long as
that heat has somewhere to go (like a heat pipe). The sensors that
are part of Amit's patch are on-chip.
...if you've got a board without external thermistors and are using
the SoC's on-chip sensors as a proxy for the heat in the overall
system then you might want to lower your values in the board device
tree file. You won't be able to have as many short term spikes, but
that's what you gotta do without the extra sensors.
Sound sane?
-Doug
On Thu, Jan 10, 2019 at 11:53:59AM +0530, Viresh Kumar wrote:
> On 09-01-19, 18:22, Matthias Kaehlcke wrote:
> > Hi Amit,
> >
> > On Thu, Jan 10, 2019 at 05:30:56AM +0530, Amit Kucheria wrote:
> > > Since the big and little cpus are in the same frequency domain, use all
> > > of them for mitigation in the cooling-map. At the lower trip points we
> > > restrict ourselves to throttling only a few OPPs. At higher trip
> > > temperatures, allow ourselves to be throttled to any extent.
> > >
> > > Signed-off-by: Amit Kucheria <[email protected]>
> > > ---
> > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 145 +++++++++++++++++++++++++++
> > > 1 file changed, 145 insertions(+)
> > >
> > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > index 29e823b0caf4..cd6402a9aa64 100644
> > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > @@ -13,6 +13,7 @@
> > > #include <dt-bindings/reset/qcom,sdm845-aoss.h>
> > > #include <dt-bindings/soc/qcom,rpmh-rsc.h>
> > > #include <dt-bindings/clock/qcom,gcc-sdm845.h>
> > > +#include <dt-bindings/thermal/thermal.h>
> > >
> > > / {
> > > interrupt-parent = <&intc>;
> > > @@ -99,6 +100,7 @@
> > > compatible = "qcom,kryo385";
> > > reg = <0x0 0x0>;
> > > enable-method = "psci";
> > > + #cooling-cells = <2>;
> > > next-level-cache = <&L2_0>;
> > > L2_0: l2-cache {
> > > compatible = "cache";
> > > @@ -114,6 +116,7 @@
> > > compatible = "qcom,kryo385";
> > > reg = <0x0 0x100>;
> > > enable-method = "psci";
> > > + #cooling-cells = <2>;
> >
> > This is not needed (also applies to other for other non-policy
> > cores). A single cpufreq device is created per frequency domain /
> > cluster, hence a single cooling device is registered per cluster,
> > which IMO makes sense given that the CPUs of a cluster can't change
> > their frequencies independently.
>
> > As per above, there are no cooling devices for CPU1-3 and CPU5-7.
>
> lore.kernel.org/lkml/[email protected]
> lore.kernel.org/lkml/b687bb6035fbb010383f4511a206abb4006679fa.1527244201.git.viresh.kumar@linaro.org
Thanks for the pointer, there's always something new to learn!
Ok, so the policy CPU and hence the CPU registered as cooling
device may vary. I understand that this requires to list all possible
cooling devices, even though only one will be active at any given
time. However I wonder if we could change this:
From 103703a46495ff210a521b5b6fbf32632053c64f Mon Sep 17 00:00:00 2001
From: Matthias Kaehlcke <[email protected]>
Date: Thu, 10 Jan 2019 09:48:38 -0800
Subject: [PATCH] thermal: cpu_cooling: always use first CPU of a freq domain
as cooling device
For all CPUs of a frequency domain a single cooling device is
registered, since the CPUs can't switch their frequencies
independently from each other. The cpufreq policy CPU is used to
represent cooling device of the frequency domain. Which CPU is the
policy CPU may vary based on the order of initialization or CPU
hotplug.
For device tree based platform the above implies that cooling maps
must include a list of all possible cooling devices of a frequency
domain, even though only one of them will exist at any given time.
For example:
cooling-maps {
map0 {
trip = <&cpu_alert0>;
cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
<&CPU1 THERMAL_NO_LIMIT 4>,
<&CPU2 THERMAL_NO_LIMIT 4>,
<&CPU3 THERMAL_NO_LIMIT 4>;
};
map1 {
trip = <&cpu_crit0>;
cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
<&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
<&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
<&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
};
};
This can be avoided by using always the first CPU of a frequency
domain as cooling device. It may happen that the first CPU is offline
when the cooling device is registered (e.g. CPU2 is initialized
first in the above example), hence the nominal cooling device might
be offline. This may seem odd, however it is not really different from
the current behavior: when the policy CPU is taking offline the cooling
device corresponding to it remains active, unless it is unregistered
because all other CPUs of the frequency domain are offline too.
A single cooling device associated with a specific CPU of the frequency
domain reduces redundant device tree clutter in CPU nodes and cooling
maps.
Signed-off-by: Matthias Kaehlcke <[email protected]>
---
drivers/thermal/cpu_cooling.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index dfd23245f778a..bb5ea06f893a2 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -758,13 +758,14 @@ EXPORT_SYMBOL_GPL(cpufreq_cooling_register);
struct thermal_cooling_device *
of_cpufreq_cooling_register(struct cpufreq_policy *policy)
{
- struct device_node *np = of_get_cpu_node(policy->cpu, NULL);
+ unsigned int first_cpu = cpumask_first(policy->related_cpus);
+ struct device_node *np = of_get_cpu_node(first_cpu, NULL);
struct thermal_cooling_device *cdev = NULL;
u32 capacitance = 0;
if (!np) {
pr_err("cpu_cooling: OF node not available for cpu%d\n",
- policy->cpu);
+ first_cpu);
return NULL;
}
@@ -775,7 +776,7 @@ of_cpufreq_cooling_register(struct cpufreq_policy *policy)
cdev = __cpufreq_cooling_register(np, policy, capacitance);
if (IS_ERR(cdev)) {
pr_err("cpu_cooling: cpu%d is not running as cooling device: %ld\n",
- policy->cpu, PTR_ERR(cdev));
+ first_cpu, PTR_ERR(cdev));
cdev = NULL;
}
}
Would that make sense or is there something I'm overlooking?
Cheers
Matthias
On Thu, Jan 10, 2019 at 5:59 AM Stephen Boyd <[email protected]> wrote:
>
> Quoting Amit Kucheria (2019-01-09 16:00:55)
> > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > Qualcomm engineers, increase it to 95 degrees.
> >
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
>
> Is the plan that these are some defaults that would be adjusted by board
> variants? Just curious why we have anything in here and don't punt it
> all to each board dts file.
These values are meant to be safe values for silicon operation as
characterised[1] by the HW team. So I'd consider them a more than just
default values. It also gives future boards a safe starting point.
IMHO it would be better to have the characterized values merged
upstream and if really required, those values can be tweaked in the
board-specific DTS.
[1] Caveat: The characterisation probably focused on mobile devices
and might change depending on form-factor, active cooling and heat
dissipation design but those differences should only impact the skin
temperature, not the operation of the SoC itself.
On Fri, Jan 11, 2019 at 01:15:09AM +0530, Amit Kucheria wrote:
> On Thu, Jan 10, 2019 at 7:45 AM Matthias Kaehlcke <[email protected]> wrote:
> >
> > On Wed, Jan 09, 2019 at 05:15:33PM -0800, Matthias Kaehlcke wrote:
> > > Hi Amit,
> > >
> > > On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> > > > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > > > Qualcomm engineers, increase it to 95 degrees.
> > > >
> > > > Signed-off-by: Amit Kucheria <[email protected]>
> > > > ---
> > > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> > > > 1 file changed, 8 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > > index c27cbd3bcb0a..29e823b0caf4 100644
> > > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > > @@ -1692,7 +1692,7 @@
> > > >
> > > > trips {
> > > > cpu_alert0: trip0 {
> > > > - temperature = <75000>;
> > > > + temperature = <95000>;
> > > > hysteresis = <2000>;
> > > > type = "passive";
> > > > };
> > > > @@ -1713,7 +1713,7 @@
> > > >
> > > > trips {
> > > > cpu_alert1: trip0 {
> > > > - temperature = <75000>;
> > > > + temperature = <95000>;
> > > > hysteresis = <2000>;
> > > > type = "passive";
> > > > };
> > > > @@ -1734,7 +1734,7 @@
> > > >
> > > > trips {
> > > > cpu_alert2: trip0 {
> > > > - temperature = <75000>;
> > > > + temperature = <95000>;
> > > > hysteresis = <2000>;
> > > > type = "passive";
> > > > };
> > > > @@ -1755,7 +1755,7 @@
> > > >
> > > > trips {
> > > > cpu_alert3: trip0 {
> > > > - temperature = <75000>;
> > > > + temperature = <95000>;
> > > > hysteresis = <2000>;
> > > > type = "passive";
> > > > };
> > > > @@ -1776,7 +1776,7 @@
> > > >
> > > > trips {
> > > > cpu_alert4: trip0 {
> > > > - temperature = <75000>;
> > > > + temperature = <95000>;
> > > > hysteresis = <2000>;
> > > > type = "passive";
> > > > };
> > > > @@ -1797,7 +1797,7 @@
> > > >
> > > > trips {
> > > > cpu_alert5: trip0 {
> > > > - temperature = <75000>;
> > > > + temperature = <95000>;
> > > > hysteresis = <2000>;
> > > > type = "passive";
> > > > };
> > > > @@ -1818,7 +1818,7 @@
> > > >
> > > > trips {
> > > > cpu_alert6: trip0 {
> > > > - temperature = <75000>;
> > > > + temperature = <95000>;
> > > > hysteresis = <2000>;
> > > > type = "passive";
> > > > };
> > > > @@ -1839,7 +1839,7 @@
> > > >
> > > > trips {
> > > > cpu_alert7: trip0 {
> > > > - temperature = <75000>;
> > > > + temperature = <95000>;
> > > > hysteresis = <2000>;
> > > > type = "passive";
> > > > };
> > >
> > > The change itself looks good to me, however I wonder if it would be
> > > worth to eliminate redundancy and merge the current 8 thermal zones
> > > into 2, one for the Silver and one for the Gold cluster (as done by
> > > http://crrev.com/c/1381752). There is a single cooling device for
> > > each cluster, so it's not clear to me if there is any gain from having
> > > a separate thermal zone for each CPU. If it is important to monitor
> > > the temperatures of the individual cores this can still be done by
> > > configuring the thermal zone of the cluster with multiple thermal
> > > sensors.
> >
> > I see your idea is to have a cooling device per CPU ("arm64: dts:
> > sdm845: wireup the thermal trip points to cpufreq" /
> > https://lore.kernel.org/patchwork/patch/1030742/), however that
> > doesn't work as intended. Only two cpufreq 'devices' are created,
> > one for CPU0 and one for CPU4. In consequence cpufreq->ready() only
> > runs for these cores and only two cooling devices are
> > registered. Since the cores of a cluster all run at the same
> > frequency I also doubt if having multiple cooling devices would
> > bring any benefits.
>
> I actually only intended for two cooling devices - one for each
> frequency domain. I'll clarify it better in the patch description.
Viresh helped me understand that we currently need to add cooling
device entries for all CPUs to the DT, even though at most one will be
active per freq domain at any time (I wonder if this could be changed
though).
Independently of the cooling devices, is there any value in having a
thermal zone for each CPU instead of having only one per freq domain?
Thanks
Matthias
On Thu, Jan 10, 2019 at 05:30:56AM +0530, Amit Kucheria wrote:
> Since the big and little cpus are in the same frequency domain, use all
> of them for mitigation in the cooling-map. At the lower trip points we
> restrict ourselves to throttling only a few OPPs. At higher trip
> temperatures, allow ourselves to be throttled to any extent.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> arch/arm64/boot/dts/qcom/sdm845.dtsi | 145 +++++++++++++++++++++++++++
> 1 file changed, 145 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index 29e823b0caf4..cd6402a9aa64 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -13,6 +13,7 @@
> #include <dt-bindings/reset/qcom,sdm845-aoss.h>
> #include <dt-bindings/soc/qcom,rpmh-rsc.h>
> #include <dt-bindings/clock/qcom,gcc-sdm845.h>
> +#include <dt-bindings/thermal/thermal.h>
>
> / {
> interrupt-parent = <&intc>;
> @@ -99,6 +100,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x0>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_0>;
> L2_0: l2-cache {
> compatible = "cache";
> @@ -114,6 +116,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x100>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_100>;
> L2_100: l2-cache {
> compatible = "cache";
> @@ -126,6 +129,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x200>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_200>;
> L2_200: l2-cache {
> compatible = "cache";
> @@ -138,6 +142,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x300>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_300>;
> L2_300: l2-cache {
> compatible = "cache";
> @@ -150,6 +155,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x400>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_400>;
> L2_400: l2-cache {
> compatible = "cache";
> @@ -162,6 +168,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x500>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_500>;
> L2_500: l2-cache {
> compatible = "cache";
> @@ -174,6 +181,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x600>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_600>;
> L2_600: l2-cache {
> compatible = "cache";
> @@ -186,6 +194,7 @@
> compatible = "qcom,kryo385";
> reg = <0x0 0x700>;
> enable-method = "psci";
> + #cooling-cells = <2>;
> next-level-cache = <&L2_700>;
> L2_700: l2-cache {
> compatible = "cache";
> @@ -1703,6 +1712,23 @@
> type = "critical";
> };
> };
> +
> + cooling-maps {
> + map0 {
> + trip = <&cpu_alert0>;
> + cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> + <&CPU1 THERMAL_NO_LIMIT 4>,
> + <&CPU2 THERMAL_NO_LIMIT 4>,
> + <&CPU3 THERMAL_NO_LIMIT 4>;
> + };
> + map1 {
> + trip = <&cpu_crit0>;
> + cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
Slightly off-topic, buy maybe not so much since we are just starting
to use the trip points:
Currently we use the naming scheme 'cpu_<type>N' for trip points. I
anticipate that we're going to add more passive trip points soon, to
keep the 'power_allocator' thermal governor happy, which expects a
'switch_on' and a 'desired_temperature' trip point. With the current
naming scheme this could become a bit messy. I suggest to change it to
'cpuN_<type>[X]', which would allow for something like 'cpuN_alert0'
and 'cpuN_alert1'.
If you think the change makes sense you can consider to do it within
this series, I can also send a separate patch once it has landed.
You could also consider to add the additional trip point in this
series if you agree that it will be needed.
This is not necessarily a call for action, just thinking loudly about
a closely related topic ;-)
Cheers
Matthias
On 10-01-19, 12:00, Matthias Kaehlcke wrote:
> Viresh helped me understand that we currently need to add cooling
> device entries for all CPUs to the DT, even though at most one will be
> active per freq domain at any time (I wonder if this could be changed
> though).
Actually we were only adding cooling-cells in CPU0 until now and I
fixed that, so that is going to stay :)
The idea is that the hardware should be described properly and not
partially. Even if all the CPUs are part of the same freq-domain, they
are all capable of being a cooling device here and the DT should
describe that. Kernel will ofcourse create a single cooling device.
--
viresh
On 10-01-19, 10:42, Matthias Kaehlcke wrote:
> Thanks for the pointer, there's always something new to learn!
>
> Ok, so the policy CPU and hence the CPU registered as cooling
> device may vary. I understand that this requires to list all possible
> cooling devices,
I won't say that I changed DT because of a design issue with kernel,
rather the DT shall be complete by itself and that's why that change
was made.
And then we can have more things going on. For example with cpuidle
cooling, we can individually control each CPU (and force idle on that)
even if all CPUs are part of the same freq-domain. Each CPU shall
expose its capabilities.
> even though only one will be active at any given
> time. However I wonder if we could change this:
I won't say it that way. I see it as all the CPUs are active during a
cooling state, i.e. they are all participating.
> >From 103703a46495ff210a521b5b6fbf32632053c64f Mon Sep 17 00:00:00 2001
> From: Matthias Kaehlcke <[email protected]>
> Date: Thu, 10 Jan 2019 09:48:38 -0800
> Subject: [PATCH] thermal: cpu_cooling: always use first CPU of a freq domain
> as cooling device
>
> For all CPUs of a frequency domain a single cooling device is
> registered, since the CPUs can't switch their frequencies
> independently from each other. The cpufreq policy CPU is used to
> represent cooling device of the frequency domain. Which CPU is the
> policy CPU may vary based on the order of initialization or CPU
> hotplug.
>
> For device tree based platform the above implies that cooling maps
> must include a list of all possible cooling devices of a frequency
> domain, even though only one of them will exist at any given time.
>
> For example:
>
> cooling-maps {
> map0 {
> trip = <&cpu_alert0>;
> cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> <&CPU1 THERMAL_NO_LIMIT 4>,
> <&CPU2 THERMAL_NO_LIMIT 4>,
> <&CPU3 THERMAL_NO_LIMIT 4>;
> };
> map1 {
> trip = <&cpu_crit0>;
> cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
This is the right thing to do hardware description wise, no matter
what the kernel does.
> };
> };
>
> This can be avoided by using always the first CPU of a frequency
> domain as cooling device. It may happen that the first CPU is offline
> when the cooling device is registered (e.g. CPU2 is initialized
> first in the above example), hence the nominal cooling device might
> be offline. This may seem odd, however it is not really different from
> the current behavior: when the policy CPU is taking offline the cooling
> device corresponding to it remains active, unless it is unregistered
> because all other CPUs of the frequency domain are offline too.
>
> A single cooling device associated with a specific CPU of the frequency
> domain reduces redundant device tree clutter in CPU nodes and cooling
> maps.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> ---
> drivers/thermal/cpu_cooling.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index dfd23245f778a..bb5ea06f893a2 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -758,13 +758,14 @@ EXPORT_SYMBOL_GPL(cpufreq_cooling_register);
> struct thermal_cooling_device *
> of_cpufreq_cooling_register(struct cpufreq_policy *policy)
> {
> - struct device_node *np = of_get_cpu_node(policy->cpu, NULL);
> + unsigned int first_cpu = cpumask_first(policy->related_cpus);
> + struct device_node *np = of_get_cpu_node(first_cpu, NULL);
> struct thermal_cooling_device *cdev = NULL;
> u32 capacitance = 0;
>
> if (!np) {
> pr_err("cpu_cooling: OF node not available for cpu%d\n",
> - policy->cpu);
> + first_cpu);
> return NULL;
> }
>
> @@ -775,7 +776,7 @@ of_cpufreq_cooling_register(struct cpufreq_policy *policy)
> cdev = __cpufreq_cooling_register(np, policy, capacitance);
> if (IS_ERR(cdev)) {
> pr_err("cpu_cooling: cpu%d is not running as cooling device: %ld\n",
> - policy->cpu, PTR_ERR(cdev));
> + first_cpu, PTR_ERR(cdev));
> cdev = NULL;
> }
> }
>
>
> Would that make sense or is there something I'm overlooking?
I don't see any benefits of this to be honest. Even if we make this
change, the DT should remain in its current form.
--
viresh
On Thu, Jan 10, 2019 at 6:45 AM Matthias Kaehlcke <[email protected]> wrote:
>
> Hi Amit,
>
> On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > Qualcomm engineers, increase it to 95 degrees.
> >
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> > 1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > index c27cbd3bcb0a..29e823b0caf4 100644
> > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > @@ -1692,7 +1692,7 @@
> >
> > trips {
> > cpu_alert0: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1713,7 +1713,7 @@
> >
> > trips {
> > cpu_alert1: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1734,7 +1734,7 @@
> >
> > trips {
> > cpu_alert2: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1755,7 +1755,7 @@
> >
> > trips {
> > cpu_alert3: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1776,7 +1776,7 @@
> >
> > trips {
> > cpu_alert4: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1797,7 +1797,7 @@
> >
> > trips {
> > cpu_alert5: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1818,7 +1818,7 @@
> >
> > trips {
> > cpu_alert6: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> > @@ -1839,7 +1839,7 @@
> >
> > trips {
> > cpu_alert7: trip0 {
> > - temperature = <75000>;
> > + temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
>
> The change itself looks good to me, however I wonder if it would be
> worth to eliminate redundancy and merge the current 8 thermal zones
> into 2, one for the Silver and one for the Gold cluster (as done by
> http://crrev.com/c/1381752). There is a single cooling device for
> each cluster, so it's not clear to me if there is any gain from having
> a separate thermal zone for each CPU. If it is important to monitor
> the temperatures of the individual cores this can still be done by
> configuring the thermal zone of the cluster with multiple thermal
> sensors.
Reducing the number of thermal zones to 2 (by grouping 4 sensors per
zone) is not possible due a limitation of the thermal framework[1]. It
is something that we want to address. Previous attempts to fix this
were rejected for various reasons. Eduardo was going to share a way to
have more flexible mapping between sensors and zones after discussions
at LPC.
<nag> Eduardo, do you have anything we can review? </nag> :-)
Having said that, we'll need some aggregation functions when we add
multiple sensors to a zone (e.g. max, mean) to reflect the zone. This
will lose information about hotspots and prevent things like idle
injection on a particular CPU that is causing most of the heat in the
aggregated zone. So IMHO, it might be useful to have information about
the hotspots (i.e TZ per sensor) and aggregated values (ambient
temperature) that can be fed to the thermal policy.
[1] https://elixir.bootlin.com/linux/v5.0-rc1/source/drivers/thermal/of-thermal.c#L502
On Fri, Jan 11, 2019 at 6:00 AM Matthias Kaehlcke <[email protected]> wrote:
>
> On Thu, Jan 10, 2019 at 05:30:56AM +0530, Amit Kucheria wrote:
> > Since the big and little cpus are in the same frequency domain, use all
> > of them for mitigation in the cooling-map. At the lower trip points we
> > restrict ourselves to throttling only a few OPPs. At higher trip
> > temperatures, allow ourselves to be throttled to any extent.
> >
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > arch/arm64/boot/dts/qcom/sdm845.dtsi | 145 +++++++++++++++++++++++++++
> > 1 file changed, 145 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > index 29e823b0caf4..cd6402a9aa64 100644
> > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > @@ -13,6 +13,7 @@
> > #include <dt-bindings/reset/qcom,sdm845-aoss.h>
> > #include <dt-bindings/soc/qcom,rpmh-rsc.h>
> > #include <dt-bindings/clock/qcom,gcc-sdm845.h>
> > +#include <dt-bindings/thermal/thermal.h>
> >
> > / {
> > interrupt-parent = <&intc>;
> > @@ -99,6 +100,7 @@
> > compatible = "qcom,kryo385";
> > reg = <0x0 0x0>;
> > enable-method = "psci";
> > + #cooling-cells = <2>;
> > next-level-cache = <&L2_0>;
> > L2_0: l2-cache {
> > compatible = "cache";
> > @@ -114,6 +116,7 @@
> > compatible = "qcom,kryo385";
> > reg = <0x0 0x100>;
> > enable-method = "psci";
> > + #cooling-cells = <2>;
> > next-level-cache = <&L2_100>;
> > L2_100: l2-cache {
> > compatible = "cache";
> > @@ -126,6 +129,7 @@
> > compatible = "qcom,kryo385";
> > reg = <0x0 0x200>;
> > enable-method = "psci";
> > + #cooling-cells = <2>;
> > next-level-cache = <&L2_200>;
> > L2_200: l2-cache {
> > compatible = "cache";
> > @@ -138,6 +142,7 @@
> > compatible = "qcom,kryo385";
> > reg = <0x0 0x300>;
> > enable-method = "psci";
> > + #cooling-cells = <2>;
> > next-level-cache = <&L2_300>;
> > L2_300: l2-cache {
> > compatible = "cache";
> > @@ -150,6 +155,7 @@
> > compatible = "qcom,kryo385";
> > reg = <0x0 0x400>;
> > enable-method = "psci";
> > + #cooling-cells = <2>;
> > next-level-cache = <&L2_400>;
> > L2_400: l2-cache {
> > compatible = "cache";
> > @@ -162,6 +168,7 @@
> > compatible = "qcom,kryo385";
> > reg = <0x0 0x500>;
> > enable-method = "psci";
> > + #cooling-cells = <2>;
> > next-level-cache = <&L2_500>;
> > L2_500: l2-cache {
> > compatible = "cache";
> > @@ -174,6 +181,7 @@
> > compatible = "qcom,kryo385";
> > reg = <0x0 0x600>;
> > enable-method = "psci";
> > + #cooling-cells = <2>;
> > next-level-cache = <&L2_600>;
> > L2_600: l2-cache {
> > compatible = "cache";
> > @@ -186,6 +194,7 @@
> > compatible = "qcom,kryo385";
> > reg = <0x0 0x700>;
> > enable-method = "psci";
> > + #cooling-cells = <2>;
> > next-level-cache = <&L2_700>;
> > L2_700: l2-cache {
> > compatible = "cache";
> > @@ -1703,6 +1712,23 @@
> > type = "critical";
> > };
> > };
> > +
> > + cooling-maps {
> > + map0 {
> > + trip = <&cpu_alert0>;
> > + cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> > + <&CPU1 THERMAL_NO_LIMIT 4>,
> > + <&CPU2 THERMAL_NO_LIMIT 4>,
> > + <&CPU3 THERMAL_NO_LIMIT 4>;
> > + };
> > + map1 {
> > + trip = <&cpu_crit0>;
> > + cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > + };
> > + };
>
> Slightly off-topic, buy maybe not so much since we are just starting
> to use the trip points:
>
> Currently we use the naming scheme 'cpu_<type>N' for trip points. I
> anticipate that we're going to add more passive trip points soon, to
> keep the 'power_allocator' thermal governor happy, which expects a
> 'switch_on' and a 'desired_temperature' trip point. With the current
> naming scheme this could become a bit messy. I suggest to change it to
> 'cpuN_<type>[X]', which would allow for something like 'cpuN_alert0'
> and 'cpuN_alert1'.
>
> If you think the change makes sense you can consider to do it within
> this series, I can also send a separate patch once it has landed.
Sure, I can change them to cpuN_alertX format.
> You could also consider to add the additional trip point in this
> series if you agree that it will be needed.
I expect that we'll end up with at least 2 passive trip points but I
don't know what temperature to set the next one at. So let's just go
with 1 passive and 1 critical trip point in this series and you can
send a patch adding more once we've characterised IPA.
> This is not necessarily a call for action, just thinking loudly about
> a closely related topic ;-)
Thanks for the reviews.
Regards,
Amit
On Fri, Jan 11, 2019 at 03:54:23PM +0530, Amit Kucheria wrote:
> On Thu, Jan 10, 2019 at 6:45 AM Matthias Kaehlcke <[email protected]> wrote:
> >
> > Hi Amit,
> >
> > On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:
> > > 75 degrees is too aggressive for throttling the CPU. After speaking to
> > > Qualcomm engineers, increase it to 95 degrees.
> > >
> > > Signed-off-by: Amit Kucheria <[email protected]>
> > > ---
> > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
> > > 1 file changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > index c27cbd3bcb0a..29e823b0caf4 100644
> > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > @@ -1692,7 +1692,7 @@
> > >
> > > trips {
> > > cpu_alert0: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1713,7 +1713,7 @@
> > >
> > > trips {
> > > cpu_alert1: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1734,7 +1734,7 @@
> > >
> > > trips {
> > > cpu_alert2: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1755,7 +1755,7 @@
> > >
> > > trips {
> > > cpu_alert3: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1776,7 +1776,7 @@
> > >
> > > trips {
> > > cpu_alert4: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1797,7 +1797,7 @@
> > >
> > > trips {
> > > cpu_alert5: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1818,7 +1818,7 @@
> > >
> > > trips {
> > > cpu_alert6: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > > @@ -1839,7 +1839,7 @@
> > >
> > > trips {
> > > cpu_alert7: trip0 {
> > > - temperature = <75000>;
> > > + temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> >
> > The change itself looks good to me, however I wonder if it would be
> > worth to eliminate redundancy and merge the current 8 thermal zones
> > into 2, one for the Silver and one for the Gold cluster (as done by
> > http://crrev.com/c/1381752). There is a single cooling device for
> > each cluster, so it's not clear to me if there is any gain from having
> > a separate thermal zone for each CPU. If it is important to monitor
> > the temperatures of the individual cores this can still be done by
> > configuring the thermal zone of the cluster with multiple thermal
> > sensors.
>
> Reducing the number of thermal zones to 2 (by grouping 4 sensors per
> zone) is not possible due a limitation of the thermal framework[1]. It
> is something that we want to address. Previous attempts to fix this
> were rejected for various reasons. Eduardo was going to share a way to
> have more flexible mapping between sensors and zones after discussions
> at LPC.
I wasn't aware of this limitation, thanks for the clarification! With
this I understand that for now we indeed need the 8 thermal zones with
all the redundant information :(
> <nag> Eduardo, do you have anything we can review? </nag> :-)
>
> Having said that, we'll need some aggregation functions when we add
> multiple sensors to a zone (e.g. max, mean) to reflect the zone. This
> will lose information about hotspots and prevent things like idle
> injection on a particular CPU that is causing most of the heat in the
> aggregated zone. So IMHO, it might be useful to have information about
> the hotspots (i.e TZ per sensor) and aggregated values (ambient
> temperature) that can be fed to the thermal policy.
Ok, it seems for now we need the 8 thermal zones in any case, when
support for multiple sensors becomes available we can evaluate whether
it's worth to change that or not.
Cheers
Matthias
On Fri, Jan 11, 2019 at 09:16:53AM +0530, Viresh Kumar wrote:
> On 10-01-19, 10:42, Matthias Kaehlcke wrote:
> > Thanks for the pointer, there's always something new to learn!
> >
> > Ok, so the policy CPU and hence the CPU registered as cooling
> > device may vary. I understand that this requires to list all possible
> > cooling devices,
>
> I won't say that I changed DT because of a design issue with kernel,
> rather the DT shall be complete by itself and that's why that change
> was made.
fair enough
> And then we can have more things going on. For example with cpuidle
> cooling, we can individually control each CPU (and force idle on that)
> even if all CPUs are part of the same freq-domain. Each CPU shall
> expose its capabilities.
Just to gain a better understanding: is cpuidle cooling already
available for arm64 (or is there a patch set)? I came across the
relatively new idle injecting framework but it seems currently the
only user is the Intel powerclamp driver.
> > even though only one will be active at any given
> > time. However I wonder if we could change this:
>
> I won't say it that way. I see it as all the CPUs are active during a
> cooling state, i.e. they are all participating.
agreed, I was referring to the CPU cooling device, which (without
cpuidle injection) could be considered a single device per freq domain.
> > For device tree based platform the above implies that cooling maps
> > must include a list of all possible cooling devices of a frequency
> > domain, even though only one of them will exist at any given time.
> >
> > For example:
> >
> > cooling-maps {
> > map0 {
> > trip = <&cpu_alert0>;
> > cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> > <&CPU1 THERMAL_NO_LIMIT 4>,
> > <&CPU2 THERMAL_NO_LIMIT 4>,
> > <&CPU3 THERMAL_NO_LIMIT 4>;
> > };
> > map1 {
> > trip = <&cpu_crit0>;
> > cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>
> This is the right thing to do hardware description wise, no matter
> what the kernel does.
Not sure I would call it a hardware description. I'd say we pretend
the thermal configuration is a hardware description so the DT folks
don't yell at us ;-) IMO a CPU cooling device is an abstraction, I
think there is no such IP block on most systems.
It seems with cpuidle injection CPUs can perform cooling actions
individually, with that I agree that representing them as individual
cooling devices in the DT makes sense. Without that a cooling device
per freq domain would seem a resonable abstraction.
One of the reasons I dislike the above list of cooling devices is that
it is repeated for different thermal-zone/cooling-maps, but I guess
we have to live with that, would be nice if the DT would allow to do
something like this:
thermal-zones {
cooling_maps_fd0 : cooling-maps {
map0 {
trip = <&cpu_alert0>;
cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
<&CPU1 THERMAL_NO_LIMIT 4>,
<&CPU2 THERMAL_NO_LIMIT 4>,
<&CPU3 THERMAL_NO_LIMIT 4>;
};
map1 {
trip = <&cpu_crit0>;
cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
<&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
<&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
<&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
};
cpu0-thermal {
...
cooling-maps = @cooling_maps_fd0;
...
};
cpu1-thermal {
...
cooling-maps = @cooling_maps_fd0;
...
};
...
};
Cheers
Matthias
On Fri, Jan 11, 2019 at 04:47:15PM +0530, Amit Kucheria wrote:
> On Fri, Jan 11, 2019 at 6:00 AM Matthias Kaehlcke <[email protected]> wrote:
> >
> > On Thu, Jan 10, 2019 at 05:30:56AM +0530, Amit Kucheria wrote:
> > > Since the big and little cpus are in the same frequency domain, use all
> > > of them for mitigation in the cooling-map. At the lower trip points we
> > > restrict ourselves to throttling only a few OPPs. At higher trip
> > > temperatures, allow ourselves to be throttled to any extent.
> > >
> > > Signed-off-by: Amit Kucheria <[email protected]>
> > > ---
> > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 145 +++++++++++++++++++++++++++
> > > 1 file changed, 145 insertions(+)
> > >
> > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > index 29e823b0caf4..cd6402a9aa64 100644
> > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > @@ -13,6 +13,7 @@
> > > #include <dt-bindings/reset/qcom,sdm845-aoss.h>
> > > #include <dt-bindings/soc/qcom,rpmh-rsc.h>
> > > #include <dt-bindings/clock/qcom,gcc-sdm845.h>
> > > +#include <dt-bindings/thermal/thermal.h>
> > >
> > > / {
> > > interrupt-parent = <&intc>;
> > > @@ -99,6 +100,7 @@
> > > compatible = "qcom,kryo385";
> > > reg = <0x0 0x0>;
> > > enable-method = "psci";
> > > + #cooling-cells = <2>;
> > > next-level-cache = <&L2_0>;
> > > L2_0: l2-cache {
> > > compatible = "cache";
> > > @@ -114,6 +116,7 @@
> > > compatible = "qcom,kryo385";
> > > reg = <0x0 0x100>;
> > > enable-method = "psci";
> > > + #cooling-cells = <2>;
> > > next-level-cache = <&L2_100>;
> > > L2_100: l2-cache {
> > > compatible = "cache";
> > > @@ -126,6 +129,7 @@
> > > compatible = "qcom,kryo385";
> > > reg = <0x0 0x200>;
> > > enable-method = "psci";
> > > + #cooling-cells = <2>;
> > > next-level-cache = <&L2_200>;
> > > L2_200: l2-cache {
> > > compatible = "cache";
> > > @@ -138,6 +142,7 @@
> > > compatible = "qcom,kryo385";
> > > reg = <0x0 0x300>;
> > > enable-method = "psci";
> > > + #cooling-cells = <2>;
> > > next-level-cache = <&L2_300>;
> > > L2_300: l2-cache {
> > > compatible = "cache";
> > > @@ -150,6 +155,7 @@
> > > compatible = "qcom,kryo385";
> > > reg = <0x0 0x400>;
> > > enable-method = "psci";
> > > + #cooling-cells = <2>;
> > > next-level-cache = <&L2_400>;
> > > L2_400: l2-cache {
> > > compatible = "cache";
> > > @@ -162,6 +168,7 @@
> > > compatible = "qcom,kryo385";
> > > reg = <0x0 0x500>;
> > > enable-method = "psci";
> > > + #cooling-cells = <2>;
> > > next-level-cache = <&L2_500>;
> > > L2_500: l2-cache {
> > > compatible = "cache";
> > > @@ -174,6 +181,7 @@
> > > compatible = "qcom,kryo385";
> > > reg = <0x0 0x600>;
> > > enable-method = "psci";
> > > + #cooling-cells = <2>;
> > > next-level-cache = <&L2_600>;
> > > L2_600: l2-cache {
> > > compatible = "cache";
> > > @@ -186,6 +194,7 @@
> > > compatible = "qcom,kryo385";
> > > reg = <0x0 0x700>;
> > > enable-method = "psci";
> > > + #cooling-cells = <2>;
> > > next-level-cache = <&L2_700>;
> > > L2_700: l2-cache {
> > > compatible = "cache";
> > > @@ -1703,6 +1712,23 @@
> > > type = "critical";
> > > };
> > > };
> > > +
> > > + cooling-maps {
> > > + map0 {
> > > + trip = <&cpu_alert0>;
> > > + cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> > > + <&CPU1 THERMAL_NO_LIMIT 4>,
> > > + <&CPU2 THERMAL_NO_LIMIT 4>,
> > > + <&CPU3 THERMAL_NO_LIMIT 4>;
> > > + };
> > > + map1 {
> > > + trip = <&cpu_crit0>;
> > > + cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > > + <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > > + <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > > + <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > > + };
> > > + };
> >
> > Slightly off-topic, buy maybe not so much since we are just starting
> > to use the trip points:
> >
> > Currently we use the naming scheme 'cpu_<type>N' for trip points. I
> > anticipate that we're going to add more passive trip points soon, to
> > keep the 'power_allocator' thermal governor happy, which expects a
> > 'switch_on' and a 'desired_temperature' trip point. With the current
> > naming scheme this could become a bit messy. I suggest to change it to
> > 'cpuN_<type>[X]', which would allow for something like 'cpuN_alert0'
> > and 'cpuN_alert1'.
> >
> > If you think the change makes sense you can consider to do it within
> > this series, I can also send a separate patch once it has landed.
>
> Sure, I can change them to cpuN_alertX format.
Great, thanks!
Another concern about adding trip points later could be the node
name. We currently have:
trips {
cpu0_alert0: trip0 {
...
};
cpu0_crit: trip1 {
...
};
};
If we keep increasing enumeration with the node name this would become:
trips {
cpu0_alert0: trip0 {
...
};
cpu0_alert1: trip1 {
...
};
cpu0_crit: trip2 {
...
};
};
i.e. the node name of the critical trip-point changes, which might be
a concern for dtsi's that override a value, though they should
probably use the phandle &cpu0_crit anyway. If this is a concern we
could change the node names to 'alert0' and 'crit'.
I looked around a bit and actually I kinda like the naming scheme used
by hisilicon/hi6220.dtsi, mediatek/mt8173.dtsi and rockchip/rk3328.dtsi
(with minor variations):
trips {
threshold: trip-point@0 {
temperature = <68000>;
hysteresis = <2000>;
type = "passive";
};
target: trip-point@1 {
temperature = <85000>;
hysteresis = <2000>;
type = "passive";
};
cpu_crit: cpu_crit@0 {
temperature = <115000>;
hysteresis = <2000>;
type = "critical";
};
};
If we were to use this we'd have to adapt it slightly since we have
multiple thermal zones. In line with the other scheme this could be
cpuN_threshold, cpuN_target and cpuN_crit.
Up to you, just providing some options ;-)
> > You could also consider to add the additional trip point in this
> > series if you agree that it will be needed.
>
> I expect that we'll end up with at least 2 passive trip points but I
> don't know what temperature to set the next one at. So let's just go
> with 1 passive and 1 critical trip point in this series and you can
> send a patch adding more once we've characterised IPA.
Sounds good
Thanks!
Matthias
On 11-01-19, 11:58, Matthias Kaehlcke wrote:
> On Fri, Jan 11, 2019 at 09:16:53AM +0530, Viresh Kumar wrote:
> Just to gain a better understanding: is cpuidle cooling already
> available for arm64 (or is there a patch set)? I came across the
> relatively new idle injecting framework but it seems currently the
> only user is the Intel powerclamp driver.
Daniel was trying to upstream it earlier:
lore.kernel.org/lkml/[email protected]
> > > even though only one will be active at any given
> > > time. However I wonder if we could change this:
> >
> > I won't say it that way. I see it as all the CPUs are active during a
> > cooling state, i.e. they are all participating.
>
> agreed, I was referring to the CPU cooling device, which (without
> cpuidle injection) could be considered a single device per freq domain.
Even without cpuidle injection all CPUs actually take part in cooling.
> > > For device tree based platform the above implies that cooling maps
> > > must include a list of all possible cooling devices of a frequency
> > > domain, even though only one of them will exist at any given time.
> > >
> > > For example:
> > >
> > > cooling-maps {
> > > map0 {
> > > trip = <&cpu_alert0>;
> > > cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> > > <&CPU1 THERMAL_NO_LIMIT 4>,
> > > <&CPU2 THERMAL_NO_LIMIT 4>,
> > > <&CPU3 THERMAL_NO_LIMIT 4>;
> > > };
> > > map1 {
> > > trip = <&cpu_crit0>;
> > > cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > > <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > > <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > > <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> >
> > This is the right thing to do hardware description wise, no matter
> > what the kernel does.
>
> Not sure I would call it a hardware description. I'd say we pretend
> the thermal configuration is a hardware description so the DT folks
> don't yell at us ;-) IMO a CPU cooling device is an abstraction, I
> think there is no such IP block on most systems.
Right.
> It seems with cpuidle injection CPUs can perform cooling actions
> individually, with that I agree that representing them as individual
> cooling devices in the DT makes sense. Without that a cooling device
> per freq domain would seem a resonable abstraction.
But we actually have 4 different cooling devices no matter what. The only thing
is that they switch their cooling state together. And that shouldn't bother DT
is what I thought :)
> One of the reasons I dislike the above list of cooling devices is that
> it is repeated for different thermal-zone/cooling-maps, but I guess
> we have to live with that, would be nice if the DT would allow to do
> something like this:
>
> thermal-zones {
> cooling_maps_fd0 : cooling-maps {
> map0 {
> trip = <&cpu_alert0>;
> cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> <&CPU1 THERMAL_NO_LIMIT 4>,
> <&CPU2 THERMAL_NO_LIMIT 4>,
> <&CPU3 THERMAL_NO_LIMIT 4>;
> };
> map1 {
> trip = <&cpu_crit0>;
> cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> };
>
> cpu0-thermal {
> ...
> cooling-maps = @cooling_maps_fd0;
> ...
> };
>
> cpu1-thermal {
> ...
> cooling-maps = @cooling_maps_fd0;
> ...
> };
>
> ...
> };
Yeah, maybe. There aren't lot of examples of such duplication though if I
remember correctly.
--
viresh
On Sat, Jan 12, 2019 at 2:06 AM Matthias Kaehlcke <[email protected]> wrote:
>
> Another concern about adding trip points later could be the node
> name. We currently have:
>
>
> trips {
> cpu0_alert0: trip0 {
> ...
> };
>
> cpu0_crit: trip1 {
> ...
> };
> };
>
> If we keep increasing enumeration with the node name this would become:
>
> trips {
> cpu0_alert0: trip0 {
> ...
> };
>
> cpu0_alert1: trip1 {
> ...
> };
>
> cpu0_crit: trip2 {
> ...
> };
> };
>
> i.e. the node name of the critical trip-point changes, which might be
> a concern for dtsi's that override a value, though they should
> probably use the phandle &cpu0_crit anyway. If this is a concern we
> could change the node names to 'alert0' and 'crit'.
>
> I looked around a bit and actually I kinda like the naming scheme used
> by hisilicon/hi6220.dtsi, mediatek/mt8173.dtsi and rockchip/rk3328.dtsi
> (with minor variations):
>
> trips {
> threshold: trip-point@0 {
> temperature = <68000>;
> hysteresis = <2000>;
> type = "passive";
> };
>
> target: trip-point@1 {
> temperature = <85000>;
> hysteresis = <2000>;
> type = "passive";
> };
>
> cpu_crit: cpu_crit@0 {
> temperature = <115000>;
> hysteresis = <2000>;
> type = "critical";
> };
> };
>
> If we were to use this we'd have to adapt it slightly since we have
> multiple thermal zones. In line with the other scheme this could be
> cpuN_threshold, cpuN_target and cpuN_crit.
>
I like this scheme enough that I adopted it for v2.