Add support for thermal throttling on SDM845.
We introduce a generic flag to be used by cpufreq drivers to tell the
cpufreq core to auto-register a thermal cooling device.
There are a few miscellaneous fixes to keep checkpatch happy.
If this approach is acceptable I can send a series converting other cpufreq
drivers to use this flag and get rid of driver code.
Amit Kucheria (9):
[ALREADY QUEUED] cpufreq: qcom-hw: Move to device_initcall
drivers: thermal: of-thermal: Print name of device node with error
drivers: cpufreq: Add thermal_cooling_device pointer to struct
cpufreq_policy
cpufreq: Add a flag to auto-register a cooling device
cpufreq: Replace open-coded << with BIT()
cpufreq: qcom-hw: Register as a cpufreq cooling device
arm64: dts: sdm845: Increase alert trip point to 95 degrees
arm64: dts: sdm845: wireup the thermal trip points to cpufreq
thermal: cpu_cooling: Clarify error message
arch/arm64/boot/dts/qcom/sdm845.dtsi | 193 +++++++++++++++++++++++----
drivers/cpufreq/cpufreq.c | 13 ++
drivers/cpufreq/qcom-cpufreq-hw.c | 5 +-
drivers/thermal/cpu_cooling.c | 2 +-
drivers/thermal/of-thermal.c | 4 +-
include/linux/cpufreq.h | 34 +++--
6 files changed, 210 insertions(+), 41 deletions(-)
--
2.17.1
Make it easier to debug devicetree definition in case of errors.
Signed-off-by: Amit Kucheria <[email protected]>
---
drivers/thermal/of-thermal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
index 4bfdb4a1e47d..08c3ccbdacf8 100644
--- a/drivers/thermal/of-thermal.c
+++ b/drivers/thermal/of-thermal.c
@@ -867,14 +867,14 @@ __init *thermal_of_build_thermal_zone(struct device_node *np)
ret = of_property_read_u32(np, "polling-delay-passive", &prop);
if (ret < 0) {
- pr_err("missing polling-delay-passive property\n");
+ pr_err("%pOFn: missing polling-delay-passive property\n", np->name);
goto free_tz;
}
tz->passive_delay = prop;
ret = of_property_read_u32(np, "polling-delay", &prop);
if (ret < 0) {
- pr_err("missing polling-delay property\n");
+ pr_err("%pOFn: missing polling-delay property\n", np->name);
goto free_tz;
}
tz->polling_delay = prop;
--
2.17.1
Several cpufreq drivers register themselves as thermal cooling devices.
Adding a pointer to struct cpufreq_policy removes the need for them to
store this pointer in a private data structure.
Signed-off-by: Amit Kucheria <[email protected]>
---
include/linux/cpufreq.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index c86d6d8bdfed..9fbc1d996967 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -95,6 +95,9 @@ struct cpufreq_policy {
struct cpufreq_frequency_table *freq_table;
enum cpufreq_table_sorting freq_table_sorted;
+ /* Pointer to the cooling device if used for thermal mitigation */
+ struct thermal_cooling_device *cooldev;
+
struct list_head policy_list;
struct kobject kobj;
struct completion kobj_unregister;
--
2.17.1
All cpufreq drivers do similar things to register as a cooling device.
Provide a cpufreq driver flag so drivers can just ask the cpufreq core
to register the cooling device on their behalf. This allows us to get
rid of duplicated code in the drivers.
Suggested-by: Stephen Boyd <[email protected]>
Suggested-by: Viresh Kumar <[email protected]>
Signed-off-by: Amit Kucheria <[email protected]>
---
drivers/cpufreq/cpufreq.c | 13 +++++++++++++
include/linux/cpufreq.h | 6 ++++++
2 files changed, 19 insertions(+)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 6f23ebb395f1..7faebfc61e60 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -30,6 +30,7 @@
#include <linux/syscore_ops.h>
#include <linux/tick.h>
#include <trace/events/power.h>
+#include <linux/cpu_cooling.h>
static LIST_HEAD(cpufreq_policy_list);
@@ -1318,6 +1319,12 @@ static int cpufreq_online(unsigned int cpu)
if (cpufreq_driver->ready)
cpufreq_driver->ready(policy);
+ if (cpufreq_driver->flags & CPUFREQ_AUTO_REGISTER_COOLING_DEV) {
+ struct thermal_cooling_device **cdev = &policy->cooldev;
+
+ *cdev = of_cpufreq_cooling_register(policy);
+ }
+
pr_debug("initialization complete\n");
return 0;
@@ -1411,6 +1418,12 @@ static int cpufreq_offline(unsigned int cpu)
if (has_target())
cpufreq_exit_governor(policy);
+ if (cpufreq_driver->flags & CPUFREQ_AUTO_REGISTER_COOLING_DEV) {
+ struct thermal_cooling_device **cdev = &policy->cooldev;
+
+ cpufreq_cooling_unregister(*cdev);
+ }
+
/*
* Perform the ->exit() even during light-weight tear-down,
* since this is a core component, and is essential for the
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 9fbc1d996967..0c51d5ce7350 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -388,6 +388,12 @@ struct cpufreq_driver {
*/
#define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING (1 << 6)
+/*
+ * Set by drivers that want the core to automatically register the cpufreq
+ * driver as a thermal cooling device
+ */
+#define CPUFREQ_AUTO_REGISTER_COOLING_DEV (1 << 7)
+
int cpufreq_register_driver(struct cpufreq_driver *driver_data);
int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
--
2.17.1
Add the CPUFREQ_AUTO_REGISTER_COOLING_DEV flag to allow the cpufreq core
to auto-register the driver as a cooling device.
Signed-off-by: Amit Kucheria <[email protected]>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 649dddd72749..47c7cea22da1 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -231,7 +231,8 @@ static struct freq_attr *qcom_cpufreq_hw_attr[] = {
static struct cpufreq_driver cpufreq_qcom_hw_driver = {
.flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK |
- CPUFREQ_HAVE_GOVERNOR_PER_POLICY,
+ CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
+ CPUFREQ_AUTO_REGISTER_COOLING_DEV,
.verify = cpufreq_generic_frequency_table_verify,
.target_index = qcom_cpufreq_hw_target_index,
.get = qcom_cpufreq_hw_get,
--
2.17.1
Make it clear that it is a failure if the cpufreq driver was unable to
register as a cooling device. Makes it easier to find in logs and
grepping for words like fail, err, warn.
Signed-off-by: Amit Kucheria <[email protected]>
---
drivers/thermal/cpu_cooling.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index dfd23245f778..6fff16113628 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -774,7 +774,7 @@ of_cpufreq_cooling_register(struct cpufreq_policy *policy)
cdev = __cpufreq_cooling_register(np, policy, capacitance);
if (IS_ERR(cdev)) {
- pr_err("cpu_cooling: cpu%d is not running as cooling device: %ld\n",
+ pr_err("cpu_cooling: cpu%d failed to register as cooling device: %ld\n",
policy->cpu, PTR_ERR(cdev));
cdev = NULL;
}
--
2.17.1
(Resent to allow people testing the series to test it easily)
subsys_initcall causes problems registering the driver as a thermal
cooling device.
If "faster boot" is the main reason for doing subsys_initcall, this
should be handled in the bootloader or another boot constraint
framework.
Signed-off-by: Amit Kucheria <[email protected]>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index d83939a1b3d4..649dddd72749 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -296,7 +296,7 @@ static int __init qcom_cpufreq_hw_init(void)
{
return platform_driver_register(&qcom_cpufreq_hw_driver);
}
-subsys_initcall(qcom_cpufreq_hw_init);
+device_initcall(qcom_cpufreq_hw_init);
static void __exit qcom_cpufreq_hw_exit(void)
{
--
2.17.1
Since all cpus in the big and little clusters, respectively, are in the
same frequency domain, use all of them for mitigation in the
cooling-map. We end up with two cooling devices - one each for the big
and little clusters.
At the lower trip points we restrict ourselves to throttling only a few
OPPs. At higher trip temperatures, allow ourselves to be throttled to
any extent.
Signed-off-by: Amit Kucheria <[email protected]>
---
arch/arm64/boot/dts/qcom/sdm845.dtsi | 177 ++++++++++++++++++++++++---
1 file changed, 161 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index fb7da678b116..7973e88bdf94 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -13,6 +13,7 @@
#include <dt-bindings/reset/qcom,sdm845-aoss.h>
#include <dt-bindings/soc/qcom,rpmh-rsc.h>
#include <dt-bindings/clock/qcom,gcc-sdm845.h>
+#include <dt-bindings/thermal/thermal.h>
/ {
interrupt-parent = <&intc>;
@@ -99,6 +100,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x0>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_0>;
qcom,freq-domain = <&cpufreq_hw 0>;
@@ -116,6 +118,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x100>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_100>;
qcom,freq-domain = <&cpufreq_hw 0>;
@@ -130,6 +133,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x200>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_200>;
qcom,freq-domain = <&cpufreq_hw 0>;
@@ -144,6 +148,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x300>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_300>;
qcom,freq-domain = <&cpufreq_hw 0>;
@@ -158,6 +163,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x400>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_400>;
qcom,freq-domain = <&cpufreq_hw 1>;
@@ -172,6 +178,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x500>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_500>;
qcom,freq-domain = <&cpufreq_hw 1>;
@@ -186,6 +193,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x600>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_600>;
qcom,freq-domain = <&cpufreq_hw 1>;
@@ -200,6 +208,7 @@
compatible = "qcom,kryo385";
reg = <0x0 0x700>;
enable-method = "psci";
+ #cooling-cells = <2>;
next-level-cache = <&L2_700>;
qcom,freq-domain = <&cpufreq_hw 1>;
@@ -1719,18 +1728,35 @@
thermal-sensors = <&tsens0 1>;
trips {
- cpu_alert0: trip0 {
+ cpu0_alert0: trip-point@0 {
temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
- cpu_crit0: trip1 {
+ cpu0_crit: cpu_crit@0 {
temperature = <110000>;
hysteresis = <1000>;
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu0_alert0>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
+ <&CPU1 THERMAL_NO_LIMIT 4>,
+ <&CPU2 THERMAL_NO_LIMIT 4>,
+ <&CPU3 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu0_crit>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu1-thermal {
@@ -1740,18 +1766,35 @@
thermal-sensors = <&tsens0 2>;
trips {
- cpu_alert1: trip0 {
+ cpu1_alert0: trip-point@0 {
temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
- cpu_crit1: trip1 {
+ cpu1_crit: cpu_crit@0 {
temperature = <110000>;
hysteresis = <1000>;
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu1_alert0>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
+ <&CPU1 THERMAL_NO_LIMIT 4>,
+ <&CPU2 THERMAL_NO_LIMIT 4>,
+ <&CPU3 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu1_crit>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu2-thermal {
@@ -1761,18 +1804,35 @@
thermal-sensors = <&tsens0 3>;
trips {
- cpu_alert2: trip0 {
+ cpu2_alert0: trip-point@0 {
temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
- cpu_crit2: trip1 {
+ cpu2_crit: cpu_crit@0 {
temperature = <110000>;
hysteresis = <1000>;
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu2_alert0>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
+ <&CPU1 THERMAL_NO_LIMIT 4>,
+ <&CPU2 THERMAL_NO_LIMIT 4>,
+ <&CPU3 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu2_crit>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu3-thermal {
@@ -1782,18 +1842,35 @@
thermal-sensors = <&tsens0 4>;
trips {
- cpu_alert3: trip0 {
+ cpu3_alert0: trip-point@0 {
temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
- cpu_crit3: trip1 {
+ cpu3_crit: cpu_crit@0 {
temperature = <110000>;
hysteresis = <1000>;
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu3_alert0>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
+ <&CPU1 THERMAL_NO_LIMIT 4>,
+ <&CPU2 THERMAL_NO_LIMIT 4>,
+ <&CPU3 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu3_crit>;
+ cooling-device = <&CPU0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu4-thermal {
@@ -1803,18 +1880,35 @@
thermal-sensors = <&tsens0 7>;
trips {
- cpu_alert4: trip0 {
+ cpu4_alert0: trip-point@0 {
temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
- cpu_crit4: trip1 {
+ cpu4_crit: cpu_crit@0 {
temperature = <110000>;
hysteresis = <1000>;
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu4_alert0>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT 4>,
+ <&CPU5 THERMAL_NO_LIMIT 4>,
+ <&CPU6 THERMAL_NO_LIMIT 4>,
+ <&CPU7 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu4_crit>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu5-thermal {
@@ -1824,18 +1918,35 @@
thermal-sensors = <&tsens0 8>;
trips {
- cpu_alert5: trip0 {
+ cpu5_alert0: trip-point@0 {
temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
- cpu_crit5: trip1 {
+ cpu5_crit: cpu_crit@0 {
temperature = <110000>;
hysteresis = <1000>;
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu5_alert0>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT 4>,
+ <&CPU5 THERMAL_NO_LIMIT 4>,
+ <&CPU6 THERMAL_NO_LIMIT 4>,
+ <&CPU7 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu5_crit>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu6-thermal {
@@ -1845,18 +1956,35 @@
thermal-sensors = <&tsens0 9>;
trips {
- cpu_alert6: trip0 {
+ cpu6_alert0: trip-point@0 {
temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
- cpu_crit6: trip1 {
+ cpu6_crit: cpu_crit@0 {
temperature = <110000>;
hysteresis = <1000>;
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu6_alert0>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT 4>,
+ <&CPU5 THERMAL_NO_LIMIT 4>,
+ <&CPU6 THERMAL_NO_LIMIT 4>,
+ <&CPU7 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu6_crit>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
cpu7-thermal {
@@ -1866,18 +1994,35 @@
thermal-sensors = <&tsens0 10>;
trips {
- cpu_alert7: trip0 {
+ cpu7_alert0: trip-point@0 {
temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
- cpu_crit7: trip1 {
+ cpu7_crit: cpu_crit@0 {
temperature = <110000>;
hysteresis = <1000>;
type = "critical";
};
};
+
+ cooling-maps {
+ map0 {
+ trip = <&cpu7_alert0>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT 4>,
+ <&CPU5 THERMAL_NO_LIMIT 4>,
+ <&CPU6 THERMAL_NO_LIMIT 4>,
+ <&CPU7 THERMAL_NO_LIMIT 4>;
+ };
+ map1 {
+ trip = <&cpu7_crit>;
+ cooling-device = <&CPU4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU5 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&CPU7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
};
};
};
--
2.17.1
Minor clean-up to use BIT() and keep checkpatch happy. Clean-up the
comment formatting to while we're at it to make it easier to read.
Signed-off-by: Amit Kucheria <[email protected]>
---
include/linux/cpufreq.h | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 0c51d5ce7350..c87989c97834 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -349,14 +349,15 @@ struct cpufreq_driver {
};
/* flags */
-#define CPUFREQ_STICKY (1 << 0) /* driver isn't removed even if
- all ->init() calls failed */
-#define CPUFREQ_CONST_LOOPS (1 << 1) /* loops_per_jiffy or other
- kernel "constants" aren't
- affected by frequency
- transitions */
-#define CPUFREQ_PM_NO_WARN (1 << 2) /* don't warn on suspend/resume
- speed mismatches */
+
+/* driver isn't removed even if all ->init() calls failed */
+#define CPUFREQ_STICKY BIT(0)
+
+/* loops_per_jiffy or other kernel "constants" aren't affected by frequency transitions */
+#define CPUFREQ_CONST_LOOPS BIT(1)
+
+/* don't warn on suspend/resume speed mismatches */
+#define CPUFREQ_PM_NO_WARN BIT(2)
/*
* This should be set by platforms having multiple clock-domains, i.e.
@@ -364,14 +365,14 @@ struct cpufreq_driver {
* be created in cpu/cpu<num>/cpufreq/ directory and so they can use the same
* governor with different tunables for different clusters.
*/
-#define CPUFREQ_HAVE_GOVERNOR_PER_POLICY (1 << 3)
+#define CPUFREQ_HAVE_GOVERNOR_PER_POLICY BIT(3)
/*
* Driver will do POSTCHANGE notifications from outside of their ->target()
* routine and so must set cpufreq_driver->flags with this flag, so that core
* can handle them specially.
*/
-#define CPUFREQ_ASYNC_NOTIFICATION (1 << 4)
+#define CPUFREQ_ASYNC_NOTIFICATION BIT(4)
/*
* Set by drivers which want cpufreq core to check if CPU is running at a
@@ -380,19 +381,19 @@ struct cpufreq_driver {
* from the table. And if that fails, we will stop further boot process by
* issuing a BUG_ON().
*/
-#define CPUFREQ_NEED_INITIAL_FREQ_CHECK (1 << 5)
+#define CPUFREQ_NEED_INITIAL_FREQ_CHECK BIT(5)
/*
* Set by drivers to disallow use of governors with "dynamic_switching" flag
* set.
*/
-#define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING (1 << 6)
+#define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING BIT(6)
/*
* Set by drivers that want the core to automatically register the cpufreq
* driver as a thermal cooling device
*/
-#define CPUFREQ_AUTO_REGISTER_COOLING_DEV (1 << 7)
+#define CPUFREQ_AUTO_REGISTER_COOLING_DEV BIT(7)
int cpufreq_register_driver(struct cpufreq_driver *driver_data);
int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
--
2.17.1
75 degrees is too aggressive for throttling the CPU. After speaking to
Qualcomm engineers, increase it to 95 degrees.
Signed-off-by: Amit Kucheria <[email protected]>
---
arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 0e659b0524ba..fb7da678b116 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -1720,7 +1720,7 @@
trips {
cpu_alert0: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1741,7 +1741,7 @@
trips {
cpu_alert1: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1762,7 +1762,7 @@
trips {
cpu_alert2: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1783,7 +1783,7 @@
trips {
cpu_alert3: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1804,7 +1804,7 @@
trips {
cpu_alert4: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1825,7 +1825,7 @@
trips {
cpu_alert5: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1846,7 +1846,7 @@
trips {
cpu_alert6: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
@@ -1867,7 +1867,7 @@
trips {
cpu_alert7: trip0 {
- temperature = <75000>;
+ temperature = <95000>;
hysteresis = <2000>;
type = "passive";
};
--
2.17.1
On Mon, Jan 14, 2019 at 11:22 AM Amit Kucheria <[email protected]> wrote:
>
> Add support for thermal throttling on SDM845.
>
> We introduce a generic flag to be used by cpufreq drivers to tell the
> cpufreq core to auto-register a thermal cooling device.
>
> There are a few miscellaneous fixes to keep checkpatch happy.
>
> If this approach is acceptable I can send a series converting other cpufreq
> drivers to use this flag and get rid of driver code.
>
> Amit Kucheria (9):
> [ALREADY QUEUED] cpufreq: qcom-hw: Move to device_initcall
> drivers: thermal: of-thermal: Print name of device node with error
> drivers: cpufreq: Add thermal_cooling_device pointer to struct
> cpufreq_policy
> cpufreq: Add a flag to auto-register a cooling device
> cpufreq: Replace open-coded << with BIT()
> cpufreq: qcom-hw: Register as a cpufreq cooling device
> arm64: dts: sdm845: Increase alert trip point to 95 degrees
> arm64: dts: sdm845: wireup the thermal trip points to cpufreq
> thermal: cpu_cooling: Clarify error message
>
> arch/arm64/boot/dts/qcom/sdm845.dtsi | 193 +++++++++++++++++++++++----
> drivers/cpufreq/cpufreq.c | 13 ++
> drivers/cpufreq/qcom-cpufreq-hw.c | 5 +-
> drivers/thermal/cpu_cooling.c | 2 +-
> drivers/thermal/of-thermal.c | 4 +-
> include/linux/cpufreq.h | 34 +++--
> 6 files changed, 210 insertions(+), 41 deletions(-)
Would it be possible to split this series so as to put the cpufreq
patches separately?
On Mon, Jan 14, 2019 at 3:57 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Mon, Jan 14, 2019 at 11:22 AM Amit Kucheria <[email protected]> wrote:
> >
> > Add support for thermal throttling on SDM845.
> >
> > We introduce a generic flag to be used by cpufreq drivers to tell the
> > cpufreq core to auto-register a thermal cooling device.
> >
> > There are a few miscellaneous fixes to keep checkpatch happy.
> >
> > If this approach is acceptable I can send a series converting other cpufreq
> > drivers to use this flag and get rid of driver code.
> >
> > Amit Kucheria (9):
> > [ALREADY QUEUED] cpufreq: qcom-hw: Move to device_initcall
> > drivers: thermal: of-thermal: Print name of device node with error
> > drivers: cpufreq: Add thermal_cooling_device pointer to struct
> > cpufreq_policy
> > cpufreq: Add a flag to auto-register a cooling device
> > cpufreq: Replace open-coded << with BIT()
> > cpufreq: qcom-hw: Register as a cpufreq cooling device
> > arm64: dts: sdm845: Increase alert trip point to 95 degrees
> > arm64: dts: sdm845: wireup the thermal trip points to cpufreq
> > thermal: cpu_cooling: Clarify error message
> >
> > arch/arm64/boot/dts/qcom/sdm845.dtsi | 193 +++++++++++++++++++++++----
> > drivers/cpufreq/cpufreq.c | 13 ++
> > drivers/cpufreq/qcom-cpufreq-hw.c | 5 +-
> > drivers/thermal/cpu_cooling.c | 2 +-
> > drivers/thermal/of-thermal.c | 4 +-
> > include/linux/cpufreq.h | 34 +++--
> > 6 files changed, 210 insertions(+), 41 deletions(-)
>
> Would it be possible to split this series so as to put the cpufreq
> patches separately?
Sure, will send out another version.
On Mon, Jan 14, 2019 at 4:04 PM Amit Kucheria <[email protected]> wrote:
>
> On Mon, Jan 14, 2019 at 3:57 PM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Mon, Jan 14, 2019 at 11:22 AM Amit Kucheria <[email protected]> wrote:
> > >
> > > Add support for thermal throttling on SDM845.
> > >
> > > We introduce a generic flag to be used by cpufreq drivers to tell the
> > > cpufreq core to auto-register a thermal cooling device.
> > >
> > > There are a few miscellaneous fixes to keep checkpatch happy.
> > >
> > > If this approach is acceptable I can send a series converting other cpufreq
> > > drivers to use this flag and get rid of driver code.
> > >
> > > Amit Kucheria (9):
> > > [ALREADY QUEUED] cpufreq: qcom-hw: Move to device_initcall
> > > drivers: thermal: of-thermal: Print name of device node with error
> > > drivers: cpufreq: Add thermal_cooling_device pointer to struct
> > > cpufreq_policy
> > > cpufreq: Add a flag to auto-register a cooling device
> > > cpufreq: Replace open-coded << with BIT()
> > > cpufreq: qcom-hw: Register as a cpufreq cooling device
> > > arm64: dts: sdm845: Increase alert trip point to 95 degrees
> > > arm64: dts: sdm845: wireup the thermal trip points to cpufreq
> > > thermal: cpu_cooling: Clarify error message
> > >
> > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 193 +++++++++++++++++++++++----
> > > drivers/cpufreq/cpufreq.c | 13 ++
> > > drivers/cpufreq/qcom-cpufreq-hw.c | 5 +-
> > > drivers/thermal/cpu_cooling.c | 2 +-
> > > drivers/thermal/of-thermal.c | 4 +-
> > > include/linux/cpufreq.h | 34 +++--
> > > 6 files changed, 210 insertions(+), 41 deletions(-)
> >
> > Would it be possible to split this series so as to put the cpufreq
> > patches separately?
>
> Sure, will send out another version.
I've now sent out a separate series just tackling the
auto-registration feature of a cooling device[1]. So the only bits in
this series needing separate review are patches 2, 7,8 and 9.
I'll wait couple of days to get some review on the other series before
sending out another version of the remaining patches listed above.
Feel free to comment on them in this thread, in the meanwhile.
Thanks.
[1] https://lore.kernel.org/patchwork/cover/1031909/
On Mon, Jan 14, 2019 at 03:51:10PM +0530, Amit Kucheria wrote:
> Since all cpus in the big and little clusters, respectively, are in the
> same frequency domain, use all of them for mitigation in the
> cooling-map. We end up with two cooling devices - one each for the big
> and little clusters.
>
> At the lower trip points we restrict ourselves to throttling only a few
> OPPs. At higher trip temperatures, allow ourselves to be throttled to
> any extent.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> arch/arm64/boot/dts/qcom/sdm845.dtsi | 177 ++++++++++++++++++++++++---
> 1 file changed, 161 insertions(+), 16 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index fb7da678b116..7973e88bdf94 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>
> ...
>
> @@ -1719,18 +1728,35 @@
> thermal-sensors = <&tsens0 1>;
>
> trips {
> - cpu_alert0: trip0 {
> + cpu0_alert0: trip-point@0 {
Thanks for adapting the trip point names and labels in anticipation of
further additions!
Seems you aren't overly convinced about the 'target/threshold'
terminology used by some other arm64 platforms ;-)
> temperature = <95000>;
> hysteresis = <2000>;
> type = "passive";
> };
I realized that we still have the potential problem of a name change
in the trip point node name if a 'threshold' node for IPA is added,
since this node will have a lower temperature than 95°. If this is
something to be concerned about it might be worth to add that extra
trip point already to avoid headaches or funky trip point enumeration,
even if we know that the value might not be the final one.
(I'm aware that we are also changing the node names and labels right
now, it seems less problematic at this point since the SDM845 thermal
zones are a fairly recent addition)
> - cpu_crit0: trip1 {
> + cpu0_crit: cpu_crit@0 {
nit: does the @0 add any value here? IIUC there can be only one
critical trip point, hence there will never be a cpu_crit@1 or
higher.
> temperature = <110000>;
> hysteresis = <1000>;
> type = "critical";
> };
> };
> +
> + cooling-maps {
> + map0 {
> + trip = <&cpu0_alert0>;
> + cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> + <&CPU1 THERMAL_NO_LIMIT 4>,
> + <&CPU2 THERMAL_NO_LIMIT 4>,
> + <&CPU3 THERMAL_NO_LIMIT 4>;
> + };
Out of curiosity: how did you determing the max cooling state of 4?
Cheers
Matthias
Quoting Amit Kucheria (2019-01-14 02:21:04)
> Make it easier to debug devicetree definition in case of errors.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> drivers/thermal/of-thermal.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> index 4bfdb4a1e47d..08c3ccbdacf8 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -867,14 +867,14 @@ __init *thermal_of_build_thermal_zone(struct device_node *np)
>
> ret = of_property_read_u32(np, "polling-delay-passive", &prop);
> if (ret < 0) {
> - pr_err("missing polling-delay-passive property\n");
> + pr_err("%pOFn: missing polling-delay-passive property\n", np->name);
Not sure you've triggered this code path because it's an error, but I'd
expect it to just be np, not np->name here. It's too bad the compiler
can't catch this.
Quoting Amit Kucheria (2019-01-14 02:21:05)
> Several cpufreq drivers register themselves as thermal cooling devices.
> Adding a pointer to struct cpufreq_policy removes the need for them to
> store this pointer in a private data structure.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
Reviewed-by: Stephen Boyd <[email protected]>
Quoting Amit Kucheria (2019-01-14 02:21:06)
> All cpufreq drivers do similar things to register as a cooling device.
> Provide a cpufreq driver flag so drivers can just ask the cpufreq core
> to register the cooling device on their behalf. This allows us to get
> rid of duplicated code in the drivers.
>
> Suggested-by: Stephen Boyd <[email protected]>
> Suggested-by: Viresh Kumar <[email protected]>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> drivers/cpufreq/cpufreq.c | 13 +++++++++++++
> include/linux/cpufreq.h | 6 ++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 6f23ebb395f1..7faebfc61e60 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -30,6 +30,7 @@
> #include <linux/syscore_ops.h>
> #include <linux/tick.h>
> #include <trace/events/power.h>
> +#include <linux/cpu_cooling.h>
Maybe this is supposed to be ordered alphabetically? If so, this should
be much higher.
>
> static LIST_HEAD(cpufreq_policy_list);
>
> @@ -1318,6 +1319,12 @@ static int cpufreq_online(unsigned int cpu)
> if (cpufreq_driver->ready)
> cpufreq_driver->ready(policy);
>
> + if (cpufreq_driver->flags & CPUFREQ_AUTO_REGISTER_COOLING_DEV) {
> + struct thermal_cooling_device **cdev = &policy->cooldev;
> +
> + *cdev = of_cpufreq_cooling_register(policy);
> + }
This seems to be some complicated way of writing:
policy->cooldev = of_cpufreq_cooling_register(policy);
?
> +
> pr_debug("initialization complete\n");
>
> return 0;
> @@ -1411,6 +1418,12 @@ static int cpufreq_offline(unsigned int cpu)
> if (has_target())
> cpufreq_exit_governor(policy);
>
> + if (cpufreq_driver->flags & CPUFREQ_AUTO_REGISTER_COOLING_DEV) {
> + struct thermal_cooling_device **cdev = &policy->cooldev;
> +
> + cpufreq_cooling_unregister(*cdev);
Similar? I'm confused!
Quoting Amit Kucheria (2019-01-14 02:21:08)
> Add the CPUFREQ_AUTO_REGISTER_COOLING_DEV flag to allow the cpufreq core
> to auto-register the driver as a cooling device.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
Reviewed-by: Stephen Boyd <[email protected]>
On Thu, Jan 17, 2019 at 6:57 AM Viresh Kumar <[email protected]> wrote:
>
> On 14-01-19, 15:51, Amit Kucheria wrote:
> > Make it clear that it is a failure if the cpufreq driver was unable to
> > register as a cooling device. Makes it easier to find in logs and
> > grepping for words like fail, err, warn.
> >
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > drivers/thermal/cpu_cooling.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> > index dfd23245f778..6fff16113628 100644
> > --- a/drivers/thermal/cpu_cooling.c
> > +++ b/drivers/thermal/cpu_cooling.c
> > @@ -774,7 +774,7 @@ of_cpufreq_cooling_register(struct cpufreq_policy *policy)
> >
> > cdev = __cpufreq_cooling_register(np, policy, capacitance);
> > if (IS_ERR(cdev)) {
> > - pr_err("cpu_cooling: cpu%d is not running as cooling device: %ld\n",
> > + pr_err("cpu_cooling: cpu%d failed to register as cooling device: %ld\n",
> > policy->cpu, PTR_ERR(cdev));
> > cdev = NULL;
> > }
>
> Always keep such cleanup patches at the top, so the maintainers can
> pick them up easily even if the entire series doesn't get in.
Actually, if nothing in the series depends on this change and this
change doesn't depend on anything in the series, why is it part of the
series at all?
On 14-01-19, 15:51, Amit Kucheria wrote:
> Make it clear that it is a failure if the cpufreq driver was unable to
> register as a cooling device. Makes it easier to find in logs and
> grepping for words like fail, err, warn.
>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> drivers/thermal/cpu_cooling.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index dfd23245f778..6fff16113628 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -774,7 +774,7 @@ of_cpufreq_cooling_register(struct cpufreq_policy *policy)
>
> cdev = __cpufreq_cooling_register(np, policy, capacitance);
> if (IS_ERR(cdev)) {
> - pr_err("cpu_cooling: cpu%d is not running as cooling device: %ld\n",
> + pr_err("cpu_cooling: cpu%d failed to register as cooling device: %ld\n",
> policy->cpu, PTR_ERR(cdev));
> cdev = NULL;
> }
Always keep such cleanup patches at the top, so the maintainers can
pick them up easily even if the entire series doesn't get in.
Acked-by: Viresh Kumar <[email protected]>
--
viresh
On 14-01-19, 15:51, Amit Kucheria wrote:
> (Resent to allow people testing the series to test it easily)
Maybe just share your branch then, instead of sending this again ?
--
viresh
On Tue, Jan 15, 2019 at 5:24 AM Stephen Boyd <[email protected]> wrote:
>
> Quoting Amit Kucheria (2019-01-14 02:21:04)
> > Make it easier to debug devicetree definition in case of errors.
> >
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > drivers/thermal/of-thermal.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> > index 4bfdb4a1e47d..08c3ccbdacf8 100644
> > --- a/drivers/thermal/of-thermal.c
> > +++ b/drivers/thermal/of-thermal.c
> > @@ -867,14 +867,14 @@ __init *thermal_of_build_thermal_zone(struct device_node *np)
> >
> > ret = of_property_read_u32(np, "polling-delay-passive", &prop);
> > if (ret < 0) {
> > - pr_err("missing polling-delay-passive property\n");
> > + pr_err("%pOFn: missing polling-delay-passive property\n", np->name);
>
> Not sure you've triggered this code path because it's an error, but I'd
> expect it to just be np, not np->name here. It's too bad the compiler
> can't catch this.
Not sure what happened here in the refactor before sending but good
catch. Fixed now.
Thanks,
Amit
On Thu, Jan 17, 2019 at 4:04 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Jan 17, 2019 at 6:57 AM Viresh Kumar <[email protected]> wrote:
> >
> > On 14-01-19, 15:51, Amit Kucheria wrote:
> > > Make it clear that it is a failure if the cpufreq driver was unable to
> > > register as a cooling device. Makes it easier to find in logs and
> > > grepping for words like fail, err, warn.
> > >
> > > Signed-off-by: Amit Kucheria <[email protected]>
> > > ---
> > > drivers/thermal/cpu_cooling.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> > > index dfd23245f778..6fff16113628 100644
> > > --- a/drivers/thermal/cpu_cooling.c
> > > +++ b/drivers/thermal/cpu_cooling.c
> > > @@ -774,7 +774,7 @@ of_cpufreq_cooling_register(struct cpufreq_policy *policy)
> > >
> > > cdev = __cpufreq_cooling_register(np, policy, capacitance);
> > > if (IS_ERR(cdev)) {
> > > - pr_err("cpu_cooling: cpu%d is not running as cooling device: %ld\n",
> > > + pr_err("cpu_cooling: cpu%d failed to register as cooling device: %ld\n",
> > > policy->cpu, PTR_ERR(cdev));
> > > cdev = NULL;
> > > }
> >
> > Always keep such cleanup patches at the top, so the maintainers can
> > pick them up easily even if the entire series doesn't get in.
>
> Actually, if nothing in the series depends on this change and this
> change doesn't depend on anything in the series, why is it part of the
> series at all?
Indeed, I should've separated this from the series. Done now.
On Tue, Jan 15, 2019 at 5:27 AM Stephen Boyd <[email protected]> wrote:
>
> Quoting Amit Kucheria (2019-01-14 02:21:06)
> > All cpufreq drivers do similar things to register as a cooling device.
> > Provide a cpufreq driver flag so drivers can just ask the cpufreq core
> > to register the cooling device on their behalf. This allows us to get
> > rid of duplicated code in the drivers.
> >
> > Suggested-by: Stephen Boyd <[email protected]>
> > Suggested-by: Viresh Kumar <[email protected]>
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > drivers/cpufreq/cpufreq.c | 13 +++++++++++++
> > include/linux/cpufreq.h | 6 ++++++
> > 2 files changed, 19 insertions(+)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 6f23ebb395f1..7faebfc61e60 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -30,6 +30,7 @@
> > #include <linux/syscore_ops.h>
> > #include <linux/tick.h>
> > #include <trace/events/power.h>
> > +#include <linux/cpu_cooling.h>
>
> Maybe this is supposed to be ordered alphabetically? If so, this should
> be much higher.
Fixed
> >
> > static LIST_HEAD(cpufreq_policy_list);
> >
> > @@ -1318,6 +1319,12 @@ static int cpufreq_online(unsigned int cpu)
> > if (cpufreq_driver->ready)
> > cpufreq_driver->ready(policy);
> >
> > + if (cpufreq_driver->flags & CPUFREQ_AUTO_REGISTER_COOLING_DEV) {
> > + struct thermal_cooling_device **cdev = &policy->cooldev;
> > +
> > + *cdev = of_cpufreq_cooling_register(policy);
> > + }
>
> This seems to be some complicated way of writing:
>
> policy->cooldev = of_cpufreq_cooling_register(policy);
>
> ?
Indeed. Fixed.
> > +
> > pr_debug("initialization complete\n");
> >
> > return 0;
> > @@ -1411,6 +1418,12 @@ static int cpufreq_offline(unsigned int cpu)
> > if (has_target())
> > cpufreq_exit_governor(policy);
> >
> > + if (cpufreq_driver->flags & CPUFREQ_AUTO_REGISTER_COOLING_DEV) {
> > + struct thermal_cooling_device **cdev = &policy->cooldev;
> > +
> > + cpufreq_cooling_unregister(*cdev);
>
> Similar? I'm confused!
>
_un_register as opposed to register above. :-)
Thanks for the review.
On Tue, Jan 15, 2019 at 3:31 AM Matthias Kaehlcke <[email protected]> wrote:
>
> On Mon, Jan 14, 2019 at 03:51:10PM +0530, Amit Kucheria wrote:
> > Since all cpus in the big and little clusters, respectively, are in the
> > same frequency domain, use all of them for mitigation in the
> > cooling-map. We end up with two cooling devices - one each for the big
> > and little clusters.
> >
> > At the lower trip points we restrict ourselves to throttling only a few
> > OPPs. At higher trip temperatures, allow ourselves to be throttled to
> > any extent.
> >
> > Signed-off-by: Amit Kucheria <[email protected]>
> > ---
> > arch/arm64/boot/dts/qcom/sdm845.dtsi | 177 ++++++++++++++++++++++++---
> > 1 file changed, 161 insertions(+), 16 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > index fb7da678b116..7973e88bdf94 100644
> > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> >
> > ...
> >
> > @@ -1719,18 +1728,35 @@
> > thermal-sensors = <&tsens0 1>;
> >
> > trips {
> > - cpu_alert0: trip0 {
> > + cpu0_alert0: trip-point@0 {
>
> Thanks for adapting the trip point names and labels in anticipation of
> further additions!
>
> Seems you aren't overly convinced about the 'target/threshold'
> terminology used by some other arm64 platforms ;-)
target and threshold have an air of finality to them and doesn't lend
itself to having a few trip points on the way to the critical trip,
IMO.
Let me know if you feel otherwise.
> > temperature = <95000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
>
> I realized that we still have the potential problem of a name change
> in the trip point node name if a 'threshold' node for IPA is added,
> since this node will have a lower temperature than 95°. If this is
> something to be concerned about it might be worth to add that extra
> trip point already to avoid headaches or funky trip point enumeration,
> even if we know that the value might not be the final one.
I will squash both the DT changes in to a single change introducing 2
passive trips and 1 critical trip to avoid the churn. See if you like
it better.
> (I'm aware that we are also changing the node names and labels right
> now, it seems less problematic at this point since the SDM845 thermal
> zones are a fairly recent addition)
>
> > - cpu_crit0: trip1 {
> > + cpu0_crit: cpu_crit@0 {
>
> nit: does the @0 add any value here? IIUC there can be only one
> critical trip point, hence there will never be a cpu_crit@1 or
> higher.
Agreed. Will remove.
> > temperature = <110000>;
> > hysteresis = <1000>;
> > type = "critical";
> > };
> > };
> > +
> > + cooling-maps {
> > + map0 {
> > + trip = <&cpu0_alert0>;
> > + cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> > + <&CPU1 THERMAL_NO_LIMIT 4>,
> > + <&CPU2 THERMAL_NO_LIMIT 4>,
> > + <&CPU3 THERMAL_NO_LIMIT 4>;
> > + };
>
> Out of curiosity: how did you determing the max cooling state of 4?
Just some basic testing by pinning a dhrystone benchmark to each of
the cores along with some stress-ng threads. Lopping off the top 4
OPPs seemed to mitigate anything I could throw at the board.
I'm unable to do the "device in a closed car on a hot summer day" type
of tests on the dev board. Nevertheless, I've changed the patch now to
only remove the boost frequency at 75 degrees and then full throttling
at 95 degrees.
I'd appreciate more "real world" testing to validate these.
Thanks for the review.
Regards,
Amit
On Mon, Jan 21, 2019 at 11:40:45PM +0530, Amit Kucheria wrote:
> On Tue, Jan 15, 2019 at 3:31 AM Matthias Kaehlcke <[email protected]> wrote:
> >
> > On Mon, Jan 14, 2019 at 03:51:10PM +0530, Amit Kucheria wrote:
> > > Since all cpus in the big and little clusters, respectively, are in the
> > > same frequency domain, use all of them for mitigation in the
> > > cooling-map. We end up with two cooling devices - one each for the big
> > > and little clusters.
> > >
> > > At the lower trip points we restrict ourselves to throttling only a few
> > > OPPs. At higher trip temperatures, allow ourselves to be throttled to
> > > any extent.
> > >
> > > Signed-off-by: Amit Kucheria <[email protected]>
> > > ---
> > > arch/arm64/boot/dts/qcom/sdm845.dtsi | 177 ++++++++++++++++++++++++---
> > > 1 file changed, 161 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > index fb7da678b116..7973e88bdf94 100644
> > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > >
> > > ...
> > >
> > > @@ -1719,18 +1728,35 @@
> > > thermal-sensors = <&tsens0 1>;
> > >
> > > trips {
> > > - cpu_alert0: trip0 {
> > > + cpu0_alert0: trip-point@0 {
> >
> > Thanks for adapting the trip point names and labels in anticipation of
> > further additions!
> >
> > Seems you aren't overly convinced about the 'target/threshold'
> > terminology used by some other arm64 platforms ;-)
>
> target and threshold have an air of finality to them and doesn't lend
> itself to having a few trip points on the way to the critical trip,
> IMO.
>
> Let me know if you feel otherwise.
I can see your point, and it's also true that target/threshold seem to
imply the use of power_allocator, which may not always be the case.
> > > temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> >
> > I realized that we still have the potential problem of a name change
> > in the trip point node name if a 'threshold' node for IPA is added,
> > since this node will have a lower temperature than 95°. If this is
> > something to be concerned about it might be worth to add that extra
> > trip point already to avoid headaches or funky trip point enumeration,
> > even if we know that the value might not be the final one.
>
> I will squash both the DT changes in to a single change introducing 2
> passive trips and 1 critical trip to avoid the churn.
Sounds good, thanks!
> See if you like it better.
I didn't really dislike it, was just wondering if renaming nodes could
break existing users. I imagine it's not a huge problem after all,
since users with an older kernel version won't see the DT change and
probably should use the phandle anyway.
> > (I'm aware that we are also changing the node names and labels right
> > now, it seems less problematic at this point since the SDM845 thermal
> > zones are a fairly recent addition)
> >
> > > - cpu_crit0: trip1 {
> > > + cpu0_crit: cpu_crit@0 {
> >
> > nit: does the @0 add any value here? IIUC there can be only one
> > critical trip point, hence there will never be a cpu_crit@1 or
> > higher.
>
> Agreed. Will remove.
>
> > > temperature = <110000>;
> > > hysteresis = <1000>;
> > > type = "critical";
> > > };
> > > };
> > > +
> > > + cooling-maps {
> > > + map0 {
> > > + trip = <&cpu0_alert0>;
> > > + cooling-device = <&CPU0 THERMAL_NO_LIMIT 4>,
> > > + <&CPU1 THERMAL_NO_LIMIT 4>,
> > > + <&CPU2 THERMAL_NO_LIMIT 4>,
> > > + <&CPU3 THERMAL_NO_LIMIT 4>;
> > > + };
> >
> > Out of curiosity: how did you determing the max cooling state of 4?
>
> Just some basic testing by pinning a dhrystone benchmark to each of
> the cores along with some stress-ng threads. Lopping off the top 4
> OPPs seemed to mitigate anything I could throw at the board.
Thanks for sharing your approach!
> I'm unable to do the "device in a closed car on a hot summer day" type
> of tests on the dev board. Nevertheless, I've changed the patch now to
> only remove the boost frequency at 75 degrees and then full throttling
> at 95 degrees.
>
> I'd appreciate more "real world" testing to validate these.
Sure, we'll run some tests with the new configuration on our site.
Thanks
Matthias