2024-02-29 19:27:37

by Alexey Charkov

[permalink] [raw]
Subject: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
active cooling on Radxa Rock 5B via the provided PWM fan.

Some RK3588 boards use separate regulators to supply CPUs and their
respective memory interfaces, so this is handled by coupling those
regulators in affected boards' device trees to ensure that their
voltage is adjusted in step.

In this revision of the series I chose to enable TSADC for all boards
at .dtsi level, because:
- The defaults already in .dtsi should work for all users, given that
the CRU based resets don't need any out-of-chip components, and
the CRU vs. PMIC reset is pretty much the only thing a board might
have to configure / override there
- The boards that have TSADC_SHUT signal wired to the PMIC reset line
can still choose to override the reset logic in their .dts. Or stay
with CRU based resets, as downstream kernels do anyway
- The on-by-default approach helps ensure thermal protections are in
place (emergency reset and throttling) for any board even with a
rudimentary .dts, and thus lets us introduce CPU DVFS with better
peace of mind

Fan control on Rock 5B has been split into two intervals: let it spin
at the minimum cooling state between 55C and 65C, and then accelerate
if the system crosses the 65C mark - thanks to Dragan for suggesting.
This lets some cooling setups with beefier heatsinks and/or larger
fan fins to stay in the quietest non-zero fan state while still
gaining potential benefits from the airflow it generates, and
possibly avoiding noisy speeds altogether for some workloads.

OPPs help actually scale CPU frequencies up and down for both cooling
and performance - tested on Rock 5B under varied loads. I've split
the patch into two parts: the first containing those OPPs that seem
to be no-regret with general consensus during v1 review [2], while
the second contains OPPs that cause frequency reductions without
accompanying decrease in CPU voltage. There seems to be a slight
performance gain in some workload scenarios when using these, but
previous discussion was inconclusive as to whether they should be
included or not. Having them as separate patches enables easier
comparison and partial reversion if people want to test it under
their workloads, and also enables the first 'no-regret' part to be
merged to -next while the jury is still out on the second one.

[1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
[2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c

Signed-off-by: Alexey Charkov <[email protected]>
---
Changes in v3:
- Added regulator coupling for EVB1 and QuartzPro64
- Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
- Added comments regarding two passive cooling trips in each zone (thanks Dragan)
- Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
- Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
churn there since the version he acknowledged
- Link to v2: https://lore.kernel.org/r/[email protected]

Changes in v2:
- Dropped the rfkill patch which Heiko has already applied
- Set higher 'polling-delay-passive' (100 instead of 20)
- Name all cooling maps starting from map0 in each respective zone
- Drop 'contribution' properties from passive cooling maps
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Alexey Charkov (5):
arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
arm64: dts: rockchip: enable automatic active cooling on Rock 5B
arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs

arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 +
.../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 +
arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +-
arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385 ++++++++++++++++++++-
4 files changed, 437 insertions(+), 2 deletions(-)
---
base-commit: cf1182944c7cc9f1c21a8a44e0d29abe12527412
change-id: 20240124-rk-dts-additions-a6d7b52787b9

Best regards,
--
Alexey Charkov <[email protected]>



2024-02-29 19:27:55

by Alexey Charkov

[permalink] [raw]
Subject: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

Include thermal zones information in device tree for RK3588 variants.

This also enables the TSADC controller unconditionally on all boards
to ensure that thermal protections are in place via throttling and
emergency reset, once OPPs are added to enable CPU DVFS.

The default settings (using CRU as the emergency reset mechanism)
should work on all boards regardless of their wiring, as CRU resets
do not depend on any external components. Boards that have the TSHUT
signal wired to the reset line of the PMIC may opt to switch to GPIO
tshut mode instead (rockchip,hw-tshut-mode = <1>;)

It seems though that downstream kernels don't use that, even for
those boards where the wiring allows for GPIO based tshut, such as
Radxa Rock 5B [1], [2], [3]

[1] https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
[2] https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
[3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf page 11 (TSADC_SHUT_H)

Signed-off-by: Alexey Charkov <[email protected]>
---
arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176 +++++++++++++++++++++++++++++-
1 file changed, 175 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
index 36b1b7acfe6a..9bf197358642 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
@@ -10,6 +10,7 @@
#include <dt-bindings/reset/rockchip,rk3588-cru.h>
#include <dt-bindings/phy/phy.h>
#include <dt-bindings/ata/ahci.h>
+#include <dt-bindings/thermal/thermal.h>

/ {
compatible = "rockchip,rk3588";
@@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
pinctrl-1 = <&tsadc_shut>;
pinctrl-names = "gpio", "otpout";
#thermal-sensor-cells = <1>;
- status = "disabled";
+ status = "okay";
+ };
+
+ thermal_zones: thermal-zones {
+ /* sensor near the center of the SoC */
+ package_thermal: package-thermal {
+ polling-delay-passive = <0>;
+ polling-delay = <0>;
+ thermal-sensors = <&tsadc 0>;
+
+ trips {
+ package_crit: package-crit {
+ temperature = <115000>;
+ hysteresis = <0>;
+ type = "critical";
+ };
+ };
+ };
+
+ /* sensor between A76 cores 0 and 1 */
+ bigcore0_thermal: bigcore0-thermal {
+ polling-delay-passive = <100>;
+ polling-delay = <0>;
+ thermal-sensors = <&tsadc 1>;
+
+ trips {
+ /* threshold to start collecting temperature
+ * statistics e.g. with the IPA governor
+ */
+ bigcore0_alert0: bigcore0-alert0 {
+ temperature = <75000>;
+ hysteresis = <2000>;
+ type = "passive";
+ };
+ /* actual control temperature */
+ bigcore0_alert1: bigcore0-alert1 {
+ temperature = <85000>;
+ hysteresis = <2000>;
+ type = "passive";
+ };
+ bigcore0_crit: bigcore0-crit {
+ temperature = <115000>;
+ hysteresis = <0>;
+ type = "critical";
+ };
+ };
+ cooling-maps {
+ map0 {
+ trip = <&bigcore0_alert1>;
+ cooling-device =
+ <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
+ };
+
+ /* sensor between A76 cores 2 and 3 */
+ bigcore2_thermal: bigcore2-thermal {
+ polling-delay-passive = <100>;
+ polling-delay = <0>;
+ thermal-sensors = <&tsadc 2>;
+
+ trips {
+ /* threshold to start collecting temperature
+ * statistics e.g. with the IPA governor
+ */
+ bigcore2_alert0: bigcore2-alert0 {
+ temperature = <75000>;
+ hysteresis = <2000>;
+ type = "passive";
+ };
+ /* actual control temperature */
+ bigcore2_alert1: bigcore2-alert1 {
+ temperature = <85000>;
+ hysteresis = <2000>;
+ type = "passive";
+ };
+ bigcore2_crit: bigcore2-crit {
+ temperature = <115000>;
+ hysteresis = <0>;
+ type = "critical";
+ };
+ };
+ cooling-maps {
+ map0 {
+ trip = <&bigcore2_alert1>;
+ cooling-device =
+ <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
+ };
+
+ /* sensor between the four A55 cores */
+ little_core_thermal: littlecore-thermal {
+ polling-delay-passive = <100>;
+ polling-delay = <0>;
+ thermal-sensors = <&tsadc 3>;
+
+ trips {
+ /* threshold to start collecting temperature
+ * statistics e.g. with the IPA governor
+ */
+ littlecore_alert0: littlecore-alert0 {
+ temperature = <75000>;
+ hysteresis = <2000>;
+ type = "passive";
+ };
+ /* actual control temperature */
+ littlecore_alert1: littlecore-alert1 {
+ temperature = <85000>;
+ hysteresis = <2000>;
+ type = "passive";
+ };
+ littlecore_crit: littlecore-crit {
+ temperature = <115000>;
+ hysteresis = <0>;
+ type = "critical";
+ };
+ };
+ cooling-maps {
+ map0 {
+ trip = <&littlecore_alert1>;
+ cooling-device =
+ <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+ <&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+ };
+ };
+ };
+
+ /* sensor near the PD_CENTER power domain */
+ center_thermal: center-thermal {
+ polling-delay-passive = <0>;
+ polling-delay = <0>;
+ thermal-sensors = <&tsadc 4>;
+
+ trips {
+ center_crit: center-crit {
+ temperature = <115000>;
+ hysteresis = <0>;
+ type = "critical";
+ };
+ };
+ };
+
+ gpu_thermal: gpu-thermal {
+ polling-delay-passive = <0>;
+ polling-delay = <0>;
+ thermal-sensors = <&tsadc 5>;
+
+ trips {
+ gpu_crit: gpu-crit {
+ temperature = <115000>;
+ hysteresis = <0>;
+ type = "critical";
+ };
+ };
+ };
+
+ npu_thermal: npu-thermal {
+ polling-delay-passive = <0>;
+ polling-delay = <0>;
+ thermal-sensors = <&tsadc 6>;
+
+ trips {
+ npu_crit: npu-crit {
+ temperature = <115000>;
+ hysteresis = <0>;
+ type = "critical";
+ };
+ };
+ };
};

saradc: adc@fec10000 {

--
2.44.0


2024-02-29 19:28:01

by Alexey Charkov

[permalink] [raw]
Subject: [PATCH v3 2/5] arm64: dts: rockchip: enable automatic active cooling on Rock 5B

This links the PWM fan on Radxa Rock 5B as an active cooling device
managed automatically by the thermal subsystem, with a target SoC
temperature of 65C and a minimum-spin interval from 55C to 65C to
ensure airflow when the system gets warm

Signed-off-by: Alexey Charkov <[email protected]>
---
arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 ++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
index a0e303c3a1dc..3f7fb055c4dc 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
@@ -52,7 +52,7 @@ led_rgb_b {

fan: pwm-fan {
compatible = "pwm-fan";
- cooling-levels = <0 95 145 195 255>;
+ cooling-levels = <0 120 150 180 210 240 255>;
fan-supply = <&vcc5v0_sys>;
pwms = <&pwm1 0 50000 0>;
#cooling-cells = <2>;
@@ -173,6 +173,34 @@ &cpu_l3 {
cpu-supply = <&vdd_cpu_lit_s0>;
};

+&package_thermal {
+ polling-delay = <1000>;
+
+ trips {
+ package_fan0: package-fan0 {
+ temperature = <55000>;
+ hysteresis = <2000>;
+ type = "active";
+ };
+ package_fan1: package-fan1 {
+ temperature = <65000>;
+ hysteresis = <2000>;
+ type = "active";
+ };
+ };
+
+ cooling-maps {
+ map1 {
+ trip = <&package_fan0>;
+ cooling-device = <&fan THERMAL_NO_LIMIT 1>;
+ };
+ map2 {
+ trip = <&package_fan1>;
+ cooling-device = <&fan 2 THERMAL_NO_LIMIT>;
+ };
+ };
+};
+
&i2c0 {
pinctrl-names = "default";
pinctrl-0 = <&i2c0m2_xfer>;

--
2.44.0


2024-02-29 19:28:25

by Alexey Charkov

[permalink] [raw]
Subject: [PATCH v3 3/5] arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588

RK3588 chips allow for their CPU cores to be powered by a different
supply vs. their corresponding memory interfaces, and two of the
boards currently upstream do that (EVB1 and QuartzPro64).

The voltage of the memory interface though has to match that of the
CPU cores that use it, which downstream kernels achieve by the means
of a custom cpufreq driver which adjusts both at the same time.

It seems that regulator coupling is a more appropriate generic
interface for it, so this patch introduces coupling to affected
device trees to ensure that memory interface voltage is also updated
whenever cpufreq switches between CPU OPPs.

Note that other boards, such as Radxa Rock 5B, define both the CPU
and memory interface regulators as aliases to the same DT node, so
this doesn't apply there.

Signed-off-by: Alexey Charkov <[email protected]>
---
arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 ++++++++++++
arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 ++++++++++++
2 files changed, 24 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts b/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
index de30c2632b8e..dfae67f1e9c7 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
@@ -788,6 +788,8 @@ regulators {
vdd_cpu_big1_s0: dcdc-reg1 {
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_big1_mem_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <550000>;
regulator-max-microvolt = <1050000>;
regulator-ramp-delay = <12500>;
@@ -800,6 +802,8 @@ regulator-state-mem {
vdd_cpu_big0_s0: dcdc-reg2 {
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_big0_mem_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <550000>;
regulator-max-microvolt = <1050000>;
regulator-ramp-delay = <12500>;
@@ -812,6 +816,8 @@ regulator-state-mem {
vdd_cpu_lit_s0: dcdc-reg3 {
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_lit_mem_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <550000>;
regulator-max-microvolt = <950000>;
regulator-ramp-delay = <12500>;
@@ -836,6 +842,8 @@ regulator-state-mem {
vdd_cpu_big1_mem_s0: dcdc-reg5 {
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_big1_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <675000>;
regulator-max-microvolt = <1050000>;
regulator-ramp-delay = <12500>;
@@ -849,6 +857,8 @@ regulator-state-mem {
vdd_cpu_big0_mem_s0: dcdc-reg6 {
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_big0_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <675000>;
regulator-max-microvolt = <1050000>;
regulator-ramp-delay = <12500>;
@@ -873,6 +883,8 @@ regulator-state-mem {
vdd_cpu_lit_mem_s0: dcdc-reg8 {
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_lit_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <675000>;
regulator-max-microvolt = <950000>;
regulator-ramp-delay = <12500>;
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
index 87a0abf95f7d..9c038450cd7c 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
@@ -818,6 +818,8 @@ vdd_cpu_big1_s0: dcdc-reg1 {
regulator-name = "vdd_cpu_big1_s0";
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_big1_mem_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <550000>;
regulator-max-microvolt = <1050000>;
regulator-ramp-delay = <12500>;
@@ -831,6 +833,8 @@ vdd_cpu_big0_s0: dcdc-reg2 {
regulator-name = "vdd_cpu_big0_s0";
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_big0_mem_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <550000>;
regulator-max-microvolt = <1050000>;
regulator-ramp-delay = <12500>;
@@ -844,6 +848,8 @@ vdd_cpu_lit_s0: dcdc-reg3 {
regulator-name = "vdd_cpu_lit_s0";
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_lit_mem_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <550000>;
regulator-max-microvolt = <950000>;
regulator-ramp-delay = <12500>;
@@ -870,6 +876,8 @@ vdd_cpu_big1_mem_s0: dcdc-reg5 {
regulator-name = "vdd_cpu_big1_mem_s0";
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_big1_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <675000>;
regulator-max-microvolt = <1050000>;
regulator-ramp-delay = <12500>;
@@ -884,6 +892,8 @@ vdd_cpu_big0_mem_s0: dcdc-reg6 {
regulator-name = "vdd_cpu_big0_mem_s0";
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_big0_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <675000>;
regulator-max-microvolt = <1050000>;
regulator-ramp-delay = <12500>;
@@ -910,6 +920,8 @@ vdd_cpu_lit_mem_s0: dcdc-reg8 {
regulator-name = "vdd_cpu_lit_mem_s0";
regulator-always-on;
regulator-boot-on;
+ regulator-coupled-with = <&vdd_cpu_lit_s0>;
+ regulator-coupled-max-spread = <10000>;
regulator-min-microvolt = <675000>;
regulator-max-microvolt = <950000>;
regulator-ramp-delay = <12500>;

--
2.44.0


2024-02-29 19:28:34

by Alexey Charkov

[permalink] [raw]
Subject: [PATCH v3 4/5] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588

By default the CPUs on RK3588 start up in a conservative performance
mode. Add frequency and voltage mappings to the device tree to enable
dynamic scaling via cpufreq.

OPP values are adapted from Radxa's downstream kernel for Rock 5B [1],
stripping them down to the minimum frequency and voltage combinations
as expected by the generic upstream cpufreq-dt driver, and also dropping
those OPPs that don't differ in voltage but only in frequency (keeping
the top frequency OPP in each case).

Note that this patch ignores voltage scaling for the CPU memory
interface which the downstream kernel does through a custom cpufreq
driver, and which is why the downstream version has two sets of voltage
values for each OPP (the second one being meant for the memory
interface supply regulator). This is done instead via regulator
coupling between CPU and memory interface supplies on affected boards.

This has been tested on Rock 5B with u-boot 2023.11 compiled from
Collabora's integration tree [2] with binary bl31 and appears to be
stable both under active cooling and passive cooling (with throttling)

[1] https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
[2] https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot

Signed-off-by: Alexey Charkov <[email protected]>
---
arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 122 ++++++++++++++++++++++++++++++
1 file changed, 122 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
index 9bf197358642..bd39c5c47bfb 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
@@ -97,6 +97,7 @@ cpu_l0: cpu@0 {
clocks = <&scmi_clk SCMI_CLK_CPUL>;
assigned-clocks = <&scmi_clk SCMI_CLK_CPUL>;
assigned-clock-rates = <816000000>;
+ operating-points-v2 = <&cluster0_opp_table>;
cpu-idle-states = <&CPU_SLEEP>;
i-cache-size = <32768>;
i-cache-line-size = <64>;
@@ -116,6 +117,7 @@ cpu_l1: cpu@100 {
enable-method = "psci";
capacity-dmips-mhz = <530>;
clocks = <&scmi_clk SCMI_CLK_CPUL>;
+ operating-points-v2 = <&cluster0_opp_table>;
cpu-idle-states = <&CPU_SLEEP>;
i-cache-size = <32768>;
i-cache-line-size = <64>;
@@ -135,6 +137,7 @@ cpu_l2: cpu@200 {
enable-method = "psci";
capacity-dmips-mhz = <530>;
clocks = <&scmi_clk SCMI_CLK_CPUL>;
+ operating-points-v2 = <&cluster0_opp_table>;
cpu-idle-states = <&CPU_SLEEP>;
i-cache-size = <32768>;
i-cache-line-size = <64>;
@@ -154,6 +157,7 @@ cpu_l3: cpu@300 {
enable-method = "psci";
capacity-dmips-mhz = <530>;
clocks = <&scmi_clk SCMI_CLK_CPUL>;
+ operating-points-v2 = <&cluster0_opp_table>;
cpu-idle-states = <&CPU_SLEEP>;
i-cache-size = <32768>;
i-cache-line-size = <64>;
@@ -175,6 +179,7 @@ cpu_b0: cpu@400 {
clocks = <&scmi_clk SCMI_CLK_CPUB01>;
assigned-clocks = <&scmi_clk SCMI_CLK_CPUB01>;
assigned-clock-rates = <816000000>;
+ operating-points-v2 = <&cluster1_opp_table>;
cpu-idle-states = <&CPU_SLEEP>;
i-cache-size = <65536>;
i-cache-line-size = <64>;
@@ -194,6 +199,7 @@ cpu_b1: cpu@500 {
enable-method = "psci";
capacity-dmips-mhz = <1024>;
clocks = <&scmi_clk SCMI_CLK_CPUB01>;
+ operating-points-v2 = <&cluster1_opp_table>;
cpu-idle-states = <&CPU_SLEEP>;
i-cache-size = <65536>;
i-cache-line-size = <64>;
@@ -215,6 +221,7 @@ cpu_b2: cpu@600 {
clocks = <&scmi_clk SCMI_CLK_CPUB23>;
assigned-clocks = <&scmi_clk SCMI_CLK_CPUB23>;
assigned-clock-rates = <816000000>;
+ operating-points-v2 = <&cluster2_opp_table>;
cpu-idle-states = <&CPU_SLEEP>;
i-cache-size = <65536>;
i-cache-line-size = <64>;
@@ -234,6 +241,7 @@ cpu_b3: cpu@700 {
enable-method = "psci";
capacity-dmips-mhz = <1024>;
clocks = <&scmi_clk SCMI_CLK_CPUB23>;
+ operating-points-v2 = <&cluster2_opp_table>;
cpu-idle-states = <&CPU_SLEEP>;
i-cache-size = <65536>;
i-cache-line-size = <64>;
@@ -348,6 +356,120 @@ l3_cache: l3-cache {
};
};

+ cluster0_opp_table: opp-table-cluster0 {
+ compatible = "operating-points-v2";
+ opp-shared;
+
+ opp-1008000000 {
+ opp-hz = /bits/ 64 <1008000000>;
+ opp-microvolt = <675000 675000 950000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1200000000 {
+ opp-hz = /bits/ 64 <1200000000>;
+ opp-microvolt = <712500 712500 950000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1416000000 {
+ opp-hz = /bits/ 64 <1416000000>;
+ opp-microvolt = <762500 762500 950000>;
+ clock-latency-ns = <40000>;
+ opp-suspend;
+ };
+ opp-1608000000 {
+ opp-hz = /bits/ 64 <1608000000>;
+ opp-microvolt = <850000 850000 950000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1800000000 {
+ opp-hz = /bits/ 64 <1800000000>;
+ opp-microvolt = <950000 950000 950000>;
+ clock-latency-ns = <40000>;
+ };
+ };
+
+ cluster1_opp_table: opp-table-cluster1 {
+ compatible = "operating-points-v2";
+ opp-shared;
+
+ opp-1200000000 {
+ opp-hz = /bits/ 64 <1200000000>;
+ opp-microvolt = <675000 675000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1416000000 {
+ opp-hz = /bits/ 64 <1416000000>;
+ opp-microvolt = <725000 725000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1608000000 {
+ opp-hz = /bits/ 64 <1608000000>;
+ opp-microvolt = <762500 762500 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1800000000 {
+ opp-hz = /bits/ 64 <1800000000>;
+ opp-microvolt = <850000 850000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-2016000000 {
+ opp-hz = /bits/ 64 <2016000000>;
+ opp-microvolt = <925000 925000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-2208000000 {
+ opp-hz = /bits/ 64 <2208000000>;
+ opp-microvolt = <987500 987500 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-2400000000 {
+ opp-hz = /bits/ 64 <2400000000>;
+ opp-microvolt = <1000000 1000000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ };
+
+ cluster2_opp_table: opp-table-cluster2 {
+ compatible = "operating-points-v2";
+ opp-shared;
+
+ opp-1200000000 {
+ opp-hz = /bits/ 64 <1200000000>;
+ opp-microvolt = <675000 675000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1416000000 {
+ opp-hz = /bits/ 64 <1416000000>;
+ opp-microvolt = <725000 725000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1608000000 {
+ opp-hz = /bits/ 64 <1608000000>;
+ opp-microvolt = <762500 762500 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1800000000 {
+ opp-hz = /bits/ 64 <1800000000>;
+ opp-microvolt = <850000 850000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-2016000000 {
+ opp-hz = /bits/ 64 <2016000000>;
+ opp-microvolt = <925000 925000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-2208000000 {
+ opp-hz = /bits/ 64 <2208000000>;
+ opp-microvolt = <987500 987500 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-2400000000 {
+ opp-hz = /bits/ 64 <2400000000>;
+ opp-microvolt = <1000000 1000000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ };
+
firmware {
optee: optee {
compatible = "linaro,optee-tz";

--
2.44.0


2024-02-29 20:21:46

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

Hello Alexey,

On 2024-02-29 20:26, Alexey Charkov wrote:
> Include thermal zones information in device tree for RK3588 variants.
>
> This also enables the TSADC controller unconditionally on all boards
> to ensure that thermal protections are in place via throttling and
> emergency reset, once OPPs are added to enable CPU DVFS.
>
> The default settings (using CRU as the emergency reset mechanism)
> should work on all boards regardless of their wiring, as CRU resets
> do not depend on any external components. Boards that have the TSHUT
> signal wired to the reset line of the PMIC may opt to switch to GPIO
> tshut mode instead (rockchip,hw-tshut-mode = <1>;)

Quite frankly, I'm still not sure that enabling this on the SoC level
is the way to go. As I already described in detail, [4] according to
the RK3588 Hardware Design Guide v1.0 and the Rock 5B schematic, we
should actually use GPIO-based handling for the thermal runaways on
the Rock 5B. Other boards should also be investigated individually,
and the TSADC should be enabled on a board-to-board basis.

[4]
https://lore.kernel.org/linux-rockchip/[email protected]/

> It seems though that downstream kernels don't use that, even for
> those boards where the wiring allows for GPIO based tshut, such as
> Radxa Rock 5B [1], [2], [3]
>
> [1]
> https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
> [2]
> https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
> [3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf
> page 11 (TSADC_SHUT_H)
>
> Signed-off-by: Alexey Charkov <[email protected]>
> ---
> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176
> +++++++++++++++++++++++++++++-
> 1 file changed, 175 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> index 36b1b7acfe6a..9bf197358642 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> @@ -10,6 +10,7 @@
> #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> #include <dt-bindings/phy/phy.h>
> #include <dt-bindings/ata/ahci.h>
> +#include <dt-bindings/thermal/thermal.h>
>
> / {
> compatible = "rockchip,rk3588";
> @@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
> pinctrl-1 = <&tsadc_shut>;
> pinctrl-names = "gpio", "otpout";
> #thermal-sensor-cells = <1>;
> - status = "disabled";
> + status = "okay";
> + };
> +
> + thermal_zones: thermal-zones {
> + /* sensor near the center of the SoC */
> + package_thermal: package-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 0>;
> +
> + trips {
> + package_crit: package-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + /* sensor between A76 cores 0 and 1 */
> + bigcore0_thermal: bigcore0-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 1>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */
> + bigcore0_alert0: bigcore0-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */
> + bigcore0_alert1: bigcore0-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + bigcore0_crit: bigcore0-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&bigcore0_alert1>;
> + cooling-device =
> + <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor between A76 cores 2 and 3 */
> + bigcore2_thermal: bigcore2-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 2>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */
> + bigcore2_alert0: bigcore2-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */
> + bigcore2_alert1: bigcore2-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + bigcore2_crit: bigcore2-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&bigcore2_alert1>;
> + cooling-device =
> + <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor between the four A55 cores */
> + little_core_thermal: littlecore-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 3>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */
> + littlecore_alert0: littlecore-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */
> + littlecore_alert1: littlecore-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + littlecore_crit: littlecore-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&littlecore_alert1>;
> + cooling-device =
> + <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor near the PD_CENTER power domain */
> + center_thermal: center-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 4>;
> +
> + trips {
> + center_crit: center-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + gpu_thermal: gpu-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 5>;
> +
> + trips {
> + gpu_crit: gpu-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + npu_thermal: npu-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 6>;
> +
> + trips {
> + npu_crit: npu-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> };
>
> saradc: adc@fec10000 {

2024-02-29 21:14:17

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

Hello Alexey,

Please see also some nitpicks below, which I forgot to mention in
my earlier response. I'm sorry for that.

On 2024-02-29 20:26, Alexey Charkov wrote:
> Include thermal zones information in device tree for RK3588 variants.
>
> This also enables the TSADC controller unconditionally on all boards
> to ensure that thermal protections are in place via throttling and
> emergency reset, once OPPs are added to enable CPU DVFS.
>
> The default settings (using CRU as the emergency reset mechanism)
> should work on all boards regardless of their wiring, as CRU resets
> do not depend on any external components. Boards that have the TSHUT
> signal wired to the reset line of the PMIC may opt to switch to GPIO
> tshut mode instead (rockchip,hw-tshut-mode = <1>;)
>
> It seems though that downstream kernels don't use that, even for
> those boards where the wiring allows for GPIO based tshut, such as
> Radxa Rock 5B [1], [2], [3]
>
> [1]
> https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
> [2]
> https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
> [3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf
> page 11 (TSADC_SHUT_H)
>
> Signed-off-by: Alexey Charkov <[email protected]>
> ---
> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176
> +++++++++++++++++++++++++++++-
> 1 file changed, 175 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> index 36b1b7acfe6a..9bf197358642 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> @@ -10,6 +10,7 @@
> #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> #include <dt-bindings/phy/phy.h>
> #include <dt-bindings/ata/ahci.h>
> +#include <dt-bindings/thermal/thermal.h>
>
> / {
> compatible = "rockchip,rk3588";
> @@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
> pinctrl-1 = <&tsadc_shut>;
> pinctrl-names = "gpio", "otpout";
> #thermal-sensor-cells = <1>;
> - status = "disabled";
> + status = "okay";
> + };
> +
> + thermal_zones: thermal-zones {
> + /* sensor near the center of the SoC */
> + package_thermal: package-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 0>;
> +
> + trips {
> + package_crit: package-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + /* sensor between A76 cores 0 and 1 */
> + bigcore0_thermal: bigcore0-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 1>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */

See, I'm not a native English speaker, but I've spent a lot of time
and effort improving my English skills. Thus, perhaps these comments
may or may not seem like unnecessary nitpicking, depending on how much
someone pays attention to writing style in general, but I'll risk to
be annoying and state these comments anyway. :)

The comment above could be written in a much more condensed form like
this, which would also be a bit more accurate:


/* IPA threshold, when IPA governor is used */

IOW, we're writing all this for someone to read later, but we should
(and can) perfectly reasonably expect some already existing background
knowledge from the readers. In other words, we should be as concise
as possible.

> + bigcore0_alert0: bigcore0-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */

Similarly to the above, I'd suggest this:

/* IPA target, when IPA governor is used */

Having such brief comments should make it all perfectly understandable
to anyone who's already familiar with the way IPA governor works.
Everyone
else should be welcome to read up a bit on IPA first.

> + bigcore0_alert1: bigcore0-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + bigcore0_crit: bigcore0-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&bigcore0_alert1>;
> + cooling-device =
> + <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor between A76 cores 2 and 3 */
> + bigcore2_thermal: bigcore2-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 2>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */

The same as above.

> + bigcore2_alert0: bigcore2-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */

The same as above.

> + bigcore2_alert1: bigcore2-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + bigcore2_crit: bigcore2-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&bigcore2_alert1>;
> + cooling-device =
> + <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor between the four A55 cores */
> + little_core_thermal: littlecore-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 3>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */

The same as above.

> + littlecore_alert0: littlecore-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */

The same as above.

> + littlecore_alert1: littlecore-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + littlecore_crit: littlecore-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&littlecore_alert1>;
> + cooling-device =
> + <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor near the PD_CENTER power domain */
> + center_thermal: center-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 4>;
> +
> + trips {
> + center_crit: center-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + gpu_thermal: gpu-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 5>;
> +
> + trips {
> + gpu_crit: gpu-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + npu_thermal: npu-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 6>;
> +
> + trips {
> + npu_crit: npu-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> };
>
> saradc: adc@fec10000 {

2024-02-29 21:25:32

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] arm64: dts: rockchip: enable automatic active cooling on Rock 5B

Hello Alexey,

On 2024-02-29 20:26, Alexey Charkov wrote:
> This links the PWM fan on Radxa Rock 5B as an active cooling device
> managed automatically by the thermal subsystem, with a target SoC
> temperature of 65C and a minimum-spin interval from 55C to 65C to
> ensure airflow when the system gets warm

I'd suggest that you replace "automatic active cooling" with "active
cooling" in the patch subject. I know, it may seem like more of the
unnecessary nitpicking, :) but I hope you'll agree that "automatic"
is actually redundant there. It would also make the patch subject
a bit shorter.

Another option would be to replace "automatic active cooling" with
"automatic fan control", which may actually be a better choice.
I'd be happy with whichever one you prefer. :)

Otherwise, please feel free to add:

Reviewed-by: Dragan Simic <[email protected]>

> Signed-off-by: Alexey Charkov <[email protected]>
> ---
> arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30
> ++++++++++++++++++++++++-
> 1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> index a0e303c3a1dc..3f7fb055c4dc 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> @@ -52,7 +52,7 @@ led_rgb_b {
>
> fan: pwm-fan {
> compatible = "pwm-fan";
> - cooling-levels = <0 95 145 195 255>;
> + cooling-levels = <0 120 150 180 210 240 255>;
> fan-supply = <&vcc5v0_sys>;
> pwms = <&pwm1 0 50000 0>;
> #cooling-cells = <2>;
> @@ -173,6 +173,34 @@ &cpu_l3 {
> cpu-supply = <&vdd_cpu_lit_s0>;
> };
>
> +&package_thermal {
> + polling-delay = <1000>;
> +
> + trips {
> + package_fan0: package-fan0 {
> + temperature = <55000>;
> + hysteresis = <2000>;
> + type = "active";
> + };
> + package_fan1: package-fan1 {
> + temperature = <65000>;
> + hysteresis = <2000>;
> + type = "active";
> + };
> + };
> +
> + cooling-maps {
> + map1 {
> + trip = <&package_fan0>;
> + cooling-device = <&fan THERMAL_NO_LIMIT 1>;
> + };
> + map2 {
> + trip = <&package_fan1>;
> + cooling-device = <&fan 2 THERMAL_NO_LIMIT>;
> + };
> + };
> +};
> +
> &i2c0 {
> pinctrl-names = "default";
> pinctrl-0 = <&i2c0m2_xfer>;

2024-02-29 19:28:47

by Alexey Charkov

[permalink] [raw]
Subject: [PATCH v3 5/5] arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs

This introduces additional OPPs that share the same voltage as
another OPP already present in the .dtsi but with lower frequency.

The idea is to try and limit system throughput more gradually upon
reaching the throttling condition for workloads that are close to
sustainable power already, thus avoiding needless performance loss.

My limited synthetic benchmarking [1] showed around 3.8% performance
benefit when these are in place, other things equal (not meant to
be comprehensive). Though dmesg complains about these OPPs being
'inefficient':

[ 9.009561] cpu cpu0: EM: OPP:816000 is inefficient
[ 9.009580] cpu cpu0: EM: OPP:600000 is inefficient
[ 9.009591] cpu cpu0: EM: OPP:408000 is inefficient
[ 9.011370] cpu cpu4: EM: OPP:2352000 is inefficient
[ 9.011379] cpu cpu4: EM: OPP:2304000 is inefficient
[ 9.011384] cpu cpu4: EM: OPP:2256000 is inefficient
[ 9.011389] cpu cpu4: EM: OPP:600000 is inefficient
[ 9.011393] cpu cpu4: EM: OPP:408000 is inefficient
[ 9.012978] cpu cpu6: EM: OPP:2352000 is inefficient
[ 9.012987] cpu cpu6: EM: OPP:2304000 is inefficient
[ 9.012992] cpu cpu6: EM: OPP:2256000 is inefficient
[ 9.012996] cpu cpu6: EM: OPP:600000 is inefficient
[ 9.013000] cpu cpu6: EM: OPP:408000 is inefficient

[1] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#me92aa0ee25e6eeb1d1501ce85f5af4e58b3b13c5

Signed-off-by: Alexey Charkov <[email protected]>
---
arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 87 +++++++++++++++++++++++++++++++
1 file changed, 87 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
index bd39c5c47bfb..6b4ecc7ab37d 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
@@ -360,6 +360,21 @@ cluster0_opp_table: opp-table-cluster0 {
compatible = "operating-points-v2";
opp-shared;

+ opp-408000000 {
+ opp-hz = /bits/ 64 <408000000>;
+ opp-microvolt = <675000 675000 950000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-600000000 {
+ opp-hz = /bits/ 64 <600000000>;
+ opp-microvolt = <675000 675000 950000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-816000000 {
+ opp-hz = /bits/ 64 <816000000>;
+ opp-microvolt = <675000 675000 950000>;
+ clock-latency-ns = <40000>;
+ };
opp-1008000000 {
opp-hz = /bits/ 64 <1008000000>;
opp-microvolt = <675000 675000 950000>;
@@ -392,6 +407,27 @@ cluster1_opp_table: opp-table-cluster1 {
compatible = "operating-points-v2";
opp-shared;

+ opp-408000000 {
+ opp-hz = /bits/ 64 <408000000>;
+ opp-microvolt = <675000 675000 1000000>;
+ clock-latency-ns = <40000>;
+ opp-suspend;
+ };
+ opp-600000000 {
+ opp-hz = /bits/ 64 <600000000>;
+ opp-microvolt = <675000 675000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-816000000 {
+ opp-hz = /bits/ 64 <816000000>;
+ opp-microvolt = <675000 675000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1008000000 {
+ opp-hz = /bits/ 64 <1008000000>;
+ opp-microvolt = <675000 675000 1000000>;
+ clock-latency-ns = <40000>;
+ };
opp-1200000000 {
opp-hz = /bits/ 64 <1200000000>;
opp-microvolt = <675000 675000 1000000>;
@@ -422,6 +458,21 @@ opp-2208000000 {
opp-microvolt = <987500 987500 1000000>;
clock-latency-ns = <40000>;
};
+ opp-2256000000 {
+ opp-hz = /bits/ 64 <2256000000>;
+ opp-microvolt = <1000000 1000000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-2304000000 {
+ opp-hz = /bits/ 64 <2304000000>;
+ opp-microvolt = <1000000 1000000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-2352000000 {
+ opp-hz = /bits/ 64 <2352000000>;
+ opp-microvolt = <1000000 1000000 1000000>;
+ clock-latency-ns = <40000>;
+ };
opp-2400000000 {
opp-hz = /bits/ 64 <2400000000>;
opp-microvolt = <1000000 1000000 1000000>;
@@ -433,6 +484,27 @@ cluster2_opp_table: opp-table-cluster2 {
compatible = "operating-points-v2";
opp-shared;

+ opp-408000000 {
+ opp-hz = /bits/ 64 <408000000>;
+ opp-microvolt = <675000 675000 1000000>;
+ clock-latency-ns = <40000>;
+ opp-suspend;
+ };
+ opp-600000000 {
+ opp-hz = /bits/ 64 <600000000>;
+ opp-microvolt = <675000 675000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-816000000 {
+ opp-hz = /bits/ 64 <816000000>;
+ opp-microvolt = <675000 675000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-1008000000 {
+ opp-hz = /bits/ 64 <1008000000>;
+ opp-microvolt = <675000 675000 1000000>;
+ clock-latency-ns = <40000>;
+ };
opp-1200000000 {
opp-hz = /bits/ 64 <1200000000>;
opp-microvolt = <675000 675000 1000000>;
@@ -463,6 +535,21 @@ opp-2208000000 {
opp-microvolt = <987500 987500 1000000>;
clock-latency-ns = <40000>;
};
+ opp-2256000000 {
+ opp-hz = /bits/ 64 <2256000000>;
+ opp-microvolt = <1000000 1000000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-2304000000 {
+ opp-hz = /bits/ 64 <2304000000>;
+ opp-microvolt = <1000000 1000000 1000000>;
+ clock-latency-ns = <40000>;
+ };
+ opp-2352000000 {
+ opp-hz = /bits/ 64 <2352000000>;
+ opp-microvolt = <1000000 1000000 1000000>;
+ clock-latency-ns = <40000>;
+ };
opp-2400000000 {
opp-hz = /bits/ 64 <2400000000>;
opp-microvolt = <1000000 1000000 1000000>;

--
2.44.0


2024-03-01 05:13:13

by Alexey Charkov

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

Hi Dragan,

On Fri, Mar 1, 2024 at 12:21 AM Dragan Simic <[email protected]> wrote:
>
> Hello Alexey,
>
> On 2024-02-29 20:26, Alexey Charkov wrote:
> > Include thermal zones information in device tree for RK3588 variants.
> >
> > This also enables the TSADC controller unconditionally on all boards
> > to ensure that thermal protections are in place via throttling and
> > emergency reset, once OPPs are added to enable CPU DVFS.
> >
> > The default settings (using CRU as the emergency reset mechanism)
> > should work on all boards regardless of their wiring, as CRU resets
> > do not depend on any external components. Boards that have the TSHUT
> > signal wired to the reset line of the PMIC may opt to switch to GPIO
> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
>
> Quite frankly, I'm still not sure that enabling this on the SoC level
> is the way to go. As I already described in detail, [4] according to
> the RK3588 Hardware Design Guide v1.0 and the Rock 5B schematic, we
> should actually use GPIO-based handling for the thermal runaways on
> the Rock 5B. Other boards should also be investigated individually,
> and the TSADC should be enabled on a board-to-board basis.

With all due respect, I disagree, here is why:
- Neither the schematic nor the hardware design guide, on which the
schematic seems to be based, prescribes a particular way to handle
thermal runaways. They only provide the possibility of GPIO based
resets, along with the CRU based one
- My strong belief is that defaults (regardless of context) should be
safe and reasonable, and should also minimize the need to override
them
- In context of dts/dtsi, as far as I understand the general logic
behind the split, the SoC .dtsi should contain all the things that are
fully contained within the SoC and do not depend on the wiring of a
particular board or its target use case. Boards then
add/remove/override settings to match their wiring and use case more
closely

In the light of the last two points, I believe that enabling TSADC by
default is the more safe and reasonable choice, because it provides
crucial thermal protection logic for the SoC, and it can do so in a
board-agnostic way (if the CRU based reset is selected, which is the
current default).

Furthermore, TSADC and CRU are fully contained within the SoC, and I
cannot think of a use case where a board might be somehow
disadvantaged by TSADC being enabled, and thus need to disable it
altogether (maybe I'm missing something). The only thing that the
board might be adjusting is the thermal reset handling, and even then
it's rather a matter of choice/preference to switch away from CRU to
GPIO resets where the wiring permits it, rather than an existential
need. I presume that a PMIC-assisted reset causes deeper power cycling
of the SoC and might therefore help in some rare cases where the CRU
reset alone is not enough, but that would be niche.

All summed up, I believe that the default of "fry my board if I have
no heatsink and forget to include &tsadc {status = <okay>;}; in my
dts" is substantially inferior to the default of "my board could do a
deep power-cycle in this weird corner-case thermal-runaway situation
that somehow didn't get handled by active cooling, then by passive
cooling, then by a CRU reset, but I didn't include
rockchip,hw-tshut-mode = <1>; so poor luck for me".

Would be great to hear other perspectives from people on the list.

Best regards,
Alexey

> [4]
> https://lore.kernel.org/linux-rockchip/[email protected]/
>
> > It seems though that downstream kernels don't use that, even for
> > those boards where the wiring allows for GPIO based tshut, such as
> > Radxa Rock 5B [1], [2], [3]
> >
> > [1]
> > https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
> > [2]
> > https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
> > [3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf
> > page 11 (TSADC_SHUT_H)
> >
> > Signed-off-by: Alexey Charkov <[email protected]>
> > ---
> > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176
> > +++++++++++++++++++++++++++++-
> > 1 file changed, 175 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > index 36b1b7acfe6a..9bf197358642 100644
> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > @@ -10,6 +10,7 @@
> > #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> > #include <dt-bindings/phy/phy.h>
> > #include <dt-bindings/ata/ahci.h>
> > +#include <dt-bindings/thermal/thermal.h>
> >
> > / {
> > compatible = "rockchip,rk3588";
> > @@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
> > pinctrl-1 = <&tsadc_shut>;
> > pinctrl-names = "gpio", "otpout";
> > #thermal-sensor-cells = <1>;
> > - status = "disabled";
> > + status = "okay";
> > + };
> > +
> > + thermal_zones: thermal-zones {
> > + /* sensor near the center of the SoC */
> > + package_thermal: package-thermal {
> > + polling-delay-passive = <0>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 0>;
> > +
> > + trips {
> > + package_crit: package-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + };
> > +
> > + /* sensor between A76 cores 0 and 1 */
> > + bigcore0_thermal: bigcore0-thermal {
> > + polling-delay-passive = <100>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 1>;
> > +
> > + trips {
> > + /* threshold to start collecting temperature
> > + * statistics e.g. with the IPA governor
> > + */
> > + bigcore0_alert0: bigcore0-alert0 {
> > + temperature = <75000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + /* actual control temperature */
> > + bigcore0_alert1: bigcore0-alert1 {
> > + temperature = <85000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + bigcore0_crit: bigcore0-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + cooling-maps {
> > + map0 {
> > + trip = <&bigcore0_alert1>;
> > + cooling-device =
> > + <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > + };
> > + };
> > + };
> > +
> > + /* sensor between A76 cores 2 and 3 */
> > + bigcore2_thermal: bigcore2-thermal {
> > + polling-delay-passive = <100>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 2>;
> > +
> > + trips {
> > + /* threshold to start collecting temperature
> > + * statistics e.g. with the IPA governor
> > + */
> > + bigcore2_alert0: bigcore2-alert0 {
> > + temperature = <75000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + /* actual control temperature */
> > + bigcore2_alert1: bigcore2-alert1 {
> > + temperature = <85000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + bigcore2_crit: bigcore2-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + cooling-maps {
> > + map0 {
> > + trip = <&bigcore2_alert1>;
> > + cooling-device =
> > + <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > + };
> > + };
> > + };
> > +
> > + /* sensor between the four A55 cores */
> > + little_core_thermal: littlecore-thermal {
> > + polling-delay-passive = <100>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 3>;
> > +
> > + trips {
> > + /* threshold to start collecting temperature
> > + * statistics e.g. with the IPA governor
> > + */
> > + littlecore_alert0: littlecore-alert0 {
> > + temperature = <75000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + /* actual control temperature */
> > + littlecore_alert1: littlecore-alert1 {
> > + temperature = <85000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + littlecore_crit: littlecore-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + cooling-maps {
> > + map0 {
> > + trip = <&littlecore_alert1>;
> > + cooling-device =
> > + <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > + };
> > + };
> > + };
> > +
> > + /* sensor near the PD_CENTER power domain */
> > + center_thermal: center-thermal {
> > + polling-delay-passive = <0>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 4>;
> > +
> > + trips {
> > + center_crit: center-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + };
> > +
> > + gpu_thermal: gpu-thermal {
> > + polling-delay-passive = <0>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 5>;
> > +
> > + trips {
> > + gpu_crit: gpu-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + };
> > +
> > + npu_thermal: npu-thermal {
> > + polling-delay-passive = <0>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 6>;
> > +
> > + trips {
> > + npu_crit: npu-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + };
> > };
> >
> > saradc: adc@fec10000 {

2024-03-01 05:22:37

by Alexey Charkov

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] arm64: dts: rockchip: enable automatic active cooling on Rock 5B

On Fri, Mar 1, 2024 at 1:25 AM Dragan Simic <[email protected]> wrote:
>
> Hello Alexey,
>
> On 2024-02-29 20:26, Alexey Charkov wrote:
> > This links the PWM fan on Radxa Rock 5B as an active cooling device
> > managed automatically by the thermal subsystem, with a target SoC
> > temperature of 65C and a minimum-spin interval from 55C to 65C to
> > ensure airflow when the system gets warm
>
> I'd suggest that you replace "automatic active cooling" with "active
> cooling" in the patch subject. I know, it may seem like more of the
> unnecessary nitpicking, :) but I hope you'll agree that "automatic"
> is actually redundant there. It would also make the patch subject
> a bit shorter.
>
> Another option would be to replace "automatic active cooling" with
> "automatic fan control", which may actually be a better choice.
> I'd be happy with whichever one you prefer. :)

Sounds good to me, thanks!

> Otherwise, please feel free to add:
>
> Reviewed-by: Dragan Simic <[email protected]>

Thank you Dragan, much appreciated!

Best regards,
Alexey

2024-03-01 05:23:10

by Alexey Charkov

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On Fri, Mar 1, 2024 at 1:11 AM Dragan Simic <[email protected]> wrote:
>
> Hello Alexey,
>
> Please see also some nitpicks below, which I forgot to mention in
> my earlier response. I'm sorry for that.
>
> On 2024-02-29 20:26, Alexey Charkov wrote:
> > Include thermal zones information in device tree for RK3588 variants.
> >
> > This also enables the TSADC controller unconditionally on all boards
> > to ensure that thermal protections are in place via throttling and
> > emergency reset, once OPPs are added to enable CPU DVFS.
> >
> > The default settings (using CRU as the emergency reset mechanism)
> > should work on all boards regardless of their wiring, as CRU resets
> > do not depend on any external components. Boards that have the TSHUT
> > signal wired to the reset line of the PMIC may opt to switch to GPIO
> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
> >
> > It seems though that downstream kernels don't use that, even for
> > those boards where the wiring allows for GPIO based tshut, such as
> > Radxa Rock 5B [1], [2], [3]
> >
> > [1]
> > https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
> > [2]
> > https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
> > [3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf
> > page 11 (TSADC_SHUT_H)
> >
> > Signed-off-by: Alexey Charkov <[email protected]>
> > ---
> > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176
> > +++++++++++++++++++++++++++++-
> > 1 file changed, 175 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > index 36b1b7acfe6a..9bf197358642 100644
> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > @@ -10,6 +10,7 @@
> > #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> > #include <dt-bindings/phy/phy.h>
> > #include <dt-bindings/ata/ahci.h>
> > +#include <dt-bindings/thermal/thermal.h>
> >
> > / {
> > compatible = "rockchip,rk3588";
> > @@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
> > pinctrl-1 = <&tsadc_shut>;
> > pinctrl-names = "gpio", "otpout";
> > #thermal-sensor-cells = <1>;
> > - status = "disabled";
> > + status = "okay";
> > + };
> > +
> > + thermal_zones: thermal-zones {
> > + /* sensor near the center of the SoC */
> > + package_thermal: package-thermal {
> > + polling-delay-passive = <0>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 0>;
> > +
> > + trips {
> > + package_crit: package-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + };
> > +
> > + /* sensor between A76 cores 0 and 1 */
> > + bigcore0_thermal: bigcore0-thermal {
> > + polling-delay-passive = <100>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 1>;
> > +
> > + trips {
> > + /* threshold to start collecting temperature
> > + * statistics e.g. with the IPA governor
> > + */
>
> See, I'm not a native English speaker, but I've spent a lot of time
> and effort improving my English skills. Thus, perhaps these comments
> may or may not seem like unnecessary nitpicking, depending on how much
> someone pays attention to writing style in general, but I'll risk to
> be annoying and state these comments anyway. :)
>
> The comment above could be written in a much more condensed form like
> this, which would also be a bit more accurate:
>
>
> /* IPA threshold, when IPA governor is used */
>
> IOW, we're writing all this for someone to read later, but we should
> (and can) perfectly reasonably expect some already existing background
> knowledge from the readers. In other words, we should be as concise
> as possible.

In fact, the power allocation governor code itself doesn't call those
trips threshold or target as your suggested wording would imply.
Instead, it calls them "switch on temperature" and "maximum desired
temperature" [1]. Maybe we can call them that in the comments (and
also avoid calling the governor IPA, because upstream code only calls
it a "power allocator").

Best regards,
Alexey

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/thermal/gov_power_allocator.c#n483

> > + bigcore0_alert0: bigcore0-alert0 {
> > + temperature = <75000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + /* actual control temperature */
>
> Similarly to the above, I'd suggest this:
>
> /* IPA target, when IPA governor is used */
>
> Having such brief comments should make it all perfectly understandable
> to anyone who's already familiar with the way IPA governor works.
> Everyone
> else should be welcome to read up a bit on IPA first.
>
> > + bigcore0_alert1: bigcore0-alert1 {
> > + temperature = <85000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + bigcore0_crit: bigcore0-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + cooling-maps {
> > + map0 {
> > + trip = <&bigcore0_alert1>;
> > + cooling-device =
> > + <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > + };
> > + };
> > + };
> > +
> > + /* sensor between A76 cores 2 and 3 */
> > + bigcore2_thermal: bigcore2-thermal {
> > + polling-delay-passive = <100>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 2>;
> > +
> > + trips {
> > + /* threshold to start collecting temperature
> > + * statistics e.g. with the IPA governor
> > + */
>
> The same as above.
>
> > + bigcore2_alert0: bigcore2-alert0 {
> > + temperature = <75000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + /* actual control temperature */
>
> The same as above.
>
> > + bigcore2_alert1: bigcore2-alert1 {
> > + temperature = <85000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + bigcore2_crit: bigcore2-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + cooling-maps {
> > + map0 {
> > + trip = <&bigcore2_alert1>;
> > + cooling-device =
> > + <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > + };
> > + };
> > + };
> > +
> > + /* sensor between the four A55 cores */
> > + little_core_thermal: littlecore-thermal {
> > + polling-delay-passive = <100>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 3>;
> > +
> > + trips {
> > + /* threshold to start collecting temperature
> > + * statistics e.g. with the IPA governor
> > + */
>
> The same as above.
>
> > + littlecore_alert0: littlecore-alert0 {
> > + temperature = <75000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + /* actual control temperature */
>
> The same as above.
>
> > + littlecore_alert1: littlecore-alert1 {
> > + temperature = <85000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + littlecore_crit: littlecore-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + cooling-maps {
> > + map0 {
> > + trip = <&littlecore_alert1>;
> > + cooling-device =
> > + <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > + };
> > + };
> > + };
> > +
> > + /* sensor near the PD_CENTER power domain */
> > + center_thermal: center-thermal {
> > + polling-delay-passive = <0>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 4>;
> > +
> > + trips {
> > + center_crit: center-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + };
> > +
> > + gpu_thermal: gpu-thermal {
> > + polling-delay-passive = <0>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 5>;
> > +
> > + trips {
> > + gpu_crit: gpu-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + };
> > +
> > + npu_thermal: npu-thermal {
> > + polling-delay-passive = <0>;
> > + polling-delay = <0>;
> > + thermal-sensors = <&tsadc 6>;
> > +
> > + trips {
> > + npu_crit: npu-crit {
> > + temperature = <115000>;
> > + hysteresis = <0>;
> > + type = "critical";
> > + };
> > + };
> > + };
> > };
> >
> > saradc: adc@fec10000 {

2024-03-01 05:57:21

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On 2024-03-01 06:12, Alexey Charkov wrote:
> On Fri, Mar 1, 2024 at 12:21 AM Dragan Simic <[email protected]>
> wrote:
>> On 2024-02-29 20:26, Alexey Charkov wrote:
>> > Include thermal zones information in device tree for RK3588 variants.
>> >
>> > This also enables the TSADC controller unconditionally on all boards
>> > to ensure that thermal protections are in place via throttling and
>> > emergency reset, once OPPs are added to enable CPU DVFS.
>> >
>> > The default settings (using CRU as the emergency reset mechanism)
>> > should work on all boards regardless of their wiring, as CRU resets
>> > do not depend on any external components. Boards that have the TSHUT
>> > signal wired to the reset line of the PMIC may opt to switch to GPIO
>> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
>>
>> Quite frankly, I'm still not sure that enabling this on the SoC level
>> is the way to go. As I already described in detail, [4] according to
>> the RK3588 Hardware Design Guide v1.0 and the Rock 5B schematic, we
>> should actually use GPIO-based handling for the thermal runaways on
>> the Rock 5B. Other boards should also be investigated individually,
>> and the TSADC should be enabled on a board-to-board basis.
>
> With all due respect, I disagree, here is why:
> - Neither the schematic nor the hardware design guide, on which the
> schematic seems to be based, prescribes a particular way to handle
> thermal runaways. They only provide the possibility of GPIO based
> resets, along with the CRU based one

Please note that other documents from Rockchip also exist. Below is
a link to a screenshot from the Thermal developer guide, version 1.0,
which describes the whole thing further. I believe it's obvious that
the thermal runaway is to be treated as a board-level feature.

- https://i.imgur.com/IJ6dSAc.png

To be fair, that version of the Thermal developer guide dates back to
2019, meaning that it technically applies to the RK3399, for example,
but the TSADC and reset circuitry design has basically remained the
same for the RK3588.

> - My strong belief is that defaults (regardless of context) should be
> safe and reasonable, and should also minimize the need to override
> them

Please note that the TSADC is disabled in the RK3399 SoC dtsi, so having
it disabled in the RK3588(s) SoC dtsi would provide some consistency.
Though, the RK3399 still does it in a safe way, by moving the OPPs into
a separate dtsi file, named rk3399-opp.dtsi, which the board dts files
then include together with enabling the TSADC.

If you agree, let's employ the same approach for the RK3588(s), by
having
the its OPPs defined in a separate file, named rk3588s-opp.dtsi, etc.

> - In context of dts/dtsi, as far as I understand the general logic
> behind the split, the SoC .dtsi should contain all the things that are
> fully contained within the SoC and do not depend on the wiring of a
> particular board or its target use case. Boards then
> add/remove/override settings to match their wiring and use case more
> closely

Of course, but the thermal shutdown is obviously a board-level feature,
which I described further above.

> In the light of the last two points, I believe that enabling TSADC by
> default is the more safe and reasonable choice, because it provides
> crucial thermal protection logic for the SoC, and it can do so in a
> board-agnostic way (if the CRU based reset is selected, which is the
> current default).
>
> Furthermore, TSADC and CRU are fully contained within the SoC, and I
> cannot think of a use case where a board might be somehow
> disadvantaged by TSADC being enabled, and thus need to disable it
> altogether (maybe I'm missing something). The only thing that the
> board might be adjusting is the thermal reset handling, and even then
> it's rather a matter of choice/preference to switch away from CRU to
> GPIO resets where the wiring permits it, rather than an existential
> need. I presume that a PMIC-assisted reset causes deeper power cycling
> of the SoC and might therefore help in some rare cases where the CRU
> reset alone is not enough, but that would be niche.
>
> All summed up, I believe that the default of "fry my board if I have
> no heatsink and forget to include &tsadc {status = <okay>;}; in my
> .dts" is substantially inferior to the default of "my board could do a
> deep power-cycle in this weird corner-case thermal-runaway situation
> that somehow didn't get handled by active cooling, then by passive
> cooling, then by a CRU reset, but I didn't include
> rockchip,hw-tshut-mode = <1>; so poor luck for me".

Please see my comments above, regarding the separate dtsi for the OPPs.
I think it's a win-win, and I hope you'll agree.

>> [4]
>> https://lore.kernel.org/linux-rockchip/[email protected]/
>>
>> > It seems though that downstream kernels don't use that, even for
>> > those boards where the wiring allows for GPIO based tshut, such as
>> > Radxa Rock 5B [1], [2], [3]
>> >
>> > [1]
>> > https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
>> > [2]
>> > https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
>> > [3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf
>> > page 11 (TSADC_SHUT_H)
>> >
>> > Signed-off-by: Alexey Charkov <[email protected]>
>> > ---
>> > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176
>> > +++++++++++++++++++++++++++++-
>> > 1 file changed, 175 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > index 36b1b7acfe6a..9bf197358642 100644
>> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > @@ -10,6 +10,7 @@
>> > #include <dt-bindings/reset/rockchip,rk3588-cru.h>
>> > #include <dt-bindings/phy/phy.h>
>> > #include <dt-bindings/ata/ahci.h>
>> > +#include <dt-bindings/thermal/thermal.h>
>> >
>> > / {
>> > compatible = "rockchip,rk3588";
>> > @@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
>> > pinctrl-1 = <&tsadc_shut>;
>> > pinctrl-names = "gpio", "otpout";
>> > #thermal-sensor-cells = <1>;
>> > - status = "disabled";
>> > + status = "okay";
>> > + };
>> > +
>> > + thermal_zones: thermal-zones {
>> > + /* sensor near the center of the SoC */
>> > + package_thermal: package-thermal {
>> > + polling-delay-passive = <0>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 0>;
>> > +
>> > + trips {
>> > + package_crit: package-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor between A76 cores 0 and 1 */
>> > + bigcore0_thermal: bigcore0-thermal {
>> > + polling-delay-passive = <100>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 1>;
>> > +
>> > + trips {
>> > + /* threshold to start collecting temperature
>> > + * statistics e.g. with the IPA governor
>> > + */
>> > + bigcore0_alert0: bigcore0-alert0 {
>> > + temperature = <75000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + /* actual control temperature */
>> > + bigcore0_alert1: bigcore0-alert1 {
>> > + temperature = <85000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + bigcore0_crit: bigcore0-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + cooling-maps {
>> > + map0 {
>> > + trip = <&bigcore0_alert1>;
>> > + cooling-device =
>> > + <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor between A76 cores 2 and 3 */
>> > + bigcore2_thermal: bigcore2-thermal {
>> > + polling-delay-passive = <100>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 2>;
>> > +
>> > + trips {
>> > + /* threshold to start collecting temperature
>> > + * statistics e.g. with the IPA governor
>> > + */
>> > + bigcore2_alert0: bigcore2-alert0 {
>> > + temperature = <75000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + /* actual control temperature */
>> > + bigcore2_alert1: bigcore2-alert1 {
>> > + temperature = <85000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + bigcore2_crit: bigcore2-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + cooling-maps {
>> > + map0 {
>> > + trip = <&bigcore2_alert1>;
>> > + cooling-device =
>> > + <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor between the four A55 cores */
>> > + little_core_thermal: littlecore-thermal {
>> > + polling-delay-passive = <100>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 3>;
>> > +
>> > + trips {
>> > + /* threshold to start collecting temperature
>> > + * statistics e.g. with the IPA governor
>> > + */
>> > + littlecore_alert0: littlecore-alert0 {
>> > + temperature = <75000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + /* actual control temperature */
>> > + littlecore_alert1: littlecore-alert1 {
>> > + temperature = <85000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + littlecore_crit: littlecore-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + cooling-maps {
>> > + map0 {
>> > + trip = <&littlecore_alert1>;
>> > + cooling-device =
>> > + <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor near the PD_CENTER power domain */
>> > + center_thermal: center-thermal {
>> > + polling-delay-passive = <0>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 4>;
>> > +
>> > + trips {
>> > + center_crit: center-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + };
>> > +
>> > + gpu_thermal: gpu-thermal {
>> > + polling-delay-passive = <0>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 5>;
>> > +
>> > + trips {
>> > + gpu_crit: gpu-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + };
>> > +
>> > + npu_thermal: npu-thermal {
>> > + polling-delay-passive = <0>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 6>;
>> > +
>> > + trips {
>> > + npu_crit: npu-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + };
>> > };
>> >
>> > saradc: adc@fec10000 {
>
> _______________________________________________
> Linux-rockchip mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

2024-03-01 06:14:18

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On 2024-03-01 06:20, Alexey Charkov wrote:
> On Fri, Mar 1, 2024 at 1:11 AM Dragan Simic <[email protected]> wrote:
>> Please see also some nitpicks below, which I forgot to mention in
>> my earlier response. I'm sorry for that.
>>
>> On 2024-02-29 20:26, Alexey Charkov wrote:
>> > Include thermal zones information in device tree for RK3588 variants.
>> >
>> > This also enables the TSADC controller unconditionally on all boards
>> > to ensure that thermal protections are in place via throttling and
>> > emergency reset, once OPPs are added to enable CPU DVFS.
>> >
>> > The default settings (using CRU as the emergency reset mechanism)
>> > should work on all boards regardless of their wiring, as CRU resets
>> > do not depend on any external components. Boards that have the TSHUT
>> > signal wired to the reset line of the PMIC may opt to switch to GPIO
>> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
>> >
>> > It seems though that downstream kernels don't use that, even for
>> > those boards where the wiring allows for GPIO based tshut, such as
>> > Radxa Rock 5B [1], [2], [3]
>> >
>> > [1]
>> > https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
>> > [2]
>> > https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
>> > [3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf
>> > page 11 (TSADC_SHUT_H)
>> >
>> > Signed-off-by: Alexey Charkov <[email protected]>
>> > ---
>> > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176
>> > +++++++++++++++++++++++++++++-
>> > 1 file changed, 175 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > index 36b1b7acfe6a..9bf197358642 100644
>> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> > @@ -10,6 +10,7 @@
>> > #include <dt-bindings/reset/rockchip,rk3588-cru.h>
>> > #include <dt-bindings/phy/phy.h>
>> > #include <dt-bindings/ata/ahci.h>
>> > +#include <dt-bindings/thermal/thermal.h>
>> >
>> > / {
>> > compatible = "rockchip,rk3588";
>> > @@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
>> > pinctrl-1 = <&tsadc_shut>;
>> > pinctrl-names = "gpio", "otpout";
>> > #thermal-sensor-cells = <1>;
>> > - status = "disabled";
>> > + status = "okay";
>> > + };
>> > +
>> > + thermal_zones: thermal-zones {
>> > + /* sensor near the center of the SoC */
>> > + package_thermal: package-thermal {
>> > + polling-delay-passive = <0>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 0>;
>> > +
>> > + trips {
>> > + package_crit: package-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor between A76 cores 0 and 1 */
>> > + bigcore0_thermal: bigcore0-thermal {
>> > + polling-delay-passive = <100>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 1>;
>> > +
>> > + trips {
>> > + /* threshold to start collecting temperature
>> > + * statistics e.g. with the IPA governor
>> > + */
>>
>> See, I'm not a native English speaker, but I've spent a lot of time
>> and effort improving my English skills. Thus, perhaps these comments
>> may or may not seem like unnecessary nitpicking, depending on how much
>> someone pays attention to writing style in general, but I'll risk to
>> be annoying and state these comments anyway. :)
>>
>> The comment above could be written in a much more condensed form like
>> this, which would also be a bit more accurate:
>>
>>
>> /* IPA threshold, when IPA governor is
>> used */
>>
>> IOW, we're writing all this for someone to read later, but we should
>> (and can) perfectly reasonably expect some already existing background
>> knowledge from the readers. In other words, we should be as concise
>> as possible.
>
> In fact, the power allocation governor code itself doesn't call those
> trips threshold or target as your suggested wording would imply.
> Instead, it calls them "switch on temperature" and "maximum desired
> temperature" [1]. Maybe we can call them that in the comments (and
> also avoid calling the governor IPA, because upstream code only calls
> it a "power allocator").

Hmm, but "IPA" is still mentioned in exactly three places in the files
under drivers/thermal. I think that warrants the use of "IPA", which
is also widely used pretty much everywhere.

Perhaps a win-win would be to have only the very first of the comments
like this, to introduce "IPA" as an acronym:

/* Power allocator (IPA) thermal
governor */
/* switch-on point, when IPA governor
is used */

Next, "the target temperature" is mentioned more than a few times in
drivers/thermal/gov_power_allocator.c, which I believe makes the use
of "IPA target" perfectly valid. Actually, let's use "IPA target
temperature", if you agree, to make it self descriptive.

Finally, the threshold... Based on
drivers/thermal/gov_power_allocator.c,
I think "IPA switch-on point" would be a good choice, which I already
used above in the proposed opening comment.

> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/thermal/gov_power_allocator.c#n483
>
>> > + bigcore0_alert0: bigcore0-alert0 {
>> > + temperature = <75000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + /* actual control temperature */
>>
>> Similarly to the above, I'd suggest this:
>>
>> /* IPA target, when IPA governor is
>> used */
>>
>> Having such brief comments should make it all perfectly understandable
>> to anyone who's already familiar with the way IPA governor works.
>> Everyone else should be welcome to read up a bit on IPA first.
>>
>> > + bigcore0_alert1: bigcore0-alert1 {
>> > + temperature = <85000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + bigcore0_crit: bigcore0-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + cooling-maps {
>> > + map0 {
>> > + trip = <&bigcore0_alert1>;
>> > + cooling-device =
>> > + <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor between A76 cores 2 and 3 */
>> > + bigcore2_thermal: bigcore2-thermal {
>> > + polling-delay-passive = <100>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 2>;
>> > +
>> > + trips {
>> > + /* threshold to start collecting temperature
>> > + * statistics e.g. with the IPA governor
>> > + */
>>
>> The same as above.
>>
>> > + bigcore2_alert0: bigcore2-alert0 {
>> > + temperature = <75000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + /* actual control temperature */
>>
>> The same as above.
>>
>> > + bigcore2_alert1: bigcore2-alert1 {
>> > + temperature = <85000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + bigcore2_crit: bigcore2-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + cooling-maps {
>> > + map0 {
>> > + trip = <&bigcore2_alert1>;
>> > + cooling-device =
>> > + <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor between the four A55 cores */
>> > + little_core_thermal: littlecore-thermal {
>> > + polling-delay-passive = <100>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 3>;
>> > +
>> > + trips {
>> > + /* threshold to start collecting temperature
>> > + * statistics e.g. with the IPA governor
>> > + */
>>
>> The same as above.
>>
>> > + littlecore_alert0: littlecore-alert0 {
>> > + temperature = <75000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + /* actual control temperature */
>>
>> The same as above.
>>
>> > + littlecore_alert1: littlecore-alert1 {
>> > + temperature = <85000>;
>> > + hysteresis = <2000>;
>> > + type = "passive";
>> > + };
>> > + littlecore_crit: littlecore-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + cooling-maps {
>> > + map0 {
>> > + trip = <&littlecore_alert1>;
>> > + cooling-device =
>> > + <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>> > + <&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>> > + };
>> > + };
>> > + };
>> > +
>> > + /* sensor near the PD_CENTER power domain */
>> > + center_thermal: center-thermal {
>> > + polling-delay-passive = <0>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 4>;
>> > +
>> > + trips {
>> > + center_crit: center-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + };
>> > +
>> > + gpu_thermal: gpu-thermal {
>> > + polling-delay-passive = <0>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 5>;
>> > +
>> > + trips {
>> > + gpu_crit: gpu-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + };
>> > +
>> > + npu_thermal: npu-thermal {
>> > + polling-delay-passive = <0>;
>> > + polling-delay = <0>;
>> > + thermal-sensors = <&tsadc 6>;
>> > +
>> > + trips {
>> > + npu_crit: npu-crit {
>> > + temperature = <115000>;
>> > + hysteresis = <0>;
>> > + type = "critical";
>> > + };
>> > + };
>> > + };
>> > };
>> >
>> > saradc: adc@fec10000 {

2024-03-01 06:17:54

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] arm64: dts: rockchip: enable automatic active cooling on Rock 5B

On 2024-03-01 06:21, Alexey Charkov wrote:
> On Fri, Mar 1, 2024 at 1:25 AM Dragan Simic <[email protected]> wrote:
>> On 2024-02-29 20:26, Alexey Charkov wrote:
>> > This links the PWM fan on Radxa Rock 5B as an active cooling device
>> > managed automatically by the thermal subsystem, with a target SoC
>> > temperature of 65C and a minimum-spin interval from 55C to 65C to
>> > ensure airflow when the system gets warm
>>
>> I'd suggest that you replace "automatic active cooling" with "active
>> cooling" in the patch subject. I know, it may seem like more of the
>> unnecessary nitpicking, :) but I hope you'll agree that "automatic"
>> is actually redundant there. It would also make the patch subject
>> a bit shorter.
>>
>> Another option would be to replace "automatic active cooling" with
>> "automatic fan control", which may actually be a better choice.
>> I'd be happy with whichever one you prefer. :)
>
> Sounds good to me, thanks!

I'm glad that you like it. :)

>> Otherwise, please feel free to add:
>>
>> Reviewed-by: Dragan Simic <[email protected]>
>
> Thank you Dragan, much appreciated!

Thank you for putting up with my nitpicking. :)

2024-03-01 06:31:44

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 4/5] arm64: dts: rockchip: Add OPP data for CPU cores on RK3588

On 2024-02-29 20:26, Alexey Charkov wrote:
> By default the CPUs on RK3588 start up in a conservative performance
> mode. Add frequency and voltage mappings to the device tree to enable
> dynamic scaling via cpufreq.
>
> OPP values are adapted from Radxa's downstream kernel for Rock 5B [1],
> stripping them down to the minimum frequency and voltage combinations
> as expected by the generic upstream cpufreq-dt driver, and also
> dropping
> those OPPs that don't differ in voltage but only in frequency (keeping
> the top frequency OPP in each case).

Please, let's consider extracting the OPPs into a separate
rk3588s-opp.dtsi
file, as I already explained in detail and proposed in the message
linked
below. To be fair, it might also be seen as redundant, because the
RK3399
does that because of the need for different OPPs for different RK3399
SoC
variants, but it would make leaving the TSADC disabled 100% safe.

Though, the RK3328 SoC dtsi also leaves the TSADC disabled on the SoC
level
and enables it for each RK3328-based board, so I'm no longer sure do we
really need a separate rk3588s-opp.dtsi file to be on the 100% safe side
with the TSADC disabled on the RK3588(s) SoC level.

-
https://lore.kernel.org/linux-rockchip/[email protected]/

> Note that this patch ignores voltage scaling for the CPU memory
> interface which the downstream kernel does through a custom cpufreq
> driver, and which is why the downstream version has two sets of voltage
> values for each OPP (the second one being meant for the memory
> interface supply regulator). This is done instead via regulator
> coupling between CPU and memory interface supplies on affected boards.

I'm still digging through various documents, to find a more clear
explanation
of what those *_MEM_* voltages are exactly for. I'll reply in more
detail in
the respective patch thread, of course.

> This has been tested on Rock 5B with u-boot 2023.11 compiled from
> Collabora's integration tree [2] with binary bl31 and appears to be
> stable both under active cooling and passive cooling (with throttling)
>
> [1]
> https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> [2]
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot
>
> Signed-off-by: Alexey Charkov <[email protected]>
> ---
> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 122
> ++++++++++++++++++++++++++++++
> 1 file changed, 122 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> index 9bf197358642..bd39c5c47bfb 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> @@ -97,6 +97,7 @@ cpu_l0: cpu@0 {
> clocks = <&scmi_clk SCMI_CLK_CPUL>;
> assigned-clocks = <&scmi_clk SCMI_CLK_CPUL>;
> assigned-clock-rates = <816000000>;
> + operating-points-v2 = <&cluster0_opp_table>;
> cpu-idle-states = <&CPU_SLEEP>;
> i-cache-size = <32768>;
> i-cache-line-size = <64>;
> @@ -116,6 +117,7 @@ cpu_l1: cpu@100 {
> enable-method = "psci";
> capacity-dmips-mhz = <530>;
> clocks = <&scmi_clk SCMI_CLK_CPUL>;
> + operating-points-v2 = <&cluster0_opp_table>;
> cpu-idle-states = <&CPU_SLEEP>;
> i-cache-size = <32768>;
> i-cache-line-size = <64>;
> @@ -135,6 +137,7 @@ cpu_l2: cpu@200 {
> enable-method = "psci";
> capacity-dmips-mhz = <530>;
> clocks = <&scmi_clk SCMI_CLK_CPUL>;
> + operating-points-v2 = <&cluster0_opp_table>;
> cpu-idle-states = <&CPU_SLEEP>;
> i-cache-size = <32768>;
> i-cache-line-size = <64>;
> @@ -154,6 +157,7 @@ cpu_l3: cpu@300 {
> enable-method = "psci";
> capacity-dmips-mhz = <530>;
> clocks = <&scmi_clk SCMI_CLK_CPUL>;
> + operating-points-v2 = <&cluster0_opp_table>;
> cpu-idle-states = <&CPU_SLEEP>;
> i-cache-size = <32768>;
> i-cache-line-size = <64>;
> @@ -175,6 +179,7 @@ cpu_b0: cpu@400 {
> clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> assigned-clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> assigned-clock-rates = <816000000>;
> + operating-points-v2 = <&cluster1_opp_table>;
> cpu-idle-states = <&CPU_SLEEP>;
> i-cache-size = <65536>;
> i-cache-line-size = <64>;
> @@ -194,6 +199,7 @@ cpu_b1: cpu@500 {
> enable-method = "psci";
> capacity-dmips-mhz = <1024>;
> clocks = <&scmi_clk SCMI_CLK_CPUB01>;
> + operating-points-v2 = <&cluster1_opp_table>;
> cpu-idle-states = <&CPU_SLEEP>;
> i-cache-size = <65536>;
> i-cache-line-size = <64>;
> @@ -215,6 +221,7 @@ cpu_b2: cpu@600 {
> clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> assigned-clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> assigned-clock-rates = <816000000>;
> + operating-points-v2 = <&cluster2_opp_table>;
> cpu-idle-states = <&CPU_SLEEP>;
> i-cache-size = <65536>;
> i-cache-line-size = <64>;
> @@ -234,6 +241,7 @@ cpu_b3: cpu@700 {
> enable-method = "psci";
> capacity-dmips-mhz = <1024>;
> clocks = <&scmi_clk SCMI_CLK_CPUB23>;
> + operating-points-v2 = <&cluster2_opp_table>;
> cpu-idle-states = <&CPU_SLEEP>;
> i-cache-size = <65536>;
> i-cache-line-size = <64>;
> @@ -348,6 +356,120 @@ l3_cache: l3-cache {
> };
> };
>
> + cluster0_opp_table: opp-table-cluster0 {
> + compatible = "operating-points-v2";
> + opp-shared;
> +
> + opp-1008000000 {
> + opp-hz = /bits/ 64 <1008000000>;
> + opp-microvolt = <675000 675000 950000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1200000000 {
> + opp-hz = /bits/ 64 <1200000000>;
> + opp-microvolt = <712500 712500 950000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1416000000 {
> + opp-hz = /bits/ 64 <1416000000>;
> + opp-microvolt = <762500 762500 950000>;
> + clock-latency-ns = <40000>;
> + opp-suspend;
> + };
> + opp-1608000000 {
> + opp-hz = /bits/ 64 <1608000000>;
> + opp-microvolt = <850000 850000 950000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1800000000 {
> + opp-hz = /bits/ 64 <1800000000>;
> + opp-microvolt = <950000 950000 950000>;
> + clock-latency-ns = <40000>;
> + };
> + };
> +
> + cluster1_opp_table: opp-table-cluster1 {
> + compatible = "operating-points-v2";
> + opp-shared;
> +
> + opp-1200000000 {
> + opp-hz = /bits/ 64 <1200000000>;
> + opp-microvolt = <675000 675000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1416000000 {
> + opp-hz = /bits/ 64 <1416000000>;
> + opp-microvolt = <725000 725000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1608000000 {
> + opp-hz = /bits/ 64 <1608000000>;
> + opp-microvolt = <762500 762500 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1800000000 {
> + opp-hz = /bits/ 64 <1800000000>;
> + opp-microvolt = <850000 850000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-2016000000 {
> + opp-hz = /bits/ 64 <2016000000>;
> + opp-microvolt = <925000 925000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-2208000000 {
> + opp-hz = /bits/ 64 <2208000000>;
> + opp-microvolt = <987500 987500 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-2400000000 {
> + opp-hz = /bits/ 64 <2400000000>;
> + opp-microvolt = <1000000 1000000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + };
> +
> + cluster2_opp_table: opp-table-cluster2 {
> + compatible = "operating-points-v2";
> + opp-shared;
> +
> + opp-1200000000 {
> + opp-hz = /bits/ 64 <1200000000>;
> + opp-microvolt = <675000 675000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1416000000 {
> + opp-hz = /bits/ 64 <1416000000>;
> + opp-microvolt = <725000 725000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1608000000 {
> + opp-hz = /bits/ 64 <1608000000>;
> + opp-microvolt = <762500 762500 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1800000000 {
> + opp-hz = /bits/ 64 <1800000000>;
> + opp-microvolt = <850000 850000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-2016000000 {
> + opp-hz = /bits/ 64 <2016000000>;
> + opp-microvolt = <925000 925000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-2208000000 {
> + opp-hz = /bits/ 64 <2208000000>;
> + opp-microvolt = <987500 987500 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-2400000000 {
> + opp-hz = /bits/ 64 <2400000000>;
> + opp-microvolt = <1000000 1000000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + };
> +
> firmware {
> optee: optee {
> compatible = "linaro,optee-tz";

2024-03-01 06:36:28

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 5/5] arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs

On 2024-02-29 20:26, Alexey Charkov wrote:
> This introduces additional OPPs that share the same voltage as
> another OPP already present in the .dtsi but with lower frequency.
>
> The idea is to try and limit system throughput more gradually upon
> reaching the throttling condition for workloads that are close to
> sustainable power already, thus avoiding needless performance loss.
>
> My limited synthetic benchmarking [1] showed around 3.8% performance
> benefit when these are in place, other things equal (not meant to
> be comprehensive). Though dmesg complains about these OPPs being
> 'inefficient':

As I already promised, I'll perform additional testing, in a
reproducible
way, and come back with a detailed report.

> [ 9.009561] cpu cpu0: EM: OPP:816000 is inefficient
> [ 9.009580] cpu cpu0: EM: OPP:600000 is inefficient
> [ 9.009591] cpu cpu0: EM: OPP:408000 is inefficient
> [ 9.011370] cpu cpu4: EM: OPP:2352000 is inefficient
> [ 9.011379] cpu cpu4: EM: OPP:2304000 is inefficient
> [ 9.011384] cpu cpu4: EM: OPP:2256000 is inefficient
> [ 9.011389] cpu cpu4: EM: OPP:600000 is inefficient
> [ 9.011393] cpu cpu4: EM: OPP:408000 is inefficient
> [ 9.012978] cpu cpu6: EM: OPP:2352000 is inefficient
> [ 9.012987] cpu cpu6: EM: OPP:2304000 is inefficient
> [ 9.012992] cpu cpu6: EM: OPP:2256000 is inefficient
> [ 9.012996] cpu cpu6: EM: OPP:600000 is inefficient
> [ 9.013000] cpu cpu6: EM: OPP:408000 is inefficient
>
> [1]
> https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#me92aa0ee25e6eeb1d1501ce85f5af4e58b3b13c5
>
> Signed-off-by: Alexey Charkov <[email protected]>
> ---
> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 87
> +++++++++++++++++++++++++++++++
> 1 file changed, 87 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> index bd39c5c47bfb..6b4ecc7ab37d 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> @@ -360,6 +360,21 @@ cluster0_opp_table: opp-table-cluster0 {
> compatible = "operating-points-v2";
> opp-shared;
>
> + opp-408000000 {
> + opp-hz = /bits/ 64 <408000000>;
> + opp-microvolt = <675000 675000 950000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-600000000 {
> + opp-hz = /bits/ 64 <600000000>;
> + opp-microvolt = <675000 675000 950000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-816000000 {
> + opp-hz = /bits/ 64 <816000000>;
> + opp-microvolt = <675000 675000 950000>;
> + clock-latency-ns = <40000>;
> + };
> opp-1008000000 {
> opp-hz = /bits/ 64 <1008000000>;
> opp-microvolt = <675000 675000 950000>;
> @@ -392,6 +407,27 @@ cluster1_opp_table: opp-table-cluster1 {
> compatible = "operating-points-v2";
> opp-shared;
>
> + opp-408000000 {
> + opp-hz = /bits/ 64 <408000000>;
> + opp-microvolt = <675000 675000 1000000>;
> + clock-latency-ns = <40000>;
> + opp-suspend;
> + };
> + opp-600000000 {
> + opp-hz = /bits/ 64 <600000000>;
> + opp-microvolt = <675000 675000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-816000000 {
> + opp-hz = /bits/ 64 <816000000>;
> + opp-microvolt = <675000 675000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1008000000 {
> + opp-hz = /bits/ 64 <1008000000>;
> + opp-microvolt = <675000 675000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> opp-1200000000 {
> opp-hz = /bits/ 64 <1200000000>;
> opp-microvolt = <675000 675000 1000000>;
> @@ -422,6 +458,21 @@ opp-2208000000 {
> opp-microvolt = <987500 987500 1000000>;
> clock-latency-ns = <40000>;
> };
> + opp-2256000000 {
> + opp-hz = /bits/ 64 <2256000000>;
> + opp-microvolt = <1000000 1000000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-2304000000 {
> + opp-hz = /bits/ 64 <2304000000>;
> + opp-microvolt = <1000000 1000000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-2352000000 {
> + opp-hz = /bits/ 64 <2352000000>;
> + opp-microvolt = <1000000 1000000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> opp-2400000000 {
> opp-hz = /bits/ 64 <2400000000>;
> opp-microvolt = <1000000 1000000 1000000>;
> @@ -433,6 +484,27 @@ cluster2_opp_table: opp-table-cluster2 {
> compatible = "operating-points-v2";
> opp-shared;
>
> + opp-408000000 {
> + opp-hz = /bits/ 64 <408000000>;
> + opp-microvolt = <675000 675000 1000000>;
> + clock-latency-ns = <40000>;
> + opp-suspend;
> + };
> + opp-600000000 {
> + opp-hz = /bits/ 64 <600000000>;
> + opp-microvolt = <675000 675000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-816000000 {
> + opp-hz = /bits/ 64 <816000000>;
> + opp-microvolt = <675000 675000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-1008000000 {
> + opp-hz = /bits/ 64 <1008000000>;
> + opp-microvolt = <675000 675000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> opp-1200000000 {
> opp-hz = /bits/ 64 <1200000000>;
> opp-microvolt = <675000 675000 1000000>;
> @@ -463,6 +535,21 @@ opp-2208000000 {
> opp-microvolt = <987500 987500 1000000>;
> clock-latency-ns = <40000>;
> };
> + opp-2256000000 {
> + opp-hz = /bits/ 64 <2256000000>;
> + opp-microvolt = <1000000 1000000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-2304000000 {
> + opp-hz = /bits/ 64 <2304000000>;
> + opp-microvolt = <1000000 1000000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> + opp-2352000000 {
> + opp-hz = /bits/ 64 <2352000000>;
> + opp-microvolt = <1000000 1000000 1000000>;
> + clock-latency-ns = <40000>;
> + };
> opp-2400000000 {
> opp-hz = /bits/ 64 <2400000000>;
> opp-microvolt = <1000000 1000000 1000000>;

2024-03-01 07:54:08

by Alexey Charkov

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On Fri, Mar 1, 2024 at 10:14 AM Dragan Simic <[email protected]> wrote:
>
> On 2024-03-01 06:20, Alexey Charkov wrote:
> > On Fri, Mar 1, 2024 at 1:11 AM Dragan Simic <[email protected]> wrote:
> >> Please see also some nitpicks below, which I forgot to mention in
> >> my earlier response. I'm sorry for that.
> >>
> >> On 2024-02-29 20:26, Alexey Charkov wrote:
> >> > Include thermal zones information in device tree for RK3588 variants.
> >> >
> >> > This also enables the TSADC controller unconditionally on all boards
> >> > to ensure that thermal protections are in place via throttling and
> >> > emergency reset, once OPPs are added to enable CPU DVFS.
> >> >
> >> > The default settings (using CRU as the emergency reset mechanism)
> >> > should work on all boards regardless of their wiring, as CRU resets
> >> > do not depend on any external components. Boards that have the TSHUT
> >> > signal wired to the reset line of the PMIC may opt to switch to GPIO
> >> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
> >> >
> >> > It seems though that downstream kernels don't use that, even for
> >> > those boards where the wiring allows for GPIO based tshut, such as
> >> > Radxa Rock 5B [1], [2], [3]
> >> >
> >> > [1]
> >> > https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
> >> > [2]
> >> > https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
> >> > [3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf
> >> > page 11 (TSADC_SHUT_H)
> >> >
> >> > Signed-off-by: Alexey Charkov <[email protected]>
> >> > ---
> >> > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176
> >> > +++++++++++++++++++++++++++++-
> >> > 1 file changed, 175 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> > b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> > index 36b1b7acfe6a..9bf197358642 100644
> >> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> >> > @@ -10,6 +10,7 @@
> >> > #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> >> > #include <dt-bindings/phy/phy.h>
> >> > #include <dt-bindings/ata/ahci.h>
> >> > +#include <dt-bindings/thermal/thermal.h>
> >> >
> >> > / {
> >> > compatible = "rockchip,rk3588";
> >> > @@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
> >> > pinctrl-1 = <&tsadc_shut>;
> >> > pinctrl-names = "gpio", "otpout";
> >> > #thermal-sensor-cells = <1>;
> >> > - status = "disabled";
> >> > + status = "okay";
> >> > + };
> >> > +
> >> > + thermal_zones: thermal-zones {
> >> > + /* sensor near the center of the SoC */
> >> > + package_thermal: package-thermal {
> >> > + polling-delay-passive = <0>;
> >> > + polling-delay = <0>;
> >> > + thermal-sensors = <&tsadc 0>;
> >> > +
> >> > + trips {
> >> > + package_crit: package-crit {
> >> > + temperature = <115000>;
> >> > + hysteresis = <0>;
> >> > + type = "critical";
> >> > + };
> >> > + };
> >> > + };
> >> > +
> >> > + /* sensor between A76 cores 0 and 1 */
> >> > + bigcore0_thermal: bigcore0-thermal {
> >> > + polling-delay-passive = <100>;
> >> > + polling-delay = <0>;
> >> > + thermal-sensors = <&tsadc 1>;
> >> > +
> >> > + trips {
> >> > + /* threshold to start collecting temperature
> >> > + * statistics e.g. with the IPA governor
> >> > + */
> >>
> >> See, I'm not a native English speaker, but I've spent a lot of time
> >> and effort improving my English skills. Thus, perhaps these comments
> >> may or may not seem like unnecessary nitpicking, depending on how much
> >> someone pays attention to writing style in general, but I'll risk to
> >> be annoying and state these comments anyway. :)
> >>
> >> The comment above could be written in a much more condensed form like
> >> this, which would also be a bit more accurate:
> >>
> >>
> >> /* IPA threshold, when IPA governor is
> >> used */
> >>
> >> IOW, we're writing all this for someone to read later, but we should
> >> (and can) perfectly reasonably expect some already existing background
> >> knowledge from the readers. In other words, we should be as concise
> >> as possible.
> >
> > In fact, the power allocation governor code itself doesn't call those
> > trips threshold or target as your suggested wording would imply.
> > Instead, it calls them "switch on temperature" and "maximum desired
> > temperature" [1]. Maybe we can call them that in the comments (and
> > also avoid calling the governor IPA, because upstream code only calls
> > it a "power allocator").
>
> Hmm, but "IPA" is still mentioned in exactly three places in the files
> under drivers/thermal. I think that warrants the use of "IPA", which
> is also widely used pretty much everywhere.
>
> Perhaps a win-win would be to have only the very first of the comments
> like this, to introduce "IPA" as an acronym:
>
> /* Power allocator (IPA) thermal
> governor */
> /* switch-on point, when IPA governor
> is used */

Yes, good point, thanks!

> Next, "the target temperature" is mentioned more than a few times in
> drivers/thermal/gov_power_allocator.c, which I believe makes the use
> of "IPA target" perfectly valid. Actually, let's use "IPA target
> temperature", if you agree, to make it self descriptive.

Or perhaps simply "target temperature"? Stepwise governor will also
use this trip as its target, so it's not IPA specific, unlike the
switch-on point.

> Finally, the threshold... Based on
> drivers/thermal/gov_power_allocator.c,
> I think "IPA switch-on point" would be a good choice, which I already
> used above in the proposed opening comment.

Agreed, that sounds good to me, will reflect in the next iteration.
Thanks for bringing it up!

Best,
Alexey

2024-03-01 08:13:49

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588

On 2024-02-29 20:26, Alexey Charkov wrote:
> RK3588 chips allow for their CPU cores to be powered by a different
> supply vs. their corresponding memory interfaces, and two of the
> boards currently upstream do that (EVB1 and QuartzPro64).

The only reasonable explanation, based on the Cortex-A55 and Cortex-A76
technical reference manuals (TRMs), and some other documents, including
the RK3588 hardware design guide (HDG), is that the VDD_CPU_BIG0_MEM_S0,
VDD_CPU_BIG1_MEM_S0 and VDD_CPU_LIT_MEM_S0 voltages are internally
used as the supplies for the SRAM used for the A76's and A55's L1 and
L2 caches, which are both per-core and private in the DynamIQ SoC layout
that the RK3588 is based on.

Sure, using "MEM" there is confusing, but actually, the Cortex-A55 and
Cortex-A76 refer to the L1 and L2 caches as "memory" in multiple places.
I'd say that's the reason for "MEM" (and "memory", in the RK3588 HDG) to
be used in the board schematics (and in the RK3588 HDG).

The RK3588 HDG specifically allows what the Rock 5B does there, i.e. to
basically short the RK3588's individual *_MEM_S0 power inputs to the
respective CPU core power supplies, which avoids the need to use
separate
voltage regulators for the RK3588's *_MEM_S0 power inputs.

However, I'd really, _really_ love to know why did Rockchip opt to make
the power supply voltages separate for the RK3588's L1 and L2 caches,
which are, BTW, rated for up to 100 mA for each *_MEM_S0 input, meaning
that they present no large loads? All that under the assumption that
my analysis is correct, of course.

> The voltage of the memory interface though has to match that of the
> CPU cores that use it, which downstream kernels achieve by the means
> of a custom cpufreq driver which adjusts both at the same time.
>
> It seems that regulator coupling is a more appropriate generic
> interface for it, so this patch introduces coupling to affected
> device trees to ensure that memory interface voltage is also updated
> whenever cpufreq switches between CPU OPPs.

I'll verify this a bit later and provide a separate response.

> Note that other boards, such as Radxa Rock 5B, define both the CPU
> and memory interface regulators as aliases to the same DT node, so
> this doesn't apply there.

Yup, they're actually shorted on the Rock 5B, as I described above.

> Signed-off-by: Alexey Charkov <[email protected]>
> ---
> arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 ++++++++++++
> arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 ++++++++++++
> 2 files changed, 24 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
> b/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
> index de30c2632b8e..dfae67f1e9c7 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
> +++ b/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
> @@ -788,6 +788,8 @@ regulators {
> vdd_cpu_big1_s0: dcdc-reg1 {
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_big1_mem_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <550000>;
> regulator-max-microvolt = <1050000>;
> regulator-ramp-delay = <12500>;
> @@ -800,6 +802,8 @@ regulator-state-mem {
> vdd_cpu_big0_s0: dcdc-reg2 {
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_big0_mem_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <550000>;
> regulator-max-microvolt = <1050000>;
> regulator-ramp-delay = <12500>;
> @@ -812,6 +816,8 @@ regulator-state-mem {
> vdd_cpu_lit_s0: dcdc-reg3 {
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_lit_mem_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <550000>;
> regulator-max-microvolt = <950000>;
> regulator-ramp-delay = <12500>;
> @@ -836,6 +842,8 @@ regulator-state-mem {
> vdd_cpu_big1_mem_s0: dcdc-reg5 {
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_big1_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <675000>;
> regulator-max-microvolt = <1050000>;
> regulator-ramp-delay = <12500>;
> @@ -849,6 +857,8 @@ regulator-state-mem {
> vdd_cpu_big0_mem_s0: dcdc-reg6 {
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_big0_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <675000>;
> regulator-max-microvolt = <1050000>;
> regulator-ramp-delay = <12500>;
> @@ -873,6 +883,8 @@ regulator-state-mem {
> vdd_cpu_lit_mem_s0: dcdc-reg8 {
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_lit_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <675000>;
> regulator-max-microvolt = <950000>;
> regulator-ramp-delay = <12500>;
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
> b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
> index 87a0abf95f7d..9c038450cd7c 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
> +++ b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
> @@ -818,6 +818,8 @@ vdd_cpu_big1_s0: dcdc-reg1 {
> regulator-name = "vdd_cpu_big1_s0";
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_big1_mem_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <550000>;
> regulator-max-microvolt = <1050000>;
> regulator-ramp-delay = <12500>;
> @@ -831,6 +833,8 @@ vdd_cpu_big0_s0: dcdc-reg2 {
> regulator-name = "vdd_cpu_big0_s0";
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_big0_mem_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <550000>;
> regulator-max-microvolt = <1050000>;
> regulator-ramp-delay = <12500>;
> @@ -844,6 +848,8 @@ vdd_cpu_lit_s0: dcdc-reg3 {
> regulator-name = "vdd_cpu_lit_s0";
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_lit_mem_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <550000>;
> regulator-max-microvolt = <950000>;
> regulator-ramp-delay = <12500>;
> @@ -870,6 +876,8 @@ vdd_cpu_big1_mem_s0: dcdc-reg5 {
> regulator-name = "vdd_cpu_big1_mem_s0";
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_big1_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <675000>;
> regulator-max-microvolt = <1050000>;
> regulator-ramp-delay = <12500>;
> @@ -884,6 +892,8 @@ vdd_cpu_big0_mem_s0: dcdc-reg6 {
> regulator-name = "vdd_cpu_big0_mem_s0";
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_big0_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <675000>;
> regulator-max-microvolt = <1050000>;
> regulator-ramp-delay = <12500>;
> @@ -910,6 +920,8 @@ vdd_cpu_lit_mem_s0: dcdc-reg8 {
> regulator-name = "vdd_cpu_lit_mem_s0";
> regulator-always-on;
> regulator-boot-on;
> + regulator-coupled-with = <&vdd_cpu_lit_s0>;
> + regulator-coupled-max-spread = <10000>;
> regulator-min-microvolt = <675000>;
> regulator-max-microvolt = <950000>;
> regulator-ramp-delay = <12500>;

2024-03-01 08:22:00

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On 2024-03-01 08:51, Alexey Charkov wrote:
> On Fri, Mar 1, 2024 at 10:14 AM Dragan Simic <[email protected]>
> wrote:
>> On 2024-03-01 06:20, Alexey Charkov wrote:
>> > On Fri, Mar 1, 2024 at 1:11 AM Dragan Simic <[email protected]> wrote:
>> >> See, I'm not a native English speaker, but I've spent a lot of time
>> >> and effort improving my English skills. Thus, perhaps these comments
>> >> may or may not seem like unnecessary nitpicking, depending on how much
>> >> someone pays attention to writing style in general, but I'll risk to
>> >> be annoying and state these comments anyway. :)
>> >>
>> >> The comment above could be written in a much more condensed form like
>> >> this, which would also be a bit more accurate:
>> >>
>> >>
>> >> /* IPA threshold, when IPA governor is
>> >> used */
>> >>
>> >> IOW, we're writing all this for someone to read later, but we should
>> >> (and can) perfectly reasonably expect some already existing background
>> >> knowledge from the readers. In other words, we should be as concise
>> >> as possible.
>> >
>> > In fact, the power allocation governor code itself doesn't call those
>> > trips threshold or target as your suggested wording would imply.
>> > Instead, it calls them "switch on temperature" and "maximum desired
>> > temperature" [1]. Maybe we can call them that in the comments (and
>> > also avoid calling the governor IPA, because upstream code only calls
>> > it a "power allocator").
>>
>> Hmm, but "IPA" is still mentioned in exactly three places in the files
>> under drivers/thermal. I think that warrants the use of "IPA", which
>> is also widely used pretty much everywhere.
>>
>> Perhaps a win-win would be to have only the very first of the comments
>> like this, to introduce "IPA" as an acronym:
>>
>> /* Power allocator (IPA) thermal
>> governor */
>> /* switch-on point, when IPA
>> governor
>> is used */
>
> Yes, good point, thanks!

I'm glad that you agree. :)

>> Next, "the target temperature" is mentioned more than a few times in
>> drivers/thermal/gov_power_allocator.c, which I believe makes the use
>> of "IPA target" perfectly valid. Actually, let's use "IPA target
>> temperature", if you agree, to make it self descriptive.
>
> Or perhaps simply "target temperature"? Stepwise governor will also
> use this trip as its target, so it's not IPA specific, unlike the
> switch-on point.

I also had similar thoughts about the shared nature. I agree, just
"/* target temperature */" would be fine.

>> Finally, the threshold... Based on
>> drivers/thermal/gov_power_allocator.c,
>> I think "IPA switch-on point" would be a good choice, which I already
>> used above in the proposed opening comment.
>
> Agreed, that sounds good to me, will reflect in the next iteration.
> Thanks for bringing it up!

Great, thanks!

2024-03-01 08:25:19

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] arm64: dts: rockchip: enable automatic active cooling on Rock 5B

On 2024-03-01 07:17, Dragan Simic wrote:
> On 2024-03-01 06:21, Alexey Charkov wrote:
>> On Fri, Mar 1, 2024 at 1:25 AM Dragan Simic <[email protected]>
>> wrote:
>>> On 2024-02-29 20:26, Alexey Charkov wrote:
>>> > This links the PWM fan on Radxa Rock 5B as an active cooling device
>>> > managed automatically by the thermal subsystem, with a target SoC
>>> > temperature of 65C and a minimum-spin interval from 55C to 65C to
>>> > ensure airflow when the system gets warm
>>>
>>> I'd suggest that you replace "automatic active cooling" with "active
>>> cooling" in the patch subject. I know, it may seem like more of the
>>> unnecessary nitpicking, :) but I hope you'll agree that "automatic"
>>> is actually redundant there. It would also make the patch subject
>>> a bit shorter.
>>>
>>> Another option would be to replace "automatic active cooling" with
>>> "automatic fan control", which may actually be a better choice.
>>> I'd be happy with whichever one you prefer. :)
>>
>> Sounds good to me, thanks!
>
> I'm glad that you like it. :)
>
>>> Otherwise, please feel free to add:
>>>
>>> Reviewed-by: Dragan Simic <[email protected]>
>>
>> Thank you Dragan, much appreciated!
>
> Thank you for putting up with my nitpicking. :)

Perhaps the following tag would also be deserved for this patch:

Helped-by: Dragan Simic <[email protected]>

I hope you agree. :)

2024-03-01 08:27:44

by Alexey Charkov

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <[email protected]> wrote:
>
> On 2024-03-01 06:12, Alexey Charkov wrote:
> > On Fri, Mar 1, 2024 at 12:21 AM Dragan Simic <[email protected]>
> > wrote:
> >> On 2024-02-29 20:26, Alexey Charkov wrote:
> >> > Include thermal zones information in device tree for RK3588 variants.
> >> >
> >> > This also enables the TSADC controller unconditionally on all boards
> >> > to ensure that thermal protections are in place via throttling and
> >> > emergency reset, once OPPs are added to enable CPU DVFS.
> >> >
> >> > The default settings (using CRU as the emergency reset mechanism)
> >> > should work on all boards regardless of their wiring, as CRU resets
> >> > do not depend on any external components. Boards that have the TSHUT
> >> > signal wired to the reset line of the PMIC may opt to switch to GPIO
> >> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
> >>
> >> Quite frankly, I'm still not sure that enabling this on the SoC level
> >> is the way to go. As I already described in detail, [4] according to
> >> the RK3588 Hardware Design Guide v1.0 and the Rock 5B schematic, we
> >> should actually use GPIO-based handling for the thermal runaways on
> >> the Rock 5B. Other boards should also be investigated individually,
> >> and the TSADC should be enabled on a board-to-board basis.
> >
> > With all due respect, I disagree, here is why:
> > - Neither the schematic nor the hardware design guide, on which the
> > schematic seems to be based, prescribes a particular way to handle
> > thermal runaways. They only provide the possibility of GPIO based
> > resets, along with the CRU based one
>
> Please note that other documents from Rockchip also exist. Below is
> a link to a screenshot from the Thermal developer guide, version 1.0,
> which describes the whole thing further. I believe it's obvious that
> the thermal runaway is to be treated as a board-level feature.
>
> - https://i.imgur.com/IJ6dSAc.png

Frankly, that still doesn't make TSADC per se a board-level thing IMO.
The only thing that is board-level is the wiring of GPIO based resets,
which I fully agree should go to board .dts for boards that support
it, but that's not part of the current defaults and can be safely
added later.

TSADC is inside the SoC. CRU is inside the SoC. They work just fine
for a thermal reset, even if no dedicated reset logic is wired on the
board. I really don't see any downsides in having TSADC enabled by
default with CRU based resets:
- it's a safe default (i.e. I cannot think of any configuration or use
case where enabled-by-default TSADC does any harm)
- it's safer than accidentally forgetting to enable TSADC (as it adds
thermal protection which is otherwise missing)
- it will work on all boards (even if it doesn't utilize the full
hardware functionality by ignoring GPIO resets that some boards also
have in addition to the CRU)
- and it requires fewer overrides in board .dts files

Sounds like a no-regret move to me.

> To be fair, that version of the Thermal developer guide dates back to
> 2019, meaning that it technically applies to the RK3399, for example,
> but the TSADC and reset circuitry design has basically remained the
> same for the RK3588.
>
> > - My strong belief is that defaults (regardless of context) should be
> > safe and reasonable, and should also minimize the need to override
> > them
>
> Please note that the TSADC is disabled in the RK3399 SoC dtsi, so having
> it disabled in the RK3588(s) SoC dtsi would provide some consistency.

I'm happy to produce a patch to reverse the logic in RK3399 (and any
others for that matter) to also have TSADC enabled by default there,
thus saving several lines of code, if it's just about consistency.

> Though, the RK3399 still does it in a safe way, by moving the OPPs into
> a separate dtsi file, named rk3399-opp.dtsi, which the board dts files
> then include together with enabling the TSADC.
>
> If you agree, let's employ the same approach for the RK3588(s), by
> having
> the its OPPs defined in a separate file, named rk3588s-opp.dtsi, etc.

Separate file for OPPs is a good no-regret move to declutter the SoC
level .dtsi (as the OPP table is long and boring) - happy to move it
regardless of the outcome of the above TSADC discussion. Thanks for
the pointer!

> > - In context of dts/dtsi, as far as I understand the general logic
> > behind the split, the SoC .dtsi should contain all the things that are
> > fully contained within the SoC and do not depend on the wiring of a
> > particular board or its target use case. Boards then
> > add/remove/override settings to match their wiring and use case more
> > closely
>
> Of course, but the thermal shutdown is obviously a board-level feature,
> which I described further above.

Not so obvious to me :-) I don't mean to be stubborn or uncooperative
here, but I really can't find any technical merit in having it enabled
at board level instead of SoC level.

Switching to PMIC-assisted resets is one thing - it definitely should
go to board files, as it depends on the specific wiring of the
TSADC_SHUT signal. Enabling TSADC in a default configuration that can
and will work on all boards regardless of their wiring is another
thing. I'm just arguing for the latter.

To me it seems similar to the watchdog timer situation: we enable it
at the SoC level [1], as it is expected to work in its default
configuration regardless of the board wiring, and it provides
protection against system malfunctions. Doesn't matter if the board or
its userspace code ends up using the full functionality - it just sits
there waiting for its spotlight without hurting anybody.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#n1872

Best regards,
Alexey

2024-03-01 08:30:45

by Alexey Charkov

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] arm64: dts: rockchip: enable automatic active cooling on Rock 5B

On Fri, Mar 1, 2024 at 12:25 PM Dragan Simic <[email protected]> wrote:
>
> On 2024-03-01 07:17, Dragan Simic wrote:
> > On 2024-03-01 06:21, Alexey Charkov wrote:
> >> On Fri, Mar 1, 2024 at 1:25 AM Dragan Simic <[email protected]>
> >> wrote:
> >>> On 2024-02-29 20:26, Alexey Charkov wrote:
> >>> > This links the PWM fan on Radxa Rock 5B as an active cooling device
> >>> > managed automatically by the thermal subsystem, with a target SoC
> >>> > temperature of 65C and a minimum-spin interval from 55C to 65C to
> >>> > ensure airflow when the system gets warm
> >>>
> >>> I'd suggest that you replace "automatic active cooling" with "active
> >>> cooling" in the patch subject. I know, it may seem like more of the
> >>> unnecessary nitpicking, :) but I hope you'll agree that "automatic"
> >>> is actually redundant there. It would also make the patch subject
> >>> a bit shorter.
> >>>
> >>> Another option would be to replace "automatic active cooling" with
> >>> "automatic fan control", which may actually be a better choice.
> >>> I'd be happy with whichever one you prefer. :)
> >>
> >> Sounds good to me, thanks!
> >
> > I'm glad that you like it. :)
> >
> >>> Otherwise, please feel free to add:
> >>>
> >>> Reviewed-by: Dragan Simic <[email protected]>
> >>
> >> Thank you Dragan, much appreciated!
> >
> > Thank you for putting up with my nitpicking. :)
>
> Perhaps the following tag would also be deserved for this patch:
>
> Helped-by: Dragan Simic <[email protected]>
>
> I hope you agree. :)

Definitely! Thanks again for your feedback and contribution!

Best regards,
Alexey

2024-03-01 08:52:58

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On 2024-03-01 09:25, Alexey Charkov wrote:
> On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <[email protected]> wrote:
>> On 2024-03-01 06:12, Alexey Charkov wrote:
>> > On Fri, Mar 1, 2024 at 12:21 AM Dragan Simic <[email protected]>
>> > wrote:
>> >> On 2024-02-29 20:26, Alexey Charkov wrote:
>> >> > Include thermal zones information in device tree for RK3588 variants.
>> >> >
>> >> > This also enables the TSADC controller unconditionally on all boards
>> >> > to ensure that thermal protections are in place via throttling and
>> >> > emergency reset, once OPPs are added to enable CPU DVFS.
>> >> >
>> >> > The default settings (using CRU as the emergency reset mechanism)
>> >> > should work on all boards regardless of their wiring, as CRU resets
>> >> > do not depend on any external components. Boards that have the TSHUT
>> >> > signal wired to the reset line of the PMIC may opt to switch to GPIO
>> >> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
>> >>
>> >> Quite frankly, I'm still not sure that enabling this on the SoC level
>> >> is the way to go. As I already described in detail, [4] according to
>> >> the RK3588 Hardware Design Guide v1.0 and the Rock 5B schematic, we
>> >> should actually use GPIO-based handling for the thermal runaways on
>> >> the Rock 5B. Other boards should also be investigated individually,
>> >> and the TSADC should be enabled on a board-to-board basis.
>> >
>> > With all due respect, I disagree, here is why:
>> > - Neither the schematic nor the hardware design guide, on which the
>> > schematic seems to be based, prescribes a particular way to handle
>> > thermal runaways. They only provide the possibility of GPIO based
>> > resets, along with the CRU based one
>>
>> Please note that other documents from Rockchip also exist. Below is
>> a link to a screenshot from the Thermal developer guide, version 1.0,
>> which describes the whole thing further. I believe it's obvious that
>> the thermal runaway is to be treated as a board-level feature.
>>
>> - https://i.imgur.com/IJ6dSAc.png
>
> Frankly, that still doesn't make TSADC per se a board-level thing IMO.
> The only thing that is board-level is the wiring of GPIO based resets,
> which I fully agree should go to board .dts for boards that support
> it, but that's not part of the current defaults and can be safely
> added later.
>
> TSADC is inside the SoC. CRU is inside the SoC. They work just fine
> for a thermal reset, even if no dedicated reset logic is wired on the
> board. I really don't see any downsides in having TSADC enabled by
> default with CRU based resets:
> - it's a safe default (i.e. I cannot think of any configuration or use
> case where enabled-by-default TSADC does any harm)
> - it's safer than accidentally forgetting to enable TSADC (as it adds
> thermal protection which is otherwise missing)
> - it will work on all boards (even if it doesn't utilize the full
> hardware functionality by ignoring GPIO resets that some boards also
> have in addition to the CRU)
> - and it requires fewer overrides in board .dts files
>
> Sounds like a no-regret move to me.

Please see my comments below.

>> To be fair, that version of the Thermal developer guide dates back to
>> 2019, meaning that it technically applies to the RK3399, for example,
>> but the TSADC and reset circuitry design has basically remained the
>> same for the RK3588.
>>
>> > - My strong belief is that defaults (regardless of context) should be
>> > safe and reasonable, and should also minimize the need to override
>> > them
>>
>> Please note that the TSADC is disabled in the RK3399 SoC dtsi, so
>> having
>> it disabled in the RK3588(s) SoC dtsi would provide some consistency.
>
> I'm happy to produce a patch to reverse the logic in RK3399 (and any
> others for that matter) to also have TSADC enabled by default there,
> thus saving several lines of code, if it's just about consistency.

But why should we change something that has served us for years, on
multiple SoCs, with zero troubles and with (AFAIK) zero boards producing
puffs of bluish smoke?

>> Though, the RK3399 still does it in a safe way, by moving the OPPs
>> into
>> a separate dtsi file, named rk3399-opp.dtsi, which the board dts files
>> then include together with enabling the TSADC.
>>
>> If you agree, let's employ the same approach for the RK3588(s), by
>> having
>> the its OPPs defined in a separate file, named rk3588s-opp.dtsi, etc.
>
> Separate file for OPPs is a good no-regret move to declutter the SoC
> level .dtsi (as the OPP table is long and boring) - happy to move it
> regardless of the outcome of the above TSADC discussion. Thanks for
> the pointer!

Yeah, but I'm not sure that everyone would like that kind of separation.
In fact, such separation may be frowned upon unless it's necessary.

As I already described in another thread, the separation for the RK3399
is there only because a couple of different variants of the RK3399 SoC
require different OPPs.

>> > - In context of dts/dtsi, as far as I understand the general logic
>> > behind the split, the SoC .dtsi should contain all the things that are
>> > fully contained within the SoC and do not depend on the wiring of a
>> > particular board or its target use case. Boards then
>> > add/remove/override settings to match their wiring and use case more
>> > closely
>>
>> Of course, but the thermal shutdown is obviously a board-level
>> feature,
>> which I described further above.
>
> Not so obvious to me :-) I don't mean to be stubborn or uncooperative
> here, but I really can't find any technical merit in having it enabled
> at board level instead of SoC level.

Well, please also consider that the PMICs from Rockchip are kind of
weird little chips, specifically customized to serve particular SoCs.
For example, they ensure the right sequencing and ramping-up of
different
power rails, which is in many cases essential.

Thus, who knows what might (or might not) go wrong if we don't reset the
PMIC at the same time when the CRU resets the SoC? Unfortunately, the
things aren't that straightforward.

On top of that, some boards, such as the Rock 5B, use a few additional
discrete voltage regulators instead of a master-slave PMIC
configuration,
which may actually introduce some weird power-related issues, which also
may be intermittent. Actually, I've already overheard that the Rock 5B
experiences some issues of that nature, but I don't know the details.

> Switching to PMIC-assisted resets is one thing - it definitely should
> go to board files, as it depends on the specific wiring of the
> TSADC_SHUT signal. Enabling TSADC in a default configuration that can
> and will work on all boards regardless of their wiring is another
> thing. I'm just arguing for the latter.

CRU-based thermal runaway handling may in theory work on all boards, but
we simply can't be 100% sure without detailed insights into the board
designs and testing. Maybe even the downstream U-Boot does some magic
during such thermal runaway resets, which we don't know. It may be
similar to the SoC reset issues that the RK3399 suffers from.

See also my comment above.

> To me it seems similar to the watchdog timer situation: we enable it
> at the SoC level [1], as it is expected to work in its default
> configuration regardless of the board wiring, and it provides
> protection against system malfunctions. Doesn't matter if the board or
> its userspace code ends up using the full functionality - it just sits
> there waiting for its spotlight without hurting anybody.

Frankly, I don't know much about the watchdog functionality, so I'd need
to research it before I could say something about it.

> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#n1872

2024-03-01 09:32:55

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] arm64: dts: rockchip: enable automatic active cooling on Rock 5B

On 2024-03-01 09:30, Alexey Charkov wrote:
> On Fri, Mar 1, 2024 at 12:25 PM Dragan Simic <[email protected]>
> wrote:
>> On 2024-03-01 07:17, Dragan Simic wrote:
>> > On 2024-03-01 06:21, Alexey Charkov wrote:
>> >> On Fri, Mar 1, 2024 at 1:25 AM Dragan Simic <[email protected]>
>> >> wrote:
>> >>> On 2024-02-29 20:26, Alexey Charkov wrote:
>> >>> > This links the PWM fan on Radxa Rock 5B as an active cooling device
>> >>> > managed automatically by the thermal subsystem, with a target SoC
>> >>> > temperature of 65C and a minimum-spin interval from 55C to 65C to
>> >>> > ensure airflow when the system gets warm
>> >>>
>> >>> I'd suggest that you replace "automatic active cooling" with "active
>> >>> cooling" in the patch subject. I know, it may seem like more of the
>> >>> unnecessary nitpicking, :) but I hope you'll agree that "automatic"
>> >>> is actually redundant there. It would also make the patch subject
>> >>> a bit shorter.
>> >>>
>> >>> Another option would be to replace "automatic active cooling" with
>> >>> "automatic fan control", which may actually be a better choice.
>> >>> I'd be happy with whichever one you prefer. :)
>> >>
>> >> Sounds good to me, thanks!
>> >
>> > I'm glad that you like it. :)
>> >
>> >>> Otherwise, please feel free to add:
>> >>>
>> >>> Reviewed-by: Dragan Simic <[email protected]>
>> >>
>> >> Thank you Dragan, much appreciated!
>> >
>> > Thank you for putting up with my nitpicking. :)
>>
>> Perhaps the following tag would also be deserved for this patch:
>>
>> Helped-by: Dragan Simic <[email protected]>
>>
>> I hope you agree. :)
>
> Definitely! Thanks again for your feedback and contribution!

I'm glad to help. :)

2024-03-01 09:38:05

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On 2024-03-01 09:52, Dragan Simic wrote:
> On 2024-03-01 09:25, Alexey Charkov wrote:
>> On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <[email protected]>
>> wrote:
>>> On 2024-03-01 06:12, Alexey Charkov wrote:
>>> > With all due respect, I disagree, here is why:
>>> > - Neither the schematic nor the hardware design guide, on which the
>>> > schematic seems to be based, prescribes a particular way to handle
>>> > thermal runaways. They only provide the possibility of GPIO based
>>> > resets, along with the CRU based one
>>>
>>> Please note that other documents from Rockchip also exist. Below is
>>> a link to a screenshot from the Thermal developer guide, version 1.0,
>>> which describes the whole thing further. I believe it's obvious that
>>> the thermal runaway is to be treated as a board-level feature.
>>>
>>> - https://i.imgur.com/IJ6dSAc.png
>>
>> Frankly, that still doesn't make TSADC per se a board-level thing IMO.
>> The only thing that is board-level is the wiring of GPIO based resets,
>> which I fully agree should go to board .dts for boards that support
>> it, but that's not part of the current defaults and can be safely
>> added later.
>>
>> TSADC is inside the SoC. CRU is inside the SoC. They work just fine
>> for a thermal reset, even if no dedicated reset logic is wired on the
>> board. I really don't see any downsides in having TSADC enabled by
>> default with CRU based resets:
>> - it's a safe default (i.e. I cannot think of any configuration or use
>> case where enabled-by-default TSADC does any harm)
>> - it's safer than accidentally forgetting to enable TSADC (as it adds
>> thermal protection which is otherwise missing)
>> - it will work on all boards (even if it doesn't utilize the full
>> hardware functionality by ignoring GPIO resets that some boards also
>> have in addition to the CRU)
>> - and it requires fewer overrides in board .dts files
>>
>> Sounds like a no-regret move to me.
>
> Please see my comments below.
>
>>> To be fair, that version of the Thermal developer guide dates back to
>>> 2019, meaning that it technically applies to the RK3399, for example,
>>> but the TSADC and reset circuitry design has basically remained the
>>> same for the RK3588.
>>>
>>> > - My strong belief is that defaults (regardless of context) should be
>>> > safe and reasonable, and should also minimize the need to override
>>> > them
>>>
>>> Please note that the TSADC is disabled in the RK3399 SoC dtsi, so
>>> having
>>> it disabled in the RK3588(s) SoC dtsi would provide some consistency.
>>
>> I'm happy to produce a patch to reverse the logic in RK3399 (and any
>> others for that matter) to also have TSADC enabled by default there,
>> thus saving several lines of code, if it's just about consistency.
>
> But why should we change something that has served us for years, on
> multiple SoCs, with zero troubles and with (AFAIK) zero boards
> producing
> puffs of bluish smoke?
>
>>> Though, the RK3399 still does it in a safe way, by moving the OPPs
>>> into
>>> a separate dtsi file, named rk3399-opp.dtsi, which the board dts
>>> files
>>> then include together with enabling the TSADC.
>>>
>>> If you agree, let's employ the same approach for the RK3588(s), by
>>> having
>>> the its OPPs defined in a separate file, named rk3588s-opp.dtsi, etc.
>>
>> Separate file for OPPs is a good no-regret move to declutter the SoC
>> level .dtsi (as the OPP table is long and boring) - happy to move it
>> regardless of the outcome of the above TSADC discussion. Thanks for
>> the pointer!
>
> Yeah, but I'm not sure that everyone would like that kind of
> separation.
> In fact, such separation may be frowned upon unless it's necessary.
>
> As I already described in another thread, the separation for the RK3399
> is there only because a couple of different variants of the RK3399 SoC
> require different OPPs.
>
>>> > - In context of dts/dtsi, as far as I understand the general logic
>>> > behind the split, the SoC .dtsi should contain all the things that are
>>> > fully contained within the SoC and do not depend on the wiring of a
>>> > particular board or its target use case. Boards then
>>> > add/remove/override settings to match their wiring and use case more
>>> > closely
>>>
>>> Of course, but the thermal shutdown is obviously a board-level
>>> feature,
>>> which I described further above.
>>
>> Not so obvious to me :-) I don't mean to be stubborn or uncooperative
>> here, but I really can't find any technical merit in having it enabled
>> at board level instead of SoC level.
>
> Well, please also consider that the PMICs from Rockchip are kind of
> weird little chips, specifically customized to serve particular SoCs.
> For example, they ensure the right sequencing and ramping-up of
> different
> power rails, which is in many cases essential.
>
> Thus, who knows what might (or might not) go wrong if we don't reset
> the
> PMIC at the same time when the CRU resets the SoC? Unfortunately, the
> things aren't that straightforward.
>
> On top of that, some boards, such as the Rock 5B, use a few additional
> discrete voltage regulators instead of a master-slave PMIC
> configuration,
> which may actually introduce some weird power-related issues, which
> also
> may be intermittent. Actually, I've already overheard that the Rock 5B
> experiences some issues of that nature, but I don't know the details.

As an example, did you know that LPDDR4 chips, according to the official
JEDEC documentation, require proper sequencing of the ramping-down of
their
power rails when they're to be turned off as part of shutting the system
down? The documentation also specifies that the expected lifetime
becomes
reduced when the powering-off isn't properly performed, and there's even
an
official number of such unsafe power-offs that the LPDDR4 chips are
actually
expected to survive.

Thus, just yanking a power cord from a device that uses LPDDR4 may
actually
make it die prematurely. Such behavior is kind of exected when it comes
to
flash-based storage, but DRAM? Things are weird these days. :)

2024-03-01 11:12:42

by Alexey Charkov

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On Fri, Mar 1, 2024 at 12:52 PM Dragan Simic <[email protected]> wrote:
>
> On 2024-03-01 09:25, Alexey Charkov wrote:
> > On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <[email protected]> wrote:
> >> On 2024-03-01 06:12, Alexey Charkov wrote:
> >> > On Fri, Mar 1, 2024 at 12:21 AM Dragan Simic <dsimic@manjaroorg>
> >> > wrote:
> >> >> On 2024-02-29 20:26, Alexey Charkov wrote:
> >> >> > Include thermal zones information in device tree for RK3588 variants.
> >> >> >
> >> >> > This also enables the TSADC controller unconditionally on all boards
> >> >> > to ensure that thermal protections are in place via throttling and
> >> >> > emergency reset, once OPPs are added to enable CPU DVFS.
> >> >> >
> >> >> > The default settings (using CRU as the emergency reset mechanism)
> >> >> > should work on all boards regardless of their wiring, as CRU resets
> >> >> > do not depend on any external components. Boards that have the TSHUT
> >> >> > signal wired to the reset line of the PMIC may opt to switch to GPIO
> >> >> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
> >> >>
> >> >> Quite frankly, I'm still not sure that enabling this on the SoC level
> >> >> is the way to go. As I already described in detail, [4] according to
> >> >> the RK3588 Hardware Design Guide v1.0 and the Rock 5B schematic, we
> >> >> should actually use GPIO-based handling for the thermal runaways on
> >> >> the Rock 5B. Other boards should also be investigated individually,
> >> >> and the TSADC should be enabled on a board-to-board basis.
> >> >
> >> > With all due respect, I disagree, here is why:
> >> > - Neither the schematic nor the hardware design guide, on which the
> >> > schematic seems to be based, prescribes a particular way to handle
> >> > thermal runaways. They only provide the possibility of GPIO based
> >> > resets, along with the CRU based one
> >>
> >> Please note that other documents from Rockchip also exist. Below is
> >> a link to a screenshot from the Thermal developer guide, version 1.0,
> >> which describes the whole thing further. I believe it's obvious that
> >> the thermal runaway is to be treated as a board-level feature.
> >>
> >> - https://i.imgur.com/IJ6dSAc.png
> >
> > Frankly, that still doesn't make TSADC per se a board-level thing IMO.
> > The only thing that is board-level is the wiring of GPIO based resets,
> > which I fully agree should go to board .dts for boards that support
> > it, but that's not part of the current defaults and can be safely
> > added later.
> >
> > TSADC is inside the SoC. CRU is inside the SoC. They work just fine
> > for a thermal reset, even if no dedicated reset logic is wired on the
> > board. I really don't see any downsides in having TSADC enabled by
> > default with CRU based resets:
> > - it's a safe default (i.e. I cannot think of any configuration or use
> > case where enabled-by-default TSADC does any harm)
> > - it's safer than accidentally forgetting to enable TSADC (as it adds
> > thermal protection which is otherwise missing)
> > - it will work on all boards (even if it doesn't utilize the full
> > hardware functionality by ignoring GPIO resets that some boards also
> > have in addition to the CRU)
> > - and it requires fewer overrides in board .dts files
> >
> > Sounds like a no-regret move to me.
>
> Please see my comments below.
>
> >> To be fair, that version of the Thermal developer guide dates back to
> >> 2019, meaning that it technically applies to the RK3399, for example,
> >> but the TSADC and reset circuitry design has basically remained the
> >> same for the RK3588.
> >>
> >> > - My strong belief is that defaults (regardless of context) should be
> >> > safe and reasonable, and should also minimize the need to override
> >> > them
> >>
> >> Please note that the TSADC is disabled in the RK3399 SoC dtsi, so
> >> having
> >> it disabled in the RK3588(s) SoC dtsi would provide some consistency.
> >
> > I'm happy to produce a patch to reverse the logic in RK3399 (and any
> > others for that matter) to also have TSADC enabled by default there,
> > thus saving several lines of code, if it's just about consistency.
>
> But why should we change something that has served us for years, on
> multiple SoCs, with zero troubles and with (AFAIK) zero boards producing
> puffs of bluish smoke?

That's just if we are concerned about consistency across different SoC
series. The point is that I'm happy to make whatever change we agree
upon in a consistent way across all related .dtsi/.dts files - thus no
need to worry about past decisions that have already been implemented
for other chips. Let's just agree on the technical merits of one or
the other approach, leaving "we've been doing it differently
elsewhere" aside for now.

> >> Though, the RK3399 still does it in a safe way, by moving the OPPs
> >> into
> >> a separate dtsi file, named rk3399-opp.dtsi, which the board dts files
> >> then include together with enabling the TSADC.
> >>
> >> If you agree, let's employ the same approach for the RK3588(s), by
> >> having
> >> the its OPPs defined in a separate file, named rk3588s-opp.dtsi, etc.
> >
> > Separate file for OPPs is a good no-regret move to declutter the SoC
> > level .dtsi (as the OPP table is long and boring) - happy to move it
> > regardless of the outcome of the above TSADC discussion. Thanks for
> > the pointer!
>
> Yeah, but I'm not sure that everyone would like that kind of separation.
> In fact, such separation may be frowned upon unless it's necessary.
>
> As I already described in another thread, the separation for the RK3399
> is there only because a couple of different variants of the RK3399 SoC
> require different OPPs.
>
> >> > - In context of dts/dtsi, as far as I understand the general logic
> >> > behind the split, the SoC .dtsi should contain all the things that are
> >> > fully contained within the SoC and do not depend on the wiring of a
> >> > particular board or its target use case. Boards then
> >> > add/remove/override settings to match their wiring and use case more
> >> > closely
> >>
> >> Of course, but the thermal shutdown is obviously a board-level
> >> feature,
> >> which I described further above.
> >
> > Not so obvious to me :-) I don't mean to be stubborn or uncooperative
> > here, but I really can't find any technical merit in having it enabled
> > at board level instead of SoC level.
>
> Well, please also consider that the PMICs from Rockchip are kind of
> weird little chips, specifically customized to serve particular SoCs.
> For example, they ensure the right sequencing and ramping-up of
> different
> power rails, which is in many cases essential.

Sure. I'm not saying that switching to a PMIC-assisted reset shouldn't
be done where the board supports it - quite the opposite. All I'm
saying is that having at least passive cooling and CRU based resets
guaranteed for any board, regardless of how thought through its .dts
is, seems to be a better default than no thermal protection.

> Thus, who knows what might (or might not) go wrong if we don't reset the
> PMIC at the same time when the CRU resets the SoC? Unfortunately, the
> things aren't that straightforward.
>
> On top of that, some boards, such as the Rock 5B, use a few additional
> discrete voltage regulators instead of a master-slave PMIC
> configuration,
> which may actually introduce some weird power-related issues, which also
> may be intermittent. Actually, I've already overheard that the Rock 5B
> experiences some issues of that nature, but I don't know the details.

Those discrete regulators seem to be out of scope of this discussion.

I agree that a deeper power-cycle with proper power-up sequence to
follow it is better when it's available in the respective hardware.
I'm also happy to provide a follow-up patch to switch from CRU to PMIC
resets for the boards I found to support the latter.

The question we have at hand is solely about the default behavior for
a hypothetical new board with minimal .dts, or an existing board where
we can't determine the wiring of the TSHUT signal:
Option 1. Let them stay nice and warm at 120C+ under load, because
they should have known better and should have enabled the TSADC in
their device tree before putting the system under load
Option 2. Get them passively cooled at 85C under load even with no
heatsink, then force a CRU reset out of abundance of caution at 120C
unless they defined PMIC reset in their device tree

I'm advocating for the latter.

> > Switching to PMIC-assisted resets is one thing - it definitely should
> > go to board files, as it depends on the specific wiring of the
> > TSADC_SHUT signal. Enabling TSADC in a default configuration that can
> > and will work on all boards regardless of their wiring is another
> > thing. I'm just arguing for the latter.
>
> CRU-based thermal runaway handling may in theory work on all boards, but
> we simply can't be 100% sure without detailed insights into the board
> designs and testing. Maybe even the downstream U-Boot does some magic
> during such thermal runaway resets, which we don't know. It may be
> similar to the SoC reset issues that the RK3399 suffers from.

That might be true, but we're talking about operation at 120C+ here.
I'd rather have my board reboot in any way it pleases under those
conditions, and have that behavior triggered by default even if it's
imperfect, then worry about the correct state of all regulators and
peripherals upon next boot. The latter is important of course, but I'd
rather let it cool down and reboot it manually anyway, because that
heat could have made more things go sideways than just the regulators.

> See also my comment above.
>
> > To me it seems similar to the watchdog timer situation: we enable it
> > at the SoC level [1], as it is expected to work in its default
> > configuration regardless of the board wiring, and it provides
> > protection against system malfunctions. Doesn't matter if the board or
> > its userspace code ends up using the full functionality - it just sits
> > there waiting for its spotlight without hurting anybody.
>
> Frankly, I don't know much about the watchdog functionality, so I'd need
> to research it before I could say something about it.

FWIW, watchdog resets are exclusively routed through the CRU (see
RK3588 TRM V1.0 part 1 page 31). So if we expect that one to work
somehow, probably we should expect thermal resets to work too.

Best regards,
Alexey

2024-03-01 12:02:39

by Chen-Yu Tsai

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On Fri, Mar 1, 2024 at 7:10 PM Alexey Charkov <[email protected]> wrote:
>
> On Fri, Mar 1, 2024 at 12:52 PM Dragan Simic <[email protected]> wrote:
> >
> > On 2024-03-01 09:25, Alexey Charkov wrote:
> > > On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <[email protected]> wrote:
> > >> On 2024-03-01 06:12, Alexey Charkov wrote:
> > >> > On Fri, Mar 1, 2024 at 12:21 AM Dragan Simic <[email protected]>
> > >> > wrote:
> > >> >> On 2024-02-29 20:26, Alexey Charkov wrote:
> > >> >> > Include thermal zones information in device tree for RK3588 variants.
> > >> >> >
> > >> >> > This also enables the TSADC controller unconditionally on all boards
> > >> >> > to ensure that thermal protections are in place via throttling and
> > >> >> > emergency reset, once OPPs are added to enable CPU DVFS.
> > >> >> >
> > >> >> > The default settings (using CRU as the emergency reset mechanism)
> > >> >> > should work on all boards regardless of their wiring, as CRU resets
> > >> >> > do not depend on any external components. Boards that have the TSHUT
> > >> >> > signal wired to the reset line of the PMIC may opt to switch to GPIO
> > >> >> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
> > >> >>
> > >> >> Quite frankly, I'm still not sure that enabling this on the SoC level
> > >> >> is the way to go. As I already described in detail, [4] according to
> > >> >> the RK3588 Hardware Design Guide v1.0 and the Rock 5B schematic, we
> > >> >> should actually use GPIO-based handling for the thermal runaways on
> > >> >> the Rock 5B. Other boards should also be investigated individually,
> > >> >> and the TSADC should be enabled on a board-to-board basis.
> > >> >
> > >> > With all due respect, I disagree, here is why:
> > >> > - Neither the schematic nor the hardware design guide, on which the
> > >> > schematic seems to be based, prescribes a particular way to handle
> > >> > thermal runaways. They only provide the possibility of GPIO based
> > >> > resets, along with the CRU based one
> > >>
> > >> Please note that other documents from Rockchip also exist. Below is
> > >> a link to a screenshot from the Thermal developer guide, version 1.0,
> > >> which describes the whole thing further. I believe it's obvious that
> > >> the thermal runaway is to be treated as a board-level feature.
> > >>
> > >> - https://i.imgur.com/IJ6dSAc.png
> > >
> > > Frankly, that still doesn't make TSADC per se a board-level thing IMO.
> > > The only thing that is board-level is the wiring of GPIO based resets,
> > > which I fully agree should go to board .dts for boards that support
> > > it, but that's not part of the current defaults and can be safely
> > > added later.
> > >
> > > TSADC is inside the SoC. CRU is inside the SoC. They work just fine
> > > for a thermal reset, even if no dedicated reset logic is wired on the
> > > board. I really don't see any downsides in having TSADC enabled by
> > > default with CRU based resets:
> > > - it's a safe default (i.e. I cannot think of any configuration or use
> > > case where enabled-by-default TSADC does any harm)
> > > - it's safer than accidentally forgetting to enable TSADC (as it adds
> > > thermal protection which is otherwise missing)
> > > - it will work on all boards (even if it doesn't utilize the full
> > > hardware functionality by ignoring GPIO resets that some boards also
> > > have in addition to the CRU)
> > > - and it requires fewer overrides in board .dts files
> > >
> > > Sounds like a no-regret move to me.
> >
> > Please see my comments below.
> >
> > >> To be fair, that version of the Thermal developer guide dates back to
> > >> 2019, meaning that it technically applies to the RK3399, for example,
> > >> but the TSADC and reset circuitry design has basically remained the
> > >> same for the RK3588.
> > >>
> > >> > - My strong belief is that defaults (regardless of context) should be
> > >> > safe and reasonable, and should also minimize the need to override
> > >> > them
> > >>
> > >> Please note that the TSADC is disabled in the RK3399 SoC dtsi, so
> > >> having
> > >> it disabled in the RK3588(s) SoC dtsi would provide some consistency.
> > >
> > > I'm happy to produce a patch to reverse the logic in RK3399 (and any
> > > others for that matter) to also have TSADC enabled by default there,
> > > thus saving several lines of code, if it's just about consistency.
> >
> > But why should we change something that has served us for years, on
> > multiple SoCs, with zero troubles and with (AFAIK) zero boards producing
> > puffs of bluish smoke?
>
> That's just if we are concerned about consistency across different SoC
> series. The point is that I'm happy to make whatever change we agree
> upon in a consistent way across all related .dtsi/.dts files - thus no
> need to worry about past decisions that have already been implemented
> for other chips. Let's just agree on the technical merits of one or
> the other approach, leaving "we've been doing it differently
> elsewhere" aside for now.
>
> > >> Though, the RK3399 still does it in a safe way, by moving the OPPs
> > >> into
> > >> a separate dtsi file, named rk3399-opp.dtsi, which the board dts files
> > >> then include together with enabling the TSADC.
> > >>
> > >> If you agree, let's employ the same approach for the RK3588(s), by
> > >> having
> > >> the its OPPs defined in a separate file, named rk3588s-opp.dtsi, etc.
> > >
> > > Separate file for OPPs is a good no-regret move to declutter the SoC
> > > level .dtsi (as the OPP table is long and boring) - happy to move it
> > > regardless of the outcome of the above TSADC discussion. Thanks for
> > > the pointer!
> >
> > Yeah, but I'm not sure that everyone would like that kind of separation.
> > In fact, such separation may be frowned upon unless it's necessary.
> >
> > As I already described in another thread, the separation for the RK3399
> > is there only because a couple of different variants of the RK3399 SoC
> > require different OPPs.
> >
> > >> > - In context of dts/dtsi, as far as I understand the general logic
> > >> > behind the split, the SoC .dtsi should contain all the things that are
> > >> > fully contained within the SoC and do not depend on the wiring of a
> > >> > particular board or its target use case. Boards then
> > >> > add/remove/override settings to match their wiring and use case more
> > >> > closely
> > >>
> > >> Of course, but the thermal shutdown is obviously a board-level
> > >> feature,
> > >> which I described further above.
> > >
> > > Not so obvious to me :-) I don't mean to be stubborn or uncooperative
> > > here, but I really can't find any technical merit in having it enabled
> > > at board level instead of SoC level.
> >
> > Well, please also consider that the PMICs from Rockchip are kind of
> > weird little chips, specifically customized to serve particular SoCs.
> > For example, they ensure the right sequencing and ramping-up of
> > different
> > power rails, which is in many cases essential.
>
> Sure. I'm not saying that switching to a PMIC-assisted reset shouldn't
> be done where the board supports it - quite the opposite. All I'm
> saying is that having at least passive cooling and CRU based resets
> guaranteed for any board, regardless of how thought through its .dts
> is, seems to be a better default than no thermal protection.
>
> > Thus, who knows what might (or might not) go wrong if we don't reset the
> > PMIC at the same time when the CRU resets the SoC? Unfortunately, the
> > things aren't that straightforward.
> >
> > On top of that, some boards, such as the Rock 5B, use a few additional
> > discrete voltage regulators instead of a master-slave PMIC
> > configuration,
> > which may actually introduce some weird power-related issues, which also
> > may be intermittent. Actually, I've already overheard that the Rock 5B
> > experiences some issues of that nature, but I don't know the details.
>
> Those discrete regulators seem to be out of scope of this discussion.
>
> I agree that a deeper power-cycle with proper power-up sequence to
> follow it is better when it's available in the respective hardware.
> I'm also happy to provide a follow-up patch to switch from CRU to PMIC
> resets for the boards I found to support the latter.
>
> The question we have at hand is solely about the default behavior for
> a hypothetical new board with minimal .dts, or an existing board where
> we can't determine the wiring of the TSHUT signal:
> Option 1. Let them stay nice and warm at 120C+ under load, because
> they should have known better and should have enabled the TSADC in
> their device tree before putting the system under load
> Option 2. Get them passively cooled at 85C under load even with no
> heatsink, then force a CRU reset out of abundance of caution at 120C
> unless they defined PMIC reset in their device tree
>
> I'm advocating for the latter.

FWIW, the CRU reset is what the kernel uses for rebooting the system,
either during a reboot or a kernel panic. So it is already used for both
normal and abnormal scenarios. And yes, it sometimes leaves regulators
or other parts of the system in some weird state that the BROM isn't
expecting.

Why should a hardware triggered reset be any different?

ChenYu

> > > Switching to PMIC-assisted resets is one thing - it definitely should
> > > go to board files, as it depends on the specific wiring of the
> > > TSADC_SHUT signal. Enabling TSADC in a default configuration that can
> > > and will work on all boards regardless of their wiring is another
> > > thing. I'm just arguing for the latter.
> >
> > CRU-based thermal runaway handling may in theory work on all boards, but
> > we simply can't be 100% sure without detailed insights into the board
> > designs and testing. Maybe even the downstream U-Boot does some magic
> > during such thermal runaway resets, which we don't know. It may be
> > similar to the SoC reset issues that the RK3399 suffers from.
>
> That might be true, but we're talking about operation at 120C+ here.
> I'd rather have my board reboot in any way it pleases under those
> conditions, and have that behavior triggered by default even if it's
> imperfect, then worry about the correct state of all regulators and
> peripherals upon next boot. The latter is important of course, but I'd
> rather let it cool down and reboot it manually anyway, because that
> heat could have made more things go sideways than just the regulators.
>
> > See also my comment above.
> >
> > > To me it seems similar to the watchdog timer situation: we enable it
> > > at the SoC level [1], as it is expected to work in its default
> > > configuration regardless of the board wiring, and it provides
> > > protection against system malfunctions. Doesn't matter if the board or
> > > its userspace code ends up using the full functionality - it just sits
> > > there waiting for its spotlight without hurting anybody.
> >
> > Frankly, I don't know much about the watchdog functionality, so I'd need
> > to research it before I could say something about it.
>
> FWIW, watchdog resets are exclusively routed through the CRU (see
> RK3588 TRM V1.0 part 1 page 31). So if we expect that one to work
> somehow, probably we should expect thermal resets to work too.
>
> Best regards,
> Alexey

2024-03-01 12:34:44

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

On 2024-03-01 12:10, Alexey Charkov wrote:
> On Fri, Mar 1, 2024 at 12:52 PM Dragan Simic <[email protected]>
> wrote:
>> On 2024-03-01 09:25, Alexey Charkov wrote:
>> > On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <[email protected]> wrote:
>> >> On 2024-03-01 06:12, Alexey Charkov wrote:
>> >> > On Fri, Mar 1, 2024 at 12:21 AM Dragan Simic <[email protected]>
>> >> > wrote:
>> >> >> On 2024-02-29 20:26, Alexey Charkov wrote:
>> >> >> > Include thermal zones information in device tree for RK3588 variants.
>> >> >> >
>> >> >> > This also enables the TSADC controller unconditionally on all boards
>> >> >> > to ensure that thermal protections are in place via throttling and
>> >> >> > emergency reset, once OPPs are added to enable CPU DVFS.
>> >> >> >
>> >> >> > The default settings (using CRU as the emergency reset mechanism)
>> >> >> > should work on all boards regardless of their wiring, as CRU resets
>> >> >> > do not depend on any external components. Boards that have the TSHUT
>> >> >> > signal wired to the reset line of the PMIC may opt to switch to GPIO
>> >> >> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
>> >> >>
>> >> >> Quite frankly, I'm still not sure that enabling this on the SoC level
>> >> >> is the way to go. As I already described in detail, [4] according to
>> >> >> the RK3588 Hardware Design Guide v1.0 and the Rock 5B schematic, we
>> >> >> should actually use GPIO-based handling for the thermal runaways on
>> >> >> the Rock 5B. Other boards should also be investigated individually,
>> >> >> and the TSADC should be enabled on a board-to-board basis.
>> >> >
>> >> > With all due respect, I disagree, here is why:
>> >> > - Neither the schematic nor the hardware design guide, on which the
>> >> > schematic seems to be based, prescribes a particular way to handle
>> >> > thermal runaways. They only provide the possibility of GPIO based
>> >> > resets, along with the CRU based one
>> >>
>> >> Please note that other documents from Rockchip also exist. Below is
>> >> a link to a screenshot from the Thermal developer guide, version 1.0,
>> >> which describes the whole thing further. I believe it's obvious that
>> >> the thermal runaway is to be treated as a board-level feature.
>> >>
>> >> - https://i.imgur.com/IJ6dSAc.png
>> >
>> > Frankly, that still doesn't make TSADC per se a board-level thing IMO.
>> > The only thing that is board-level is the wiring of GPIO based resets,
>> > which I fully agree should go to board .dts for boards that support
>> > it, but that's not part of the current defaults and can be safely
>> > added later.
>> >
>> > TSADC is inside the SoC. CRU is inside the SoC. They work just fine
>> > for a thermal reset, even if no dedicated reset logic is wired on the
>> > board. I really don't see any downsides in having TSADC enabled by
>> > default with CRU based resets:
>> > - it's a safe default (i.e. I cannot think of any configuration or use
>> > case where enabled-by-default TSADC does any harm)
>> > - it's safer than accidentally forgetting to enable TSADC (as it adds
>> > thermal protection which is otherwise missing)
>> > - it will work on all boards (even if it doesn't utilize the full
>> > hardware functionality by ignoring GPIO resets that some boards also
>> > have in addition to the CRU)
>> > - and it requires fewer overrides in board .dts files
>> >
>> > Sounds like a no-regret move to me.
>>
>> Please see my comments below.
>>
>> >> To be fair, that version of the Thermal developer guide dates back to
>> >> 2019, meaning that it technically applies to the RK3399, for example,
>> >> but the TSADC and reset circuitry design has basically remained the
>> >> same for the RK3588.
>> >>
>> >> > - My strong belief is that defaults (regardless of context) should be
>> >> > safe and reasonable, and should also minimize the need to override
>> >> > them
>> >>
>> >> Please note that the TSADC is disabled in the RK3399 SoC dtsi, so
>> >> having
>> >> it disabled in the RK3588(s) SoC dtsi would provide some consistency.
>> >
>> > I'm happy to produce a patch to reverse the logic in RK3399 (and any
>> > others for that matter) to also have TSADC enabled by default there,
>> > thus saving several lines of code, if it's just about consistency.
>>
>> But why should we change something that has served us for years, on
>> multiple SoCs, with zero troubles and with (AFAIK) zero boards
>> producing
>> puffs of bluish smoke?
>
> That's just if we are concerned about consistency across different SoC
> series. The point is that I'm happy to make whatever change we agree
> upon in a consistent way across all related .dtsi/.dts files - thus no
> need to worry about past decisions that have already been implemented
> for other chips. Let's just agree on the technical merits of one or
> the other approach, leaving "we've been doing it differently elsewhere"
> aside for now.

I see, I'd also be willing to implement such cleanup patches. Though,
let's also keep in mind that some past decisions might have some strong
reasons behind them, which might not be obvious. That's another reason
why I'm against enabling TSADC by default.

>> >> Though, the RK3399 still does it in a safe way, by moving the OPPs
>> >> into
>> >> a separate dtsi file, named rk3399-opp.dtsi, which the board dts files
>> >> then include together with enabling the TSADC.
>> >>
>> >> If you agree, let's employ the same approach for the RK3588(s), by
>> >> having
>> >> the its OPPs defined in a separate file, named rk3588s-opp.dtsi, etc.
>> >
>> > Separate file for OPPs is a good no-regret move to declutter the SoC
>> > level .dtsi (as the OPP table is long and boring) - happy to move it
>> > regardless of the outcome of the above TSADC discussion. Thanks for
>> > the pointer!
>>
>> Yeah, but I'm not sure that everyone would like that kind of
>> separation.
>> In fact, such separation may be frowned upon unless it's necessary.
>>
>> As I already described in another thread, the separation for the
>> RK3399
>> is there only because a couple of different variants of the RK3399 SoC
>> require different OPPs.
>>
>> >> > - In context of dts/dtsi, as far as I understand the general logic
>> >> > behind the split, the SoC .dtsi should contain all the things that are
>> >> > fully contained within the SoC and do not depend on the wiring of a
>> >> > particular board or its target use case. Boards then
>> >> > add/remove/override settings to match their wiring and use case more
>> >> > closely
>> >>
>> >> Of course, but the thermal shutdown is obviously a board-level
>> >> feature,
>> >> which I described further above.
>> >
>> > Not so obvious to me :-) I don't mean to be stubborn or uncooperative
>> > here, but I really can't find any technical merit in having it enabled
>> > at board level instead of SoC level.
>>
>> Well, please also consider that the PMICs from Rockchip are kind of
>> weird little chips, specifically customized to serve particular SoCs.
>> For example, they ensure the right sequencing and ramping-up of
>> different power rails, which is in many cases essential.
>
> Sure. I'm not saying that switching to a PMIC-assisted reset shouldn't
> be done where the board supports it - quite the opposite. All I'm
> saying is that having at least passive cooling and CRU based resets
> guaranteed for any board, regardless of how thought through its .dts
> is, seems to be a better default than no thermal protection.

The way I see it, which I tried to describe in my previous response, is
that we can't be 100% sure that CRU-based resets would work as expected
and 100% reliably on all boards.

>> Thus, who knows what might (or might not) go wrong if we don't reset
>> the
>> PMIC at the same time when the CRU resets the SoC? Unfortunately, the
>> things aren't that straightforward.
>>
>> On top of that, some boards, such as the Rock 5B, use a few additional
>> discrete voltage regulators instead of a master-slave PMIC
>> configuration,
>> which may actually introduce some weird power-related issues, which
>> also
>> may be intermittent. Actually, I've already overheard that the Rock
>> 5B
>> experiences some issues of that nature, but I don't know the details.
>
> Those discrete regulators seem to be out of scope of this discussion.
>
> I agree that a deeper power-cycle with proper power-up sequence to
> follow it is better when it's available in the respective hardware.
> I'm also happy to provide a follow-up patch to switch from CRU to PMIC
> resets for the boards I found to support the latter.
>
> The question we have at hand is solely about the default behavior for
> a hypothetical new board with minimal .dts, or an existing board where
> we can't determine the wiring of the TSHUT signal:
> Option 1. Let them stay nice and warm at 120C+ under load, because
> they should have known better and should have enabled the TSADC in
> their device tree before putting the system under load
> Option 2. Get them passively cooled at 85C under load even with no
> heatsink, then force a CRU reset out of abundance of caution at 120C
> unless they defined PMIC reset in their device tree
>
> I'm advocating for the latter.

Just to clarify, the way I see it, we'd end up with having TSADC and
CRU-based resets enabled for the boards we can't be sure to have support
for PMIC resets. It's just that, the way I see it, TSADC wouldn't be
enabled on the SoC level, but no boards would be left with no thermal
runaway handling in place.

>> > Switching to PMIC-assisted resets is one thing - it definitely should
>> > go to board files, as it depends on the specific wiring of the
>> > TSADC_SHUT signal. Enabling TSADC in a default configuration that can
>> > and will work on all boards regardless of their wiring is another
>> > thing. I'm just arguing for the latter.
>>
>> CRU-based thermal runaway handling may in theory work on all boards,
>> but
>> we simply can't be 100% sure without detailed insights into the board
>> designs and testing. Maybe even the downstream U-Boot does some magic
>> during such thermal runaway resets, which we don't know. It may be
>> similar to the SoC reset issues that the RK3399 suffers from.
>
> That might be true, but we're talking about operation at 120C+ here.
> I'd rather have my board reboot in any way it pleases under those
> conditions, and have that behavior triggered by default even if it's
> imperfect, then worry about the correct state of all regulators and
> peripherals upon next boot. The latter is important of course, but I'd
> rather let it cool down and reboot it manually anyway, because that
> heat could have made more things go sideways than just the regulators.

Makes sense, but as you can see above, the way I propose it no boards
would be left to sizzle under load. That would be irresponsible and bad
on multiple levels.

>> See also my comment above.
>>
>> > To me it seems similar to the watchdog timer situation: we enable it
>> > at the SoC level [1], as it is expected to work in its default
>> > configuration regardless of the board wiring, and it provides
>> > protection against system malfunctions. Doesn't matter if the board or
>> > its userspace code ends up using the full functionality - it just sits
>> > there waiting for its spotlight without hurting anybody.
>>
>> Frankly, I don't know much about the watchdog functionality, so I'd
>> need
>> to research it before I could say something about it.
>
> FWIW, watchdog resets are exclusively routed through the CRU (see
> RK3588 TRM V1.0 part 1 page 31). So if we expect that one to work
> somehow, probably we should expect thermal resets to work too.

I'll see to reasearch that a bit, maybe I'll find something out that
could be interesting or useful.

2024-03-01 13:11:44

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

Hello Chen-Yu,

On 2024-03-01 13:02, Chen-Yu Tsai wrote:
> On Fri, Mar 1, 2024 at 7:10 PM Alexey Charkov <[email protected]>
> wrote:
>> On Fri, Mar 1, 2024 at 12:52 PM Dragan Simic <[email protected]>
>> wrote:
>> > On 2024-03-01 09:25, Alexey Charkov wrote:
>> > > On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <[email protected]> wrote:
>> > Thus, who knows what might (or might not) go wrong if we don't reset the
>> > PMIC at the same time when the CRU resets the SoC? Unfortunately, the
>> > things aren't that straightforward.
>> >
>> > On top of that, some boards, such as the Rock 5B, use a few additional
>> > discrete voltage regulators instead of a master-slave PMIC
>> > configuration,
>> > which may actually introduce some weird power-related issues, which also
>> > may be intermittent. Actually, I've already overheard that the Rock 5B
>> > experiences some issues of that nature, but I don't know the details.
>>
>> Those discrete regulators seem to be out of scope of this discussion.
>>
>> I agree that a deeper power-cycle with proper power-up sequence to
>> follow it is better when it's available in the respective hardware.
>> I'm also happy to provide a follow-up patch to switch from CRU to PMIC
>> resets for the boards I found to support the latter.
>>
>> The question we have at hand is solely about the default behavior for
>> a hypothetical new board with minimal .dts, or an existing board where
>> we can't determine the wiring of the TSHUT signal:
>> Option 1. Let them stay nice and warm at 120C+ under load, because
>> they should have known better and should have enabled the TSADC in
>> their device tree before putting the system under load
>> Option 2. Get them passively cooled at 85C under load even with no
>> heatsink, then force a CRU reset out of abundance of caution at 120C
>> unless they defined PMIC reset in their device tree
>>
>> I'm advocating for the latter.
>
> FWIW, the CRU reset is what the kernel uses for rebooting the system,
> either during a reboot or a kernel panic. So it is already used for
> both
> normal and abnormal scenarios. And yes, it sometimes leaves regulators
> or other parts of the system in some weird state that the BROM isn't
> expecting.

According to drivers/mfd/rk8xx-core.c, some PMICs (RK809 and RK817, to
be precise) already support taking over the board resets when configured
with "rockchip,system-power-controller". Perhaps we should do the same
with the RK806, to avoid any possible issues with CRU-based board
resets;
I'll see to investigate that further.

Not all Rockchip PMICs (RK808, for example) support software-initiated
resets, unfortunately. According to the RK806 datasheet, it seems
capable
of that; see pages 27 and 28 in the version 1.0 of the datasheet.

> Why should a hardware triggered reset be any different?

According to the RK806 datasheet, resetting through PMIC(s) causes the
PMIC(s) to cut the power rails in a controlled way, i.e. with the
expected
ramp-downs and sequencing, and the SoC then wakes up in a state of the
regulators that's exactly the same as when it gets powered up on cold
boot.
Doing it that way should be better.

The reset procedure _should_ be virtually the same for all Rockchip
PMICs,
but please don't take my word on that. Resets are described quite
poorly
in some PMIC datasheets.

2024-03-02 11:25:58

by Heiko Stuebner

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

Am Donnerstag, 29. Februar 2024, 20:26:32 CET schrieb Alexey Charkov:
> Include thermal zones information in device tree for RK3588 variants.
>
> This also enables the TSADC controller unconditionally on all boards
> to ensure that thermal protections are in place via throttling and
> emergency reset, once OPPs are added to enable CPU DVFS.
>
> The default settings (using CRU as the emergency reset mechanism)
> should work on all boards regardless of their wiring, as CRU resets
> do not depend on any external components. Boards that have the TSHUT
> signal wired to the reset line of the PMIC may opt to switch to GPIO
> tshut mode instead (rockchip,hw-tshut-mode = <1>;)
>
> It seems though that downstream kernels don't use that, even for
> those boards where the wiring allows for GPIO based tshut, such as
> Radxa Rock 5B [1], [2], [3]
>
> [1] https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
> [2] https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
> [3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf page 11 (TSADC_SHUT_H)
>
> Signed-off-by: Alexey Charkov <[email protected]>
> ---
> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176 +++++++++++++++++++++++++++++-
> 1 file changed, 175 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> index 36b1b7acfe6a..9bf197358642 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> @@ -10,6 +10,7 @@
> #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> #include <dt-bindings/phy/phy.h>
> #include <dt-bindings/ata/ahci.h>
> +#include <dt-bindings/thermal/thermal.h>
>
> / {
> compatible = "rockchip,rk3588";
> @@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
> pinctrl-1 = <&tsadc_shut>;
> pinctrl-names = "gpio", "otpout";
> #thermal-sensor-cells = <1>;
> - status = "disabled";
> + status = "okay";
> + };

so I've skimmed over the general discussion, though don't have a hard
opinion in either direction yet. Still there are some low-hanging fruit:

- having the thermal-zones addition in a separate patch would allow to
merge the obvious stuff, while this discussion is still ongoing
- status=okay in a soc dtsi is wrong, because okay is the default status
so if anything the status property should be removed

In general I'm not that much of a fan of things just working implicitly.
So somehow, when someone submits a board devicetree, I expect them to
having ensured stuff is enabled somewhat ok. So even seeing a simple

&tsadc {
status = "okay"
};

suggests that they have at least noticed the existence of thermal stuff.


At least that is where my thought-process is at the moment ;-)


Heiko

> + thermal_zones: thermal-zones {
> + /* sensor near the center of the SoC */
> + package_thermal: package-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 0>;
> +
> + trips {
> + package_crit: package-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + /* sensor between A76 cores 0 and 1 */
> + bigcore0_thermal: bigcore0-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 1>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */
> + bigcore0_alert0: bigcore0-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */
> + bigcore0_alert1: bigcore0-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + bigcore0_crit: bigcore0-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&bigcore0_alert1>;
> + cooling-device =
> + <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor between A76 cores 2 and 3 */
> + bigcore2_thermal: bigcore2-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 2>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */
> + bigcore2_alert0: bigcore2-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */
> + bigcore2_alert1: bigcore2-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + bigcore2_crit: bigcore2-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&bigcore2_alert1>;
> + cooling-device =
> + <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor between the four A55 cores */
> + little_core_thermal: littlecore-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 3>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */
> + littlecore_alert0: littlecore-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */
> + littlecore_alert1: littlecore-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + littlecore_crit: littlecore-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&littlecore_alert1>;
> + cooling-device =
> + <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor near the PD_CENTER power domain */
> + center_thermal: center-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 4>;
> +
> + trips {
> + center_crit: center-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + gpu_thermal: gpu-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 5>;
> +
> + trips {
> + gpu_crit: gpu-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + npu_thermal: npu-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 6>;
> +
> + trips {
> + npu_crit: npu-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> };
>
> saradc: adc@fec10000 {
>
>





2024-03-02 18:38:43

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

Hello Heiko,

On 2024-03-02 12:25, Heiko Stuebner wrote:
> Am Donnerstag, 29. Februar 2024, 20:26:32 CET schrieb Alexey Charkov:
>> Include thermal zones information in device tree for RK3588 variants.
>>
>> This also enables the TSADC controller unconditionally on all boards
>> to ensure that thermal protections are in place via throttling and
>> emergency reset, once OPPs are added to enable CPU DVFS.
>>
>> The default settings (using CRU as the emergency reset mechanism)
>> should work on all boards regardless of their wiring, as CRU resets
>> do not depend on any external components. Boards that have the TSHUT
>> signal wired to the reset line of the PMIC may opt to switch to GPIO
>> tshut mode instead (rockchip,hw-tshut-mode = <1>;)
>>
>> It seems though that downstream kernels don't use that, even for
>> those boards where the wiring allows for GPIO based tshut, such as
>> Radxa Rock 5B [1], [2], [3]
>>
>> [1]
>> https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
>> [2]
>> https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
>> [3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf
>> page 11 (TSADC_SHUT_H)
>>
>> Signed-off-by: Alexey Charkov <[email protected]>
>> ---
>> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176
>> +++++++++++++++++++++++++++++-
>> 1 file changed, 175 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> index 36b1b7acfe6a..9bf197358642 100644
>> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
>> @@ -10,6 +10,7 @@
>> #include <dt-bindings/reset/rockchip,rk3588-cru.h>
>> #include <dt-bindings/phy/phy.h>
>> #include <dt-bindings/ata/ahci.h>
>> +#include <dt-bindings/thermal/thermal.h>
>>
>> / {
>> compatible = "rockchip,rk3588";
>> @@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
>> pinctrl-1 = <&tsadc_shut>;
>> pinctrl-names = "gpio", "otpout";
>> #thermal-sensor-cells = <1>;
>> - status = "disabled";
>> + status = "okay";
>> + };
>
> so I've skimmed over the general discussion, though don't have a hard
> opinion in either direction yet. Still there are some low-hanging
> fruit:
>
> - having the thermal-zones addition in a separate patch would allow to
> merge the obvious stuff, while this discussion is still ongoing

Very good suggestion.

> - status=okay in a soc dtsi is wrong, because okay is the default
> status
> so if anything the status property should be removed
>
> In general I'm not that much of a fan of things just working
> implicitly.
> So somehow, when someone submits a board devicetree, I expect them to
> having ensured stuff is enabled somewhat ok. So even seeing a simple
>
> &tsadc {
> status = "okay"
> };
>
> suggests that they have at least noticed the existence of thermal
> stuff.

I agree that having such additional "signed-off markers", so to speak,
in
a board dts is quite assuring. I mean, someone implementing a new dts
file
for a new board should simply know what needs to be done there, and
there
should be no excuses for not checking the thermal throttling stuff.

2024-03-04 18:09:07

by Sebastian Reichel

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Hi,

On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
> This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> active cooling on Radxa Rock 5B via the provided PWM fan.
>
> Some RK3588 boards use separate regulators to supply CPUs and their
> respective memory interfaces, so this is handled by coupling those
> regulators in affected boards' device trees to ensure that their
> voltage is adjusted in step.
>
> In this revision of the series I chose to enable TSADC for all boards
> at .dtsi level, because:
> - The defaults already in .dtsi should work for all users, given that
> the CRU based resets don't need any out-of-chip components, and
> the CRU vs. PMIC reset is pretty much the only thing a board might
> have to configure / override there
> - The boards that have TSADC_SHUT signal wired to the PMIC reset line
> can still choose to override the reset logic in their .dts. Or stay
> with CRU based resets, as downstream kernels do anyway
> - The on-by-default approach helps ensure thermal protections are in
> place (emergency reset and throttling) for any board even with a
> rudimentary .dts, and thus lets us introduce CPU DVFS with better
> peace of mind
>
> Fan control on Rock 5B has been split into two intervals: let it spin
> at the minimum cooling state between 55C and 65C, and then accelerate
> if the system crosses the 65C mark - thanks to Dragan for suggesting.
> This lets some cooling setups with beefier heatsinks and/or larger
> fan fins to stay in the quietest non-zero fan state while still
> gaining potential benefits from the airflow it generates, and
> possibly avoiding noisy speeds altogether for some workloads.
>
> OPPs help actually scale CPU frequencies up and down for both cooling
> and performance - tested on Rock 5B under varied loads. I've split
> the patch into two parts: the first containing those OPPs that seem
> to be no-regret with general consensus during v1 review [2], while
> the second contains OPPs that cause frequency reductions without
> accompanying decrease in CPU voltage. There seems to be a slight
> performance gain in some workload scenarios when using these, but
> previous discussion was inconclusive as to whether they should be
> included or not. Having them as separate patches enables easier
> comparison and partial reversion if people want to test it under
> their workloads, and also enables the first 'no-regret' part to be
> merged to -next while the jury is still out on the second one.
>
> [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
> [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
>
> Signed-off-by: Alexey Charkov <[email protected]>
> ---
> Changes in v3:
> - Added regulator coupling for EVB1 and QuartzPro64
> - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
> - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
> - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
> - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
> churn there since the version he acknowledged
> - Link to v2: https://lore.kernel.org/r/[email protected]
>
> Changes in v2:
> - Dropped the rfkill patch which Heiko has already applied
> - Set higher 'polling-delay-passive' (100 instead of 20)
> - Name all cooling maps starting from map0 in each respective zone
> - Drop 'contribution' properties from passive cooling maps
> - Link to v1: https://lore.kernel.org/r/[email protected]
>
> ---
> Alexey Charkov (5):
> arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
> arm64: dts: rockchip: enable automatic active cooling on Rock 5B
> arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
> arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
> arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
>
> arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 +
> .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 +
> arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +-
> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385 ++++++++++++++++++++-
> 4 files changed, 437 insertions(+), 2 deletions(-)

I'm too busy to have a detailed review of this series right now, but
I pushed it to our CI and it results in a board reset at boot time:

https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950

I also pushed just the first three patches (i.e. without OPP /
cpufreq) and that boots fine:

https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953

Note, that OPP / cpufreq works on the same boards in the CI when
using the ugly-and-not-for-upstream cpufreq driver:

https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd

My best guess right now is, that this is related to the generic
driver obviously not updating the GRF read margin registers.

Greetings,

-- Sebastian


Attachments:
(No filename) (5.27 kB)
signature.asc (849.00 B)
Download all attachments

2024-03-05 08:07:46

by Alexey Charkov

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Hi Sebastian!

On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
<[email protected]> wrote:
>
> Hi,
>
> On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
> > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> > active cooling on Radxa Rock 5B via the provided PWM fan.
> >
> > Some RK3588 boards use separate regulators to supply CPUs and their
> > respective memory interfaces, so this is handled by coupling those
> > regulators in affected boards' device trees to ensure that their
> > voltage is adjusted in step.
> >
> > In this revision of the series I chose to enable TSADC for all boards
> > at .dtsi level, because:
> > - The defaults already in .dtsi should work for all users, given that
> > the CRU based resets don't need any out-of-chip components, and
> > the CRU vs. PMIC reset is pretty much the only thing a board might
> > have to configure / override there
> > - The boards that have TSADC_SHUT signal wired to the PMIC reset line
> > can still choose to override the reset logic in their .dts. Or stay
> > with CRU based resets, as downstream kernels do anyway
> > - The on-by-default approach helps ensure thermal protections are in
> > place (emergency reset and throttling) for any board even with a
> > rudimentary .dts, and thus lets us introduce CPU DVFS with better
> > peace of mind
> >
> > Fan control on Rock 5B has been split into two intervals: let it spin
> > at the minimum cooling state between 55C and 65C, and then accelerate
> > if the system crosses the 65C mark - thanks to Dragan for suggesting.
> > This lets some cooling setups with beefier heatsinks and/or larger
> > fan fins to stay in the quietest non-zero fan state while still
> > gaining potential benefits from the airflow it generates, and
> > possibly avoiding noisy speeds altogether for some workloads.
> >
> > OPPs help actually scale CPU frequencies up and down for both cooling
> > and performance - tested on Rock 5B under varied loads. I've split
> > the patch into two parts: the first containing those OPPs that seem
> > to be no-regret with general consensus during v1 review [2], while
> > the second contains OPPs that cause frequency reductions without
> > accompanying decrease in CPU voltage. There seems to be a slight
> > performance gain in some workload scenarios when using these, but
> > previous discussion was inconclusive as to whether they should be
> > included or not. Having them as separate patches enables easier
> > comparison and partial reversion if people want to test it under
> > their workloads, and also enables the first 'no-regret' part to be
> > merged to -next while the jury is still out on the second one.
> >
> > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
> > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
> >
> > Signed-off-by: Alexey Charkov <[email protected]>
> > ---
> > Changes in v3:
> > - Added regulator coupling for EVB1 and QuartzPro64
> > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
> > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
> > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
> > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
> > churn there since the version he acknowledged
> > - Link to v2: https://lore.kernel.org/r/[email protected]
> >
> > Changes in v2:
> > - Dropped the rfkill patch which Heiko has already applied
> > - Set higher 'polling-delay-passive' (100 instead of 20)
> > - Name all cooling maps starting from map0 in each respective zone
> > - Drop 'contribution' properties from passive cooling maps
> > - Link to v1: https://lore.kernel.org/r/[email protected]
> >
> > ---
> > Alexey Charkov (5):
> > arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
> > arm64: dts: rockchip: enable automatic active cooling on Rock 5B
> > arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
> > arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
> > arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
> >
> > arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 +
> > .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 +
> > arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +-
> > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385 ++++++++++++++++++++-
> > 4 files changed, 437 insertions(+), 2 deletions(-)
>
> I'm too busy to have a detailed review of this series right now, but
> I pushed it to our CI and it results in a board reset at boot time:
>
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
>
> I also pushed just the first three patches (i.e. without OPP /
> cpufreq) and that boots fine:
>
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953

Thank you for testing these! I've noticed in the boot log that the CI
machine uses some u-boot 2023.07 - is that a downstream one? Any
chance to compare it to 2023.11 or 2024.01 from your (Collabora)
integration tree?

I use 2023.11 from your integration tree, with a binary bl31, and I'm
not getting those resets even under prolonged heavy load (I rebuild
Chromium with 8 concurrent compilation jobs as the stress test -
that's 14 hours of heavy CPU, memory and IO use). Would be interesting
to understand if it's just a 'lucky' SoC specimen on my side, or if
there is some dark magic happening differently on my machine vs. your
CI machine.

Thinking that maybe if your CI machine uses a downstream u-boot it
might be leaving some extra hardware running (PVTM?) which might do
weird stuff when TSADC/clocks/voltages get readjusted by the generic
cpufreq driver?..

> Note, that OPP / cpufreq works on the same boards in the CI when
> using the ugly-and-not-for-upstream cpufreq driver:
>
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
>
> My best guess right now is, that this is related to the generic
> driver obviously not updating the GRF read margin registers.

If it was about memory read margins I believe I would have been
unlikely to get my machine to work reliably under heavy load with the
default ones, but who knows...

Best regards,
Alexey

2024-03-07 12:38:48

by Alexey Charkov

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <[email protected]> wrote:
>
> Hi Sebastian!
>
> On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
> <[email protected]> wrote:
> >
> > Hi,
> >
> > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
> > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> > > active cooling on Radxa Rock 5B via the provided PWM fan.
> > >
> > > Some RK3588 boards use separate regulators to supply CPUs and their
> > > respective memory interfaces, so this is handled by coupling those
> > > regulators in affected boards' device trees to ensure that their
> > > voltage is adjusted in step.
> > >
> > > In this revision of the series I chose to enable TSADC for all boards
> > > at .dtsi level, because:
> > > - The defaults already in .dtsi should work for all users, given that
> > > the CRU based resets don't need any out-of-chip components, and
> > > the CRU vs. PMIC reset is pretty much the only thing a board might
> > > have to configure / override there
> > > - The boards that have TSADC_SHUT signal wired to the PMIC reset line
> > > can still choose to override the reset logic in their .dts. Or stay
> > > with CRU based resets, as downstream kernels do anyway
> > > - The on-by-default approach helps ensure thermal protections are in
> > > place (emergency reset and throttling) for any board even with a
> > > rudimentary .dts, and thus lets us introduce CPU DVFS with better
> > > peace of mind
> > >
> > > Fan control on Rock 5B has been split into two intervals: let it spin
> > > at the minimum cooling state between 55C and 65C, and then accelerate
> > > if the system crosses the 65C mark - thanks to Dragan for suggesting.
> > > This lets some cooling setups with beefier heatsinks and/or larger
> > > fan fins to stay in the quietest non-zero fan state while still
> > > gaining potential benefits from the airflow it generates, and
> > > possibly avoiding noisy speeds altogether for some workloads.
> > >
> > > OPPs help actually scale CPU frequencies up and down for both cooling
> > > and performance - tested on Rock 5B under varied loads. I've split
> > > the patch into two parts: the first containing those OPPs that seem
> > > to be no-regret with general consensus during v1 review [2], while
> > > the second contains OPPs that cause frequency reductions without
> > > accompanying decrease in CPU voltage. There seems to be a slight
> > > performance gain in some workload scenarios when using these, but
> > > previous discussion was inconclusive as to whether they should be
> > > included or not. Having them as separate patches enables easier
> > > comparison and partial reversion if people want to test it under
> > > their workloads, and also enables the first 'no-regret' part to be
> > > merged to -next while the jury is still out on the second one.
> > >
> > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
> > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
> > >
> > > Signed-off-by: Alexey Charkov <[email protected]>
> > > ---
> > > Changes in v3:
> > > - Added regulator coupling for EVB1 and QuartzPro64
> > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
> > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
> > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
> > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
> > > churn there since the version he acknowledged
> > > - Link to v2: https://lore.kernel.org/r/[email protected]
> > >
> > > Changes in v2:
> > > - Dropped the rfkill patch which Heiko has already applied
> > > - Set higher 'polling-delay-passive' (100 instead of 20)
> > > - Name all cooling maps starting from map0 in each respective zone
> > > - Drop 'contribution' properties from passive cooling maps
> > > - Link to v1: https://lore.kernel.org/r/[email protected]
> > >
> > > ---
> > > Alexey Charkov (5):
> > > arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
> > > arm64: dts: rockchip: enable automatic active cooling on Rock 5B
> > > arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
> > > arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
> > > arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
> > >
> > > arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 +
> > > .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 +
> > > arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +-
> > > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385 ++++++++++++++++++++-
> > > 4 files changed, 437 insertions(+), 2 deletions(-)
> >
> > I'm too busy to have a detailed review of this series right now, but
> > I pushed it to our CI and it results in a board reset at boot time:
> >
> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
> >
> > I also pushed just the first three patches (i.e. without OPP /
> > cpufreq) and that boots fine:
> >
> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
>
> Thank you for testing these! I've noticed in the boot log that the CI
> machine uses some u-boot 2023.07 - is that a downstream one? Any
> chance to compare it to 2023.11 or 2024.01 from your (Collabora)
> integration tree?
>
> I use 2023.11 from your integration tree, with a binary bl31, and I'm
> not getting those resets even under prolonged heavy load (I rebuild
> Chromium with 8 concurrent compilation jobs as the stress test -
> that's 14 hours of heavy CPU, memory and IO use). Would be interesting
> to understand if it's just a 'lucky' SoC specimen on my side, or if
> there is some dark magic happening differently on my machine vs. your
> CI machine.
>
> Thinking that maybe if your CI machine uses a downstream u-boot it
> might be leaving some extra hardware running (PVTM?) which might do
> weird stuff when TSADC/clocks/voltages get readjusted by the generic
> cpufreq driver?..
>
> > Note, that OPP / cpufreq works on the same boards in the CI when
> > using the ugly-and-not-for-upstream cpufreq driver:
> >
> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
> >
> > My best guess right now is, that this is related to the generic
> > driver obviously not updating the GRF read margin registers.
>
> If it was about memory read margins I believe I would have been
> unlikely to get my machine to work reliably under heavy load with the
> default ones, but who knows...

Sebastian's report led me to investigate further how all those things
are organized in the downstream code and in hardware, and what could
be a pragmatic way forward with upstream enablement. It turned out to
be quite a rabbit hole frankly, with multiple layers of abstraction
and intertwined code in different places.

Here's a quick summary for future reference:
- CPU clocks on RK3588 are ultimately managed by the ATF firmware,
which provides an SCMI service to expose them to the kernel
- ATF itself doesn't directly set any clock frequencies. Instead, it
accepts a target frequency via SCMI and converts it into an oscillator
ring length setting for the PVPLL hardware block (via a fixed table
lookup). At least that's how it's done in the recently released TF-A
bl31 code [1] - perhaps the binary bl31 does something similar
- U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
- PVPLL produces a reference clock to feed to the CPUs, which depends
on the configured oscillator ring length but also on the supply
voltage, silicon quality and perhaps temperature too. ATF doesn't know
anything about voltages or temperatures, so it doesn't guarantee that
the requested frequency is matched by the hardware
- PVPLL frequency generation is bypassed for lower-frequency OPPs, in
which case the target frequency is directly fed by the ATF to the CRU.
This happens for both big-core and little-core frequencies below 816
MHz
- Given that requesting a particular frequency via SCMI doesn't
guarantee that it will be what the CPUs end up running at, the vendor
kernel also does a runtime voltage calibration for the supply
regulators, by adjusting the supply voltage in minimum regulator steps
until the frequency reported by PVPLL gets close to the requested one
[2]. It then overwrites OPP provided voltage values with the
calibrated ones
- There's also some trickery with preselecting OPP voltage sets using
the "-Lx" suffix based on silicon quality, as measured by a "leakage"
value stored in an NVMEM cell and/or the PVTM frequency generated at a
reference "midpoint" OPP [3]. Better performing silicon gets to run at
lower default supply voltages, thus saving power
- Once the OPPs are selected and calibrated, the only remaining
trickery is the two supply regulators per each CPU cluster (one for
the CPUs and the other for the memory interface)
- Another catch, as Sebastian points out, is that memory read margins
must be adjusted whenever the memory interface supply voltage crosses
certain thresholds [4]. This has little to do with CPUs or
frequencies, and is only tangentially related to them due to the
dependency chain between the target CPU frequency -> required CPU
supply voltage -> matching memory interface supply voltage -> required
read margins
- At reset the ATF switches all clocks to the lowest 408 MHz [6], so
setting it to anything in kernel code (as the downstream driver does)
seems redundant

All in all, it does indeed sound like Collabora's CI machine boot-time
resets are most likely caused by the missing memory read margin
settings in my patch series. Voltage values in the OPPs I used are the
most conservative defaults of what the downstream DT has, and PVPLL
should be able to generate reasonable clock speeds with those (albeit
likely suboptimal, due to them not being tuned to the particular
silicon specimen). And there is little else to differ frankly.

As for the way forward, it would be great to know the opinions from
the list. My thinking is as follows:
- I can introduce memory read margin updates as the first priority,
leaving voltage calibration and/or OPP preselection for later (as
those should not affect system stability at current default values,
perhaps only power efficiency to a certain extent)
- CPUfreq doesn't sound like the right place for those, given that
they have little to do with either CPU or freq :)
- I suggest a custom regulator config helper to plug into the OPP
layer, as is done for TI OMAP5 [6]. At first, it might be only used
for looking up and setting the correct memory read margin value
whenever the cluster supply voltage changes, and later the same code
can be extended to do voltage calibration. In fact, OMAP code is there
for a very similar purpose, but in their case optimized voltages are
pre-programmed in efuses and don't require runtime recalibration
- Given that all OPPs in the downstream kernel list identical
voltages for the memory supply as for the CPU supply, I don't think it
makes much sense to customize the cpufreq driver per se.
Single-regulator approach with the generic cpufreq-dt and regulator
coupling sounds much less invasive and thus lower-maintenance

Best regards,
Alexey

[1] https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L303
[2] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L804
[3] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L1575
[4] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/cpufreq/rockchip-cpufreq.c#L405
[5] https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L2419
[6] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/opp/ti-opp-supply.c#n275

2024-03-07 14:21:48

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Hello Alexey,

On 2024-03-07 13:38, Alexey Charkov wrote:
> On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <[email protected]>
> wrote:
>>
>> Hi Sebastian!
>>
>> On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
>> <[email protected]> wrote:
>> >
>> > Hi,
>> >
>> > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
>> > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
>> > > active cooling on Radxa Rock 5B via the provided PWM fan.
>> > >
>> > > Some RK3588 boards use separate regulators to supply CPUs and their
>> > > respective memory interfaces, so this is handled by coupling those
>> > > regulators in affected boards' device trees to ensure that their
>> > > voltage is adjusted in step.
>> > >
>> > > In this revision of the series I chose to enable TSADC for all boards
>> > > at .dtsi level, because:
>> > > - The defaults already in .dtsi should work for all users, given that
>> > > the CRU based resets don't need any out-of-chip components, and
>> > > the CRU vs. PMIC reset is pretty much the only thing a board might
>> > > have to configure / override there
>> > > - The boards that have TSADC_SHUT signal wired to the PMIC reset line
>> > > can still choose to override the reset logic in their .dts. Or stay
>> > > with CRU based resets, as downstream kernels do anyway
>> > > - The on-by-default approach helps ensure thermal protections are in
>> > > place (emergency reset and throttling) for any board even with a
>> > > rudimentary .dts, and thus lets us introduce CPU DVFS with better
>> > > peace of mind
>> > >
>> > > Fan control on Rock 5B has been split into two intervals: let it spin
>> > > at the minimum cooling state between 55C and 65C, and then accelerate
>> > > if the system crosses the 65C mark - thanks to Dragan for suggesting.
>> > > This lets some cooling setups with beefier heatsinks and/or larger
>> > > fan fins to stay in the quietest non-zero fan state while still
>> > > gaining potential benefits from the airflow it generates, and
>> > > possibly avoiding noisy speeds altogether for some workloads.
>> > >
>> > > OPPs help actually scale CPU frequencies up and down for both cooling
>> > > and performance - tested on Rock 5B under varied loads. I've split
>> > > the patch into two parts: the first containing those OPPs that seem
>> > > to be no-regret with general consensus during v1 review [2], while
>> > > the second contains OPPs that cause frequency reductions without
>> > > accompanying decrease in CPU voltage. There seems to be a slight
>> > > performance gain in some workload scenarios when using these, but
>> > > previous discussion was inconclusive as to whether they should be
>> > > included or not. Having them as separate patches enables easier
>> > > comparison and partial reversion if people want to test it under
>> > > their workloads, and also enables the first 'no-regret' part to be
>> > > merged to -next while the jury is still out on the second one.
>> > >
>> > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
>> > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
>> > >
>> > > Signed-off-by: Alexey Charkov <[email protected]>
>> > > ---
>> > > Changes in v3:
>> > > - Added regulator coupling for EVB1 and QuartzPro64
>> > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
>> > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
>> > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
>> > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
>> > > churn there since the version he acknowledged
>> > > - Link to v2: https://lore.kernel.org/r/[email protected]
>> > >
>> > > Changes in v2:
>> > > - Dropped the rfkill patch which Heiko has already applied
>> > > - Set higher 'polling-delay-passive' (100 instead of 20)
>> > > - Name all cooling maps starting from map0 in each respective zone
>> > > - Drop 'contribution' properties from passive cooling maps
>> > > - Link to v1: https://lore.kernel.org/r/[email protected]
>> > >
>> > > ---
>> > > Alexey Charkov (5):
>> > > arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
>> > > arm64: dts: rockchip: enable automatic active cooling on Rock 5B
>> > > arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
>> > > arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
>> > > arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
>> > >
>> > > arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 +
>> > > .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 +
>> > > arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +-
>> > > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385 ++++++++++++++++++++-
>> > > 4 files changed, 437 insertions(+), 2 deletions(-)
>> >
>> > I'm too busy to have a detailed review of this series right now, but
>> > I pushed it to our CI and it results in a board reset at boot time:
>> >
>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
>> >
>> > I also pushed just the first three patches (i.e. without OPP /
>> > cpufreq) and that boots fine:
>> >
>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
>>
>> Thank you for testing these! I've noticed in the boot log that the CI
>> machine uses some u-boot 2023.07 - is that a downstream one? Any
>> chance to compare it to 2023.11 or 2024.01 from your (Collabora)
>> integration tree?
>>
>> I use 2023.11 from your integration tree, with a binary bl31, and I'm
>> not getting those resets even under prolonged heavy load (I rebuild
>> Chromium with 8 concurrent compilation jobs as the stress test -
>> that's 14 hours of heavy CPU, memory and IO use). Would be interesting
>> to understand if it's just a 'lucky' SoC specimen on my side, or if
>> there is some dark magic happening differently on my machine vs. your
>> CI machine.
>>
>> Thinking that maybe if your CI machine uses a downstream u-boot it
>> might be leaving some extra hardware running (PVTM?) which might do
>> weird stuff when TSADC/clocks/voltages get readjusted by the generic
>> cpufreq driver?..
>>
>> > Note, that OPP / cpufreq works on the same boards in the CI when
>> > using the ugly-and-not-for-upstream cpufreq driver:
>> >
>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
>> >
>> > My best guess right now is, that this is related to the generic
>> > driver obviously not updating the GRF read margin registers.
>>
>> If it was about memory read margins I believe I would have been
>> unlikely to get my machine to work reliably under heavy load with the
>> default ones, but who knows...
>
> Sebastian's report led me to investigate further how all those things
> are organized in the downstream code and in hardware, and what could
> be a pragmatic way forward with upstream enablement. It turned out to
> be quite a rabbit hole frankly, with multiple layers of abstraction
> and intertwined code in different places.
>
> Here's a quick summary for future reference:
> - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
> which provides an SCMI service to expose them to the kernel
> - ATF itself doesn't directly set any clock frequencies. Instead, it
> accepts a target frequency via SCMI and converts it into an oscillator
> ring length setting for the PVPLL hardware block (via a fixed table
> lookup). At least that's how it's done in the recently released TF-A
> bl31 code [1] - perhaps the binary bl31 does something similar
> - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
> - PVPLL produces a reference clock to feed to the CPUs, which depends
> on the configured oscillator ring length but also on the supply
> voltage, silicon quality and perhaps temperature too. ATF doesn't know
> anything about voltages or temperatures, so it doesn't guarantee that
> the requested frequency is matched by the hardware
> - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
> which case the target frequency is directly fed by the ATF to the CRU.
> This happens for both big-core and little-core frequencies below 816
> MHz
> - Given that requesting a particular frequency via SCMI doesn't
> guarantee that it will be what the CPUs end up running at, the vendor
> kernel also does a runtime voltage calibration for the supply
> regulators, by adjusting the supply voltage in minimum regulator steps
> until the frequency reported by PVPLL gets close to the requested one
> [2]. It then overwrites OPP provided voltage values with the
> calibrated ones
> - There's also some trickery with preselecting OPP voltage sets using
> the "-Lx" suffix based on silicon quality, as measured by a "leakage"
> value stored in an NVMEM cell and/or the PVTM frequency generated at a
> reference "midpoint" OPP [3]. Better performing silicon gets to run at
> lower default supply voltages, thus saving power
> - Once the OPPs are selected and calibrated, the only remaining
> trickery is the two supply regulators per each CPU cluster (one for
> the CPUs and the other for the memory interface)
> - Another catch, as Sebastian points out, is that memory read margins
> must be adjusted whenever the memory interface supply voltage crosses
> certain thresholds [4]. This has little to do with CPUs or
> frequencies, and is only tangentially related to them due to the
> dependency chain between the target CPU frequency -> required CPU
> supply voltage -> matching memory interface supply voltage -> required
> read margins
> - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
> setting it to anything in kernel code (as the downstream driver does)
> seems redundant
>
> All in all, it does indeed sound like Collabora's CI machine boot-time
> resets are most likely caused by the missing memory read margin
> settings in my patch series. Voltage values in the OPPs I used are the
> most conservative defaults of what the downstream DT has, and PVPLL
> should be able to generate reasonable clock speeds with those (albeit
> likely suboptimal, due to them not being tuned to the particular
> silicon specimen). And there is little else to differ frankly.
>
> As for the way forward, it would be great to know the opinions from
> the list. My thinking is as follows:
> - I can introduce memory read margin updates as the first priority,
> leaving voltage calibration and/or OPP preselection for later (as
> those should not affect system stability at current default values,
> perhaps only power efficiency to a certain extent)
> - CPUfreq doesn't sound like the right place for those, given that
> they have little to do with either CPU or freq :)
> - I suggest a custom regulator config helper to plug into the OPP
> layer, as is done for TI OMAP5 [6]. At first, it might be only used
> for looking up and setting the correct memory read margin value
> whenever the cluster supply voltage changes, and later the same code
> can be extended to do voltage calibration. In fact, OMAP code is there
> for a very similar purpose, but in their case optimized voltages are
> pre-programmed in efuses and don't require runtime recalibration
> - Given that all OPPs in the downstream kernel list identical
> voltages for the memory supply as for the CPU supply, I don't think it
> makes much sense to customize the cpufreq driver per se.
> Single-regulator approach with the generic cpufreq-dt and regulator
> coupling sounds much less invasive and thus lower-maintenance

Thank you very much for a detailed and highly useful summary!

I'll retrace your steps into and, hopefully, out of the rabbit hole. :)
After that, I'll come back with an update.

> [1]
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L303
> [2]
> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L804
> [3]
> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L1575
> [4]
> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/cpufreq/rockchip-cpufreq.c#L405
> [5]
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L2419
> [6]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/opp/ti-opp-supply.c#n275

2024-03-07 22:17:03

by Sebastian Reichel

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Hi,

On Thu, Mar 07, 2024 at 04:38:24PM +0400, Alexey Charkov wrote:
> On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <[email protected]> wrote:
> > On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
> > <[email protected]> wrote:
> > > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
> > > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> > > > active cooling on Radxa Rock 5B via the provided PWM fan.
> > > >
> > > > Some RK3588 boards use separate regulators to supply CPUs and their
> > > > respective memory interfaces, so this is handled by coupling those
> > > > regulators in affected boards' device trees to ensure that their
> > > > voltage is adjusted in step.
> > > >
> > > > In this revision of the series I chose to enable TSADC for all boards
> > > > at .dtsi level, because:
> > > > - The defaults already in .dtsi should work for all users, given that
> > > > the CRU based resets don't need any out-of-chip components, and
> > > > the CRU vs. PMIC reset is pretty much the only thing a board might
> > > > have to configure / override there
> > > > - The boards that have TSADC_SHUT signal wired to the PMIC reset line
> > > > can still choose to override the reset logic in their .dts. Or stay
> > > > with CRU based resets, as downstream kernels do anyway
> > > > - The on-by-default approach helps ensure thermal protections are in
> > > > place (emergency reset and throttling) for any board even with a
> > > > rudimentary .dts, and thus lets us introduce CPU DVFS with better
> > > > peace of mind
> > > >
> > > > Fan control on Rock 5B has been split into two intervals: let it spin
> > > > at the minimum cooling state between 55C and 65C, and then accelerate
> > > > if the system crosses the 65C mark - thanks to Dragan for suggesting.
> > > > This lets some cooling setups with beefier heatsinks and/or larger
> > > > fan fins to stay in the quietest non-zero fan state while still
> > > > gaining potential benefits from the airflow it generates, and
> > > > possibly avoiding noisy speeds altogether for some workloads.
> > > >
> > > > OPPs help actually scale CPU frequencies up and down for both cooling
> > > > and performance - tested on Rock 5B under varied loads. I've split
> > > > the patch into two parts: the first containing those OPPs that seem
> > > > to be no-regret with general consensus during v1 review [2], while
> > > > the second contains OPPs that cause frequency reductions without
> > > > accompanying decrease in CPU voltage. There seems to be a slight
> > > > performance gain in some workload scenarios when using these, but
> > > > previous discussion was inconclusive as to whether they should be
> > > > included or not. Having them as separate patches enables easier
> > > > comparison and partial reversion if people want to test it under
> > > > their workloads, and also enables the first 'no-regret' part to be
> > > > merged to -next while the jury is still out on the second one.
> > > >
> > > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
> > > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
> > > >
> > > > Signed-off-by: Alexey Charkov <[email protected]>
> > > > ---
> > > > Changes in v3:
> > > > - Added regulator coupling for EVB1 and QuartzPro64
> > > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
> > > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
> > > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
> > > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
> > > > churn there since the version he acknowledged
> > > > - Link to v2: https://lore.kernel.org/r/[email protected]
> > > >
> > > > Changes in v2:
> > > > - Dropped the rfkill patch which Heiko has already applied
> > > > - Set higher 'polling-delay-passive' (100 instead of 20)
> > > > - Name all cooling maps starting from map0 in each respective zone
> > > > - Drop 'contribution' properties from passive cooling maps
> > > > - Link to v1: https://lore.kernel.org/r/[email protected]
> > > >
> > > > ---
> > > > Alexey Charkov (5):
> > > > arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
> > > > arm64: dts: rockchip: enable automatic active cooling on Rock 5B
> > > > arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
> > > > arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
> > > > arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
> > > >
> > > > arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 +
> > > > .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 +
> > > > arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +-
> > > > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385 ++++++++++++++++++++-
> > > > 4 files changed, 437 insertions(+), 2 deletions(-)
> > >
> > > I'm too busy to have a detailed review of this series right now, but
> > > I pushed it to our CI and it results in a board reset at boot time:
> > >
> > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
> > >
> > > I also pushed just the first three patches (i.e. without OPP /
> > > cpufreq) and that boots fine:
> > >
> > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
> >
> > Thank you for testing these! I've noticed in the boot log that the CI
> > machine uses some u-boot 2023.07 - is that a downstream one? Any
> > chance to compare it to 2023.11 or 2024.01 from your (Collabora)
> > integration tree?
> >
> > I use 2023.11 from your integration tree, with a binary bl31, and I'm
> > not getting those resets even under prolonged heavy load (I rebuild
> > Chromium with 8 concurrent compilation jobs as the stress test -
> > that's 14 hours of heavy CPU, memory and IO use). Would be interesting
> > to understand if it's just a 'lucky' SoC specimen on my side, or if
> > there is some dark magic happening differently on my machine vs. your
> > CI machine.
> >
> > Thinking that maybe if your CI machine uses a downstream u-boot it
> > might be leaving some extra hardware running (PVTM?) which might do
> > weird stuff when TSADC/clocks/voltages get readjusted by the generic
> > cpufreq driver?..
> >
> > > Note, that OPP / cpufreq works on the same boards in the CI when
> > > using the ugly-and-not-for-upstream cpufreq driver:
> > >
> > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
> > >
> > > My best guess right now is, that this is related to the generic
> > > driver obviously not updating the GRF read margin registers.
> >
> > If it was about memory read margins I believe I would have been
> > unlikely to get my machine to work reliably under heavy load with the
> > default ones, but who knows...
>
> Sebastian's report led me to investigate further how all those things
> are organized in the downstream code and in hardware, and what could
> be a pragmatic way forward with upstream enablement. It turned out to
> be quite a rabbit hole frankly, with multiple layers of abstraction
> and intertwined code in different places.
>
> Here's a quick summary for future reference:
> - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
> which provides an SCMI service to expose them to the kernel
> - ATF itself doesn't directly set any clock frequencies. Instead, it
> accepts a target frequency via SCMI and converts it into an oscillator
> ring length setting for the PVPLL hardware block (via a fixed table
> lookup). At least that's how it's done in the recently released TF-A
> bl31 code [1] - perhaps the binary bl31 does something similar
> - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
> - PVPLL produces a reference clock to feed to the CPUs, which depends
> on the configured oscillator ring length but also on the supply
> voltage, silicon quality and perhaps temperature too. ATF doesn't know
> anything about voltages or temperatures, so it doesn't guarantee that
> the requested frequency is matched by the hardware
> - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
> which case the target frequency is directly fed by the ATF to the CRU.
> This happens for both big-core and little-core frequencies below 816
> MHz
> - Given that requesting a particular frequency via SCMI doesn't
> guarantee that it will be what the CPUs end up running at, the vendor
> kernel also does a runtime voltage calibration for the supply
> regulators, by adjusting the supply voltage in minimum regulator steps
> until the frequency reported by PVPLL gets close to the requested one
> [2]. It then overwrites OPP provided voltage values with the
> calibrated ones
> - There's also some trickery with preselecting OPP voltage sets using
> the "-Lx" suffix based on silicon quality, as measured by a "leakage"
> value stored in an NVMEM cell and/or the PVTM frequency generated at a
> reference "midpoint" OPP [3]. Better performing silicon gets to run at
> lower default supply voltages, thus saving power
> - Once the OPPs are selected and calibrated, the only remaining
> trickery is the two supply regulators per each CPU cluster (one for
> the CPUs and the other for the memory interface)
> - Another catch, as Sebastian points out, is that memory read margins
> must be adjusted whenever the memory interface supply voltage crosses
> certain thresholds [4]. This has little to do with CPUs or
> frequencies, and is only tangentially related to them due to the
> dependency chain between the target CPU frequency -> required CPU
> supply voltage -> matching memory interface supply voltage -> required
> read margins
> - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
> setting it to anything in kernel code (as the downstream driver does)
> seems redundant
>
> All in all, it does indeed sound like Collabora's CI machine boot-time
> resets are most likely caused by the missing memory read margin
> settings in my patch series. Voltage values in the OPPs I used are the
> most conservative defaults of what the downstream DT has, and PVPLL
> should be able to generate reasonable clock speeds with those (albeit
> likely suboptimal, due to them not being tuned to the particular
> silicon specimen). And there is little else to differ frankly.
>
> As for the way forward, it would be great to know the opinions from
> the list. My thinking is as follows:
> - I can introduce memory read margin updates as the first priority,
> leaving voltage calibration and/or OPP preselection for later (as
> those should not affect system stability at current default values,
> perhaps only power efficiency to a certain extent)
> - CPUfreq doesn't sound like the right place for those, given that
> they have little to do with either CPU or freq :)
> - I suggest a custom regulator config helper to plug into the OPP
> layer, as is done for TI OMAP5 [6]. At first, it might be only used
> for looking up and setting the correct memory read margin value
> whenever the cluster supply voltage changes, and later the same code
> can be extended to do voltage calibration. In fact, OMAP code is there
> for a very similar purpose, but in their case optimized voltages are
> pre-programmed in efuses and don't require runtime recalibration
> - Given that all OPPs in the downstream kernel list identical
> voltages for the memory supply as for the CPU supply, I don't think it
> makes much sense to customize the cpufreq driver per se.
> Single-regulator approach with the generic cpufreq-dt and regulator
> coupling sounds much less invasive and thus lower-maintenance

Sorry for my late response.

When doing some more tests I noticed, that the CI never build the
custom driver and thus never did any CPU frequency scaling at all.
I only used it for my own tests (on RK3588 EVB1). When enabling the
custom driver, the CI has the same issues as your series. So my
message was completely wrong, sorry about that.

Regarding U-Boot: The CI uses "U-Boot SPL 2023.07-rc4-g46349e27";
the last part is the git hash. This is the exact U-Boot source tree
being used:

https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot/-/commits/46349e27/

This was one of the first U-Boot trees with Rock 5B Ethernet support
and is currently flashed to the SPI flash memory of the CI boards.
The vendor U-Boot tree is a lot older. Also it is still using the
Rockchip binary BL31. We have plans to also CI boot test U-Boot,
but currently nobody has time to work on this. I don't think there should
be any relevant changes between upstream 2023.07 and 2023.11 that could
explain this. But it's the best lead now, so I will try to find some time
for doing further tests related to this in the next days.

Regarding the voltage calibration - One option would be to do this
calibration at boot time (i.e. in U-Boot) and update the voltages
in DT accordingly.

Greetings,

-- Sebastian

> Best regards,
> Alexey
>
> [1] https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L303
> [2] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L804
> [3] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L1575
> [4] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/cpufreq/rockchip-cpufreq.c#L405
> [5] https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L2419
> [6] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/opp/ti-opp-supply.c#n275


Attachments:
(No filename) (14.11 kB)
signature.asc (849.00 B)
Download all attachments

2024-03-11 07:08:38

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Hello Alexey,

On 2024-03-07 15:21, Dragan Simic wrote:
> On 2024-03-07 13:38, Alexey Charkov wrote:
>> On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <[email protected]>
>> wrote:
>>> On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
>>> <[email protected]> wrote:
>>> > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
>>> > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
>>> > > active cooling on Radxa Rock 5B via the provided PWM fan.
>>> > >
>>> > > Some RK3588 boards use separate regulators to supply CPUs and their
>>> > > respective memory interfaces, so this is handled by coupling those
>>> > > regulators in affected boards' device trees to ensure that their
>>> > > voltage is adjusted in step.
>>> > >
>>> > > In this revision of the series I chose to enable TSADC for all boards
>>> > > at .dtsi level, because:
>>> > > - The defaults already in .dtsi should work for all users, given that
>>> > > the CRU based resets don't need any out-of-chip components, and
>>> > > the CRU vs. PMIC reset is pretty much the only thing a board might
>>> > > have to configure / override there
>>> > > - The boards that have TSADC_SHUT signal wired to the PMIC reset line
>>> > > can still choose to override the reset logic in their .dts. Or stay
>>> > > with CRU based resets, as downstream kernels do anyway
>>> > > - The on-by-default approach helps ensure thermal protections are in
>>> > > place (emergency reset and throttling) for any board even with a
>>> > > rudimentary .dts, and thus lets us introduce CPU DVFS with better
>>> > > peace of mind
>>> > >
>>> > > Fan control on Rock 5B has been split into two intervals: let it spin
>>> > > at the minimum cooling state between 55C and 65C, and then accelerate
>>> > > if the system crosses the 65C mark - thanks to Dragan for suggesting.
>>> > > This lets some cooling setups with beefier heatsinks and/or larger
>>> > > fan fins to stay in the quietest non-zero fan state while still
>>> > > gaining potential benefits from the airflow it generates, and
>>> > > possibly avoiding noisy speeds altogether for some workloads.
>>> > >
>>> > > OPPs help actually scale CPU frequencies up and down for both cooling
>>> > > and performance - tested on Rock 5B under varied loads. I've split
>>> > > the patch into two parts: the first containing those OPPs that seem
>>> > > to be no-regret with general consensus during v1 review [2], while
>>> > > the second contains OPPs that cause frequency reductions without
>>> > > accompanying decrease in CPU voltage. There seems to be a slight
>>> > > performance gain in some workload scenarios when using these, but
>>> > > previous discussion was inconclusive as to whether they should be
>>> > > included or not. Having them as separate patches enables easier
>>> > > comparison and partial reversion if people want to test it under
>>> > > their workloads, and also enables the first 'no-regret' part to be
>>> > > merged to -next while the jury is still out on the second one.
>>> > >
>>> > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
>>> > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
>>> > >
>>> > > Signed-off-by: Alexey Charkov <[email protected]>
>>> > > ---
>>> > > Changes in v3:
>>> > > - Added regulator coupling for EVB1 and QuartzPro64
>>> > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
>>> > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
>>> > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
>>> > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
>>> > > churn there since the version he acknowledged
>>> > > - Link to v2: https://lore.kernel.org/r/[email protected]
>>> > >
>>> > > Changes in v2:
>>> > > - Dropped the rfkill patch which Heiko has already applied
>>> > > - Set higher 'polling-delay-passive' (100 instead of 20)
>>> > > - Name all cooling maps starting from map0 in each respective zone
>>> > > - Drop 'contribution' properties from passive cooling maps
>>> > > - Link to v1: https://lore.kernel.org/r/[email protected]
>>> > >
>>> > > ---
>>> > > Alexey Charkov (5):
>>> > > arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
>>> > > arm64: dts: rockchip: enable automatic active cooling on Rock 5B
>>> > > arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
>>> > > arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
>>> > > arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
>>> > >
>>> > > arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 +
>>> > > .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 +
>>> > > arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +-
>>> > > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385 ++++++++++++++++++++-
>>> > > 4 files changed, 437 insertions(+), 2 deletions(-)
>>> >
>>> > I'm too busy to have a detailed review of this series right now, but
>>> > I pushed it to our CI and it results in a board reset at boot time:
>>> >
>>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
>>> >
>>> > I also pushed just the first three patches (i.e. without OPP /
>>> > cpufreq) and that boots fine:
>>> >
>>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
>>>
>>> Thank you for testing these! I've noticed in the boot log that the CI
>>> machine uses some u-boot 2023.07 - is that a downstream one? Any
>>> chance to compare it to 2023.11 or 2024.01 from your (Collabora)
>>> integration tree?
>>>
>>> I use 2023.11 from your integration tree, with a binary bl31, and I'm
>>> not getting those resets even under prolonged heavy load (I rebuild
>>> Chromium with 8 concurrent compilation jobs as the stress test -
>>> that's 14 hours of heavy CPU, memory and IO use). Would be
>>> interesting
>>> to understand if it's just a 'lucky' SoC specimen on my side, or if
>>> there is some dark magic happening differently on my machine vs. your
>>> CI machine.
>>>
>>> Thinking that maybe if your CI machine uses a downstream u-boot it
>>> might be leaving some extra hardware running (PVTM?) which might do
>>> weird stuff when TSADC/clocks/voltages get readjusted by the generic
>>> cpufreq driver?..
>>>
>>> > Note, that OPP / cpufreq works on the same boards in the CI when
>>> > using the ugly-and-not-for-upstream cpufreq driver:
>>> >
>>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
>>> >
>>> > My best guess right now is, that this is related to the generic
>>> > driver obviously not updating the GRF read margin registers.
>>>
>>> If it was about memory read margins I believe I would have been
>>> unlikely to get my machine to work reliably under heavy load with the
>>> default ones, but who knows...
>>
>> Sebastian's report led me to investigate further how all those things
>> are organized in the downstream code and in hardware, and what could
>> be a pragmatic way forward with upstream enablement. It turned out to
>> be quite a rabbit hole frankly, with multiple layers of abstraction
>> and intertwined code in different places.
>>
>> Here's a quick summary for future reference:
>> - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
>> which provides an SCMI service to expose them to the kernel
>> - ATF itself doesn't directly set any clock frequencies. Instead, it
>> accepts a target frequency via SCMI and converts it into an oscillator
>> ring length setting for the PVPLL hardware block (via a fixed table
>> lookup). At least that's how it's done in the recently released TF-A
>> bl31 code [1] - perhaps the binary bl31 does something similar
>> - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
>> - PVPLL produces a reference clock to feed to the CPUs, which depends
>> on the configured oscillator ring length but also on the supply
>> voltage, silicon quality and perhaps temperature too. ATF doesn't know
>> anything about voltages or temperatures, so it doesn't guarantee that
>> the requested frequency is matched by the hardware
>> - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
>> which case the target frequency is directly fed by the ATF to the CRU.
>> This happens for both big-core and little-core frequencies below 816
>> MHz
>> - Given that requesting a particular frequency via SCMI doesn't
>> guarantee that it will be what the CPUs end up running at, the vendor
>> kernel also does a runtime voltage calibration for the supply
>> regulators, by adjusting the supply voltage in minimum regulator steps
>> until the frequency reported by PVPLL gets close to the requested one
>> [2]. It then overwrites OPP provided voltage values with the
>> calibrated ones
>> - There's also some trickery with preselecting OPP voltage sets using
>> the "-Lx" suffix based on silicon quality, as measured by a "leakage"
>> value stored in an NVMEM cell and/or the PVTM frequency generated at a
>> reference "midpoint" OPP [3]. Better performing silicon gets to run at
>> lower default supply voltages, thus saving power
>> - Once the OPPs are selected and calibrated, the only remaining
>> trickery is the two supply regulators per each CPU cluster (one for
>> the CPUs and the other for the memory interface)
>> - Another catch, as Sebastian points out, is that memory read margins
>> must be adjusted whenever the memory interface supply voltage crosses
>> certain thresholds [4]. This has little to do with CPUs or
>> frequencies, and is only tangentially related to them due to the
>> dependency chain between the target CPU frequency -> required CPU
>> supply voltage -> matching memory interface supply voltage -> required
>> read margins
>> - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
>> setting it to anything in kernel code (as the downstream driver does)
>> seems redundant
>>
>> All in all, it does indeed sound like Collabora's CI machine boot-time
>> resets are most likely caused by the missing memory read margin
>> settings in my patch series. Voltage values in the OPPs I used are the
>> most conservative defaults of what the downstream DT has, and PVPLL
>> should be able to generate reasonable clock speeds with those (albeit
>> likely suboptimal, due to them not being tuned to the particular
>> silicon specimen). And there is little else to differ frankly.
>>
>> As for the way forward, it would be great to know the opinions from
>> the list. My thinking is as follows:
>> - I can introduce memory read margin updates as the first priority,
>> leaving voltage calibration and/or OPP preselection for later (as
>> those should not affect system stability at current default values,
>> perhaps only power efficiency to a certain extent)
>> - CPUfreq doesn't sound like the right place for those, given that
>> they have little to do with either CPU or freq :)
>> - I suggest a custom regulator config helper to plug into the OPP
>> layer, as is done for TI OMAP5 [6]. At first, it might be only used
>> for looking up and setting the correct memory read margin value
>> whenever the cluster supply voltage changes, and later the same code
>> can be extended to do voltage calibration. In fact, OMAP code is there
>> for a very similar purpose, but in their case optimized voltages are
>> pre-programmed in efuses and don't require runtime recalibration
>> - Given that all OPPs in the downstream kernel list identical
>> voltages for the memory supply as for the CPU supply, I don't think it
>> makes much sense to customize the cpufreq driver per se.
>> Single-regulator approach with the generic cpufreq-dt and regulator
>> coupling sounds much less invasive and thus lower-maintenance
>
> Thank you very much for a detailed and highly useful summary!
>
> I'll retrace your steps into and, hopefully, out of the rabbit hole. :)
> After that, I'll come back with an update.

Just a brief update... I went even a bit deeper into multiple rabbit
holes, :) and I'll come back with a detailed update a bit later,
together
with a proposal for the plan to move forward. The final outcome should
be awesome. :)

>> [1]
>> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L303
>> [2]
>> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L804
>> [3]
>> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L1575
>> [4]
>> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/cpufreq/rockchip-cpufreq.c#L405
>> [5]
>> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L2419
>> [6]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/opp/ti-opp-supply.c#n275
>
> _______________________________________________
> Linux-rockchip mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

2024-03-11 10:24:57

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588

Hello Kever,

Any chances, please, to have a look at my explanation below, and to
possibly provide some further insights? I'd really love to understand
that better.


On 2024-03-01 09:13, Dragan Simic wrote:
> On 2024-02-29 20:26, Alexey Charkov wrote:
>> RK3588 chips allow for their CPU cores to be powered by a different
>> supply vs. their corresponding memory interfaces, and two of the
>> boards currently upstream do that (EVB1 and QuartzPro64).
>
> The only reasonable explanation, based on the Cortex-A55 and Cortex-A76
> technical reference manuals (TRMs), and some other documents, including
> the RK3588 hardware design guide (HDG), is that the
> VDD_CPU_BIG0_MEM_S0,
> VDD_CPU_BIG1_MEM_S0 and VDD_CPU_LIT_MEM_S0 voltages are internally
> used as the supplies for the SRAM used for the A76's and A55's L1 and
> L2 caches, which are both per-core and private in the DynamIQ SoC
> layout
> that the RK3588 is based on.
>
> Sure, using "MEM" there is confusing, but actually, the Cortex-A55 and
> Cortex-A76 refer to the L1 and L2 caches as "memory" in multiple
> places.
> I'd say that's the reason for "MEM" (and "memory", in the RK3588 HDG)
> to
> be used in the board schematics (and in the RK3588 HDG).
>
> The RK3588 HDG specifically allows what the Rock 5B does there, i.e. to
> basically short the RK3588's individual *_MEM_S0 power inputs to the
> respective CPU core power supplies, which avoids the need to use
> separate
> voltage regulators for the RK3588's *_MEM_S0 power inputs.
>
> However, I'd really, _really_ love to know why did Rockchip opt to make
> the power supply voltages separate for the RK3588's L1 and L2 caches,
> which are, BTW, rated for up to 100 mA for each *_MEM_S0 input, meaning
> that they present no large loads? All that under the assumption that
> my analysis is correct, of course.
>
>> The voltage of the memory interface though has to match that of the
>> CPU cores that use it, which downstream kernels achieve by the means
>> of a custom cpufreq driver which adjusts both at the same time.
>>
>> It seems that regulator coupling is a more appropriate generic
>> interface for it, so this patch introduces coupling to affected
>> device trees to ensure that memory interface voltage is also updated
>> whenever cpufreq switches between CPU OPPs.
>
> I'll verify this a bit later and provide a separate response.
>
>> Note that other boards, such as Radxa Rock 5B, define both the CPU
>> and memory interface regulators as aliases to the same DT node, so
>> this doesn't apply there.
>
> Yup, they're actually shorted on the Rock 5B, as I described above.
>
>> Signed-off-by: Alexey Charkov <[email protected]>
>> ---
>> arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 ++++++++++++
>> arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 ++++++++++++
>> 2 files changed, 24 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
>> b/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
>> index de30c2632b8e..dfae67f1e9c7 100644
>> --- a/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
>> +++ b/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
>> @@ -788,6 +788,8 @@ regulators {
>> vdd_cpu_big1_s0: dcdc-reg1 {
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_big1_mem_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <550000>;
>> regulator-max-microvolt = <1050000>;
>> regulator-ramp-delay = <12500>;
>> @@ -800,6 +802,8 @@ regulator-state-mem {
>> vdd_cpu_big0_s0: dcdc-reg2 {
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_big0_mem_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <550000>;
>> regulator-max-microvolt = <1050000>;
>> regulator-ramp-delay = <12500>;
>> @@ -812,6 +816,8 @@ regulator-state-mem {
>> vdd_cpu_lit_s0: dcdc-reg3 {
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_lit_mem_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <550000>;
>> regulator-max-microvolt = <950000>;
>> regulator-ramp-delay = <12500>;
>> @@ -836,6 +842,8 @@ regulator-state-mem {
>> vdd_cpu_big1_mem_s0: dcdc-reg5 {
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_big1_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <675000>;
>> regulator-max-microvolt = <1050000>;
>> regulator-ramp-delay = <12500>;
>> @@ -849,6 +857,8 @@ regulator-state-mem {
>> vdd_cpu_big0_mem_s0: dcdc-reg6 {
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_big0_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <675000>;
>> regulator-max-microvolt = <1050000>;
>> regulator-ramp-delay = <12500>;
>> @@ -873,6 +883,8 @@ regulator-state-mem {
>> vdd_cpu_lit_mem_s0: dcdc-reg8 {
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_lit_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <675000>;
>> regulator-max-microvolt = <950000>;
>> regulator-ramp-delay = <12500>;
>> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
>> b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
>> index 87a0abf95f7d..9c038450cd7c 100644
>> --- a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
>> +++ b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
>> @@ -818,6 +818,8 @@ vdd_cpu_big1_s0: dcdc-reg1 {
>> regulator-name = "vdd_cpu_big1_s0";
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_big1_mem_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <550000>;
>> regulator-max-microvolt = <1050000>;
>> regulator-ramp-delay = <12500>;
>> @@ -831,6 +833,8 @@ vdd_cpu_big0_s0: dcdc-reg2 {
>> regulator-name = "vdd_cpu_big0_s0";
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_big0_mem_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <550000>;
>> regulator-max-microvolt = <1050000>;
>> regulator-ramp-delay = <12500>;
>> @@ -844,6 +848,8 @@ vdd_cpu_lit_s0: dcdc-reg3 {
>> regulator-name = "vdd_cpu_lit_s0";
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_lit_mem_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <550000>;
>> regulator-max-microvolt = <950000>;
>> regulator-ramp-delay = <12500>;
>> @@ -870,6 +876,8 @@ vdd_cpu_big1_mem_s0: dcdc-reg5 {
>> regulator-name = "vdd_cpu_big1_mem_s0";
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_big1_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <675000>;
>> regulator-max-microvolt = <1050000>;
>> regulator-ramp-delay = <12500>;
>> @@ -884,6 +892,8 @@ vdd_cpu_big0_mem_s0: dcdc-reg6 {
>> regulator-name = "vdd_cpu_big0_mem_s0";
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_big0_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <675000>;
>> regulator-max-microvolt = <1050000>;
>> regulator-ramp-delay = <12500>;
>> @@ -910,6 +920,8 @@ vdd_cpu_lit_mem_s0: dcdc-reg8 {
>> regulator-name = "vdd_cpu_lit_mem_s0";
>> regulator-always-on;
>> regulator-boot-on;
>> + regulator-coupled-with = <&vdd_cpu_lit_s0>;
>> + regulator-coupled-max-spread = <10000>;
>> regulator-min-microvolt = <675000>;
>> regulator-max-microvolt = <950000>;
>> regulator-ramp-delay = <12500>;

2024-03-13 17:03:16

by Sebastian Reichel

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Hi,

On Thu, Mar 07, 2024 at 11:16:20PM +0100, Sebastian Reichel wrote:
> On Thu, Mar 07, 2024 at 04:38:24PM +0400, Alexey Charkov wrote:
> > On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <[email protected]> wrote:
> > > On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
> > > <[email protected]> wrote:
> > > > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
> > > > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> > > > > active cooling on Radxa Rock 5B via the provided PWM fan.
> > > > >
> > > > > Some RK3588 boards use separate regulators to supply CPUs and their
> > > > > respective memory interfaces, so this is handled by coupling those
> > > > > regulators in affected boards' device trees to ensure that their
> > > > > voltage is adjusted in step.
> > > > >
> > > > > In this revision of the series I chose to enable TSADC for all boards
> > > > > at .dtsi level, because:
> > > > > - The defaults already in .dtsi should work for all users, given that
> > > > > the CRU based resets don't need any out-of-chip components, and
> > > > > the CRU vs. PMIC reset is pretty much the only thing a board might
> > > > > have to configure / override there
> > > > > - The boards that have TSADC_SHUT signal wired to the PMIC reset line
> > > > > can still choose to override the reset logic in their .dts. Or stay
> > > > > with CRU based resets, as downstream kernels do anyway
> > > > > - The on-by-default approach helps ensure thermal protections are in
> > > > > place (emergency reset and throttling) for any board even with a
> > > > > rudimentary .dts, and thus lets us introduce CPU DVFS with better
> > > > > peace of mind
> > > > >
> > > > > Fan control on Rock 5B has been split into two intervals: let it spin
> > > > > at the minimum cooling state between 55C and 65C, and then accelerate
> > > > > if the system crosses the 65C mark - thanks to Dragan for suggesting.
> > > > > This lets some cooling setups with beefier heatsinks and/or larger
> > > > > fan fins to stay in the quietest non-zero fan state while still
> > > > > gaining potential benefits from the airflow it generates, and
> > > > > possibly avoiding noisy speeds altogether for some workloads.
> > > > >
> > > > > OPPs help actually scale CPU frequencies up and down for both cooling
> > > > > and performance - tested on Rock 5B under varied loads. I've split
> > > > > the patch into two parts: the first containing those OPPs that seem
> > > > > to be no-regret with general consensus during v1 review [2], while
> > > > > the second contains OPPs that cause frequency reductions without
> > > > > accompanying decrease in CPU voltage. There seems to be a slight
> > > > > performance gain in some workload scenarios when using these, but
> > > > > previous discussion was inconclusive as to whether they should be
> > > > > included or not. Having them as separate patches enables easier
> > > > > comparison and partial reversion if people want to test it under
> > > > > their workloads, and also enables the first 'no-regret' part to be
> > > > > merged to -next while the jury is still out on the second one.
> > > > >
> > > > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
> > > > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
> > > > >
> > > > > Signed-off-by: Alexey Charkov <[email protected]>
> > > > > ---
> > > > > Changes in v3:
> > > > > - Added regulator coupling for EVB1 and QuartzPro64
> > > > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
> > > > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
> > > > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
> > > > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
> > > > > churn there since the version he acknowledged
> > > > > - Link to v2: https://lore.kernel.org/r/[email protected]
> > > > >
> > > > > Changes in v2:
> > > > > - Dropped the rfkill patch which Heiko has already applied
> > > > > - Set higher 'polling-delay-passive' (100 instead of 20)
> > > > > - Name all cooling maps starting from map0 in each respective zone
> > > > > - Drop 'contribution' properties from passive cooling maps
> > > > > - Link to v1: https://lore.kernel.org/r/[email protected]
> > > > >
> > > > > ---
> > > > > Alexey Charkov (5):
> > > > > arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
> > > > > arm64: dts: rockchip: enable automatic active cooling on Rock 5B
> > > > > arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
> > > > > arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
> > > > > arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
> > > > >
> > > > > arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 +
> > > > > .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 +
> > > > > arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +-
> > > > > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385 ++++++++++++++++++++-
> > > > > 4 files changed, 437 insertions(+), 2 deletions(-)
> > > >
> > > > I'm too busy to have a detailed review of this series right now, but
> > > > I pushed it to our CI and it results in a board reset at boot time:
> > > >
> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
> > > >
> > > > I also pushed just the first three patches (i.e. without OPP /
> > > > cpufreq) and that boots fine:
> > > >
> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
> > >
> > > Thank you for testing these! I've noticed in the boot log that the CI
> > > machine uses some u-boot 2023.07 - is that a downstream one? Any
> > > chance to compare it to 2023.11 or 2024.01 from your (Collabora)
> > > integration tree?
> > >
> > > I use 2023.11 from your integration tree, with a binary bl31, and I'm
> > > not getting those resets even under prolonged heavy load (I rebuild
> > > Chromium with 8 concurrent compilation jobs as the stress test -
> > > that's 14 hours of heavy CPU, memory and IO use). Would be interesting
> > > to understand if it's just a 'lucky' SoC specimen on my side, or if
> > > there is some dark magic happening differently on my machine vs. your
> > > CI machine.
> > >
> > > Thinking that maybe if your CI machine uses a downstream u-boot it
> > > might be leaving some extra hardware running (PVTM?) which might do
> > > weird stuff when TSADC/clocks/voltages get readjusted by the generic
> > > cpufreq driver?..
> > >
> > > > Note, that OPP / cpufreq works on the same boards in the CI when
> > > > using the ugly-and-not-for-upstream cpufreq driver:
> > > >
> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
> > > >
> > > > My best guess right now is, that this is related to the generic
> > > > driver obviously not updating the GRF read margin registers.
> > >
> > > If it was about memory read margins I believe I would have been
> > > unlikely to get my machine to work reliably under heavy load with the
> > > default ones, but who knows...
> >
> > Sebastian's report led me to investigate further how all those things
> > are organized in the downstream code and in hardware, and what could
> > be a pragmatic way forward with upstream enablement. It turned out to
> > be quite a rabbit hole frankly, with multiple layers of abstraction
> > and intertwined code in different places.
> >
> > Here's a quick summary for future reference:
> > - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
> > which provides an SCMI service to expose them to the kernel
> > - ATF itself doesn't directly set any clock frequencies. Instead, it
> > accepts a target frequency via SCMI and converts it into an oscillator
> > ring length setting for the PVPLL hardware block (via a fixed table
> > lookup). At least that's how it's done in the recently released TF-A
> > bl31 code [1] - perhaps the binary bl31 does something similar
> > - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
> > - PVPLL produces a reference clock to feed to the CPUs, which depends
> > on the configured oscillator ring length but also on the supply
> > voltage, silicon quality and perhaps temperature too. ATF doesn't know
> > anything about voltages or temperatures, so it doesn't guarantee that
> > the requested frequency is matched by the hardware
> > - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
> > which case the target frequency is directly fed by the ATF to the CRU.
> > This happens for both big-core and little-core frequencies below 816
> > MHz
> > - Given that requesting a particular frequency via SCMI doesn't
> > guarantee that it will be what the CPUs end up running at, the vendor
> > kernel also does a runtime voltage calibration for the supply
> > regulators, by adjusting the supply voltage in minimum regulator steps
> > until the frequency reported by PVPLL gets close to the requested one
> > [2]. It then overwrites OPP provided voltage values with the
> > calibrated ones
> > - There's also some trickery with preselecting OPP voltage sets using
> > the "-Lx" suffix based on silicon quality, as measured by a "leakage"
> > value stored in an NVMEM cell and/or the PVTM frequency generated at a
> > reference "midpoint" OPP [3]. Better performing silicon gets to run at
> > lower default supply voltages, thus saving power
> > - Once the OPPs are selected and calibrated, the only remaining
> > trickery is the two supply regulators per each CPU cluster (one for
> > the CPUs and the other for the memory interface)
> > - Another catch, as Sebastian points out, is that memory read margins
> > must be adjusted whenever the memory interface supply voltage crosses
> > certain thresholds [4]. This has little to do with CPUs or
> > frequencies, and is only tangentially related to them due to the
> > dependency chain between the target CPU frequency -> required CPU
> > supply voltage -> matching memory interface supply voltage -> required
> > read margins
> > - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
> > setting it to anything in kernel code (as the downstream driver does)
> > seems redundant
> >
> > All in all, it does indeed sound like Collabora's CI machine boot-time
> > resets are most likely caused by the missing memory read margin
> > settings in my patch series. Voltage values in the OPPs I used are the
> > most conservative defaults of what the downstream DT has, and PVPLL
> > should be able to generate reasonable clock speeds with those (albeit
> > likely suboptimal, due to them not being tuned to the particular
> > silicon specimen). And there is little else to differ frankly.
> >
> > As for the way forward, it would be great to know the opinions from
> > the list. My thinking is as follows:
> > - I can introduce memory read margin updates as the first priority,
> > leaving voltage calibration and/or OPP preselection for later (as
> > those should not affect system stability at current default values,
> > perhaps only power efficiency to a certain extent)
> > - CPUfreq doesn't sound like the right place for those, given that
> > they have little to do with either CPU or freq :)
> > - I suggest a custom regulator config helper to plug into the OPP
> > layer, as is done for TI OMAP5 [6]. At first, it might be only used
> > for looking up and setting the correct memory read margin value
> > whenever the cluster supply voltage changes, and later the same code
> > can be extended to do voltage calibration. In fact, OMAP code is there
> > for a very similar purpose, but in their case optimized voltages are
> > pre-programmed in efuses and don't require runtime recalibration
> > - Given that all OPPs in the downstream kernel list identical
> > voltages for the memory supply as for the CPU supply, I don't think it
> > makes much sense to customize the cpufreq driver per se.
> > Single-regulator approach with the generic cpufreq-dt and regulator
> > coupling sounds much less invasive and thus lower-maintenance
>
> Sorry for my late response.
>
> When doing some more tests I noticed, that the CI never build the
> custom driver and thus never did any CPU frequency scaling at all.
> I only used it for my own tests (on RK3588 EVB1). When enabling the
> custom driver, the CI has the same issues as your series. So my
> message was completely wrong, sorry about that.
>
> Regarding U-Boot: The CI uses "U-Boot SPL 2023.07-rc4-g46349e27";
> the last part is the git hash. This is the exact U-Boot source tree
> being used:
>
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot/-/commits/46349e27/
>
> This was one of the first U-Boot trees with Rock 5B Ethernet support
> and is currently flashed to the SPI flash memory of the CI boards.
> The vendor U-Boot tree is a lot older. Also it is still using the
> Rockchip binary BL31. We have plans to also CI boot test U-Boot,
> but currently nobody has time to work on this. I don't think there should
> be any relevant changes between upstream 2023.07 and 2023.11 that could
> explain this. But it's the best lead now, so I will try to find some time
> for doing further tests related to this in the next days.
>
> Regarding the voltage calibration - One option would be to do this
> calibration at boot time (i.e. in U-Boot) and update the voltages
> in DT accordingly.

After some more debugging I finally found the root cause. The CI
boards were powered from a USB hub using a USB-A to USB-C cable, so
that the team could access maskrom mode. Since I was not involved in
setting them up, I was not aware of that. It effectively limits the
power draw to 500 or 900 mA (depending on USB port implementation),
which is not enough to power the board with the higher frequencies.
The KernelCI Rock 5B boards are now switched to proper power
supplies and the issues are gone.

Sorry for the false alarm,

-- Sebstian


Attachments:
(No filename) (14.27 kB)
signature.asc (849.00 B)
Download all attachments

2024-03-13 17:40:29

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Hello Sebastian,

On 2024-03-13 17:39, Sebastian Reichel wrote:
> On Thu, Mar 07, 2024 at 11:16:20PM +0100, Sebastian Reichel wrote:
>> On Thu, Mar 07, 2024 at 04:38:24PM +0400, Alexey Charkov wrote:
>> > On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <[email protected]> wrote:
>> > > On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
>> > > <[email protected]> wrote:
>> > > > I'm too busy to have a detailed review of this series right now, but
>> > > > I pushed it to our CI and it results in a board reset at boot time:
>> > > >
>> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
>> > > >
>> > > > I also pushed just the first three patches (i.e. without OPP /
>> > > > cpufreq) and that boots fine:
>> > > >
>> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
>> > >
>> > > Thank you for testing these! I've noticed in the boot log that the CI
>> > > machine uses some u-boot 2023.07 - is that a downstream one? Any
>> > > chance to compare it to 2023.11 or 2024.01 from your (Collabora)
>> > > integration tree?
>> > >
>> > > I use 2023.11 from your integration tree, with a binary bl31, and I'm
>> > > not getting those resets even under prolonged heavy load (I rebuild
>> > > Chromium with 8 concurrent compilation jobs as the stress test -
>> > > that's 14 hours of heavy CPU, memory and IO use). Would be interesting
>> > > to understand if it's just a 'lucky' SoC specimen on my side, or if
>> > > there is some dark magic happening differently on my machine vs. your
>> > > CI machine.
>> > >
>> > > Thinking that maybe if your CI machine uses a downstream u-boot it
>> > > might be leaving some extra hardware running (PVTM?) which might do
>> > > weird stuff when TSADC/clocks/voltages get readjusted by the generic
>> > > cpufreq driver?..
>> > >
>> > > > Note, that OPP / cpufreq works on the same boards in the CI when
>> > > > using the ugly-and-not-for-upstream cpufreq driver:
>> > > >
>> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
>> > > >
>> > > > My best guess right now is, that this is related to the generic
>> > > > driver obviously not updating the GRF read margin registers.
>> > >
>> > > If it was about memory read margins I believe I would have been
>> > > unlikely to get my machine to work reliably under heavy load with the
>> > > default ones, but who knows...
>> >
>> > Sebastian's report led me to investigate further how all those things
>> > are organized in the downstream code and in hardware, and what could
>> > be a pragmatic way forward with upstream enablement. It turned out to
>> > be quite a rabbit hole frankly, with multiple layers of abstraction
>> > and intertwined code in different places.
>> >
>> > Here's a quick summary for future reference:
>> > - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
>> > which provides an SCMI service to expose them to the kernel
>> > - ATF itself doesn't directly set any clock frequencies. Instead, it
>> > accepts a target frequency via SCMI and converts it into an oscillator
>> > ring length setting for the PVPLL hardware block (via a fixed table
>> > lookup). At least that's how it's done in the recently released TF-A
>> > bl31 code [1] - perhaps the binary bl31 does something similar
>> > - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
>> > - PVPLL produces a reference clock to feed to the CPUs, which depends
>> > on the configured oscillator ring length but also on the supply
>> > voltage, silicon quality and perhaps temperature too. ATF doesn't know
>> > anything about voltages or temperatures, so it doesn't guarantee that
>> > the requested frequency is matched by the hardware
>> > - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
>> > which case the target frequency is directly fed by the ATF to the CRU.
>> > This happens for both big-core and little-core frequencies below 816
>> > MHz
>> > - Given that requesting a particular frequency via SCMI doesn't
>> > guarantee that it will be what the CPUs end up running at, the vendor
>> > kernel also does a runtime voltage calibration for the supply
>> > regulators, by adjusting the supply voltage in minimum regulator steps
>> > until the frequency reported by PVPLL gets close to the requested one
>> > [2]. It then overwrites OPP provided voltage values with the
>> > calibrated ones
>> > - There's also some trickery with preselecting OPP voltage sets using
>> > the "-Lx" suffix based on silicon quality, as measured by a "leakage"
>> > value stored in an NVMEM cell and/or the PVTM frequency generated at a
>> > reference "midpoint" OPP [3]. Better performing silicon gets to run at
>> > lower default supply voltages, thus saving power
>> > - Once the OPPs are selected and calibrated, the only remaining
>> > trickery is the two supply regulators per each CPU cluster (one for
>> > the CPUs and the other for the memory interface)
>> > - Another catch, as Sebastian points out, is that memory read margins
>> > must be adjusted whenever the memory interface supply voltage crosses
>> > certain thresholds [4]. This has little to do with CPUs or
>> > frequencies, and is only tangentially related to them due to the
>> > dependency chain between the target CPU frequency -> required CPU
>> > supply voltage -> matching memory interface supply voltage -> required
>> > read margins
>> > - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
>> > setting it to anything in kernel code (as the downstream driver does)
>> > seems redundant
>> >
>> > All in all, it does indeed sound like Collabora's CI machine boot-time
>> > resets are most likely caused by the missing memory read margin
>> > settings in my patch series. Voltage values in the OPPs I used are the
>> > most conservative defaults of what the downstream DT has, and PVPLL
>> > should be able to generate reasonable clock speeds with those (albeit
>> > likely suboptimal, due to them not being tuned to the particular
>> > silicon specimen). And there is little else to differ frankly.
>> >
>> > As for the way forward, it would be great to know the opinions from
>> > the list. My thinking is as follows:
>> > - I can introduce memory read margin updates as the first priority,
>> > leaving voltage calibration and/or OPP preselection for later (as
>> > those should not affect system stability at current default values,
>> > perhaps only power efficiency to a certain extent)
>> > - CPUfreq doesn't sound like the right place for those, given that
>> > they have little to do with either CPU or freq :)
>> > - I suggest a custom regulator config helper to plug into the OPP
>> > layer, as is done for TI OMAP5 [6]. At first, it might be only used
>> > for looking up and setting the correct memory read margin value
>> > whenever the cluster supply voltage changes, and later the same code
>> > can be extended to do voltage calibration. In fact, OMAP code is there
>> > for a very similar purpose, but in their case optimized voltages are
>> > pre-programmed in efuses and don't require runtime recalibration
>> > - Given that all OPPs in the downstream kernel list identical
>> > voltages for the memory supply as for the CPU supply, I don't think it
>> > makes much sense to customize the cpufreq driver per se.
>> > Single-regulator approach with the generic cpufreq-dt and regulator
>> > coupling sounds much less invasive and thus lower-maintenance
>>
>> Sorry for my late response.
>>
>> When doing some more tests I noticed, that the CI never build the
>> custom driver and thus never did any CPU frequency scaling at all.
>> I only used it for my own tests (on RK3588 EVB1). When enabling the
>> custom driver, the CI has the same issues as your series. So my
>> message was completely wrong, sorry about that.
>>
>> Regarding U-Boot: The CI uses "U-Boot SPL 2023.07-rc4-g46349e27";
>> the last part is the git hash. This is the exact U-Boot source tree
>> being used:
>>
>> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot/-/commits/46349e27/
>>
>> This was one of the first U-Boot trees with Rock 5B Ethernet support
>> and is currently flashed to the SPI flash memory of the CI boards.
>> The vendor U-Boot tree is a lot older. Also it is still using the
>> Rockchip binary BL31. We have plans to also CI boot test U-Boot,
>> but currently nobody has time to work on this. I don't think there
>> should
>> be any relevant changes between upstream 2023.07 and 2023.11 that
>> could
>> explain this. But it's the best lead now, so I will try to find some
>> time
>> for doing further tests related to this in the next days.
>>
>> Regarding the voltage calibration - One option would be to do this
>> calibration at boot time (i.e. in U-Boot) and update the voltages
>> in DT accordingly.
>
> After some more debugging I finally found the root cause. The CI
> boards were powered from a USB hub using a USB-A to USB-C cable, so
> that the team could access maskrom mode. Since I was not involved in
> setting them up, I was not aware of that. It effectively limits the
> power draw to 500 or 900 mA (depending on USB port implementation),
> which is not enough to power the board with the higher frequencies.
> The KernelCI Rock 5B boards are now switched to proper power
> supplies and the issues are gone.
>
> Sorry for the false alarm,

Great to know, thanks for the clarification.

2024-04-10 09:29:53

by Diederik de Haas

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

On Thursday, 29 February 2024 20:26:31 CEST Alexey Charkov wrote:
> This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> active cooling on Radxa Rock 5B via the provided PWM fan.
>
> Some RK3588 boards use separate regulators to supply CPUs and their
> respective memory interfaces, so this is handled by coupling those
> regulators in affected boards' device trees to ensure that their
> voltage is adjusted in step.
>
>
> Signed-off-by: Alexey Charkov <[email protected]>
> ---
> Alexey Charkov (5):
> arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
> arm64: dts: rockchip: enable automatic active cooling on Rock 5B
> arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
> arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
> arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
>
> arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 +
> .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 +
> arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +-
> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385
> ++++++++++++++++++++- 4 files changed, 437 insertions(+), 2 deletions(-)
> ---
> base-commit: cf1182944c7cc9f1c21a8a44e0d29abe12527412
> change-id: 20240124-rk-dts-additions-a6d7b52787b9

Can you rebase this patch set on Heiko's for-next branch [1]?
And then also fix the ordering of the nodes and the elements within
those nodes so that they match the current conventions?

Cheers,
Diederik

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/log/?h=for-next


Attachments:
signature.asc (235.00 B)
This is a digitally signed message part.

2024-04-10 09:33:56

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Hello Diederik,

On 2024-04-10 11:19, Diederik de Haas wrote:
> On Thursday, 29 February 2024 20:26:31 CEST Alexey Charkov wrote:
>> This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
>> active cooling on Radxa Rock 5B via the provided PWM fan.
>>
>> Some RK3588 boards use separate regulators to supply CPUs and their
>> respective memory interfaces, so this is handled by coupling those
>> regulators in affected boards' device trees to ensure that their
>> voltage is adjusted in step.
>>
>>
>> Signed-off-by: Alexey Charkov <[email protected]>
>> ---
>> Alexey Charkov (5):
>> arm64: dts: rockchip: enable built-in thermal monitoring on
>> RK3588
>> arm64: dts: rockchip: enable automatic active cooling on Rock 5B
>> arm64: dts: rockchip: Add CPU/memory regulator coupling for
>> RK3588
>> arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
>> arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
>>
>> arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 +
>> .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 +
>> arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +-
>> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385
>> ++++++++++++++++++++- 4 files changed, 437 insertions(+), 2
>> deletions(-)
>> ---
>> base-commit: cf1182944c7cc9f1c21a8a44e0d29abe12527412
>> change-id: 20240124-rk-dts-additions-a6d7b52787b9
>
> Can you rebase this patch set on Heiko's for-next branch [1]?
> And then also fix the ordering of the nodes and the elements within
> those nodes so that they match the current conventions?

Ah, thanks, this is a good reminder about the proposal for the plan
for moving forward, which I promised to send a while ago. :)

> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/log/?h=for-next

2024-04-20 17:54:23

by Diederik de Haas

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Hi Dragan and Alexey,

On Wednesday, 10 April 2024 11:28:09 CEST Dragan Simic wrote:
> > Can you rebase this patch set on Heiko's for-next branch [1]?
> > And then also fix the ordering of the nodes and the elements within
> > those nodes so that they match the current conventions?
>
> Ah, thanks, this is a good reminder about the proposal for the plan
> for moving forward, which I promised to send a while ago. :)
>
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/l
> > og/?h=for-next

I build a (Debian) kernel based off 6.9-rc4 + a whole bunch of patches,
including this patch series. I got someone on #debian-arm to try it out on a
Rock 5B and the dmesg output showed a number of items wrt thermal and OPP.

Some items that I filtered out from that dmesg output:

[ 3.211716] hwmon hwmon0: temp1_input not attached to any thermal zone
[ 3.908339] panthor fb000000.gpu: EM: OPP:900000 is inefficient
[ 10.473061] cpu cpu0: EM: OPP:600000 is inefficient
[ 10.473233] energy_model: Accessing cpu4 policy failed
[ 10.585236] rockchip-thermal fec00000.tsadc: Missing rockchip,grf property

Attached is the full list of items I collected from that dmesg output which
seem worth investigating.

Maybe useful to investigate when moving forward?

Cheers,
Diederik


Attachments:
rock-5b-dmesg-issues.txt (5.72 kB)
signature.asc (235.00 B)
This is a digitally signed message part.
Download all attachments

2024-04-21 16:08:42

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Hello Diederik,

On 2024-04-20 19:53, Diederik de Haas wrote:
> On Wednesday, 10 April 2024 11:28:09 CEST Dragan Simic wrote:
>> > Can you rebase this patch set on Heiko's for-next branch [1]?
>> > And then also fix the ordering of the nodes and the elements within
>> > those nodes so that they match the current conventions?
>>
>> Ah, thanks, this is a good reminder about the proposal for the plan
>> for moving forward, which I promised to send a while ago. :)
>>
>>> [1]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/log/?h=for-next
>
> I build a (Debian) kernel based off 6.9-rc4 + a whole bunch of patches,
> including this patch series. I got someone on #debian-arm to try it out
> on a
> Rock 5B and the dmesg output showed a number of items wrt thermal and
> OPP.
>
> Some items that I filtered out from that dmesg output:
>
> [ 3.211716] hwmon hwmon0: temp1_input not attached to any thermal
> zone
> [ 3.908339] panthor fb000000.gpu: EM: OPP:900000 is inefficient
> [ 10.473061] cpu cpu0: EM: OPP:600000 is inefficient
> [ 10.473233] energy_model: Accessing cpu4 policy failed
> [ 10.585236] rockchip-thermal fec00000.tsadc: Missing rockchip,grf
> property
>
> Attached is the full list of items I collected from that dmesg output
> which
> seem worth investigating.
>
> Maybe useful to investigate when moving forward?

This is a nice report, thanks!

I'm not sure what's going on with the mmc2 issues. Regarding the
hym8563,
hwmon, energy_model and rockchip-spi issues, I'll have a look into them
and come back with an update.

Regarding the multiple "OPP:<frequency> is inefficient" warnings, it's
already on my TO-DO list to perform a detailed (and repeatable) testing.
My suspicion is that declaring the OPPs as inefficient actually isn't
warranted, but we'll see what will be the test results.