2022-02-09 11:28:46

by Nicolas Cavallari

[permalink] [raw]
Subject: [PATCH] imx_thermal: Fix temperature retrieval after overheat

When the CPU temperature is above the passive trip point, reading the
temperature would fail forever with EAGAIN. Fortunately, the thermal
core would continue to assume that the system is overheating, so would
put all passive cooling devices to the max. Unfortunately, it does
this forever, even if the temperature returns to normal.

This can be easily tested by setting a very low trip point and crossing
it with while(1) loops.

The cause is commit d92ed2c9d3ff ("thermal: imx: Use driver's local data
to decide whether to run a measurement"), which replaced a check for
thermal_zone_device_is_enabled() by a check for irq_enabled, which
tests if the passive trip interrupt is enabled.

Normally, when the thermal zone is enabled, the temperature sensors
are always enabled and the interrupt is used to detect overheating.
When the interrupt fires, it must be disabled.
In that case, the commit causes the measurements to be done
manually (enable sensor, do measurement, disable sensor).
If the thermal core successfully cools down the system below the trip
point (which it typically does quickly), the irq is enabled again but
the sensor is not enabled.

To fix this without using thermal_zone_device_is_enabled(), use a
separate variable to record if the thermal zone is enabled.

Fixes: d92ed2c9d3ff ("thermal: imx: Use driver's local data to decide
whether to run a measurement")

Signed-off-by: Nicolas Cavallari <[email protected]>
---
drivers/thermal/imx_thermal.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/imx_thermal.c b/drivers/thermal/imx_thermal.c
index 2c7473d86a59..5a6ad5bae238 100644
--- a/drivers/thermal/imx_thermal.c
+++ b/drivers/thermal/imx_thermal.c
@@ -205,6 +205,7 @@ struct imx_thermal_data {
int alarm_temp;
int last_temp;
bool irq_enabled;
+ bool tz_enabled;
int irq;
struct clk *thermal_clk;
const struct thermal_soc_data *socdata;
@@ -252,11 +253,10 @@ static int imx_get_temp(struct thermal_zone_device *tz, int *temp)
const struct thermal_soc_data *soc_data = data->socdata;
struct regmap *map = data->tempmon;
unsigned int n_meas;
- bool wait, run_measurement;
+ bool wait;
u32 val;

- run_measurement = !data->irq_enabled;
- if (!run_measurement) {
+ if (data->tz_enabled) {
/* Check if a measurement is currently in progress */
regmap_read(map, soc_data->temp_data, &val);
wait = !(val & soc_data->temp_valid_mask);
@@ -283,7 +283,7 @@ static int imx_get_temp(struct thermal_zone_device *tz, int *temp)

regmap_read(map, soc_data->temp_data, &val);

- if (run_measurement) {
+ if (!data->tz_enabled) {
regmap_write(map, soc_data->sensor_ctrl + REG_CLR,
soc_data->measure_temp_mask);
regmap_write(map, soc_data->sensor_ctrl + REG_SET,
@@ -339,6 +339,7 @@ static int imx_change_mode(struct thermal_zone_device *tz,
const struct thermal_soc_data *soc_data = data->socdata;

if (mode == THERMAL_DEVICE_ENABLED) {
+ data->tz_enabled = true;
regmap_write(map, soc_data->sensor_ctrl + REG_CLR,
soc_data->power_down_mask);
regmap_write(map, soc_data->sensor_ctrl + REG_SET,
@@ -349,6 +350,7 @@ static int imx_change_mode(struct thermal_zone_device *tz,
enable_irq(data->irq);
}
} else {
+ data->tz_enabled = false;
regmap_write(map, soc_data->sensor_ctrl + REG_CLR,
soc_data->measure_temp_mask);
regmap_write(map, soc_data->sensor_ctrl + REG_SET,
--
2.34.1