2020-06-29 21:14:27

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: [PATCH v7 00/11] Stop monitoring disabled devices

A respin of v6 with these changes:

v6..v7:
- removed duplicate S-o-b line from patch 6/11

v5..v6:
- staticized thermal_zone_device_set_mode() (kbuild test robot)

v4..v5:

- EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL (Daniel)
- dropped unnecessary thermal_zone_device_enable() in int3400_thermal.c
and in thermal_of.c (Bartlomiej)

Andrzej Pietrasiewicz (11):
acpi: thermal: Fix error handling in the register function
thermal: Store thermal mode in a dedicated enum
thermal: Add current mode to thermal zone device
thermal: Store device mode in struct thermal_zone_device
thermal: remove get_mode() operation of drivers
thermal: Add mode helpers
thermal: Use mode helpers in drivers
thermal: Explicitly enable non-changing thermal zone devices
thermal: core: Stop polling DISABLED thermal devices
thermal: Simplify or eliminate unnecessary set_mode() methods
thermal: Rename set_mode() to change_mode()

drivers/acpi/thermal.c | 75 +++++----------
.../ethernet/chelsio/cxgb4/cxgb4_thermal.c | 8 ++
.../ethernet/mellanox/mlxsw/core_thermal.c | 91 ++++---------------
drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 9 +-
drivers/platform/x86/acerhdf.c | 33 +++----
drivers/platform/x86/intel_mid_thermal.c | 6 ++
drivers/power/supply/power_supply_core.c | 9 +-
drivers/thermal/armada_thermal.c | 6 ++
drivers/thermal/da9062-thermal.c | 16 +---
drivers/thermal/dove_thermal.c | 6 ++
drivers/thermal/hisi_thermal.c | 6 +-
drivers/thermal/imx_thermal.c | 57 ++++--------
.../intel/int340x_thermal/int3400_thermal.c | 38 ++------
.../int340x_thermal/int340x_thermal_zone.c | 5 +
drivers/thermal/intel/intel_pch_thermal.c | 5 +
.../thermal/intel/intel_quark_dts_thermal.c | 34 ++-----
drivers/thermal/intel/intel_soc_dts_iosf.c | 3 +
drivers/thermal/intel/x86_pkg_temp_thermal.c | 6 ++
drivers/thermal/kirkwood_thermal.c | 7 ++
drivers/thermal/rcar_thermal.c | 9 +-
drivers/thermal/rockchip_thermal.c | 6 +-
drivers/thermal/spear_thermal.c | 7 ++
drivers/thermal/sprd_thermal.c | 6 +-
drivers/thermal/st/st_thermal.c | 5 +
drivers/thermal/thermal_core.c | 76 ++++++++++++++--
drivers/thermal/thermal_of.c | 41 +--------
drivers/thermal/thermal_sysfs.c | 37 +-------
include/linux/thermal.h | 19 +++-
28 files changed, 275 insertions(+), 351 deletions(-)


base-commit: 9ebcfadb0610322ac537dd7aa5d9cbc2b2894c68
--
2.17.1


2020-06-29 21:14:44

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: [PATCH v7 02/11] thermal: Store thermal mode in a dedicated enum

Prepare for storing mode in struct thermal_zone_device.

Signed-off-by: Andrzej Pietrasiewicz <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
[for acerhdf]
Acked-by: Peter Kaestle <[email protected]>
Reviewed-by: Bartlomiej Zolnierkiewicz <[email protected]>
Reviewed-by: Amit Kucheria <[email protected]>
---
drivers/acpi/thermal.c | 27 +++++++++----------
drivers/platform/x86/acerhdf.c | 8 ++++--
.../intel/int340x_thermal/int3400_thermal.c | 18 +++++--------
3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index 6de8066ca1e7..fb46070c66d8 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -172,7 +172,7 @@ struct acpi_thermal {
struct acpi_thermal_trips trips;
struct acpi_handle_list devices;
struct thermal_zone_device *thermal_zone;
- int tz_enabled;
+ enum thermal_device_mode mode;
int kelvin_offset; /* in millidegrees */
struct work_struct thermal_check_work;
};
@@ -500,7 +500,7 @@ static void acpi_thermal_check(void *data)
{
struct acpi_thermal *tz = data;

- if (!tz->tz_enabled)
+ if (tz->mode != THERMAL_DEVICE_ENABLED)
return;

thermal_zone_device_update(tz->thermal_zone,
@@ -534,8 +534,7 @@ static int thermal_get_mode(struct thermal_zone_device *thermal,
if (!tz)
return -EINVAL;

- *mode = tz->tz_enabled ? THERMAL_DEVICE_ENABLED :
- THERMAL_DEVICE_DISABLED;
+ *mode = tz->mode;

return 0;
}
@@ -544,27 +543,25 @@ static int thermal_set_mode(struct thermal_zone_device *thermal,
enum thermal_device_mode mode)
{
struct acpi_thermal *tz = thermal->devdata;
- int enable;

if (!tz)
return -EINVAL;

+ if (mode != THERMAL_DEVICE_DISABLED &&
+ mode != THERMAL_DEVICE_ENABLED)
+ return -EINVAL;
/*
* enable/disable thermal management from ACPI thermal driver
*/
- if (mode == THERMAL_DEVICE_ENABLED)
- enable = 1;
- else if (mode == THERMAL_DEVICE_DISABLED) {
- enable = 0;
+ if (mode == THERMAL_DEVICE_DISABLED)
pr_warn("thermal zone will be disabled\n");
- } else
- return -EINVAL;

- if (enable != tz->tz_enabled) {
- tz->tz_enabled = enable;
+ if (mode != tz->mode) {
+ tz->mode = mode;
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"%s kernel ACPI thermal control\n",
- tz->tz_enabled ? "Enable" : "Disable"));
+ tz->mode == THERMAL_DEVICE_ENABLED ?
+ "Enable" : "Disable"));
acpi_thermal_check(tz);
}
return 0;
@@ -915,7 +912,7 @@ static int acpi_thermal_register_thermal_zone(struct acpi_thermal *tz)
goto remove_dev_link;
}

- tz->tz_enabled = 1;
+ tz->mode = THERMAL_DEVICE_ENABLED;

dev_info(&tz->device->dev, "registered as thermal_zone%d\n",
tz->thermal_zone->id);
diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
index 4df7609b4aa9..9d1030b1a4f4 100644
--- a/drivers/platform/x86/acerhdf.c
+++ b/drivers/platform/x86/acerhdf.c
@@ -68,6 +68,7 @@ static int kernelmode = 1;
#else
static int kernelmode;
#endif
+static enum thermal_device_mode thermal_mode;

static unsigned int interval = 10;
static unsigned int fanon = 60000;
@@ -397,6 +398,7 @@ static inline void acerhdf_revert_to_bios_mode(void)
{
acerhdf_change_fanstate(ACERHDF_FAN_AUTO);
kernelmode = 0;
+ thermal_mode = THERMAL_DEVICE_DISABLED;
if (thz_dev)
thz_dev->polling_delay = 0;
pr_notice("kernel mode fan control OFF\n");
@@ -404,6 +406,7 @@ static inline void acerhdf_revert_to_bios_mode(void)
static inline void acerhdf_enable_kernelmode(void)
{
kernelmode = 1;
+ thermal_mode = THERMAL_DEVICE_ENABLED;

thz_dev->polling_delay = interval*1000;
thermal_zone_device_update(thz_dev, THERMAL_EVENT_UNSPECIFIED);
@@ -416,8 +419,7 @@ static int acerhdf_get_mode(struct thermal_zone_device *thermal,
if (verbose)
pr_notice("kernel mode fan control %d\n", kernelmode);

- *mode = (kernelmode) ? THERMAL_DEVICE_ENABLED
- : THERMAL_DEVICE_DISABLED;
+ *mode = thermal_mode;

return 0;
}
@@ -739,6 +741,8 @@ static int __init acerhdf_register_thermal(void)
if (IS_ERR(cl_dev))
return -EINVAL;

+ thermal_mode = kernelmode ?
+ THERMAL_DEVICE_ENABLED : THERMAL_DEVICE_DISABLED;
thz_dev = thermal_zone_device_register("acerhdf", 2, 0, NULL,
&acerhdf_dev_ops,
&acerhdf_zone_params, 0,
diff --git a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
index 0b3a62655843..e84faaadff87 100644
--- a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
+++ b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
@@ -48,7 +48,7 @@ struct int3400_thermal_priv {
struct acpi_device *adev;
struct platform_device *pdev;
struct thermal_zone_device *thermal;
- int mode;
+ enum thermal_device_mode mode;
int art_count;
struct art *arts;
int trt_count;
@@ -395,24 +395,20 @@ static int int3400_thermal_set_mode(struct thermal_zone_device *thermal,
enum thermal_device_mode mode)
{
struct int3400_thermal_priv *priv = thermal->devdata;
- bool enable;
int result = 0;

if (!priv)
return -EINVAL;

- if (mode == THERMAL_DEVICE_ENABLED)
- enable = true;
- else if (mode == THERMAL_DEVICE_DISABLED)
- enable = false;
- else
+ if (mode != THERMAL_DEVICE_ENABLED &&
+ mode != THERMAL_DEVICE_DISABLED)
return -EINVAL;

- if (enable != priv->mode) {
- priv->mode = enable;
+ if (mode != priv->mode) {
+ priv->mode = mode;
result = int3400_thermal_run_osc(priv->adev->handle,
- priv->current_uuid_index,
- enable);
+ priv->current_uuid_index,
+ mode == THERMAL_DEVICE_ENABLED);
}

evaluate_odvp(priv);
--
2.17.1

2020-06-29 21:14:46

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: [PATCH v7 04/11] thermal: Store device mode in struct thermal_zone_device

Prepare for eliminating get_mode().

Signed-off-by: Andrzej Pietrasiewicz <[email protected]>
[for acerhdf]
Acked-by: Peter Kaestle <[email protected]>
Reviewed-by: Bartlomiej Zolnierkiewicz <[email protected]>
Reviewed-by: Amit Kucheria <[email protected]>
---
drivers/acpi/thermal.c | 18 ++++++----------
.../ethernet/mellanox/mlxsw/core_thermal.c | 21 +++++++------------
drivers/platform/x86/acerhdf.c | 15 ++++++-------
drivers/thermal/da9062-thermal.c | 6 ++----
drivers/thermal/imx_thermal.c | 17 +++++++--------
.../intel/int340x_thermal/int3400_thermal.c | 12 +++--------
.../thermal/intel/intel_quark_dts_thermal.c | 16 +++++++-------
drivers/thermal/thermal_of.c | 10 +++------
8 files changed, 44 insertions(+), 71 deletions(-)

diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index fb46070c66d8..4ba273f49d87 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -172,7 +172,6 @@ struct acpi_thermal {
struct acpi_thermal_trips trips;
struct acpi_handle_list devices;
struct thermal_zone_device *thermal_zone;
- enum thermal_device_mode mode;
int kelvin_offset; /* in millidegrees */
struct work_struct thermal_check_work;
};
@@ -500,7 +499,7 @@ static void acpi_thermal_check(void *data)
{
struct acpi_thermal *tz = data;

- if (tz->mode != THERMAL_DEVICE_ENABLED)
+ if (tz->thermal_zone->mode != THERMAL_DEVICE_ENABLED)
return;

thermal_zone_device_update(tz->thermal_zone,
@@ -529,12 +528,7 @@ static int thermal_get_temp(struct thermal_zone_device *thermal, int *temp)
static int thermal_get_mode(struct thermal_zone_device *thermal,
enum thermal_device_mode *mode)
{
- struct acpi_thermal *tz = thermal->devdata;
-
- if (!tz)
- return -EINVAL;
-
- *mode = tz->mode;
+ *mode = thermal->mode;

return 0;
}
@@ -556,11 +550,11 @@ static int thermal_set_mode(struct thermal_zone_device *thermal,
if (mode == THERMAL_DEVICE_DISABLED)
pr_warn("thermal zone will be disabled\n");

- if (mode != tz->mode) {
- tz->mode = mode;
+ if (mode != tz->thermal_zone->mode) {
+ tz->thermal_zone->mode = mode;
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"%s kernel ACPI thermal control\n",
- tz->mode == THERMAL_DEVICE_ENABLED ?
+ tz->thermal_zone->mode == THERMAL_DEVICE_ENABLED ?
"Enable" : "Disable"));
acpi_thermal_check(tz);
}
@@ -912,7 +906,7 @@ static int acpi_thermal_register_thermal_zone(struct acpi_thermal *tz)
goto remove_dev_link;
}

- tz->mode = THERMAL_DEVICE_ENABLED;
+ tz->thermal_zone->mode = THERMAL_DEVICE_ENABLED;

dev_info(&tz->device->dev, "registered as thermal_zone%d\n",
tz->thermal_zone->id);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
index 05f8d5a92862..51667ed99c21 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -98,7 +98,6 @@ struct mlxsw_thermal_module {
struct mlxsw_thermal *parent;
struct thermal_zone_device *tzdev;
struct mlxsw_thermal_trip trips[MLXSW_THERMAL_NUM_TRIPS];
- enum thermal_device_mode mode;
int module; /* Module or gearbox number */
};

@@ -110,7 +109,6 @@ struct mlxsw_thermal {
struct thermal_cooling_device *cdevs[MLXSW_MFCR_PWMS_MAX];
u8 cooling_levels[MLXSW_THERMAL_MAX_STATE + 1];
struct mlxsw_thermal_trip trips[MLXSW_THERMAL_NUM_TRIPS];
- enum thermal_device_mode mode;
struct mlxsw_thermal_module *tz_module_arr;
u8 tz_module_num;
struct mlxsw_thermal_module *tz_gearbox_arr;
@@ -280,9 +278,7 @@ static int mlxsw_thermal_unbind(struct thermal_zone_device *tzdev,
static int mlxsw_thermal_get_mode(struct thermal_zone_device *tzdev,
enum thermal_device_mode *mode)
{
- struct mlxsw_thermal *thermal = tzdev->devdata;
-
- *mode = thermal->mode;
+ *mode = tzdev->mode;

return 0;
}
@@ -299,9 +295,9 @@ static int mlxsw_thermal_set_mode(struct thermal_zone_device *tzdev,
else
tzdev->polling_delay = 0;

+ tzdev->mode = mode;
mutex_unlock(&tzdev->lock);

- thermal->mode = mode;
thermal_zone_device_update(tzdev, THERMAL_EVENT_UNSPECIFIED);

return 0;
@@ -468,9 +464,7 @@ static int mlxsw_thermal_module_unbind(struct thermal_zone_device *tzdev,
static int mlxsw_thermal_module_mode_get(struct thermal_zone_device *tzdev,
enum thermal_device_mode *mode)
{
- struct mlxsw_thermal_module *tz = tzdev->devdata;
-
- *mode = tz->mode;
+ *mode = tzdev->mode;

return 0;
}
@@ -488,9 +482,10 @@ static int mlxsw_thermal_module_mode_set(struct thermal_zone_device *tzdev,
else
tzdev->polling_delay = 0;

+ tzdev->mode = mode;
+
mutex_unlock(&tzdev->lock);

- tz->mode = mode;
thermal_zone_device_update(tzdev, THERMAL_EVENT_UNSPECIFIED);

return 0;
@@ -780,7 +775,7 @@ mlxsw_thermal_module_tz_init(struct mlxsw_thermal_module *module_tz)
return err;
}

- module_tz->mode = THERMAL_DEVICE_ENABLED;
+ module_tz->tzdev->mode = THERMAL_DEVICE_ENABLED;
return 0;
}

@@ -896,7 +891,7 @@ mlxsw_thermal_gearbox_tz_init(struct mlxsw_thermal_module *gearbox_tz)
if (IS_ERR(gearbox_tz->tzdev))
return PTR_ERR(gearbox_tz->tzdev);

- gearbox_tz->mode = THERMAL_DEVICE_ENABLED;
+ gearbox_tz->tzdev->mode = THERMAL_DEVICE_ENABLED;
return 0;
}

@@ -1065,7 +1060,7 @@ int mlxsw_thermal_init(struct mlxsw_core *core,
if (err)
goto err_unreg_modules_tzdev;

- thermal->mode = THERMAL_DEVICE_ENABLED;
+ thermal->tzdev->mode = THERMAL_DEVICE_ENABLED;
*p_thermal = thermal;
return 0;

diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
index 9d1030b1a4f4..6f21015e5fd9 100644
--- a/drivers/platform/x86/acerhdf.c
+++ b/drivers/platform/x86/acerhdf.c
@@ -68,7 +68,6 @@ static int kernelmode = 1;
#else
static int kernelmode;
#endif
-static enum thermal_device_mode thermal_mode;

static unsigned int interval = 10;
static unsigned int fanon = 60000;
@@ -398,15 +397,16 @@ static inline void acerhdf_revert_to_bios_mode(void)
{
acerhdf_change_fanstate(ACERHDF_FAN_AUTO);
kernelmode = 0;
- thermal_mode = THERMAL_DEVICE_DISABLED;
- if (thz_dev)
+ if (thz_dev) {
+ thz_dev->mode = THERMAL_DEVICE_DISABLED;
thz_dev->polling_delay = 0;
+ }
pr_notice("kernel mode fan control OFF\n");
}
static inline void acerhdf_enable_kernelmode(void)
{
kernelmode = 1;
- thermal_mode = THERMAL_DEVICE_ENABLED;
+ thz_dev->mode = THERMAL_DEVICE_ENABLED;

thz_dev->polling_delay = interval*1000;
thermal_zone_device_update(thz_dev, THERMAL_EVENT_UNSPECIFIED);
@@ -419,7 +419,7 @@ static int acerhdf_get_mode(struct thermal_zone_device *thermal,
if (verbose)
pr_notice("kernel mode fan control %d\n", kernelmode);

- *mode = thermal_mode;
+ *mode = thermal->mode;

return 0;
}
@@ -741,8 +741,6 @@ static int __init acerhdf_register_thermal(void)
if (IS_ERR(cl_dev))
return -EINVAL;

- thermal_mode = kernelmode ?
- THERMAL_DEVICE_ENABLED : THERMAL_DEVICE_DISABLED;
thz_dev = thermal_zone_device_register("acerhdf", 2, 0, NULL,
&acerhdf_dev_ops,
&acerhdf_zone_params, 0,
@@ -750,6 +748,9 @@ static int __init acerhdf_register_thermal(void)
if (IS_ERR(thz_dev))
return -EINVAL;

+ thz_dev->mode = kernelmode ?
+ THERMAL_DEVICE_ENABLED : THERMAL_DEVICE_DISABLED;
+
if (strcmp(thz_dev->governor->name,
acerhdf_zone_params.governor_name)) {
pr_err("Didn't get thermal governor %s, perhaps not compiled into thermal subsystem.\n",
diff --git a/drivers/thermal/da9062-thermal.c b/drivers/thermal/da9062-thermal.c
index c32709badeda..a14c7981c7c7 100644
--- a/drivers/thermal/da9062-thermal.c
+++ b/drivers/thermal/da9062-thermal.c
@@ -49,7 +49,6 @@ struct da9062_thermal {
struct da9062 *hw;
struct delayed_work work;
struct thermal_zone_device *zone;
- enum thermal_device_mode mode;
struct mutex lock; /* protection for da9062_thermal temperature */
int temperature;
int irq;
@@ -124,8 +123,7 @@ static irqreturn_t da9062_thermal_irq_handler(int irq, void *data)
static int da9062_thermal_get_mode(struct thermal_zone_device *z,
enum thermal_device_mode *mode)
{
- struct da9062_thermal *thermal = z->devdata;
- *mode = thermal->mode;
+ *mode = z->mode;
return 0;
}

@@ -233,7 +231,6 @@ static int da9062_thermal_probe(struct platform_device *pdev)

thermal->config = match->data;
thermal->hw = chip;
- thermal->mode = THERMAL_DEVICE_ENABLED;
thermal->dev = &pdev->dev;

INIT_DELAYED_WORK(&thermal->work, da9062_thermal_poll_on);
@@ -248,6 +245,7 @@ static int da9062_thermal_probe(struct platform_device *pdev)
ret = PTR_ERR(thermal->zone);
goto err;
}
+ thermal->zone->mode = THERMAL_DEVICE_ENABLED;

dev_dbg(&pdev->dev,
"TJUNC temperature polling period set at %d ms\n",
diff --git a/drivers/thermal/imx_thermal.c b/drivers/thermal/imx_thermal.c
index e761c9b42217..9a1114d721b6 100644
--- a/drivers/thermal/imx_thermal.c
+++ b/drivers/thermal/imx_thermal.c
@@ -197,7 +197,6 @@ struct imx_thermal_data {
struct cpufreq_policy *policy;
struct thermal_zone_device *tz;
struct thermal_cooling_device *cdev;
- enum thermal_device_mode mode;
struct regmap *tempmon;
u32 c1, c2; /* See formula in imx_init_calib() */
int temp_passive;
@@ -256,7 +255,7 @@ static int imx_get_temp(struct thermal_zone_device *tz, int *temp)
bool wait;
u32 val;

- if (data->mode == THERMAL_DEVICE_ENABLED) {
+ if (tz->mode == THERMAL_DEVICE_ENABLED) {
/* Check if a measurement is currently in progress */
regmap_read(map, soc_data->temp_data, &val);
wait = !(val & soc_data->temp_valid_mask);
@@ -283,7 +282,7 @@ static int imx_get_temp(struct thermal_zone_device *tz, int *temp)

regmap_read(map, soc_data->temp_data, &val);

- if (data->mode != THERMAL_DEVICE_ENABLED) {
+ if (tz->mode != THERMAL_DEVICE_ENABLED) {
regmap_write(map, soc_data->sensor_ctrl + REG_CLR,
soc_data->measure_temp_mask);
regmap_write(map, soc_data->sensor_ctrl + REG_SET,
@@ -334,9 +333,7 @@ static int imx_get_temp(struct thermal_zone_device *tz, int *temp)
static int imx_get_mode(struct thermal_zone_device *tz,
enum thermal_device_mode *mode)
{
- struct imx_thermal_data *data = tz->devdata;
-
- *mode = data->mode;
+ *mode = tz->mode;

return 0;
}
@@ -376,7 +373,7 @@ static int imx_set_mode(struct thermal_zone_device *tz,
}
}

- data->mode = mode;
+ tz->mode = mode;
thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);

return 0;
@@ -831,7 +828,7 @@ static int imx_thermal_probe(struct platform_device *pdev)
data->socdata->measure_temp_mask);

data->irq_enabled = true;
- data->mode = THERMAL_DEVICE_ENABLED;
+ data->tz->mode = THERMAL_DEVICE_ENABLED;

ret = devm_request_threaded_irq(&pdev->dev, data->irq,
imx_thermal_alarm_irq, imx_thermal_alarm_irq_thread,
@@ -885,7 +882,7 @@ static int __maybe_unused imx_thermal_suspend(struct device *dev)
data->socdata->measure_temp_mask);
regmap_write(map, data->socdata->sensor_ctrl + REG_SET,
data->socdata->power_down_mask);
- data->mode = THERMAL_DEVICE_DISABLED;
+ data->tz->mode = THERMAL_DEVICE_DISABLED;
clk_disable_unprepare(data->thermal_clk);

return 0;
@@ -905,7 +902,7 @@ static int __maybe_unused imx_thermal_resume(struct device *dev)
data->socdata->power_down_mask);
regmap_write(map, data->socdata->sensor_ctrl + REG_SET,
data->socdata->measure_temp_mask);
- data->mode = THERMAL_DEVICE_ENABLED;
+ data->tz->mode = THERMAL_DEVICE_ENABLED;

return 0;
}
diff --git a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
index e84faaadff87..f65b2fc09198 100644
--- a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
+++ b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
@@ -48,7 +48,6 @@ struct int3400_thermal_priv {
struct acpi_device *adev;
struct platform_device *pdev;
struct thermal_zone_device *thermal;
- enum thermal_device_mode mode;
int art_count;
struct art *arts;
int trt_count;
@@ -381,12 +380,7 @@ static int int3400_thermal_get_temp(struct thermal_zone_device *thermal,
static int int3400_thermal_get_mode(struct thermal_zone_device *thermal,
enum thermal_device_mode *mode)
{
- struct int3400_thermal_priv *priv = thermal->devdata;
-
- if (!priv)
- return -EINVAL;
-
- *mode = priv->mode;
+ *mode = thermal->mode;

return 0;
}
@@ -404,8 +398,8 @@ static int int3400_thermal_set_mode(struct thermal_zone_device *thermal,
mode != THERMAL_DEVICE_DISABLED)
return -EINVAL;

- if (mode != priv->mode) {
- priv->mode = mode;
+ if (mode != thermal->mode) {
+ thermal->mode = mode;
result = int3400_thermal_run_osc(priv->adev->handle,
priv->current_uuid_index,
mode == THERMAL_DEVICE_ENABLED);
diff --git a/drivers/thermal/intel/intel_quark_dts_thermal.c b/drivers/thermal/intel/intel_quark_dts_thermal.c
index d704fc104cfd..d77cb3df5ade 100644
--- a/drivers/thermal/intel/intel_quark_dts_thermal.c
+++ b/drivers/thermal/intel/intel_quark_dts_thermal.c
@@ -103,7 +103,6 @@ struct soc_sensor_entry {
bool locked;
u32 store_ptps;
u32 store_dts_enable;
- enum thermal_device_mode mode;
struct thermal_zone_device *tzone;
};

@@ -128,7 +127,7 @@ static int soc_dts_enable(struct thermal_zone_device *tzd)
return ret;

if (out & QRK_DTS_ENABLE_BIT) {
- aux_entry->mode = THERMAL_DEVICE_ENABLED;
+ tzd->mode = THERMAL_DEVICE_ENABLED;
return 0;
}

@@ -139,9 +138,9 @@ static int soc_dts_enable(struct thermal_zone_device *tzd)
if (ret)
return ret;

- aux_entry->mode = THERMAL_DEVICE_ENABLED;
+ tzd->mode = THERMAL_DEVICE_ENABLED;
} else {
- aux_entry->mode = THERMAL_DEVICE_DISABLED;
+ tzd->mode = THERMAL_DEVICE_DISABLED;
pr_info("DTS is locked. Cannot enable DTS\n");
ret = -EPERM;
}
@@ -161,7 +160,7 @@ static int soc_dts_disable(struct thermal_zone_device *tzd)
return ret;

if (!(out & QRK_DTS_ENABLE_BIT)) {
- aux_entry->mode = THERMAL_DEVICE_DISABLED;
+ tzd->mode = THERMAL_DEVICE_DISABLED;
return 0;
}

@@ -173,9 +172,9 @@ static int soc_dts_disable(struct thermal_zone_device *tzd)
if (ret)
return ret;

- aux_entry->mode = THERMAL_DEVICE_DISABLED;
+ tzd->mode = THERMAL_DEVICE_DISABLED;
} else {
- aux_entry->mode = THERMAL_DEVICE_ENABLED;
+ tzd->mode = THERMAL_DEVICE_ENABLED;
pr_info("DTS is locked. Cannot disable DTS\n");
ret = -EPERM;
}
@@ -312,8 +311,7 @@ static int sys_get_curr_temp(struct thermal_zone_device *tzd,
static int sys_get_mode(struct thermal_zone_device *tzd,
enum thermal_device_mode *mode)
{
- struct soc_sensor_entry *aux_entry = tzd->devdata;
- *mode = aux_entry->mode;
+ *mode = tzd->mode;
return 0;
}

diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index ddf88dbe7ba2..c495b1e48ef2 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -51,7 +51,6 @@ struct __thermal_bind_params {

/**
* struct __thermal_zone - internal representation of a thermal zone
- * @mode: current thermal zone device mode (enabled/disabled)
* @passive_delay: polling interval while passive cooling is activated
* @polling_delay: zone polling interval
* @slope: slope of the temperature adjustment curve
@@ -65,7 +64,6 @@ struct __thermal_bind_params {
*/

struct __thermal_zone {
- enum thermal_device_mode mode;
int passive_delay;
int polling_delay;
int slope;
@@ -272,9 +270,7 @@ static int of_thermal_unbind(struct thermal_zone_device *thermal,
static int of_thermal_get_mode(struct thermal_zone_device *tz,
enum thermal_device_mode *mode)
{
- struct __thermal_zone *data = tz->devdata;
-
- *mode = data->mode;
+ *mode = tz->mode;

return 0;
}
@@ -296,7 +292,7 @@ static int of_thermal_set_mode(struct thermal_zone_device *tz,

mutex_unlock(&tz->lock);

- data->mode = mode;
+ tz->mode = mode;
thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);

return 0;
@@ -979,7 +975,6 @@ __init *thermal_of_build_thermal_zone(struct device_node *np)

finish:
of_node_put(child);
- tz->mode = THERMAL_DEVICE_DISABLED;

return tz;

@@ -1134,6 +1129,7 @@ int __init of_parse_thermal_zones(void)
of_thermal_free_zone(tz);
/* attempting to build remaining zones still */
}
+ zone->mode = THERMAL_DEVICE_DISABLED;
}
of_node_put(np);

--
2.17.1

2020-06-29 21:14:59

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: [PATCH v7 01/11] acpi: thermal: Fix error handling in the register function

The acpi_thermal_register_thermal_zone() is missing any error handling.
This needs to be fixed.

Signed-off-by: Andrzej Pietrasiewicz <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Reviewed-by: Bartlomiej Zolnierkiewicz <[email protected]>
Reviewed-by: Amit Kucheria <[email protected]>
---
drivers/acpi/thermal.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index 19067a5e5293..6de8066ca1e7 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -901,23 +901,35 @@ static int acpi_thermal_register_thermal_zone(struct acpi_thermal *tz)
result = sysfs_create_link(&tz->device->dev.kobj,
&tz->thermal_zone->device.kobj, "thermal_zone");
if (result)
- return result;
+ goto unregister_tzd;

result = sysfs_create_link(&tz->thermal_zone->device.kobj,
&tz->device->dev.kobj, "device");
if (result)
- return result;
+ goto remove_tz_link;

status = acpi_bus_attach_private_data(tz->device->handle,
tz->thermal_zone);
- if (ACPI_FAILURE(status))
- return -ENODEV;
+ if (ACPI_FAILURE(status)) {
+ result = -ENODEV;
+ goto remove_dev_link;
+ }

tz->tz_enabled = 1;

dev_info(&tz->device->dev, "registered as thermal_zone%d\n",
tz->thermal_zone->id);
+
return 0;
+
+remove_dev_link:
+ sysfs_remove_link(&tz->thermal_zone->device.kobj, "device");
+remove_tz_link:
+ sysfs_remove_link(&tz->device->dev.kobj, "thermal_zone");
+unregister_tzd:
+ thermal_zone_device_unregister(tz->thermal_zone);
+
+ return result;
}

static void acpi_thermal_unregister_thermal_zone(struct acpi_thermal *tz)
--
2.17.1

2020-06-29 21:16:19

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: [PATCH v7 08/11] thermal: Explicitly enable non-changing thermal zone devices

Some thermal zone devices never change their state, so they should be
always enabled.

Signed-off-by: Andrzej Pietrasiewicz <[email protected]>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c | 8 ++++++++
drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 9 ++++++++-
drivers/platform/x86/intel_mid_thermal.c | 6 ++++++
drivers/power/supply/power_supply_core.c | 9 +++++++--
drivers/thermal/armada_thermal.c | 6 ++++++
drivers/thermal/dove_thermal.c | 6 ++++++
.../thermal/intel/int340x_thermal/int340x_thermal_zone.c | 5 +++++
drivers/thermal/intel/intel_pch_thermal.c | 5 +++++
drivers/thermal/intel/intel_soc_dts_iosf.c | 3 +++
drivers/thermal/intel/x86_pkg_temp_thermal.c | 6 ++++++
drivers/thermal/kirkwood_thermal.c | 7 +++++++
drivers/thermal/rcar_thermal.c | 9 ++++++++-
drivers/thermal/spear_thermal.c | 7 +++++++
drivers/thermal/st/st_thermal.c | 5 +++++
14 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c
index 3de8a5e83b6c..e3510e9b21f3 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c
@@ -92,6 +92,14 @@ int cxgb4_thermal_init(struct adapter *adap)
ch_thermal->tzdev = NULL;
return ret;
}
+
+ ret = thermal_zone_device_enable(ch_thermal->tzdev);
+ if (ret) {
+ dev_err(adap->pdev_dev, "Failed to enable thermal zone\n");
+ thermal_zone_device_unregister(adap->ch_thermal.tzdev);
+ return ret;
+ }
+
return 0;
}

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
index 418e59b7c671..0c95663bf9ed 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
@@ -733,7 +733,7 @@ static struct thermal_zone_device_ops tzone_ops = {

static void iwl_mvm_thermal_zone_register(struct iwl_mvm *mvm)
{
- int i;
+ int i, ret;
char name[16];
static atomic_t counter = ATOMIC_INIT(0);

@@ -759,6 +759,13 @@ static void iwl_mvm_thermal_zone_register(struct iwl_mvm *mvm)
return;
}

+ ret = thermal_zone_device_enable(mvm->tz_device.tzone);
+ if (ret) {
+ IWL_DEBUG_TEMP(mvm, "Failed to enable thermal zone\n");
+ thermal_zone_device_unregister(mvm->tz_device.tzone);
+ return;
+ }
+
/* 0 is a valid temperature,
* so initialize the array with S16_MIN which invalid temperature
*/
diff --git a/drivers/platform/x86/intel_mid_thermal.c b/drivers/platform/x86/intel_mid_thermal.c
index f402e2e74a38..f12f4e7bd971 100644
--- a/drivers/platform/x86/intel_mid_thermal.c
+++ b/drivers/platform/x86/intel_mid_thermal.c
@@ -493,6 +493,12 @@ static int mid_thermal_probe(struct platform_device *pdev)
ret = PTR_ERR(pinfo->tzd[i]);
goto err;
}
+ ret = thermal_zone_device_enable(pinfo->tzd[i]);
+ if (ret) {
+ kfree(td_info);
+ thermal_zone_device_unregister(pinfo->tzd[i]);
+ goto err;
+ }
}

pinfo->pdev = pdev;
diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c
index 02b37fe6061c..90e56736d479 100644
--- a/drivers/power/supply/power_supply_core.c
+++ b/drivers/power/supply/power_supply_core.c
@@ -939,7 +939,7 @@ static struct thermal_zone_device_ops psy_tzd_ops = {

static int psy_register_thermal(struct power_supply *psy)
{
- int i;
+ int i, ret;

if (psy->desc->no_thermal)
return 0;
@@ -949,7 +949,12 @@ static int psy_register_thermal(struct power_supply *psy)
if (psy->desc->properties[i] == POWER_SUPPLY_PROP_TEMP) {
psy->tzd = thermal_zone_device_register(psy->desc->name,
0, 0, psy, &psy_tzd_ops, NULL, 0, 0);
- return PTR_ERR_OR_ZERO(psy->tzd);
+ if (IS_ERR(psy->tzd))
+ return PTR_ERR(psy->tzd);
+ ret = thermal_zone_device_enable(psy->tzd);
+ if (ret)
+ thermal_zone_device_unregister(psy->tzd);
+ return ret;
}
}
return 0;
diff --git a/drivers/thermal/armada_thermal.c b/drivers/thermal/armada_thermal.c
index 7c447cd149e7..c2ebfb5be4b3 100644
--- a/drivers/thermal/armada_thermal.c
+++ b/drivers/thermal/armada_thermal.c
@@ -874,6 +874,12 @@ static int armada_thermal_probe(struct platform_device *pdev)
return PTR_ERR(tz);
}

+ ret = thermal_zone_device_enable(tz);
+ if (ret) {
+ thermal_zone_device_unregister(tz);
+ return ret;
+ }
+
drvdata->type = LEGACY;
drvdata->data.tz = tz;
platform_set_drvdata(pdev, drvdata);
diff --git a/drivers/thermal/dove_thermal.c b/drivers/thermal/dove_thermal.c
index 75901ced4a62..73182eb94bc0 100644
--- a/drivers/thermal/dove_thermal.c
+++ b/drivers/thermal/dove_thermal.c
@@ -153,6 +153,12 @@ static int dove_thermal_probe(struct platform_device *pdev)
return PTR_ERR(thermal);
}

+ ret = thermal_zone_device_enable(thermal);
+ if (ret) {
+ thermal_zone_device_unregister(thermal);
+ return ret;
+ }
+
platform_set_drvdata(pdev, thermal);

return 0;
diff --git a/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c b/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
index 432213272f1e..6e479deff76b 100644
--- a/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
+++ b/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
@@ -259,9 +259,14 @@ struct int34x_thermal_zone *int340x_thermal_zone_add(struct acpi_device *adev,
ret = PTR_ERR(int34x_thermal_zone->zone);
goto err_thermal_zone;
}
+ ret = thermal_zone_device_enable(int34x_thermal_zone->zone);
+ if (ret)
+ goto err_enable;

return int34x_thermal_zone;

+err_enable:
+ thermal_zone_device_unregister(int34x_thermal_zone->zone);
err_thermal_zone:
acpi_lpat_free_conversion_table(int34x_thermal_zone->lpat_table);
kfree(int34x_thermal_zone->aux_trips);
diff --git a/drivers/thermal/intel/intel_pch_thermal.c b/drivers/thermal/intel/intel_pch_thermal.c
index 56401fd4708d..65702094f3d3 100644
--- a/drivers/thermal/intel/intel_pch_thermal.c
+++ b/drivers/thermal/intel/intel_pch_thermal.c
@@ -352,9 +352,14 @@ static int intel_pch_thermal_probe(struct pci_dev *pdev,
err = PTR_ERR(ptd->tzd);
goto error_cleanup;
}
+ err = thermal_zone_device_enable(ptd->tzd);
+ if (err)
+ goto err_unregister;

return 0;

+err_unregister:
+ thermal_zone_device_unregister(ptd->tzd);
error_cleanup:
iounmap(ptd->hw_base);
error_release:
diff --git a/drivers/thermal/intel/intel_soc_dts_iosf.c b/drivers/thermal/intel/intel_soc_dts_iosf.c
index f75271b669c6..4f1a2f7c016c 100644
--- a/drivers/thermal/intel/intel_soc_dts_iosf.c
+++ b/drivers/thermal/intel/intel_soc_dts_iosf.c
@@ -329,6 +329,9 @@ static int add_dts_thermal_zone(int id, struct intel_soc_dts_sensor_entry *dts,
ret = PTR_ERR(dts->tzone);
goto err_ret;
}
+ ret = thermal_zone_device_enable(dts->tzone);
+ if (ret)
+ goto err_enable;

ret = soc_dts_enable(id);
if (ret)
diff --git a/drivers/thermal/intel/x86_pkg_temp_thermal.c b/drivers/thermal/intel/x86_pkg_temp_thermal.c
index a006b9fd1d72..b81c33202f41 100644
--- a/drivers/thermal/intel/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/intel/x86_pkg_temp_thermal.c
@@ -363,6 +363,12 @@ static int pkg_temp_thermal_device_add(unsigned int cpu)
kfree(zonedev);
return err;
}
+ err = thermal_zone_device_enable(zonedev->tzone);
+ if (err) {
+ thermal_zone_device_unregister(zonedev->tzone);
+ kfree(zonedev);
+ return err;
+ }
/* Store MSR value for package thermal interrupt, to restore at exit */
rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, zonedev->msr_pkg_therm_low,
zonedev->msr_pkg_therm_high);
diff --git a/drivers/thermal/kirkwood_thermal.c b/drivers/thermal/kirkwood_thermal.c
index 189b675cf14d..7fb6e476c82a 100644
--- a/drivers/thermal/kirkwood_thermal.c
+++ b/drivers/thermal/kirkwood_thermal.c
@@ -65,6 +65,7 @@ static int kirkwood_thermal_probe(struct platform_device *pdev)
struct thermal_zone_device *thermal = NULL;
struct kirkwood_thermal_priv *priv;
struct resource *res;
+ int ret;

priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
@@ -82,6 +83,12 @@ static int kirkwood_thermal_probe(struct platform_device *pdev)
"Failed to register thermal zone device\n");
return PTR_ERR(thermal);
}
+ ret = thermal_zone_device_enable(thermal);
+ if (ret) {
+ thermal_zone_device_unregister(thermal);
+ dev_err(&pdev->dev, "Failed to enable thermal zone device\n");
+ return ret;
+ }

platform_set_drvdata(pdev, thermal);

diff --git a/drivers/thermal/rcar_thermal.c b/drivers/thermal/rcar_thermal.c
index 46aeb28b4e90..787710bb88fe 100644
--- a/drivers/thermal/rcar_thermal.c
+++ b/drivers/thermal/rcar_thermal.c
@@ -550,12 +550,19 @@ static int rcar_thermal_probe(struct platform_device *pdev)
priv->zone = devm_thermal_zone_of_sensor_register(
dev, i, priv,
&rcar_thermal_zone_of_ops);
- else
+ else {
priv->zone = thermal_zone_device_register(
"rcar_thermal",
1, 0, priv,
&rcar_thermal_zone_ops, NULL, 0,
idle);
+
+ ret = thermal_zone_device_enable(priv->zone);
+ if (ret) {
+ thermal_zone_device_unregister(priv->zone);
+ priv->zone = ERR_PTR(ret);
+ }
+ }
if (IS_ERR(priv->zone)) {
dev_err(dev, "can't register thermal zone\n");
ret = PTR_ERR(priv->zone);
diff --git a/drivers/thermal/spear_thermal.c b/drivers/thermal/spear_thermal.c
index f68f581fd669..ee33ed692e4f 100644
--- a/drivers/thermal/spear_thermal.c
+++ b/drivers/thermal/spear_thermal.c
@@ -131,6 +131,11 @@ static int spear_thermal_probe(struct platform_device *pdev)
ret = PTR_ERR(spear_thermal);
goto disable_clk;
}
+ ret = thermal_zone_device_enable(spear_thermal);
+ if (ret) {
+ dev_err(&pdev->dev, "Cannot enable thermal zone\n");
+ goto unregister_tzd;
+ }

platform_set_drvdata(pdev, spear_thermal);

@@ -139,6 +144,8 @@ static int spear_thermal_probe(struct platform_device *pdev)

return 0;

+unregister_tzd:
+ thermal_zone_device_unregister(spear_thermal);
disable_clk:
clk_disable(stdev->clk);

diff --git a/drivers/thermal/st/st_thermal.c b/drivers/thermal/st/st_thermal.c
index b928ca6a289b..1276b95604fe 100644
--- a/drivers/thermal/st/st_thermal.c
+++ b/drivers/thermal/st/st_thermal.c
@@ -246,11 +246,16 @@ int st_thermal_register(struct platform_device *pdev,
ret = PTR_ERR(sensor->thermal_dev);
goto sensor_off;
}
+ ret = thermal_zone_device_enable(sensor->thermal_dev);
+ if (ret)
+ goto tzd_unregister;

platform_set_drvdata(pdev, sensor);

return 0;

+tzd_unregister:
+ thermal_zone_device_unregister(sensor->thermal_dev);
sensor_off:
st_thermal_sensor_off(sensor);

--
2.17.1

2020-06-29 21:16:45

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: [PATCH v7 05/11] thermal: remove get_mode() operation of drivers

get_mode() is now redundant, as the state is stored in struct
thermal_zone_device.

Consequently the "mode" attribute in sysfs can always be visible, because
it is always possible to get the mode from struct tzd.

Signed-off-by: Andrzej Pietrasiewicz <[email protected]>
[for acerhdf]
Acked-by: Peter Kaestle <[email protected]>
Reviewed-by: Bartlomiej Zolnierkiewicz <[email protected]>
Reviewed-by: Amit Kucheria <[email protected]>
---
drivers/acpi/thermal.c | 9 ------
.../ethernet/mellanox/mlxsw/core_thermal.c | 19 ------------
drivers/platform/x86/acerhdf.c | 12 --------
drivers/thermal/da9062-thermal.c | 8 -----
drivers/thermal/imx_thermal.c | 9 ------
.../intel/int340x_thermal/int3400_thermal.c | 9 ------
.../thermal/intel/intel_quark_dts_thermal.c | 8 -----
drivers/thermal/thermal_core.c | 7 +----
drivers/thermal/thermal_of.c | 9 ------
drivers/thermal/thermal_sysfs.c | 30 ++-----------------
include/linux/thermal.h | 2 --
11 files changed, 3 insertions(+), 119 deletions(-)

diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index 4ba273f49d87..592be97c4456 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -525,14 +525,6 @@ static int thermal_get_temp(struct thermal_zone_device *thermal, int *temp)
return 0;
}

-static int thermal_get_mode(struct thermal_zone_device *thermal,
- enum thermal_device_mode *mode)
-{
- *mode = thermal->mode;
-
- return 0;
-}
-
static int thermal_set_mode(struct thermal_zone_device *thermal,
enum thermal_device_mode mode)
{
@@ -847,7 +839,6 @@ static struct thermal_zone_device_ops acpi_thermal_zone_ops = {
.bind = acpi_thermal_bind_cooling_device,
.unbind = acpi_thermal_unbind_cooling_device,
.get_temp = thermal_get_temp,
- .get_mode = thermal_get_mode,
.set_mode = thermal_set_mode,
.get_trip_type = thermal_get_trip_type,
.get_trip_temp = thermal_get_trip_temp,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
index 51667ed99c21..ad61b2db30b8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -275,14 +275,6 @@ static int mlxsw_thermal_unbind(struct thermal_zone_device *tzdev,
return 0;
}

-static int mlxsw_thermal_get_mode(struct thermal_zone_device *tzdev,
- enum thermal_device_mode *mode)
-{
- *mode = tzdev->mode;
-
- return 0;
-}
-
static int mlxsw_thermal_set_mode(struct thermal_zone_device *tzdev,
enum thermal_device_mode mode)
{
@@ -402,7 +394,6 @@ static int mlxsw_thermal_trend_get(struct thermal_zone_device *tzdev,
static struct thermal_zone_device_ops mlxsw_thermal_ops = {
.bind = mlxsw_thermal_bind,
.unbind = mlxsw_thermal_unbind,
- .get_mode = mlxsw_thermal_get_mode,
.set_mode = mlxsw_thermal_set_mode,
.get_temp = mlxsw_thermal_get_temp,
.get_trip_type = mlxsw_thermal_get_trip_type,
@@ -461,14 +452,6 @@ static int mlxsw_thermal_module_unbind(struct thermal_zone_device *tzdev,
return err;
}

-static int mlxsw_thermal_module_mode_get(struct thermal_zone_device *tzdev,
- enum thermal_device_mode *mode)
-{
- *mode = tzdev->mode;
-
- return 0;
-}
-
static int mlxsw_thermal_module_mode_set(struct thermal_zone_device *tzdev,
enum thermal_device_mode mode)
{
@@ -606,7 +589,6 @@ static int mlxsw_thermal_module_trend_get(struct thermal_zone_device *tzdev,
static struct thermal_zone_device_ops mlxsw_thermal_module_ops = {
.bind = mlxsw_thermal_module_bind,
.unbind = mlxsw_thermal_module_unbind,
- .get_mode = mlxsw_thermal_module_mode_get,
.set_mode = mlxsw_thermal_module_mode_set,
.get_temp = mlxsw_thermal_module_temp_get,
.get_trip_type = mlxsw_thermal_module_trip_type_get,
@@ -645,7 +627,6 @@ static int mlxsw_thermal_gearbox_temp_get(struct thermal_zone_device *tzdev,
static struct thermal_zone_device_ops mlxsw_thermal_gearbox_ops = {
.bind = mlxsw_thermal_module_bind,
.unbind = mlxsw_thermal_module_unbind,
- .get_mode = mlxsw_thermal_module_mode_get,
.set_mode = mlxsw_thermal_module_mode_set,
.get_temp = mlxsw_thermal_gearbox_temp_get,
.get_trip_type = mlxsw_thermal_module_trip_type_get,
diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
index 6f21015e5fd9..58c4e1caaa09 100644
--- a/drivers/platform/x86/acerhdf.c
+++ b/drivers/platform/x86/acerhdf.c
@@ -413,17 +413,6 @@ static inline void acerhdf_enable_kernelmode(void)
pr_notice("kernel mode fan control ON\n");
}

-static int acerhdf_get_mode(struct thermal_zone_device *thermal,
- enum thermal_device_mode *mode)
-{
- if (verbose)
- pr_notice("kernel mode fan control %d\n", kernelmode);
-
- *mode = thermal->mode;
-
- return 0;
-}
-
/*
* set operation mode;
* enabled: the thermal layer of the kernel takes care about
@@ -490,7 +479,6 @@ static struct thermal_zone_device_ops acerhdf_dev_ops = {
.bind = acerhdf_bind,
.unbind = acerhdf_unbind,
.get_temp = acerhdf_get_ec_temp,
- .get_mode = acerhdf_get_mode,
.set_mode = acerhdf_set_mode,
.get_trip_type = acerhdf_get_trip_type,
.get_trip_hyst = acerhdf_get_trip_hyst,
diff --git a/drivers/thermal/da9062-thermal.c b/drivers/thermal/da9062-thermal.c
index a14c7981c7c7..a7ac8afb063e 100644
--- a/drivers/thermal/da9062-thermal.c
+++ b/drivers/thermal/da9062-thermal.c
@@ -120,13 +120,6 @@ static irqreturn_t da9062_thermal_irq_handler(int irq, void *data)
return IRQ_HANDLED;
}

-static int da9062_thermal_get_mode(struct thermal_zone_device *z,
- enum thermal_device_mode *mode)
-{
- *mode = z->mode;
- return 0;
-}
-
static int da9062_thermal_get_trip_type(struct thermal_zone_device *z,
int trip,
enum thermal_trip_type *type)
@@ -179,7 +172,6 @@ static int da9062_thermal_get_temp(struct thermal_zone_device *z,

static struct thermal_zone_device_ops da9062_thermal_ops = {
.get_temp = da9062_thermal_get_temp,
- .get_mode = da9062_thermal_get_mode,
.get_trip_type = da9062_thermal_get_trip_type,
.get_trip_temp = da9062_thermal_get_trip_temp,
};
diff --git a/drivers/thermal/imx_thermal.c b/drivers/thermal/imx_thermal.c
index 9a1114d721b6..2c7ee5da608a 100644
--- a/drivers/thermal/imx_thermal.c
+++ b/drivers/thermal/imx_thermal.c
@@ -330,14 +330,6 @@ static int imx_get_temp(struct thermal_zone_device *tz, int *temp)
return 0;
}

-static int imx_get_mode(struct thermal_zone_device *tz,
- enum thermal_device_mode *mode)
-{
- *mode = tz->mode;
-
- return 0;
-}
-
static int imx_set_mode(struct thermal_zone_device *tz,
enum thermal_device_mode mode)
{
@@ -464,7 +456,6 @@ static struct thermal_zone_device_ops imx_tz_ops = {
.bind = imx_bind,
.unbind = imx_unbind,
.get_temp = imx_get_temp,
- .get_mode = imx_get_mode,
.set_mode = imx_set_mode,
.get_trip_type = imx_get_trip_type,
.get_trip_temp = imx_get_trip_temp,
diff --git a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
index f65b2fc09198..9a622aaf29dd 100644
--- a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
+++ b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
@@ -377,14 +377,6 @@ static int int3400_thermal_get_temp(struct thermal_zone_device *thermal,
return 0;
}

-static int int3400_thermal_get_mode(struct thermal_zone_device *thermal,
- enum thermal_device_mode *mode)
-{
- *mode = thermal->mode;
-
- return 0;
-}
-
static int int3400_thermal_set_mode(struct thermal_zone_device *thermal,
enum thermal_device_mode mode)
{
@@ -412,7 +404,6 @@ static int int3400_thermal_set_mode(struct thermal_zone_device *thermal,

static struct thermal_zone_device_ops int3400_thermal_ops = {
.get_temp = int3400_thermal_get_temp,
- .get_mode = int3400_thermal_get_mode,
.set_mode = int3400_thermal_set_mode,
};

diff --git a/drivers/thermal/intel/intel_quark_dts_thermal.c b/drivers/thermal/intel/intel_quark_dts_thermal.c
index d77cb3df5ade..c4879b4bfbf1 100644
--- a/drivers/thermal/intel/intel_quark_dts_thermal.c
+++ b/drivers/thermal/intel/intel_quark_dts_thermal.c
@@ -308,13 +308,6 @@ static int sys_get_curr_temp(struct thermal_zone_device *tzd,
return 0;
}

-static int sys_get_mode(struct thermal_zone_device *tzd,
- enum thermal_device_mode *mode)
-{
- *mode = tzd->mode;
- return 0;
-}
-
static int sys_set_mode(struct thermal_zone_device *tzd,
enum thermal_device_mode mode)
{
@@ -336,7 +329,6 @@ static struct thermal_zone_device_ops tzone_ops = {
.get_trip_type = sys_get_trip_type,
.set_trip_temp = sys_set_trip_temp,
.get_crit_temp = sys_get_crit_temp,
- .get_mode = sys_get_mode,
.set_mode = sys_set_mode,
};

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index b71196eaf90e..14d3b1b94c4f 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1456,7 +1456,6 @@ static int thermal_pm_notify(struct notifier_block *nb,
unsigned long mode, void *_unused)
{
struct thermal_zone_device *tz;
- enum thermal_device_mode tz_mode;

switch (mode) {
case PM_HIBERNATION_PREPARE:
@@ -1469,11 +1468,7 @@ static int thermal_pm_notify(struct notifier_block *nb,
case PM_POST_SUSPEND:
atomic_set(&in_suspend, 0);
list_for_each_entry(tz, &thermal_tz_list, node) {
- tz_mode = THERMAL_DEVICE_ENABLED;
- if (tz->ops->get_mode)
- tz->ops->get_mode(tz, &tz_mode);
-
- if (tz_mode == THERMAL_DEVICE_DISABLED)
+ if (tz->mode == THERMAL_DEVICE_DISABLED)
continue;

thermal_zone_device_init(tz);
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index c495b1e48ef2..ba65d48a48cb 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -267,14 +267,6 @@ static int of_thermal_unbind(struct thermal_zone_device *thermal,
return 0;
}

-static int of_thermal_get_mode(struct thermal_zone_device *tz,
- enum thermal_device_mode *mode)
-{
- *mode = tz->mode;
-
- return 0;
-}
-
static int of_thermal_set_mode(struct thermal_zone_device *tz,
enum thermal_device_mode mode)
{
@@ -389,7 +381,6 @@ static int of_thermal_get_crit_temp(struct thermal_zone_device *tz,
}

static struct thermal_zone_device_ops of_thermal_ops = {
- .get_mode = of_thermal_get_mode,
.set_mode = of_thermal_set_mode,

.get_trip_type = of_thermal_get_trip_type,
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index aa99edb4dff7..096370977068 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -49,18 +49,9 @@ static ssize_t
mode_show(struct device *dev, struct device_attribute *attr, char *buf)
{
struct thermal_zone_device *tz = to_thermal_zone(dev);
- enum thermal_device_mode mode;
- int result;
-
- if (!tz->ops->get_mode)
- return -EPERM;

- result = tz->ops->get_mode(tz, &mode);
- if (result)
- return result;
-
- return sprintf(buf, "%s\n", mode == THERMAL_DEVICE_ENABLED ? "enabled"
- : "disabled");
+ return sprintf(buf, "%s\n", tz->mode == THERMAL_DEVICE_ENABLED ?
+ "enabled" : "disabled");
}

static ssize_t
@@ -428,30 +419,13 @@ static struct attribute_group thermal_zone_attribute_group = {
.attrs = thermal_zone_dev_attrs,
};

-/* We expose mode only if .get_mode is present */
static struct attribute *thermal_zone_mode_attrs[] = {
&dev_attr_mode.attr,
NULL,
};

-static umode_t thermal_zone_mode_is_visible(struct kobject *kobj,
- struct attribute *attr,
- int attrno)
-{
- struct device *dev = container_of(kobj, struct device, kobj);
- struct thermal_zone_device *tz;
-
- tz = container_of(dev, struct thermal_zone_device, device);
-
- if (tz->ops->get_mode)
- return attr->mode;
-
- return 0;
-}
-
static struct attribute_group thermal_zone_mode_attribute_group = {
.attrs = thermal_zone_mode_attrs,
- .is_visible = thermal_zone_mode_is_visible,
};

/* We expose passive only if passive trips are present */
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 5f91d7f04512..a808f6fa2777 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -76,8 +76,6 @@ struct thermal_zone_device_ops {
struct thermal_cooling_device *);
int (*get_temp) (struct thermal_zone_device *, int *);
int (*set_trips) (struct thermal_zone_device *, int, int);
- int (*get_mode) (struct thermal_zone_device *,
- enum thermal_device_mode *);
int (*set_mode) (struct thermal_zone_device *,
enum thermal_device_mode);
int (*get_trip_type) (struct thermal_zone_device *, int,
--
2.17.1

2020-06-29 21:17:07

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: [PATCH v7 09/11] thermal: core: Stop polling DISABLED thermal devices

Polling DISABLED devices is not desired, as all such "disabled" devices
are meant to be handled by userspace. This patch introduces and uses
should_stop_polling() to decide whether the device should be polled or not.

Signed-off-by: Andrzej Pietrasiewicz <[email protected]>
Reviewed-by: Bartlomiej Zolnierkiewicz <[email protected]>
---
drivers/thermal/thermal_core.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 52d136780577..e613f5c07bad 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -301,13 +301,22 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
cancel_delayed_work(&tz->poll_queue);
}

+static inline bool should_stop_polling(struct thermal_zone_device *tz)
+{
+ return !thermal_zone_device_is_enabled(tz);
+}
+
static void monitor_thermal_zone(struct thermal_zone_device *tz)
{
+ bool stop;
+
+ stop = should_stop_polling(tz);
+
mutex_lock(&tz->lock);

- if (tz->passive)
+ if (!stop && tz->passive)
thermal_zone_device_set_polling(tz, tz->passive_delay);
- else if (tz->polling_delay)
+ else if (!stop && tz->polling_delay)
thermal_zone_device_set_polling(tz, tz->polling_delay);
else
thermal_zone_device_set_polling(tz, 0);
@@ -517,6 +526,9 @@ void thermal_zone_device_update(struct thermal_zone_device *tz,
{
int count;

+ if (should_stop_polling(tz))
+ return;
+
if (atomic_read(&in_suspend))
return;

--
2.17.1

2020-06-29 21:48:17

by Amit Kucheria

[permalink] [raw]
Subject: Re: [PATCH v7 09/11] thermal: core: Stop polling DISABLED thermal devices

On Mon, Jun 29, 2020 at 6:00 PM Andrzej Pietrasiewicz
<[email protected]> wrote:
>
> Polling DISABLED devices is not desired, as all such "disabled" devices
> are meant to be handled by userspace. This patch introduces and uses
> should_stop_polling() to decide whether the device should be polled or not.
>
> Signed-off-by: Andrzej Pietrasiewicz <[email protected]>
> Reviewed-by: Bartlomiej Zolnierkiewicz <[email protected]>

Reviewed-by: Amit Kucheria <[email protected]>

> ---
> drivers/thermal/thermal_core.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 52d136780577..e613f5c07bad 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -301,13 +301,22 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
> cancel_delayed_work(&tz->poll_queue);
> }
>
> +static inline bool should_stop_polling(struct thermal_zone_device *tz)
> +{
> + return !thermal_zone_device_is_enabled(tz);
> +}
> +
> static void monitor_thermal_zone(struct thermal_zone_device *tz)
> {
> + bool stop;
> +
> + stop = should_stop_polling(tz);
> +
> mutex_lock(&tz->lock);
>
> - if (tz->passive)
> + if (!stop && tz->passive)
> thermal_zone_device_set_polling(tz, tz->passive_delay);
> - else if (tz->polling_delay)
> + else if (!stop && tz->polling_delay)
> thermal_zone_device_set_polling(tz, tz->polling_delay);
> else
> thermal_zone_device_set_polling(tz, 0);
> @@ -517,6 +526,9 @@ void thermal_zone_device_update(struct thermal_zone_device *tz,
> {
> int count;
>
> + if (should_stop_polling(tz))
> + return;
> +
> if (atomic_read(&in_suspend))
> return;
>
> --
> 2.17.1
>

2020-06-30 05:11:40

by Amit Kucheria

[permalink] [raw]
Subject: Re: [PATCH v7 08/11] thermal: Explicitly enable non-changing thermal zone devices

On Mon, Jun 29, 2020 at 5:59 PM Andrzej Pietrasiewicz
<[email protected]> wrote:
>
> Some thermal zone devices never change their state, so they should be
> always enabled.
>
> Signed-off-by: Andrzej Pietrasiewicz <[email protected]>

Reviewed-by: Amit Kucheria <[email protected]>

> ---
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c | 8 ++++++++
> drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 9 ++++++++-
> drivers/platform/x86/intel_mid_thermal.c | 6 ++++++
> drivers/power/supply/power_supply_core.c | 9 +++++++--
> drivers/thermal/armada_thermal.c | 6 ++++++
> drivers/thermal/dove_thermal.c | 6 ++++++
> .../thermal/intel/int340x_thermal/int340x_thermal_zone.c | 5 +++++
> drivers/thermal/intel/intel_pch_thermal.c | 5 +++++
> drivers/thermal/intel/intel_soc_dts_iosf.c | 3 +++
> drivers/thermal/intel/x86_pkg_temp_thermal.c | 6 ++++++
> drivers/thermal/kirkwood_thermal.c | 7 +++++++
> drivers/thermal/rcar_thermal.c | 9 ++++++++-
> drivers/thermal/spear_thermal.c | 7 +++++++
> drivers/thermal/st/st_thermal.c | 5 +++++
> 14 files changed, 87 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c
> index 3de8a5e83b6c..e3510e9b21f3 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c
> @@ -92,6 +92,14 @@ int cxgb4_thermal_init(struct adapter *adap)
> ch_thermal->tzdev = NULL;
> return ret;
> }
> +
> + ret = thermal_zone_device_enable(ch_thermal->tzdev);
> + if (ret) {
> + dev_err(adap->pdev_dev, "Failed to enable thermal zone\n");
> + thermal_zone_device_unregister(adap->ch_thermal.tzdev);
> + return ret;
> + }
> +
> return 0;
> }
>
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> index 418e59b7c671..0c95663bf9ed 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> @@ -733,7 +733,7 @@ static struct thermal_zone_device_ops tzone_ops = {
>
> static void iwl_mvm_thermal_zone_register(struct iwl_mvm *mvm)
> {
> - int i;
> + int i, ret;
> char name[16];
> static atomic_t counter = ATOMIC_INIT(0);
>
> @@ -759,6 +759,13 @@ static void iwl_mvm_thermal_zone_register(struct iwl_mvm *mvm)
> return;
> }
>
> + ret = thermal_zone_device_enable(mvm->tz_device.tzone);
> + if (ret) {
> + IWL_DEBUG_TEMP(mvm, "Failed to enable thermal zone\n");
> + thermal_zone_device_unregister(mvm->tz_device.tzone);
> + return;
> + }
> +
> /* 0 is a valid temperature,
> * so initialize the array with S16_MIN which invalid temperature
> */
> diff --git a/drivers/platform/x86/intel_mid_thermal.c b/drivers/platform/x86/intel_mid_thermal.c
> index f402e2e74a38..f12f4e7bd971 100644
> --- a/drivers/platform/x86/intel_mid_thermal.c
> +++ b/drivers/platform/x86/intel_mid_thermal.c
> @@ -493,6 +493,12 @@ static int mid_thermal_probe(struct platform_device *pdev)
> ret = PTR_ERR(pinfo->tzd[i]);
> goto err;
> }
> + ret = thermal_zone_device_enable(pinfo->tzd[i]);
> + if (ret) {
> + kfree(td_info);
> + thermal_zone_device_unregister(pinfo->tzd[i]);
> + goto err;
> + }
> }
>
> pinfo->pdev = pdev;
> diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c
> index 02b37fe6061c..90e56736d479 100644
> --- a/drivers/power/supply/power_supply_core.c
> +++ b/drivers/power/supply/power_supply_core.c
> @@ -939,7 +939,7 @@ static struct thermal_zone_device_ops psy_tzd_ops = {
>
> static int psy_register_thermal(struct power_supply *psy)
> {
> - int i;
> + int i, ret;
>
> if (psy->desc->no_thermal)
> return 0;
> @@ -949,7 +949,12 @@ static int psy_register_thermal(struct power_supply *psy)
> if (psy->desc->properties[i] == POWER_SUPPLY_PROP_TEMP) {
> psy->tzd = thermal_zone_device_register(psy->desc->name,
> 0, 0, psy, &psy_tzd_ops, NULL, 0, 0);
> - return PTR_ERR_OR_ZERO(psy->tzd);
> + if (IS_ERR(psy->tzd))
> + return PTR_ERR(psy->tzd);
> + ret = thermal_zone_device_enable(psy->tzd);
> + if (ret)
> + thermal_zone_device_unregister(psy->tzd);
> + return ret;
> }
> }
> return 0;
> diff --git a/drivers/thermal/armada_thermal.c b/drivers/thermal/armada_thermal.c
> index 7c447cd149e7..c2ebfb5be4b3 100644
> --- a/drivers/thermal/armada_thermal.c
> +++ b/drivers/thermal/armada_thermal.c
> @@ -874,6 +874,12 @@ static int armada_thermal_probe(struct platform_device *pdev)
> return PTR_ERR(tz);
> }
>
> + ret = thermal_zone_device_enable(tz);
> + if (ret) {
> + thermal_zone_device_unregister(tz);
> + return ret;
> + }
> +
> drvdata->type = LEGACY;
> drvdata->data.tz = tz;
> platform_set_drvdata(pdev, drvdata);
> diff --git a/drivers/thermal/dove_thermal.c b/drivers/thermal/dove_thermal.c
> index 75901ced4a62..73182eb94bc0 100644
> --- a/drivers/thermal/dove_thermal.c
> +++ b/drivers/thermal/dove_thermal.c
> @@ -153,6 +153,12 @@ static int dove_thermal_probe(struct platform_device *pdev)
> return PTR_ERR(thermal);
> }
>
> + ret = thermal_zone_device_enable(thermal);
> + if (ret) {
> + thermal_zone_device_unregister(thermal);
> + return ret;
> + }
> +
> platform_set_drvdata(pdev, thermal);
>
> return 0;
> diff --git a/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c b/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
> index 432213272f1e..6e479deff76b 100644
> --- a/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
> +++ b/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
> @@ -259,9 +259,14 @@ struct int34x_thermal_zone *int340x_thermal_zone_add(struct acpi_device *adev,
> ret = PTR_ERR(int34x_thermal_zone->zone);
> goto err_thermal_zone;
> }
> + ret = thermal_zone_device_enable(int34x_thermal_zone->zone);
> + if (ret)
> + goto err_enable;
>
> return int34x_thermal_zone;
>
> +err_enable:
> + thermal_zone_device_unregister(int34x_thermal_zone->zone);
> err_thermal_zone:
> acpi_lpat_free_conversion_table(int34x_thermal_zone->lpat_table);
> kfree(int34x_thermal_zone->aux_trips);
> diff --git a/drivers/thermal/intel/intel_pch_thermal.c b/drivers/thermal/intel/intel_pch_thermal.c
> index 56401fd4708d..65702094f3d3 100644
> --- a/drivers/thermal/intel/intel_pch_thermal.c
> +++ b/drivers/thermal/intel/intel_pch_thermal.c
> @@ -352,9 +352,14 @@ static int intel_pch_thermal_probe(struct pci_dev *pdev,
> err = PTR_ERR(ptd->tzd);
> goto error_cleanup;
> }
> + err = thermal_zone_device_enable(ptd->tzd);
> + if (err)
> + goto err_unregister;
>
> return 0;
>
> +err_unregister:
> + thermal_zone_device_unregister(ptd->tzd);
> error_cleanup:
> iounmap(ptd->hw_base);
> error_release:
> diff --git a/drivers/thermal/intel/intel_soc_dts_iosf.c b/drivers/thermal/intel/intel_soc_dts_iosf.c
> index f75271b669c6..4f1a2f7c016c 100644
> --- a/drivers/thermal/intel/intel_soc_dts_iosf.c
> +++ b/drivers/thermal/intel/intel_soc_dts_iosf.c
> @@ -329,6 +329,9 @@ static int add_dts_thermal_zone(int id, struct intel_soc_dts_sensor_entry *dts,
> ret = PTR_ERR(dts->tzone);
> goto err_ret;
> }
> + ret = thermal_zone_device_enable(dts->tzone);
> + if (ret)
> + goto err_enable;
>
> ret = soc_dts_enable(id);
> if (ret)
> diff --git a/drivers/thermal/intel/x86_pkg_temp_thermal.c b/drivers/thermal/intel/x86_pkg_temp_thermal.c
> index a006b9fd1d72..b81c33202f41 100644
> --- a/drivers/thermal/intel/x86_pkg_temp_thermal.c
> +++ b/drivers/thermal/intel/x86_pkg_temp_thermal.c
> @@ -363,6 +363,12 @@ static int pkg_temp_thermal_device_add(unsigned int cpu)
> kfree(zonedev);
> return err;
> }
> + err = thermal_zone_device_enable(zonedev->tzone);
> + if (err) {
> + thermal_zone_device_unregister(zonedev->tzone);
> + kfree(zonedev);
> + return err;
> + }
> /* Store MSR value for package thermal interrupt, to restore at exit */
> rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, zonedev->msr_pkg_therm_low,
> zonedev->msr_pkg_therm_high);
> diff --git a/drivers/thermal/kirkwood_thermal.c b/drivers/thermal/kirkwood_thermal.c
> index 189b675cf14d..7fb6e476c82a 100644
> --- a/drivers/thermal/kirkwood_thermal.c
> +++ b/drivers/thermal/kirkwood_thermal.c
> @@ -65,6 +65,7 @@ static int kirkwood_thermal_probe(struct platform_device *pdev)
> struct thermal_zone_device *thermal = NULL;
> struct kirkwood_thermal_priv *priv;
> struct resource *res;
> + int ret;
>
> priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
> if (!priv)
> @@ -82,6 +83,12 @@ static int kirkwood_thermal_probe(struct platform_device *pdev)
> "Failed to register thermal zone device\n");
> return PTR_ERR(thermal);
> }
> + ret = thermal_zone_device_enable(thermal);
> + if (ret) {
> + thermal_zone_device_unregister(thermal);
> + dev_err(&pdev->dev, "Failed to enable thermal zone device\n");
> + return ret;
> + }
>
> platform_set_drvdata(pdev, thermal);
>
> diff --git a/drivers/thermal/rcar_thermal.c b/drivers/thermal/rcar_thermal.c
> index 46aeb28b4e90..787710bb88fe 100644
> --- a/drivers/thermal/rcar_thermal.c
> +++ b/drivers/thermal/rcar_thermal.c
> @@ -550,12 +550,19 @@ static int rcar_thermal_probe(struct platform_device *pdev)
> priv->zone = devm_thermal_zone_of_sensor_register(
> dev, i, priv,
> &rcar_thermal_zone_of_ops);
> - else
> + else {
> priv->zone = thermal_zone_device_register(
> "rcar_thermal",
> 1, 0, priv,
> &rcar_thermal_zone_ops, NULL, 0,
> idle);
> +
> + ret = thermal_zone_device_enable(priv->zone);
> + if (ret) {
> + thermal_zone_device_unregister(priv->zone);
> + priv->zone = ERR_PTR(ret);
> + }
> + }
> if (IS_ERR(priv->zone)) {
> dev_err(dev, "can't register thermal zone\n");
> ret = PTR_ERR(priv->zone);
> diff --git a/drivers/thermal/spear_thermal.c b/drivers/thermal/spear_thermal.c
> index f68f581fd669..ee33ed692e4f 100644
> --- a/drivers/thermal/spear_thermal.c
> +++ b/drivers/thermal/spear_thermal.c
> @@ -131,6 +131,11 @@ static int spear_thermal_probe(struct platform_device *pdev)
> ret = PTR_ERR(spear_thermal);
> goto disable_clk;
> }
> + ret = thermal_zone_device_enable(spear_thermal);
> + if (ret) {
> + dev_err(&pdev->dev, "Cannot enable thermal zone\n");
> + goto unregister_tzd;
> + }
>
> platform_set_drvdata(pdev, spear_thermal);
>
> @@ -139,6 +144,8 @@ static int spear_thermal_probe(struct platform_device *pdev)
>
> return 0;
>
> +unregister_tzd:
> + thermal_zone_device_unregister(spear_thermal);
> disable_clk:
> clk_disable(stdev->clk);
>
> diff --git a/drivers/thermal/st/st_thermal.c b/drivers/thermal/st/st_thermal.c
> index b928ca6a289b..1276b95604fe 100644
> --- a/drivers/thermal/st/st_thermal.c
> +++ b/drivers/thermal/st/st_thermal.c
> @@ -246,11 +246,16 @@ int st_thermal_register(struct platform_device *pdev,
> ret = PTR_ERR(sensor->thermal_dev);
> goto sensor_off;
> }
> + ret = thermal_zone_device_enable(sensor->thermal_dev);
> + if (ret)
> + goto tzd_unregister;
>
> platform_set_drvdata(pdev, sensor);
>
> return 0;
>
> +tzd_unregister:
> + thermal_zone_device_unregister(sensor->thermal_dev);
> sensor_off:
> st_thermal_sensor_off(sensor);
>
> --
> 2.17.1
>

2020-06-30 12:59:42

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices


Hi Andrzej,

I've tested your series with kernelci and there are 3 regressions for
the imx6.

https://kernelci.org/test/job/thermal/branch/testing/kernel/v5.8-rc3-11-gf5e50bf4d3ef/plan/baseline/


On 29/06/2020 14:29, Andrzej Pietrasiewicz wrote:
> A respin of v6 with these changes:
>
> v6..v7:
> - removed duplicate S-o-b line from patch 6/11
>
> v5..v6:
> - staticized thermal_zone_device_set_mode() (kbuild test robot)
>
> v4..v5:
>
> - EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL (Daniel)
> - dropped unnecessary thermal_zone_device_enable() in int3400_thermal.c
> and in thermal_of.c (Bartlomiej)
>
> Andrzej Pietrasiewicz (11):
> acpi: thermal: Fix error handling in the register function
> thermal: Store thermal mode in a dedicated enum
> thermal: Add current mode to thermal zone device
> thermal: Store device mode in struct thermal_zone_device
> thermal: remove get_mode() operation of drivers
> thermal: Add mode helpers
> thermal: Use mode helpers in drivers
> thermal: Explicitly enable non-changing thermal zone devices
> thermal: core: Stop polling DISABLED thermal devices
> thermal: Simplify or eliminate unnecessary set_mode() methods
> thermal: Rename set_mode() to change_mode()
>
> drivers/acpi/thermal.c | 75 +++++----------
> .../ethernet/chelsio/cxgb4/cxgb4_thermal.c | 8 ++
> .../ethernet/mellanox/mlxsw/core_thermal.c | 91 ++++---------------
> drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 9 +-
> drivers/platform/x86/acerhdf.c | 33 +++----
> drivers/platform/x86/intel_mid_thermal.c | 6 ++
> drivers/power/supply/power_supply_core.c | 9 +-
> drivers/thermal/armada_thermal.c | 6 ++
> drivers/thermal/da9062-thermal.c | 16 +---
> drivers/thermal/dove_thermal.c | 6 ++
> drivers/thermal/hisi_thermal.c | 6 +-
> drivers/thermal/imx_thermal.c | 57 ++++--------
> .../intel/int340x_thermal/int3400_thermal.c | 38 ++------
> .../int340x_thermal/int340x_thermal_zone.c | 5 +
> drivers/thermal/intel/intel_pch_thermal.c | 5 +
> .../thermal/intel/intel_quark_dts_thermal.c | 34 ++-----
> drivers/thermal/intel/intel_soc_dts_iosf.c | 3 +
> drivers/thermal/intel/x86_pkg_temp_thermal.c | 6 ++
> drivers/thermal/kirkwood_thermal.c | 7 ++
> drivers/thermal/rcar_thermal.c | 9 +-
> drivers/thermal/rockchip_thermal.c | 6 +-
> drivers/thermal/spear_thermal.c | 7 ++
> drivers/thermal/sprd_thermal.c | 6 +-
> drivers/thermal/st/st_thermal.c | 5 +
> drivers/thermal/thermal_core.c | 76 ++++++++++++++--
> drivers/thermal/thermal_of.c | 41 +--------
> drivers/thermal/thermal_sysfs.c | 37 +-------
> include/linux/thermal.h | 19 +++-
> 28 files changed, 275 insertions(+), 351 deletions(-)
>
>
> base-commit: 9ebcfadb0610322ac537dd7aa5d9cbc2b2894c68
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-06-30 13:54:10

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

Hi Daniel,

I am reading the logs and can't find anything specific to thermal.

What I can see is

"random: crng init done"

with large times (~200s) and then e.g.

'auto-login-action timed out after 283 seconds'

I'm looking at e.g.
https://storage.kernelci.org/thermal/testing/v5.8-rc3-11-gf5e50bf4d3ef/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-imx6q-sabrelite.html

Is there anywhere else I can look at?

Andrzej

W dniu 30.06.2020 o 14:57, Daniel Lezcano pisze:
>
> Hi Andrzej,
>
> I've tested your series with kernelci and there are 3 regressions for
> the imx6.
>
> https://kernelci.org/test/job/thermal/branch/testing/kernel/v5.8-rc3-11-gf5e50bf4d3ef/plan/baseline/
>
>
> On 29/06/2020 14:29, Andrzej Pietrasiewicz wrote:
>> A respin of v6 with these changes:
>>
>> v6..v7:
>> - removed duplicate S-o-b line from patch 6/11
>>
>> v5..v6:
>> - staticized thermal_zone_device_set_mode() (kbuild test robot)
>>
>> v4..v5:
>>
>> - EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL (Daniel)
>> - dropped unnecessary thermal_zone_device_enable() in int3400_thermal.c
>> and in thermal_of.c (Bartlomiej)
>>
>> Andrzej Pietrasiewicz (11):
>> acpi: thermal: Fix error handling in the register function
>> thermal: Store thermal mode in a dedicated enum
>> thermal: Add current mode to thermal zone device
>> thermal: Store device mode in struct thermal_zone_device
>> thermal: remove get_mode() operation of drivers
>> thermal: Add mode helpers
>> thermal: Use mode helpers in drivers
>> thermal: Explicitly enable non-changing thermal zone devices
>> thermal: core: Stop polling DISABLED thermal devices
>> thermal: Simplify or eliminate unnecessary set_mode() methods
>> thermal: Rename set_mode() to change_mode()
>>
>> drivers/acpi/thermal.c | 75 +++++----------
>> .../ethernet/chelsio/cxgb4/cxgb4_thermal.c | 8 ++
>> .../ethernet/mellanox/mlxsw/core_thermal.c | 91 ++++---------------
>> drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 9 +-
>> drivers/platform/x86/acerhdf.c | 33 +++----
>> drivers/platform/x86/intel_mid_thermal.c | 6 ++
>> drivers/power/supply/power_supply_core.c | 9 +-
>> drivers/thermal/armada_thermal.c | 6 ++
>> drivers/thermal/da9062-thermal.c | 16 +---
>> drivers/thermal/dove_thermal.c | 6 ++
>> drivers/thermal/hisi_thermal.c | 6 +-
>> drivers/thermal/imx_thermal.c | 57 ++++--------
>> .../intel/int340x_thermal/int3400_thermal.c | 38 ++------
>> .../int340x_thermal/int340x_thermal_zone.c | 5 +
>> drivers/thermal/intel/intel_pch_thermal.c | 5 +
>> .../thermal/intel/intel_quark_dts_thermal.c | 34 ++-----
>> drivers/thermal/intel/intel_soc_dts_iosf.c | 3 +
>> drivers/thermal/intel/x86_pkg_temp_thermal.c | 6 ++
>> drivers/thermal/kirkwood_thermal.c | 7 ++
>> drivers/thermal/rcar_thermal.c | 9 +-
>> drivers/thermal/rockchip_thermal.c | 6 +-
>> drivers/thermal/spear_thermal.c | 7 ++
>> drivers/thermal/sprd_thermal.c | 6 +-
>> drivers/thermal/st/st_thermal.c | 5 +
>> drivers/thermal/thermal_core.c | 76 ++++++++++++++--
>> drivers/thermal/thermal_of.c | 41 +--------
>> drivers/thermal/thermal_sysfs.c | 37 +-------
>> include/linux/thermal.h | 19 +++-
>> 28 files changed, 275 insertions(+), 351 deletions(-)
>>
>>
>> base-commit: 9ebcfadb0610322ac537dd7aa5d9cbc2b2894c68
>>
>
>

2020-06-30 14:55:10

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 30/06/2020 15:43, Andrzej Pietrasiewicz wrote:
> Hi Daniel,
>
> I am reading the logs and can't find anything specific to thermal.
>
> What I can see is
>
> "random: crng init done"
>
> with large times (~200s) and then e.g.
>
> 'auto-login-action timed out after 283 seconds'
>
> I'm looking at e.g.
> https://storage.kernelci.org/thermal/testing/v5.8-rc3-11-gf5e50bf4d3ef/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-imx6q-sabrelite.html
>
>
> Is there anywhere else I can look at?

No unfortunately :/

I have a Meerkat96 which uses the same sensor as the imx6q.

I'll give a try to see if I can reproduce and let you know.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-06-30 15:30:35

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

Hi Daniel,

W dniu 30.06.2020 o 16:53, Daniel Lezcano pisze:
> On 30/06/2020 15:43, Andrzej Pietrasiewicz wrote:
>> Hi Daniel,
>>
>> I am reading the logs and can't find anything specific to thermal.
>>
>> What I can see is
>>
>> "random: crng init done"
>>
>> with large times (~200s) and then e.g.
>>
>> 'auto-login-action timed out after 283 seconds'
>>
>> I'm looking at e.g.
>> https://storage.kernelci.org/thermal/testing/v5.8-rc3-11-gf5e50bf4d3ef/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-imx6q-sabrelite.html
>>

f5e50bf4d3ef is PATCH 11/11. Does the problem happen at PATCH 1-10/11?
PATCH 11/11 renames a method and the code compiles, so it seems
unlikely that this is causing problems. One should never say never,
though ;)

The reported failure is not due to some test failing but rather due
to timeout logging into the test system. Could it be that there is
some other problem?

Andrzej

2020-06-30 15:55:05

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 30/06/2020 17:29, Andrzej Pietrasiewicz wrote:
> Hi Daniel,
>
> W dniu 30.06.2020 o 16:53, Daniel Lezcano pisze:
>> On 30/06/2020 15:43, Andrzej Pietrasiewicz wrote:
>>> Hi Daniel,
>>>
>>> I am reading the logs and can't find anything specific to thermal.
>>>
>>> What I can see is
>>>
>>> "random: crng init done"
>>>
>>> with large times (~200s) and then e.g.
>>>
>>> 'auto-login-action timed out after 283 seconds'
>>>
>>> I'm looking at e.g.
>>> https://storage.kernelci.org/thermal/testing/v5.8-rc3-11-gf5e50bf4d3ef/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-imx6q-sabrelite.html
>>>
>>>
>
> f5e50bf4d3ef is PATCH 11/11. Does the problem happen at PATCH 1-10/11?
> PATCH 11/11 renames a method and the code compiles, so it seems
> unlikely that this is causing problems. One should never say never,
> though ;)

The sha1 is just the HEAD for the kernel reference. The regression
happens with your series, somewhere.

> The reported failure is not due to some test failing but rather due
> to timeout logging into the test system. Could it be that there is
> some other problem?

I did reproduce:

v5.8-rc3 + series => imx6 hang at boot time
v5.8-rc3 => imx6 boots correctly


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-06-30 18:43:22

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

Hi,

W dniu 30.06.2020 o 17:53, Daniel Lezcano pisze:
> On 30/06/2020 17:29, Andrzej Pietrasiewicz wrote:
>> Hi Daniel,
>>
>> W dniu 30.06.2020 o 16:53, Daniel Lezcano pisze:
>>> On 30/06/2020 15:43, Andrzej Pietrasiewicz wrote:
>>>> Hi Daniel,
>>>>
>>>> I am reading the logs and can't find anything specific to thermal.
>>>>
>>>> What I can see is
>>>>
>>>> "random: crng init done"
>>>>
>>>> with large times (~200s) and then e.g.
>>>>
>>>> 'auto-login-action timed out after 283 seconds'
>>>>
>>>> I'm looking at e.g.
>>>> https://storage.kernelci.org/thermal/testing/v5.8-rc3-11-gf5e50bf4d3ef/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-imx6q-sabrelite.html
>>>>
>>>>
>>
>> f5e50bf4d3ef is PATCH 11/11. Does the problem happen at PATCH 1-10/11?
>> PATCH 11/11 renames a method and the code compiles, so it seems
>> unlikely that this is causing problems. One should never say never,
>> though ;)
>
> The sha1 is just the HEAD for the kernel reference. The regression
> happens with your series, somewhere.
>
>> The reported failure is not due to some test failing but rather due
>> to timeout logging into the test system. Could it be that there is
>> some other problem?
>
> I did reproduce:
>
> v5.8-rc3 + series => imx6 hang at boot time
> v5.8-rc3 => imx6 boots correctly
>

I kindly ask for a bisect.

Andrzej

2020-06-30 20:52:13

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 30/06/2020 18:56, Andrzej Pietrasiewicz wrote:
> Hi,
>
> W dniu 30.06.2020 o 17:53, Daniel Lezcano pisze:
>> On 30/06/2020 17:29, Andrzej Pietrasiewicz wrote:
>>> Hi Daniel,
>>>
>>> W dniu 30.06.2020 o 16:53, Daniel Lezcano pisze:
>>>> On 30/06/2020 15:43, Andrzej Pietrasiewicz wrote:
>>>>> Hi Daniel,
>>>>>
>>>>> I am reading the logs and can't find anything specific to thermal.
>>>>>
>>>>> What I can see is
>>>>>
>>>>> "random: crng init done"
>>>>>
>>>>> with large times (~200s) and then e.g.
>>>>>
>>>>> 'auto-login-action timed out after 283 seconds'
>>>>>
>>>>> I'm looking at e.g.
>>>>> https://storage.kernelci.org/thermal/testing/v5.8-rc3-11-gf5e50bf4d3ef/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-imx6q-sabrelite.html
>>>>>
>>>>>
>>>>>
>>>
>>> f5e50bf4d3ef is PATCH 11/11. Does the problem happen at PATCH 1-10/11?
>>> PATCH 11/11 renames a method and the code compiles, so it seems
>>> unlikely that this is causing problems. One should never say never,
>>> though ;)
>>
>> The sha1 is just the HEAD for the kernel reference. The regression
>> happens with your series, somewhere.
>>
>>> The reported failure is not due to some test failing but rather due
>>> to timeout logging into the test system. Could it be that there is
>>> some other problem?
>>
>> I did reproduce:
>>
>> v5.8-rc3 + series => imx6 hang at boot time
>> v5.8-rc3 => imx6 boots correctly
>>
>
> I kindly ask for a bisect.

I will give a try but it is a very long process as the board is running
on kernelci.

I was not able to reproduce it on imx7 despite it is the same sensor :/


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-07-01 10:24:21

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

Hi,

W dniu 30.06.2020 o 20:33, Daniel Lezcano pisze:
> On 30/06/2020 18:56, Andrzej Pietrasiewicz wrote:
>> Hi,
>>
>> W dniu 30.06.2020 o 17:53, Daniel Lezcano pisze:
>>> On 30/06/2020 17:29, Andrzej Pietrasiewicz wrote:
>>>> Hi Daniel,
>>>>
>>>> W dniu 30.06.2020 o 16:53, Daniel Lezcano pisze:
>>>>> On 30/06/2020 15:43, Andrzej Pietrasiewicz wrote:
>>>>>> Hi Daniel,
>>>>>>
>>>>>> I am reading the logs and can't find anything specific to thermal.
>>>>>>
>>>>>> What I can see is
>>>>>>
>>>>>> "random: crng init done"
>>>>>>
>>>>>> with large times (~200s) and then e.g.
>>>>>>
>>>>>> 'auto-login-action timed out after 283 seconds'
>>>>>>
>>>>>> I'm looking at e.g.
>>>>>> https://storage.kernelci.org/thermal/testing/v5.8-rc3-11-gf5e50bf4d3ef/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-imx6q-sabrelite.html
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> f5e50bf4d3ef is PATCH 11/11. Does the problem happen at PATCH 1-10/11?
>>>> PATCH 11/11 renames a method and the code compiles, so it seems
>>>> unlikely that this is causing problems. One should never say never,
>>>> though ;)
>>>
>>> The sha1 is just the HEAD for the kernel reference. The regression
>>> happens with your series, somewhere.
>>>
>>>> The reported failure is not due to some test failing but rather due
>>>> to timeout logging into the test system. Could it be that there is
>>>> some other problem?
>>>
>>> I did reproduce:
>>>
>>> v5.8-rc3 + series => imx6 hang at boot time
>>> v5.8-rc3 => imx6 boots correctly
>>>

What did you reproduce? Timeout logging in to the test system or a "real"
failure of a test?

>>
>> I kindly ask for a bisect.
>
> I will give a try but it is a very long process as the board is running
> on kernelci.
>
> I was not able to reproduce it on imx7 despite it is the same sensor :/
>
>

Could it be that the thermal sensors somehow contribute to entropy and after
the series is applied on some machines it takes more time to gather enough
entropy?

Andrzej

2020-07-01 14:27:09

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 01/07/2020 12:23, Andrzej Pietrasiewicz wrote:
> Hi,
>

[ ... ]

>>>>
>>>> I did reproduce:
>>>>
>>>> v5.8-rc3 + series => imx6 hang at boot time
>>>> v5.8-rc3 => imx6 boots correctly
>>>>
>
> What did you reproduce? Timeout logging in to the test system or a
> "real" failure of a test?

Timeout logging. Boot hangs.

>>> I kindly ask for a bisect.
>>
>> I will give a try but it is a very long process as the board is running
>> on kernelci.
>>
>> I was not able to reproduce it on imx7 despite it is the same sensor :/
>>
>>
>
> Could it be that the thermal sensors somehow contribute to entropy and
> after
> the series is applied on some machines it takes more time to gather enough
> entropy?

I assume you are talking about the entropy for random?

It would be really surprising if it is the case. The message appears
asynchronously, I believe the boot flow is stuck in a mutex.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-07-02 13:50:26

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 01/07/2020 12:23, Andrzej Pietrasiewicz wrote:
> Hi,
>
> W dniu 30.06.2020 o 20:33, Daniel Lezcano pisze:
>> On 30/06/2020 18:56, Andrzej Pietrasiewicz wrote:
>>> Hi,
>>>
>>> W dniu 30.06.2020 o 17:53, Daniel Lezcano pisze:
>>>> On 30/06/2020 17:29, Andrzej Pietrasiewicz wrote:
>>>>> Hi Daniel,
>>>>>
>>>>> W dniu 30.06.2020 o 16:53, Daniel Lezcano pisze:
>>>>>> On 30/06/2020 15:43, Andrzej Pietrasiewicz wrote:
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> I am reading the logs and can't find anything specific to thermal.
>>>>>>>
>>>>>>> What I can see is
>>>>>>>
>>>>>>> "random: crng init done"
>>>>>>>
>>>>>>> with large times (~200s) and then e.g.
>>>>>>>
>>>>>>> 'auto-login-action timed out after 283 seconds'
>>>>>>>
>>>>>>> I'm looking at e.g.
>>>>>>> https://storage.kernelci.org/thermal/testing/v5.8-rc3-11-gf5e50bf4d3ef/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-imx6q-sabrelite.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> f5e50bf4d3ef is PATCH 11/11. Does the problem happen at PATCH 1-10/11?
>>>>> PATCH 11/11 renames a method and the code compiles, so it seems
>>>>> unlikely that this is causing problems. One should never say never,
>>>>> though ;)
>>>>
>>>> The sha1 is just the HEAD for the kernel reference. The regression
>>>> happens with your series, somewhere.
>>>>
>>>>> The reported failure is not due to some test failing but rather due
>>>>> to timeout logging into the test system. Could it be that there is
>>>>> some other problem?
>>>>
>>>> I did reproduce:
>>>>
>>>> v5.8-rc3 + series => imx6 hang at boot time
>>>> v5.8-rc3 => imx6 boots correctly

So finally I succeeded to reproduce it on my imx7 locally. The sensor
was failing to initialize for another reason related to the legacy
cooling device, this is why it is not appearing on the imx7.

I can now git-bisect :)



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-07-02 13:54:37

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

Hi Daniel,

<snip>

>>>>>
>>>>> I did reproduce:
>>>>>
>>>>> v5.8-rc3 + series => imx6 hang at boot time
>>>>> v5.8-rc3 => imx6 boots correctly
>
> So finally I succeeded to reproduce it on my imx7 locally. The sensor
> was failing to initialize for another reason related to the legacy
> cooling device, this is why it is not appearing on the imx7.
>
> I can now git-bisect :)
>

That would be very kind of you, thank you!

Andrzej

2020-07-02 14:59:12

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
> Hi Daniel,
>
> <snip>
>
>>>>>>
>>>>>> I did reproduce:
>>>>>>
>>>>>> v5.8-rc3 + series => imx6 hang at boot time
>>>>>> v5.8-rc3 => imx6 boots correctly
>>
>> So finally I succeeded to reproduce it on my imx7 locally. The sensor
>> was failing to initialize for another reason related to the legacy
>> cooling device, this is why it is not appearing on the imx7.
>>
>> I can now git-bisect :)
>>
>
> That would be very kind of you, thank you!

Author: Andrzej Pietrasiewicz <[email protected]>
Date: Mon Jun 29 14:29:21 2020 +0200

thermal: Use mode helpers in drivers

Use thermal_zone_device_{en|dis}able() and
thermal_zone_device_is_enabled().



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-07-02 17:02:10

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
> Hi Daniel,
>
> <snip>
>
>>>>>>
>>>>>> I did reproduce:
>>>>>>
>>>>>> v5.8-rc3 + series => imx6 hang at boot time
>>>>>> v5.8-rc3 => imx6 boots correctly
>>
>> So finally I succeeded to reproduce it on my imx7 locally. The sensor
>> was failing to initialize for another reason related to the legacy
>> cooling device, this is why it is not appearing on the imx7.
>>
>> I can now git-bisect :)
>>
>
> That would be very kind of you, thank you!

With the lock correctness option enabled:

[ 4.179223] imx_thermal tempmon: Extended Commercial CPU temperature
grade - max:105C critical:100C passive:95C
[ 4.189557]
[ 4.191060] ============================================
[ 4.196378] WARNING: possible recursive locking detected
[ 4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
[ 4.207102] --------------------------------------------
[ 4.212421] kworker/0:3/54 is trying to acquire lock:
[ 4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
thermal_zone_device_is_enabled+0x18/0x34
[ 4.225777]
[ 4.225777] but task is already holding lock:
[ 4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
thermal_zone_get_temp+0x38/0x6c
[ 4.239121]
[ 4.239121] other info that might help us debug this:
[ 4.245655] Possible unsafe locking scenario:
[ 4.245655]
[ 4.251579] CPU0
[ 4.254031] ----
[ 4.256481] lock(&tz->lock);
[ 4.259544] lock(&tz->lock);
[ 4.262608]
[ 4.262608] *** DEADLOCK ***
[ 4.262608]
[ 4.268533] May be due to missing lock nesting notation
[ 4.268533]
[ 4.275329] 4 locks held by kworker/0:3/54:
[ 4.279517] #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, at:
process_one_work+0x224/0x808
[ 4.288241] #1: ca075f10 (deferred_probe_work){+.+.}-{0:0}, at:
process_one_work+0x224/0x808
[ 4.296787] #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
__device_attach+0x30/0x140
[ 4.304468] #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
thermal_zone_get_temp+0x38/0x6c
[ 4.312408]
[ 4.312408] stack backtrace:
[ 4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
5.8.0-rc3-00011-gf5e50bf4d3ef #42
[ 4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
[ 4.330809] Workqueue: events deferred_probe_work_func
[ 4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
(show_stack+0x10/0x14)
[ 4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
(dump_stack+0xe8/0x114)
[ 4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
(__lock_acquire+0xbfc/0x2cb4)
[ 4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
(lock_acquire+0xf4/0x4e4)
[ 4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
(__mutex_lock+0xb0/0xaa8)
[ 4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
(mutex_lock_nested+0x1c/0x24)
[ 4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
(thermal_zone_device_is_enabled+0x18/0x34)
[ 4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
[<c0d9da90>] (imx_get_temp+0x30/0x208)
[ 4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
(thermal_zone_get_temp+0x4c/0x6c)
[ 4.409640] [<c0d97484>] (thermal_zone_get_temp) from [<c0d93df0>]
(thermal_zone_device_update+0x8c/0x258)
[ 4.419310] [<c0d93df0>] (thermal_zone_device_update) from
[<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
[ 4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
[<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
[ 4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
(platform_drv_probe+0x48/0x98)
[ 4.447622] [<c0a78388>] (platform_drv_probe) from [<c0a7603c>]
(really_probe+0x218/0x348)
[ 4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
(driver_probe_device+0x5c/0xb4)
[ 4.464098] [<c0a76278>] (driver_probe_device) from [<c0a743bc>]
(bus_for_each_drv+0x58/0xb8)
[ 4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
(__device_attach+0xd4/0x140)
[ 4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
(bus_probe_device+0x88/0x90)
[ 4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
(deferred_probe_work_func+0x68/0x98)
[ 4.498088] [<c0a75564>] (deferred_probe_work_func) from [<c0369988>]
(process_one_work+0x2e0/0x808)
[ 4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
(worker_thread+0x2a0/0x59c)
[ 4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
(kthread+0x16c/0x178)
[ 4.522843] [<c0372208>] (kthread) from [<c0300174>]
(ret_from_fork+0x14/0x20)
[ 4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
[ 4.535138] 5fa0: 00000000
00000000 00000000 00000000
[ 4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[ 4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-07-02 17:20:37

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

Hi,

W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze:
> On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
>> Hi Daniel,
>>
>> <snip>
>>
>>>>>>>
>>>>>>> I did reproduce:
>>>>>>>
>>>>>>> v5.8-rc3 + series => imx6 hang at boot time
>>>>>>> v5.8-rc3 => imx6 boots correctly
>>>
>>> So finally I succeeded to reproduce it on my imx7 locally. The sensor
>>> was failing to initialize for another reason related to the legacy
>>> cooling device, this is why it is not appearing on the imx7.
>>>
>>> I can now git-bisect :)
>>>
>>
>> That would be very kind of you, thank you!
>
> With the lock correctness option enabled:
>
> [ 4.179223] imx_thermal tempmon: Extended Commercial CPU temperature
> grade - max:105C critical:100C passive:95C
> [ 4.189557]
> [ 4.191060] ============================================
> [ 4.196378] WARNING: possible recursive locking detected
> [ 4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
> [ 4.207102] --------------------------------------------
> [ 4.212421] kworker/0:3/54 is trying to acquire lock:
> [ 4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> thermal_zone_device_is_enabled+0x18/0x34
> [ 4.225777]
> [ 4.225777] but task is already holding lock:
> [ 4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> thermal_zone_get_temp+0x38/0x6c
> [ 4.239121]
> [ 4.239121] other info that might help us debug this:
> [ 4.245655] Possible unsafe locking scenario:
> [ 4.245655]
> [ 4.251579] CPU0
> [ 4.254031] ----
> [ 4.256481] lock(&tz->lock);
> [ 4.259544] lock(&tz->lock);
> [ 4.262608]
> [ 4.262608] *** DEADLOCK ***
> [ 4.262608]
> [ 4.268533] May be due to missing lock nesting notation
> [ 4.268533]
> [ 4.275329] 4 locks held by kworker/0:3/54:
> [ 4.279517] #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, at:
> process_one_work+0x224/0x808
> [ 4.288241] #1: ca075f10 (deferred_probe_work){+.+.}-{0:0}, at:
> process_one_work+0x224/0x808
> [ 4.296787] #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
> __device_attach+0x30/0x140
> [ 4.304468] #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> thermal_zone_get_temp+0x38/0x6c
> [ 4.312408]
> [ 4.312408] stack backtrace:
> [ 4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
> 5.8.0-rc3-00011-gf5e50bf4d3ef #42
> [ 4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
> [ 4.330809] Workqueue: events deferred_probe_work_func
> [ 4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
> (show_stack+0x10/0x14)
> [ 4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
> (dump_stack+0xe8/0x114)
> [ 4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
> (__lock_acquire+0xbfc/0x2cb4)
> [ 4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
> (lock_acquire+0xf4/0x4e4)
> [ 4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
> (__mutex_lock+0xb0/0xaa8)
> [ 4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
> (mutex_lock_nested+0x1c/0x24)
> [ 4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
> (thermal_zone_device_is_enabled+0x18/0x34)
> [ 4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
> [<c0d9da90>] (imx_get_temp+0x30/0x208)
> [ 4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
> (thermal_zone_get_temp+0x4c/0x6c)
> [ 4.409640] [<c0d97484>] (thermal_zone_get_temp) from [<c0d93df0>]
> (thermal_zone_device_update+0x8c/0x258)
> [ 4.419310] [<c0d93df0>] (thermal_zone_device_update) from
> [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
> [ 4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
> [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
> [ 4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
> (platform_drv_probe+0x48/0x98)
> [ 4.447622] [<c0a78388>] (platform_drv_probe) from [<c0a7603c>]
> (really_probe+0x218/0x348)
> [ 4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
> (driver_probe_device+0x5c/0xb4)
> [ 4.464098] [<c0a76278>] (driver_probe_device) from [<c0a743bc>]
> (bus_for_each_drv+0x58/0xb8)
> [ 4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
> (__device_attach+0xd4/0x140)
> [ 4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
> (bus_probe_device+0x88/0x90)
> [ 4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
> (deferred_probe_work_func+0x68/0x98)
> [ 4.498088] [<c0a75564>] (deferred_probe_work_func) from [<c0369988>]
> (process_one_work+0x2e0/0x808)
> [ 4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
> (worker_thread+0x2a0/0x59c)
> [ 4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
> (kthread+0x16c/0x178)
> [ 4.522843] [<c0372208>] (kthread) from [<c0300174>]
> (ret_from_fork+0x14/0x20)
> [ 4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
> [ 4.535138] 5fa0: 00000000
> 00000000 00000000 00000000
> [ 4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> [ 4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>
>
>

Thanks!

That confirms your suspicions.

So the reason is that ->get_temp() is called while the mutex is held and
thermal_zone_device_is_enabled() wants to take the same mutex.

Is adding a comment to thermal_zone_device_is_enabled() to never call
it while the mutex is held and adding another version of it which does
not take the mutex ok?

Andrzej

2020-07-02 17:51:08

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote:
> Hi,
>
> W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze:
>> On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
>>> Hi Daniel,
>>>
>>> <snip>
>>>
>>>>>>>>
>>>>>>>> I did reproduce:
>>>>>>>>
>>>>>>>> v5.8-rc3 + series => imx6 hang at boot time
>>>>>>>> v5.8-rc3 => imx6 boots correctly
>>>>
>>>> So finally I succeeded to reproduce it on my imx7 locally. The sensor
>>>> was failing to initialize for another reason related to the legacy
>>>> cooling device, this is why it is not appearing on the imx7.
>>>>
>>>> I can now git-bisect :)
>>>>
>>>
>>> That would be very kind of you, thank you!
>>
>> With the lock correctness option enabled:
>>
>> [    4.179223] imx_thermal tempmon: Extended Commercial CPU temperature
>> grade - max:105C critical:100C passive:95C
>> [    4.189557]
>> [    4.191060] ============================================
>> [    4.196378] WARNING: possible recursive locking detected
>> [    4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
>> [    4.207102] --------------------------------------------
>> [    4.212421] kworker/0:3/54 is trying to acquire lock:
>> [    4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
>> thermal_zone_device_is_enabled+0x18/0x34
>> [    4.225777]
>> [    4.225777] but task is already holding lock:
>> [    4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
>> thermal_zone_get_temp+0x38/0x6c
>> [    4.239121]
>> [    4.239121] other info that might help us debug this:
>> [    4.245655]  Possible unsafe locking scenario:
>> [    4.245655]
>> [    4.251579]        CPU0
>> [    4.254031]        ----
>> [    4.256481]   lock(&tz->lock);
>> [    4.259544]   lock(&tz->lock);
>> [    4.262608]
>> [    4.262608]  *** DEADLOCK ***
>> [    4.262608]
>> [    4.268533]  May be due to missing lock nesting notation
>> [    4.268533]
>> [    4.275329] 4 locks held by kworker/0:3/54:
>> [    4.279517]  #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, at:
>> process_one_work+0x224/0x808
>> [    4.288241]  #1: ca075f10 (deferred_probe_work){+.+.}-{0:0}, at:
>> process_one_work+0x224/0x808
>> [    4.296787]  #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
>> __device_attach+0x30/0x140
>> [    4.304468]  #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
>> thermal_zone_get_temp+0x38/0x6c
>> [    4.312408]
>> [    4.312408] stack backtrace:
>> [    4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
>> 5.8.0-rc3-00011-gf5e50bf4d3ef #42
>> [    4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
>> [    4.330809] Workqueue: events deferred_probe_work_func
>> [    4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
>> (show_stack+0x10/0x14)
>> [    4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
>> (dump_stack+0xe8/0x114)
>> [    4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
>> (__lock_acquire+0xbfc/0x2cb4)
>> [    4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
>> (lock_acquire+0xf4/0x4e4)
>> [    4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
>> (__mutex_lock+0xb0/0xaa8)
>> [    4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
>> (mutex_lock_nested+0x1c/0x24)
>> [    4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
>> (thermal_zone_device_is_enabled+0x18/0x34)
>> [    4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
>> [<c0d9da90>] (imx_get_temp+0x30/0x208)
>> [    4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
>> (thermal_zone_get_temp+0x4c/0x6c)
>> [    4.409640] [<c0d97484>] (thermal_zone_get_temp) from [<c0d93df0>]
>> (thermal_zone_device_update+0x8c/0x258)
>> [    4.419310] [<c0d93df0>] (thermal_zone_device_update) from
>> [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
>> [    4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
>> [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
>> [    4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
>> (platform_drv_probe+0x48/0x98)
>> [    4.447622] [<c0a78388>] (platform_drv_probe) from [<c0a7603c>]
>> (really_probe+0x218/0x348)
>> [    4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
>> (driver_probe_device+0x5c/0xb4)
>> [    4.464098] [<c0a76278>] (driver_probe_device) from [<c0a743bc>]
>> (bus_for_each_drv+0x58/0xb8)
>> [    4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
>> (__device_attach+0xd4/0x140)
>> [    4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
>> (bus_probe_device+0x88/0x90)
>> [    4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
>> (deferred_probe_work_func+0x68/0x98)
>> [    4.498088] [<c0a75564>] (deferred_probe_work_func) from [<c0369988>]
>> (process_one_work+0x2e0/0x808)
>> [    4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
>> (worker_thread+0x2a0/0x59c)
>> [    4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
>> (kthread+0x16c/0x178)
>> [    4.522843] [<c0372208>] (kthread) from [<c0300174>]
>> (ret_from_fork+0x14/0x20)
>> [    4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
>> [    4.535138] 5fa0:                                     00000000
>> 00000000 00000000 00000000
>> [    4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
>> 00000000 00000000 00000000
>> [    4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013
>> 00000000
>>
>>
>>
>
> Thanks!
>
> That confirms your suspicions.
>
> So the reason is that ->get_temp() is called while the mutex is held and
> thermal_zone_device_is_enabled() wants to take the same mutex.

Yes, that's correct.

> Is adding a comment to thermal_zone_device_is_enabled() to never call
> it while the mutex is held and adding another version of it which does
> not take the mutex ok?

The thermal_zone_device_is_enabled() is only used in two places, acpi
and this imx driver, and given:

1. as soon as the mutex is released, there is no guarantee the thermal
zone won't be changed right after, the lock is pointless, thus the
information also.

2. from a design point of view, I don't see why a driver should know if
a thermal zone is disabled or not

It would make sense to end with this function and do not give the
different drivers an opportunity to access this information.

Why not add change_mode for the acpi in order to enable or disable the
events and for imx_thermal use irq_enabled flag instead of the thermal
zone mode? Moreover it is very unclear why this function is needed in
imx_get_temp(), and I suspect we should be able to get rid of it.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-07-02 17:53:07

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 02/07/2020 19:49, Daniel Lezcano wrote:

[ ... ]

>> Thanks!
>>
>> That confirms your suspicions.
>>
>> So the reason is that ->get_temp() is called while the mutex is held and
>> thermal_zone_device_is_enabled() wants to take the same mutex.
>
> Yes, that's correct.
>
>> Is adding a comment to thermal_zone_device_is_enabled() to never call
>> it while the mutex is held and adding another version of it which does
>> not take the mutex ok?
>
> The thermal_zone_device_is_enabled() is only used in two places, acpi
> and this imx driver, and given:
>
> 1. as soon as the mutex is released, there is no guarantee the thermal
> zone won't be changed right after, the lock is pointless, thus the
> information also.
>
> 2. from a design point of view, I don't see why a driver should know if
> a thermal zone is disabled or not
>
> It would make sense to end with this function and do not give the
> different drivers an opportunity to access this information.
>
> Why not add change_mode for the acpi in order to enable or disable the
> events and for imx_thermal use irq_enabled flag instead of the thermal
> zone mode? Moreover it is very unclear why this function is needed in
> imx_get_temp(), and I suspect we should be able to get rid of it.

If you agree with that you can send a patch on top of your series so I
can test it fixes the imx platform.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-07-03 01:51:58

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On Thu, 2020-07-02 at 19:49 +0200, Daniel Lezcano wrote:
> On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote:
> > Hi,
> >
> > W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze:
> > > On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
> > > > Hi Daniel,
> > > >
> > > > <snip>
> > > >
> > > > > > > > >
> > > > > > > > > I did reproduce:
> > > > > > > > >
> > > > > > > > > v5.8-rc3 + series => imx6 hang at boot time
> > > > > > > > > v5.8-rc3 => imx6 boots correctly
> > > > >
> > > > > So finally I succeeded to reproduce it on my imx7 locally.
> > > > > The sensor
> > > > > was failing to initialize for another reason related to the
> > > > > legacy
> > > > > cooling device, this is why it is not appearing on the imx7.
> > > > >
> > > > > I can now git-bisect :)
> > > > >
> > > >
> > > > That would be very kind of you, thank you!
> > >
> > > With the lock correctness option enabled:
> > >
> > > [ 4.179223] imx_thermal tempmon: Extended Commercial CPU
> > > temperature
> > > grade - max:105C critical:100C passive:95C
> > > [ 4.189557]
> > > [ 4.191060] ============================================
> > > [ 4.196378] WARNING: possible recursive locking detected
> > > [ 4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
> > > [ 4.207102] --------------------------------------------
> > > [ 4.212421] kworker/0:3/54 is trying to acquire lock:
> > > [ 4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_device_is_enabled+0x18/0x34
> > > [ 4.225777]
> > > [ 4.225777] but task is already holding lock:
> > > [ 4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_get_temp+0x38/0x6c
> > > [ 4.239121]
> > > [ 4.239121] other info that might help us debug this:
> > > [ 4.245655] Possible unsafe locking scenario:
> > > [ 4.245655]
> > > [ 4.251579] CPU0
> > > [ 4.254031] ----
> > > [ 4.256481] lock(&tz->lock);
> > > [ 4.259544] lock(&tz->lock);
> > > [ 4.262608]
> > > [ 4.262608] *** DEADLOCK ***
> > > [ 4.262608]
> > > [ 4.268533] May be due to missing lock nesting notation
> > > [ 4.268533]
> > > [ 4.275329] 4 locks held by kworker/0:3/54:
> > > [ 4.279517] #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0},
> > > at:
> > > process_one_work+0x224/0x808
> > > [ 4.288241] #1: ca075f10 (deferred_probe_work){+.+.}-{0:0},
> > > at:
> > > process_one_work+0x224/0x808
> > > [ 4.296787] #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
> > > __device_attach+0x30/0x140
> > > [ 4.304468] #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_get_temp+0x38/0x6c
> > > [ 4.312408]
> > > [ 4.312408] stack backtrace:
> > > [ 4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
> > > 5.8.0-rc3-00011-gf5e50bf4d3ef #42
> > > [ 4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
> > > [ 4.330809] Workqueue: events deferred_probe_work_func
> > > [ 4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
> > > (show_stack+0x10/0x14)
> > > [ 4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
> > > (dump_stack+0xe8/0x114)
> > > [ 4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
> > > (__lock_acquire+0xbfc/0x2cb4)
> > > [ 4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
> > > (lock_acquire+0xf4/0x4e4)
> > > [ 4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
> > > (__mutex_lock+0xb0/0xaa8)
> > > [ 4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
> > > (mutex_lock_nested+0x1c/0x24)
> > > [ 4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
> > > (thermal_zone_device_is_enabled+0x18/0x34)
> > > [ 4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
> > > [<c0d9da90>] (imx_get_temp+0x30/0x208)
> > > [ 4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
> > > (thermal_zone_get_temp+0x4c/0x6c)
> > > [ 4.409640] [<c0d97484>] (thermal_zone_get_temp) from
> > > [<c0d93df0>]
> > > (thermal_zone_device_update+0x8c/0x258)
> > > [ 4.419310] [<c0d93df0>] (thermal_zone_device_update) from
> > > [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
> > > [ 4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
> > > [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
> > > [ 4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
> > > (platform_drv_probe+0x48/0x98)
> > > [ 4.447622] [<c0a78388>] (platform_drv_probe) from
> > > [<c0a7603c>]
> > > (really_probe+0x218/0x348)
> > > [ 4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
> > > (driver_probe_device+0x5c/0xb4)
> > > [ 4.464098] [<c0a76278>] (driver_probe_device) from
> > > [<c0a743bc>]
> > > (bus_for_each_drv+0x58/0xb8)
> > > [ 4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
> > > (__device_attach+0xd4/0x140)
> > > [ 4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
> > > (bus_probe_device+0x88/0x90)
> > > [ 4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
> > > (deferred_probe_work_func+0x68/0x98)
> > > [ 4.498088] [<c0a75564>] (deferred_probe_work_func) from
> > > [<c0369988>]
> > > (process_one_work+0x2e0/0x808)
> > > [ 4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
> > > (worker_thread+0x2a0/0x59c)
> > > [ 4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
> > > (kthread+0x16c/0x178)
> > > [ 4.522843] [<c0372208>] (kthread) from [<c0300174>]
> > > (ret_from_fork+0x14/0x20)
> > > [ 4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
> > > [ 4.535138] 5fa0: 00000000
> > > 00000000 00000000 00000000
> > > [ 4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000 00000000
> > > [ 4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013
> > > 00000000
> > >
> > >
> > >
> >
> > Thanks!
> >
> > That confirms your suspicions.
> >
> > So the reason is that ->get_temp() is called while the mutex is
> > held and
> > thermal_zone_device_is_enabled() wants to take the same mutex.
>
> Yes, that's correct.
>
> > Is adding a comment to thermal_zone_device_is_enabled() to never
> > call
> > it while the mutex is held and adding another version of it which
> > does
> > not take the mutex ok?
>
> The thermal_zone_device_is_enabled() is only used in two places, acpi
> and this imx driver, and given:
>
> 1. as soon as the mutex is released, there is no guarantee the
> thermal
> zone won't be changed right after, the lock is pointless, thus the
> information also.
>
> 2. from a design point of view, I don't see why a driver should know
> if
> a thermal zone is disabled or not
>
> It would make sense to end with this function and do not give the
> different drivers an opportunity to access this information.

I agree.
>
> Why not add change_mode for the acpi in order to enable or disable
> the
> events

thermal_zone_device_is_enabled() is invoked in acpi thermal driver
because we only want to do thermal_zone_device_update() when the acpi
thermal zone is enabled.

As thermal_zone_device_update() can handle a disabled thermal zone now,
we can just remove the check.

thanks,
rui

> and for imx_thermal use irq_enabled flag instead of the thermal
> zone mode? Moreover it is very unclear why this function is needed in
> imx_get_temp(), and I suspect we should be able to get rid of it.
>
>

2020-07-03 06:39:49

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 03/07/2020 03:49, Zhang Rui wrote:
> On Thu, 2020-07-02 at 19:49 +0200, Daniel Lezcano wrote:

[ ... ]

>>> So the reason is that ->get_temp() is called while the mutex is
>>> held and
>>> thermal_zone_device_is_enabled() wants to take the same mutex.
>>
>> Yes, that's correct.
>>
>>> Is adding a comment to thermal_zone_device_is_enabled() to never
>>> call
>>> it while the mutex is held and adding another version of it which
>>> does
>>> not take the mutex ok?
>>
>> The thermal_zone_device_is_enabled() is only used in two places, acpi
>> and this imx driver, and given:
>>
>> 1. as soon as the mutex is released, there is no guarantee the
>> thermal
>> zone won't be changed right after, the lock is pointless, thus the
>> information also.
>>
>> 2. from a design point of view, I don't see why a driver should know
>> if
>> a thermal zone is disabled or not
>>
>> It would make sense to end with this function and do not give the
>> different drivers an opportunity to access this information.
>
> I agree.
>>
>> Why not add change_mode for the acpi in order to enable or disable
>> the
>> events
>
> thermal_zone_device_is_enabled() is invoked in acpi thermal driver
> because we only want to do thermal_zone_device_update() when the acpi
> thermal zone is enabled.
>
> As thermal_zone_device_update() can handle a disabled thermal zone now,
> we can just remove the check.

Ah yes, good point!



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-07-03 10:46:21

by Andrzej Pietrasiewicz

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

Hi,

W dniu 03.07.2020 o 08:38, Daniel Lezcano pisze:
> On 03/07/2020 03:49, Zhang Rui wrote:
>> On Thu, 2020-07-02 at 19:49 +0200, Daniel Lezcano wrote:
>
> [ ... ]
>
>>>> So the reason is that ->get_temp() is called while the mutex is
>>>> held and
>>>> thermal_zone_device_is_enabled() wants to take the same mutex.
>>>
>>> Yes, that's correct.
>>>
>>>> Is adding a comment to thermal_zone_device_is_enabled() to never
>>>> call
>>>> it while the mutex is held and adding another version of it which
>>>> does
>>>> not take the mutex ok?
>>>
>>> The thermal_zone_device_is_enabled() is only used in two places, acpi
>>> and this imx driver, and given:
>>>
>>> 1. as soon as the mutex is released, there is no guarantee the
>>> thermal
>>> zone won't be changed right after, the lock is pointless, thus the
>>> information also.
>>>
>>> 2. from a design point of view, I don't see why a driver should know
>>> if
>>> a thermal zone is disabled or not
>>>
>>> It would make sense to end with this function and do not give the
>>> different drivers an opportunity to access this information.
>>
>> I agree.
>>>
>>> Why not add change_mode for the acpi in order to enable or disable
>>> the
>>> events
>>
>> thermal_zone_device_is_enabled() is invoked in acpi thermal driver
>> because we only want to do thermal_zone_device_update() when the acpi
>> thermal zone is enabled.
>>
>> As thermal_zone_device_update() can handle a disabled thermal zone now,
>> we can just remove the check.
>
> Ah yes, good point!
>
>
>

I sent a short series with fixes. Daniel, can you kindly test it?

Regards,

Andrzej

2020-07-03 11:05:55

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices

On 03/07/2020 12:45, Andrzej Pietrasiewicz wrote:
> Hi,
>
> W dniu 03.07.2020 o 08:38, Daniel Lezcano pisze:
>> On 03/07/2020 03:49, Zhang Rui wrote:
>>> On Thu, 2020-07-02 at 19:49 +0200, Daniel Lezcano wrote:
>>
>> [ ... ]
>>
>>>>> So the reason is that ->get_temp() is called while the mutex is
>>>>> held and
>>>>> thermal_zone_device_is_enabled() wants to take the same mutex.
>>>>
>>>> Yes, that's correct.
>>>>
>>>>> Is adding a comment to thermal_zone_device_is_enabled() to never
>>>>> call
>>>>> it while the mutex is held and adding another version of it which
>>>>> does
>>>>> not take the mutex ok?
>>>>
>>>> The thermal_zone_device_is_enabled() is only used in two places, acpi
>>>> and this imx driver, and given:
>>>>
>>>> 1. as soon as the mutex is released, there is no guarantee the
>>>> thermal
>>>> zone won't be changed right after, the lock is pointless, thus the
>>>> information also.
>>>>
>>>> 2. from a design point of view, I don't see why a driver should know
>>>> if
>>>> a thermal zone is disabled or not
>>>>
>>>> It would make sense to end with this function and do not give the
>>>> different drivers an opportunity to access this information.
>>>
>>> I agree.
>>>>
>>>> Why not add change_mode for the acpi in order to enable or disable
>>>> the
>>>> events
>>>
>>> thermal_zone_device_is_enabled() is invoked in acpi thermal driver
>>> because we only want to do thermal_zone_device_update() when the acpi
>>> thermal zone is enabled.
>>>
>>> As thermal_zone_device_update() can handle a disabled thermal zone now,
>>> we can just remove the check.
>>
>> Ah yes, good point!
>>
>>
>>
>
> I sent a short series with fixes. Daniel, can you kindly test it?

I confirm the i.MX is now correctly booting with the thermal zone
temperature available.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog