Subject: [PATCH v2] thermal/core: Clear all mitigation when thermal zone is disabled

Whenever a thermal zone is in trip violated state, there is a chance
that the same thermal zone mode can be disabled either via thermal
core API or via thermal zone sysfs. Once it is disabled, the framework
bails out any re-evaluation of thermal zone. It leads to a case where
if it is already in mitigation state, it will stay the same state
until it is re-enabled.

To avoid above mentioned issue, on thermal zone disable request
reset thermal zone and clear mitigation for each trip explicitly.

Signed-off-by: Manaf Meethalavalappu Pallikunhi <[email protected]>
---
drivers/thermal/thermal_core.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 51374f4..5f4e35b 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -427,6 +427,7 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
enum thermal_device_mode mode)
{
int ret = 0;
+ int trip;

mutex_lock(&tz->lock);

@@ -449,8 +450,14 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,

if (mode == THERMAL_DEVICE_ENABLED)
thermal_notify_tz_enable(tz->id);
- else
+ else {
+ /* make sure all previous throttlings are cleared */
+ thermal_zone_device_init(tz);
+ for (trip = 0; trip < tz->trips; trip++)
+ handle_thermal_trip(tz, trip);
+
thermal_notify_tz_disable(tz->id);
+ }

return ret;
}



2021-12-30 16:10:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] thermal/core: Clear all mitigation when thermal zone is disabled

On Wed, Dec 29, 2021 at 8:03 AM Manaf Meethalavalappu Pallikunhi
<[email protected]> wrote:
>
> Whenever a thermal zone is in trip violated state, there is a chance
> that the same thermal zone mode can be disabled either via thermal
> core API or via thermal zone sysfs. Once it is disabled, the framework
> bails out any re-evaluation of thermal zone. It leads to a case where
> if it is already in mitigation state, it will stay the same state
> until it is re-enabled.

You seem to be arguing that disabling a thermal zone should prevent it
from throttling anything, which is reasonable, but I'm not sure if the
change below is sufficient for that.

> To avoid above mentioned issue, on thermal zone disable request
> reset thermal zone and clear mitigation for each trip explicitly.
>
> Signed-off-by: Manaf Meethalavalappu Pallikunhi <[email protected]>
> ---
> drivers/thermal/thermal_core.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 51374f4..5f4e35b 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -427,6 +427,7 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
> enum thermal_device_mode mode)
> {
> int ret = 0;
> + int trip;

This can be declared in the block in which it is used.

>
> mutex_lock(&tz->lock);
>
> @@ -449,8 +450,14 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
>
> if (mode == THERMAL_DEVICE_ENABLED)

The coding style asks for braces here if they are used after the else.

> thermal_notify_tz_enable(tz->id);
> - else
> + else {
> + /* make sure all previous throttlings are cleared */
> + thermal_zone_device_init(tz);
> + for (trip = 0; trip < tz->trips; trip++)
> + handle_thermal_trip(tz, trip);

So I'm not sure if this makes the throttling go away in all cases (eg.
what if the current temperature is still above the given trip at this
point?).

> +
> thermal_notify_tz_disable(tz->id);
> + }
>
> return ret;
> }
>

Subject: Re: [PATCH v2] thermal/core: Clear all mitigation when thermal zone is disabled


On 12/30/2021 9:39 PM, Rafael J. Wysocki wrote:
> On Wed, Dec 29, 2021 at 8:03 AM Manaf Meethalavalappu Pallikunhi
> <[email protected]> wrote:
>> Whenever a thermal zone is in trip violated state, there is a chance
>> that the same thermal zone mode can be disabled either via thermal
>> core API or via thermal zone sysfs. Once it is disabled, the framework
>> bails out any re-evaluation of thermal zone. It leads to a case where
>> if it is already in mitigation state, it will stay the same state
>> until it is re-enabled.
> You seem to be arguing that disabling a thermal zone should prevent it
> from throttling anything, which is reasonable, but I'm not sure if the
> change below is sufficient for that.
>
>> To avoid above mentioned issue, on thermal zone disable request
>> reset thermal zone and clear mitigation for each trip explicitly.
>>
>> Signed-off-by: Manaf Meethalavalappu Pallikunhi <[email protected]>
>> ---
>> drivers/thermal/thermal_core.c | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
>> index 51374f4..5f4e35b 100644
>> --- a/drivers/thermal/thermal_core.c
>> +++ b/drivers/thermal/thermal_core.c
>> @@ -427,6 +427,7 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
>> enum thermal_device_mode mode)
>> {
>> int ret = 0;
>> + int trip;
> This can be declared in the block in which it is used.
Sure, will update in next patch version
>
>> mutex_lock(&tz->lock);
>>
>> @@ -449,8 +450,14 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
>>
>> if (mode == THERMAL_DEVICE_ENABLED)
> The coding style asks for braces here if they are used after the else.
Sure will update in next version
>> thermal_notify_tz_enable(tz->id);
>> - else
>> + else {
>> + /* make sure all previous throttlings are cleared */
>> + thermal_zone_device_init(tz);
>> + for (trip = 0; trip < tz->trips; trip++)
>> + handle_thermal_trip(tz, trip);
> So I'm not sure if this makes the throttling go away in all cases (eg.
> what if the current temperature is still above the given trip at this
> point?).

The thermal_zone_device_init() will reset current temperature with THERMAL_TEMP_INVALID. Then the following API
thandle_thermal_trip() doesn't call get_temp to sensor driver again instead it will use this reset temperature
value for each trip re-evaluation. Hence for above mentioned case, I think it will clear the throttling
irrespective of what is the actual current temperature at that instant. Otherwise please correct me

May I know is there any other possible cases where throttling will not go away completely ?

>> +
>> thermal_notify_tz_disable(tz->id);
>> + }
>>
>> return ret;
>> }
>>