2022-10-10 14:50:30

by Qibo Huang

[permalink] [raw]
Subject: [PATCH] thermal/governors: Fix cooling device setting cooling state failure

Because the __thermal_cdev_update function traverses the
cooling_device->thermal_instances list to obtain the maximum
target state, and then the cooling device sets the maximum
cooling state. However, the power_actor_set_power function
only updates the target value of thermal_zone->thermal_instances
to the target state, and does not update the target value of
cooling_device->thermal_instances, resulting in the target
being 0 all the time.

Signed-off-by: Qibo Huang <[email protected]>
---
drivers/thermal/gov_power_allocator.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 2d1aeaba38a8..8a6a08906dd4 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -293,6 +293,7 @@ power_actor_set_power(struct thermal_cooling_device *cdev,
struct thermal_instance *instance, u32 power)
{
unsigned long state;
+ struct thermal_instance *cdev_instance;
int ret;

ret = cdev->ops->power2state(cdev, power, &state);
@@ -300,6 +301,10 @@ power_actor_set_power(struct thermal_cooling_device *cdev,
return ret;

instance->target = clamp_val(state, instance->lower, instance->upper);
+ list_for_each_entry(cdev_instance, &cdev->thermal_instances, cdev_node) {
+ if (cdev_instance->tz->id == instance->tz->id)
+ cdev_instance->target = state;
+ }
mutex_lock(&cdev->lock);
__thermal_cdev_update(cdev);
mutex_unlock(&cdev->lock);
--
2.37.1


2022-10-11 08:11:31

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH] thermal/governors: Fix cooling device setting cooling state failure

Hi Qibo,

Thank you for using IPA and sending a patch. Unfortunately,
this code cannot be added into the governor.

On 10/10/22 15:43, Qibo Huang wrote:
> Because the __thermal_cdev_update function traverses the
> cooling_device->thermal_instances list to obtain the maximum
> target state, and then the cooling device sets the maximum
> cooling state. However, the power_actor_set_power function
> only updates the target value of thermal_zone->thermal_instances
> to the target state, and does not update the target value of
> cooling_device->thermal_instances, resulting in the target
> being 0 all the time.
>
> Signed-off-by: Qibo Huang <[email protected]>
> ---
> drivers/thermal/gov_power_allocator.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
> index 2d1aeaba38a8..8a6a08906dd4 100644
> --- a/drivers/thermal/gov_power_allocator.c
> +++ b/drivers/thermal/gov_power_allocator.c
> @@ -293,6 +293,7 @@ power_actor_set_power(struct thermal_cooling_device *cdev,
> struct thermal_instance *instance, u32 power)
> {
> unsigned long state;
> + struct thermal_instance *cdev_instance;
> int ret;
>
> ret = cdev->ops->power2state(cdev, power, &state);
> @@ -300,6 +301,10 @@ power_actor_set_power(struct thermal_cooling_device *cdev,
> return ret;
>
> instance->target = clamp_val(state, instance->lower, instance->upper);

For our 'instance', which IPA was given be the framework, we set the new
'target' value above.

> + list_for_each_entry(cdev_instance, &cdev->thermal_instances, cdev_node) {
> + if (cdev_instance->tz->id == instance->tz->id)
> + cdev_instance->target = state;

This is not the approach for the governor of a single thermal zone.
Please check the implementation of the function below
__thermal_cdev_update()
You will see there the 'voting' code. This belongs to the framework
code.

> + }
> mutex_lock(&cdev->lock);
> __thermal_cdev_update(cdev);

This is the place where the right 'targe't is picked for cooling device
from different thermal zones (and maybe governors). When a new target
value is higher than other target values, the framework would throttle
the device. Then other thermal framework are kicked to make this change
propagated correctly.

Also important, please see that this work is done under the
cdev->lock. Not every code is safe to run, because e.g. this
list cdev->thermal_instances might change in the meantime.

> mutex_unlock(&cdev->lock);

Regards,
Lukasz