2024-04-17 13:13:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v1 0/3] thermal/debugfs: Fix and clean up trip point statistics updates

Hi Everyone,

The first patch in this series addresses the problem of updating trip
point statistics prematurely for trip points that have just been
crossed on the way down (please see the patch changelog for details).

The way it does that renders the following cleanup patch inapplicable:

https://lore.kernel.org/linux-pm/2321994.ElGaqSPkdT@kreacher/

The remaining two patches in the series are cleanups on top of the
first one.

This series is based on an older patch series posted last week:

https://lore.kernel.org/linux-pm/13515747.uLZWGnKmhe@kreacher/

but it can be trivially rebased on top of the current linux-next.

Thanks!





2024-04-17 13:16:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v1 2/3] thermal/debugfs: Clean up thermal_debug_update_temp()

From: Rafael J. Wysocki <[email protected]>

Notice that it is not necessary to compute tze in every iteration of the
for () loop in thermal_debug_update_temp() because it is the same for all
trips, so compute it once before the loop starts.

Also use a trip_stats local variable to make the code in that loop easier
to follow and move the trip_id variable definition into that loop because
it is not used elsewhere in the function.

While at it, change to order of local variable definitions in the function
to follow the reverse-xmas-tree pattern.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/thermal/thermal_debugfs.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/thermal/thermal_debugfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_debugfs.c
+++ linux-pm/drivers/thermal/thermal_debugfs.c
@@ -679,9 +679,9 @@ out:
void thermal_debug_update_temp(struct thermal_zone_device *tz)
{
struct thermal_debugfs *thermal_dbg = tz->debugfs;
- struct tz_episode *tze;
struct tz_debugfs *tz_dbg;
- int trip_id, i;
+ struct tz_episode *tze;
+ int i;

if (!thermal_dbg)
return;
@@ -693,15 +693,16 @@ void thermal_debug_update_temp(struct th
if (!tz_dbg->nr_trips)
goto out;

+ tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node);
+
for (i = 0; i < tz_dbg->nr_trips; i++) {
- trip_id = tz_dbg->trips_crossed[i];
- tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node);
- tze->trip_stats[trip_id].count++;
- tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, tz->temperature);
- tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, tz->temperature);
- tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
- (tz->temperature - tze->trip_stats[trip_id].avg) /
- tze->trip_stats[trip_id].count;
+ int trip_id = tz_dbg->trips_crossed[i];
+ struct trip_stats *trip_stats = &tze->trip_stats[trip_id];
+
+ trip_stats->max = max(trip_stats->max, tz->temperature);
+ trip_stats->min = min(trip_stats->min, tz->temperature);
+ trip_stats->avg += (tz->temperature - trip_stats->avg) /
+ ++trip_stats->count;
}
out:
mutex_unlock(&thermal_dbg->lock);




2024-04-17 13:22:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v1 1/3] thermal/debugfs: Avoid excessive updates of trip point statistics

From: Rafael J. Wysocki <[email protected]>

Since thermal_debug_update_temp() is called before invoking
thermal_debug_tz_trip_down() for the trips that were crossed by the
zone temperature on the way up, it updates the statistics for them
as though the current zone temperature was above the low temperature
of each of them. However, if a given trip has just been crossed on the
way down, the zone temperature is in fact below its low temperature,
but this is handled by thermal_debug_tz_trip_down() running after the
update of the trip statistics.

The remedy is to call thermal_debug_update_temp() after
thermal_debug_tz_trip_down() has been invoked for all of the
trips in question, but then thermal_debug_tz_trip_up() needs to
be adjusted, so it does not update the statistics for the trips
that has just been crossed on the way up, as that will be taken
care of by thermal_debug_update_temp() down the road.

Modify the code accordingly.

Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/thermal/thermal_core.c | 3 ++-
drivers/thermal/thermal_debugfs.c | 7 -------
2 files changed, 2 insertions(+), 8 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -427,7 +427,6 @@ static void update_temperature(struct th
trace_thermal_temperature(tz);

thermal_genl_sampling_temp(tz->id, temp);
- thermal_debug_update_temp(tz);
}

static void thermal_zone_device_check(struct work_struct *work)
@@ -505,6 +504,8 @@ void __thermal_zone_device_update(struct
if (governor->manage)
governor->manage(tz);

+ thermal_debug_update_temp(tz);
+
monitor_thermal_zone(tz);
}

Index: linux-pm/drivers/thermal/thermal_debugfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_debugfs.c
+++ linux-pm/drivers/thermal/thermal_debugfs.c
@@ -545,7 +545,6 @@ void thermal_debug_tz_trip_up(struct the
struct tz_episode *tze;
struct tz_debugfs *tz_dbg;
struct thermal_debugfs *thermal_dbg = tz->debugfs;
- int temperature = tz->temperature;
int trip_id = thermal_zone_trip_id(tz, trip);
ktime_t now = ktime_get();

@@ -614,12 +613,6 @@ void thermal_debug_tz_trip_up(struct the

tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node);
tze->trip_stats[trip_id].timestamp = now;
- tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature);
- tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature);
- tze->trip_stats[trip_id].count++;
- tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
- (temperature - tze->trip_stats[trip_id].avg) /
- tze->trip_stats[trip_id].count;

unlock:
mutex_unlock(&thermal_dbg->lock);




2024-04-22 11:30:54

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] thermal/debugfs: Clean up thermal_debug_update_temp()



On 4/17/24 14:10, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Notice that it is not necessary to compute tze in every iteration of the
> for () loop in thermal_debug_update_temp() because it is the same for all
> trips, so compute it once before the loop starts.
>
> Also use a trip_stats local variable to make the code in that loop easier
> to follow and move the trip_id variable definition into that loop because
> it is not used elsewhere in the function.
>
> While at it, change to order of local variable definitions in the function
> to follow the reverse-xmas-tree pattern.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/thermal/thermal_debugfs.c | 21 +++++++++++----------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -679,9 +679,9 @@ out:
> void thermal_debug_update_temp(struct thermal_zone_device *tz)
> {
> struct thermal_debugfs *thermal_dbg = tz->debugfs;
> - struct tz_episode *tze;
> struct tz_debugfs *tz_dbg;
> - int trip_id, i;
> + struct tz_episode *tze;
> + int i;
>
> if (!thermal_dbg)
> return;
> @@ -693,15 +693,16 @@ void thermal_debug_update_temp(struct th
> if (!tz_dbg->nr_trips)
> goto out;
>
> + tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node);
> +
> for (i = 0; i < tz_dbg->nr_trips; i++) {
> - trip_id = tz_dbg->trips_crossed[i];
> - tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node);
> - tze->trip_stats[trip_id].count++;
> - tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, tz->temperature);
> - tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, tz->temperature);
> - tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
> - (tz->temperature - tze->trip_stats[trip_id].avg) /
> - tze->trip_stats[trip_id].count;
> + int trip_id = tz_dbg->trips_crossed[i];
> + struct trip_stats *trip_stats = &tze->trip_stats[trip_id];
> +
> + trip_stats->max = max(trip_stats->max, tz->temperature);
> + trip_stats->min = min(trip_stats->min, tz->temperature);
> + trip_stats->avg += (tz->temperature - trip_stats->avg) /
> + ++trip_stats->count;
> }
> out:
> mutex_unlock(&thermal_dbg->lock);
>
>
>
>

Reviewed-by: Lukasz Luba <[email protected]>

2024-04-22 11:35:20

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] thermal/debugfs: Avoid excessive updates of trip point statistics



On 4/17/24 14:09, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Since thermal_debug_update_temp() is called before invoking
> thermal_debug_tz_trip_down() for the trips that were crossed by the
> zone temperature on the way up, it updates the statistics for them
> as though the current zone temperature was above the low temperature
> of each of them. However, if a given trip has just been crossed on the
> way down, the zone temperature is in fact below its low temperature,
> but this is handled by thermal_debug_tz_trip_down() running after the
> update of the trip statistics.
>
> The remedy is to call thermal_debug_update_temp() after
> thermal_debug_tz_trip_down() has been invoked for all of the
> trips in question, but then thermal_debug_tz_trip_up() needs to
> be adjusted, so it does not update the statistics for the trips
> that has just been crossed on the way up, as that will be taken
> care of by thermal_debug_update_temp() down the road.
>
> Modify the code accordingly.
>
> Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/thermal/thermal_core.c | 3 ++-
> drivers/thermal/thermal_debugfs.c | 7 -------
> 2 files changed, 2 insertions(+), 8 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -427,7 +427,6 @@ static void update_temperature(struct th
> trace_thermal_temperature(tz);
>
> thermal_genl_sampling_temp(tz->id, temp);
> - thermal_debug_update_temp(tz);
> }
>
> static void thermal_zone_device_check(struct work_struct *work)
> @@ -505,6 +504,8 @@ void __thermal_zone_device_update(struct
> if (governor->manage)
> governor->manage(tz);
>
> + thermal_debug_update_temp(tz);
> +
> monitor_thermal_zone(tz);
> }
>
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -545,7 +545,6 @@ void thermal_debug_tz_trip_up(struct the
> struct tz_episode *tze;
> struct tz_debugfs *tz_dbg;
> struct thermal_debugfs *thermal_dbg = tz->debugfs;
> - int temperature = tz->temperature;
> int trip_id = thermal_zone_trip_id(tz, trip);
> ktime_t now = ktime_get();
>
> @@ -614,12 +613,6 @@ void thermal_debug_tz_trip_up(struct the
>
> tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node);
> tze->trip_stats[trip_id].timestamp = now;
> - tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature);
> - tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature);
> - tze->trip_stats[trip_id].count++;
> - tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
> - (temperature - tze->trip_stats[trip_id].avg) /
> - tze->trip_stats[trip_id].count;
>
> unlock:
> mutex_unlock(&thermal_dbg->lock);
>
>
>
>

Reviewed-by: Lukasz Luba <[email protected]>

2024-04-22 11:49:40

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal/debugfs: Fix and clean up trip point statistics updates

Hi Rafael,

On 4/17/24 14:07, Rafael J. Wysocki wrote:
> Hi Everyone,
>
> The first patch in this series addresses the problem of updating trip
> point statistics prematurely for trip points that have just been
> crossed on the way down (please see the patch changelog for details).
>
> The way it does that renders the following cleanup patch inapplicable:
>
> https://lore.kernel.org/linux-pm/2321994.ElGaqSPkdT@kreacher/
>
> The remaining two patches in the series are cleanups on top of the
> first one.
>
> This series is based on an older patch series posted last week:
>
> https://lore.kernel.org/linux-pm/13515747.uLZWGnKmhe@kreacher/
>
> but it can be trivially rebased on top of the current linux-next.
>
> Thanks!
>
>
>

I've checked this patch patch set on top of your bleeding-edge
which has thermal re-work as well. The patch set looks good
and works properly.

Although, I have found some issue in this debug info files and
I'm not sure if this is expected or not. If not I can address this
and send some small fix for it.

When I read the cooling device residency statistics, I don't
get updates for the first time the state is used. It can only
be counted when that state was known and finished it's usage.

IMO it is not the right behavior, isn't it?

Experiment:
My trip points are 70degC and 75degC and I'm setting emulated
temperature to cross them and get cooling states 1 then 0.
As you can see the statistics counter only starts showing value after
after trip crossing down.
------------------------------------8<-----------------------------------

root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
root@arm:~#
root@arm:~#
root@arm:~# echo 71000 > /sys/class/thermal/thermal_zone0/emul_temp

root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
root@arm:~# echo 76000 > /sys/class/thermal/thermal_zone0/emul_temp

root@arm:~#
root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
0 518197
root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
0 518197
root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
0 518197
root@arm:~# echo 71000 > /sys/class/thermal/thermal_zone0/emul_temp

root@arm:~#
root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
0 520066
1 17567
root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
0 522653
1 17567
root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
0 526151
1 17567
root@arm:~# echo 66000 > /sys/class/thermal/thermal_zone0/emul_temp

root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
0 537366
1 17567
root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
0 544936
1 17567
root@arm:~# cat
/sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
State Residency
0 556694
1 17567
root@arm:~#

------------------------------->8----------------------------------------

Please let me know what do you think about that behavior.

Regards,
Lukasz

2024-04-22 14:20:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal/debugfs: Fix and clean up trip point statistics updates

Hi Lukasz,

On Mon, Apr 22, 2024 at 1:37 PM Lukasz Luba <[email protected]> wrote:
>
> Hi Rafael,
>
> On 4/17/24 14:07, Rafael J. Wysocki wrote:
> > Hi Everyone,
> >
> > The first patch in this series addresses the problem of updating trip
> > point statistics prematurely for trip points that have just been
> > crossed on the way down (please see the patch changelog for details).
> >
> > The way it does that renders the following cleanup patch inapplicable:
> >
> > https://lore.kernel.org/linux-pm/2321994.ElGaqSPkdT@kreacher/
> >
> > The remaining two patches in the series are cleanups on top of the
> > first one.
> >
> > This series is based on an older patch series posted last week:
> >
> > https://lore.kernel.org/linux-pm/13515747.uLZWGnKmhe@kreacher/
> >
> > but it can be trivially rebased on top of the current linux-next.
> >
> > Thanks!
> >
> >
> >
>
> I've checked this patch patch set on top of your bleeding-edge
> which has thermal re-work as well. The patch set looks good
> and works properly.
>
> Although, I have found some issue in this debug info files and
> I'm not sure if this is expected or not. If not I can address this
> and send some small fix for it.
>
> When I read the cooling device residency statistics, I don't
> get updates for the first time the state is used. It can only
> be counted when that state was known and finished it's usage.
>
> IMO it is not the right behavior, isn't it?

I agree, it is not.

> Experiment:
> My trip points are 70degC and 75degC and I'm setting emulated
> temperature to cross them and get cooling states 1 then 0.
> As you can see the statistics counter only starts showing value after
> after trip crossing down.
> ------------------------------------8<-----------------------------------
>
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> root@arm:~#
> root@arm:~#
> root@arm:~# echo 71000 > /sys/class/thermal/thermal_zone0/emul_temp
>
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> root@arm:~# echo 76000 > /sys/class/thermal/thermal_zone0/emul_temp
>
> root@arm:~#
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> 0 518197
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> 0 518197
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> 0 518197
> root@arm:~# echo 71000 > /sys/class/thermal/thermal_zone0/emul_temp
>
> root@arm:~#
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> 0 520066
> 1 17567
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> 0 522653
> 1 17567
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> 0 526151
> 1 17567
> root@arm:~# echo 66000 > /sys/class/thermal/thermal_zone0/emul_temp
>
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> 0 537366
> 1 17567
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> 0 544936
> 1 17567
> root@arm:~# cat
> /sys/kernel/debug/thermal/cooling_devices/0/time_in_state_ms
> State Residency
> 0 556694
> 1 17567
> root@arm:~#
>
> ------------------------------->8----------------------------------------
>
> Please let me know what do you think about that behavior.

If you have thought about fixing it already, please do so.

Thanks!

2024-04-22 15:34:37

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal/debugfs: Fix and clean up trip point statistics updates

On 22/04/2024 13:37, Lukasz Luba wrote:
> Hi Rafael,
>
> On 4/17/24 14:07, Rafael J. Wysocki wrote:
>> Hi Everyone,
>>
>> The first patch in this series addresses the problem of updating trip
>> point statistics prematurely for trip points that have just been
>> crossed on the way down (please see the patch changelog for details).
>>
>> The way it does that renders the following cleanup patch inapplicable:
>>
>> https://lore.kernel.org/linux-pm/2321994.ElGaqSPkdT@kreacher/
>>
>> The remaining two patches in the series are cleanups on top of the
>> first one.
>>
>> This series is based on an older patch series posted last week:
>>
>> https://lore.kernel.org/linux-pm/13515747.uLZWGnKmhe@kreacher/
>>
>> but it can be trivially rebased on top of the current linux-next.
>>
>> Thanks!
>>
>>
>>
>
> I've checked this patch patch set on top of your bleeding-edge
> which has thermal re-work as well. The patch set looks good
> and works properly.
>
> Although, I have found some issue in this debug info files and
> I'm not sure if this is expected or not. If not I can address this
> and send some small fix for it.
>
> When I read the cooling device residency statistics, I don't
> get updates for the first time the state is used. It can only
> be counted when that state was known and finished it's usage.
>
> IMO it is not the right behavior, isn't it?

Do you mean the right behavior is a regression or we should expect at
least the residency to be showed even if the mitigation state is not
closed ?


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


2024-04-22 15:48:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal/debugfs: Fix and clean up trip point statistics updates

On Mon, Apr 22, 2024 at 5:34 PM Daniel Lezcano
<[email protected]> wrote:
>
> On 22/04/2024 13:37, Lukasz Luba wrote:
> > Hi Rafael,
> >
> > On 4/17/24 14:07, Rafael J. Wysocki wrote:
> >> Hi Everyone,
> >>
> >> The first patch in this series addresses the problem of updating trip
> >> point statistics prematurely for trip points that have just been
> >> crossed on the way down (please see the patch changelog for details).
> >>
> >> The way it does that renders the following cleanup patch inapplicable:
> >>
> >> https://lore.kernel.org/linux-pm/2321994.ElGaqSPkdT@kreacher/
> >>
> >> The remaining two patches in the series are cleanups on top of the
> >> first one.
> >>
> >> This series is based on an older patch series posted last week:
> >>
> >> https://lore.kernel.org/linux-pm/13515747.uLZWGnKmhe@kreacher/
> >>
> >> but it can be trivially rebased on top of the current linux-next.
> >>
> >> Thanks!
> >>
> >>
> >>
> >
> > I've checked this patch patch set on top of your bleeding-edge
> > which has thermal re-work as well. The patch set looks good
> > and works properly.
> >
> > Although, I have found some issue in this debug info files and
> > I'm not sure if this is expected or not. If not I can address this
> > and send some small fix for it.
> >
> > When I read the cooling device residency statistics, I don't
> > get updates for the first time the state is used. It can only
> > be counted when that state was known and finished it's usage.
> >
> > IMO it is not the right behavior, isn't it?
>
> Do you mean the right behavior is a regression

It has not changed AFAICS.

> or we should expect at least the residency to be showed even if the
> mitigation state is not closed ?

Well, in fact the device has already been in that state for some time
and the mitigation can continue for a while.

2024-04-22 16:12:35

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal/debugfs: Fix and clean up trip point statistics updates

On 22/04/2024 17:48, Rafael J. Wysocki wrote:
> On Mon, Apr 22, 2024 at 5:34 PM Daniel Lezcano

[ ... ]

>> or we should expect at least the residency to be showed even if the
>> mitigation state is not closed ?
>
> Well, in fact the device has already been in that state for some time
> and the mitigation can continue for a while.

Yes, but when to update the residency time ?

When we cross a trip point point ?

or

When we read the information ?

The former is what we are currently doing AFAIR and the latter must add
the delta between the last update and the current time for the current
state, right ?


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


2024-04-23 12:28:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal/debugfs: Fix and clean up trip point statistics updates

On Mon, Apr 22, 2024 at 6:12 PM Daniel Lezcano
<[email protected]> wrote:
>
> On 22/04/2024 17:48, Rafael J. Wysocki wrote:
> > On Mon, Apr 22, 2024 at 5:34 PM Daniel Lezcano
>
> [ ... ]
>
> >> or we should expect at least the residency to be showed even if the
> >> mitigation state is not closed ?
> >
> > Well, in fact the device has already been in that state for some time
> > and the mitigation can continue for a while.
>
> Yes, but when to update the residency time ?
>
> When we cross a trip point point ?
>
> or
>
> When we read the information ?
>
> The former is what we are currently doing AFAIR

Not really.

Records are added by thermal_debug_cdev_state_update() which only runs
when __thermal_cdev_update() is called, ie. from governors.

Moreover, it assumes the initial state to be 0 and checks if the new
state is equal to the current one before doing anything else, so it
will not make a record for the 0 state until the first transition.

> and the latter must add the delta between the last update and the current time for the current
> state, right ?

Yes, and it is doing this already AFAICS.

The problem is that it only creates a record for the old state, so if
the new one is seen for the first time, there will be no record for it
until it changes to some other state.

The appended patch (modulo GMail-induced white space breakage) should
help with this, but the initial state handling needs to be addressed
separately.

---
drivers/thermal/thermal_debugfs.c | 8 ++++++++
1 file changed, 8 insertions(+)

Index: linux-pm/drivers/thermal/thermal_debugfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_debugfs.c
+++ linux-pm/drivers/thermal/thermal_debugfs.c
@@ -433,6 +433,14 @@ void thermal_debug_cdev_state_update(con
}

cdev_dbg->current_state = new_state;
+
+ /*
+ * Create a record for the new state if it is not there, so its
+ * duration will be printed by cdev_dt_seq_show() as expected if it
+ * runs before the next state transition.
+ */
+ thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations,
new_state);
+
transition = (old_state << 16) | new_state;

/*

2024-04-23 13:38:57

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal/debugfs: Fix and clean up trip point statistics updates



On 4/23/24 13:26, Rafael J. Wysocki wrote:
> On Mon, Apr 22, 2024 at 6:12 PM Daniel Lezcano
> <[email protected]> wrote:
>>
>> On 22/04/2024 17:48, Rafael J. Wysocki wrote:
>>> On Mon, Apr 22, 2024 at 5:34 PM Daniel Lezcano
>>
>> [ ... ]
>>
>>>> or we should expect at least the residency to be showed even if the
>>>> mitigation state is not closed ?
>>>
>>> Well, in fact the device has already been in that state for some time
>>> and the mitigation can continue for a while.
>>
>> Yes, but when to update the residency time ?
>>
>> When we cross a trip point point ?
>>
>> or
>>
>> When we read the information ?
>>
>> The former is what we are currently doing AFAIR
>
> Not really.
>
> Records are added by thermal_debug_cdev_state_update() which only runs
> when __thermal_cdev_update() is called, ie. from governors.
>
> Moreover, it assumes the initial state to be 0 and checks if the new
> state is equal to the current one before doing anything else, so it
> will not make a record for the 0 state until the first transition.

Correct, AFAIKS.

>
>> and the latter must add the delta between the last update and the current time for the current
>> state, right ?
>
> Yes, and it is doing this already AFAICS.
>
> The problem is that it only creates a record for the old state, so if
> the new one is seen for the first time, there will be no record for it
> until it changes to some other state.

Exactly, it's not totally wrong what we have now, just some missing part
that needs to be added in the code, while we are counting/updating
these stats.

>
> The appended patch (modulo GMail-induced white space breakage) should
> help with this, but the initial state handling needs to be addressed
> separately.
>
> ---
> drivers/thermal/thermal_debugfs.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -433,6 +433,14 @@ void thermal_debug_cdev_state_update(con
> }
>
> cdev_dbg->current_state = new_state;
> +
> + /*
> + * Create a record for the new state if it is not there, so its
> + * duration will be printed by cdev_dt_seq_show() as expected if it
> + * runs before the next state transition.
> + */
> + thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations,
> new_state);
> +
> transition = (old_state << 16) | new_state;
>
> /*

Yes, something like this should do the trick. We will get the record
entry in the list, so we at least enter the list loop in the
cdev_dt_seq_show().



2024-04-23 15:30:16

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] thermal/debugfs: Clean up thermal_debug_update_temp()

On 17/04/2024 15:10, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Notice that it is not necessary to compute tze in every iteration of the
> for () loop in thermal_debug_update_temp() because it is the same for all
> trips, so compute it once before the loop starts.
>
> Also use a trip_stats local variable to make the code in that loop easier
> to follow and move the trip_id variable definition into that loop because
> it is not used elsewhere in the function.
>
> While at it, change to order of local variable definitions in the function
> to follow the reverse-xmas-tree pattern.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---

Acked-by: Daniel Lezcano <[email protected]>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


2024-04-23 15:54:14

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] thermal/debugfs: Avoid excessive updates of trip point statistics

On 17/04/2024 15:09, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Since thermal_debug_update_temp() is called before invoking
> thermal_debug_tz_trip_down() for the trips that were crossed by the
> zone temperature on the way up, it updates the statistics for them
> as though the current zone temperature was above the low temperature
> of each of them. However, if a given trip has just been crossed on the
> way down, the zone temperature is in fact below its low temperature,
> but this is handled by thermal_debug_tz_trip_down() running after the
> update of the trip statistics.
>
> The remedy is to call thermal_debug_update_temp() after
> thermal_debug_tz_trip_down() has been invoked for all of the
> trips in question, but then thermal_debug_tz_trip_up() needs to
> be adjusted, so it does not update the statistics for the trips
> that has just been crossed on the way up, as that will be taken
> care of by thermal_debug_update_temp() down the road.
>
> Modify the code accordingly.
>
> Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---

Acked-by: Daniel Lezcano <[email protected]>

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog