2024-04-25 14:14:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v2 0/3] thermal/debugfs: Fix handling of cdev states and mitigation episodes in progress

Hi Everyone,

This is an update of

https://lore.kernel.org/linux-pm/5774279.DvuYhMxLoT@kreacher/

and the only non-trivial difference between it and the v1 is a small
rebase of the second patch (the v1 of which didn't apply).

It generally has been based on top of

https://lore.kernel.org/linux-pm/12427744.O9o76ZdvQC@kreacher/

but it should apply on top of the linux-next branch in linux-pm.git as well.

It is present in the thermal-core-next branch in that tree, along with the
above series.

Thanks!





2024-04-25 14:28:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v2 3/3] thermal/debugfs: Avoid printing zero duration for mitigation events in progress

From: Rafael J. Wysocki <[email protected]>

If a thermal mitigation event is in progress, its duration value has
not been updated yet, so 0 will be printed as the event duration by
tze_seq_show() which is confusing.

Avoid doing that by marking the beginning of the event with the
KTIME_MIN duration value and making tze_seq_show() compute the current
event duration on the fly, in which case '>' will be printed instead of
'=' in the event duration value field.

Similarly, for trip points that have been crossed on the down, mark
the end of mitigation with the KTIME_MAX timestamp value and make
tze_seq_show() compute the current duration on the fly for the trip
points still involved in the mitigation, in which cases the duration
value printed by it will be prepended with a '>' character.

Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/thermal/thermal_debugfs.c | 39 ++++++++++++++++++++++++++++++++------
1 file changed, 33 insertions(+), 6 deletions(-)

Index: linux-pm/drivers/thermal/thermal_debugfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_debugfs.c
+++ linux-pm/drivers/thermal/thermal_debugfs.c
@@ -552,6 +552,7 @@ static struct tz_episode *thermal_debugf

INIT_LIST_HEAD(&tze->node);
tze->timestamp = now;
+ tze->duration = KTIME_MIN;

for (i = 0; i < tz->num_trips; i++) {
tze->trip_stats[i].min = INT_MAX;
@@ -680,6 +681,9 @@ void thermal_debug_tz_trip_down(struct t
tze->trip_stats[trip_id].duration =
ktime_add(delta, tze->trip_stats[trip_id].duration);

+ /* Mark the end of mitigation for this trip point. */
+ tze->trip_stats[trip_id].timestamp = KTIME_MAX;
+
/*
* This event closes the mitigation as we are crossing the
* last trip point the way down.
@@ -754,15 +758,25 @@ static int tze_seq_show(struct seq_file
struct thermal_trip_desc *td;
struct tz_episode *tze;
const char *type;
+ u64 duration_ms;
int trip_id;
+ char c;

tze = list_entry((struct list_head *)v, struct tz_episode, node);

- seq_printf(s, ",-Mitigation at %lluus, duration=%llums\n",
- ktime_to_us(tze->timestamp),
- ktime_to_ms(tze->duration));
+ if (tze->duration == KTIME_MIN) {
+ /* Mitigation in progress. */
+ duration_ms = ktime_to_ms(ktime_sub(ktime_get(), tze->timestamp));
+ c = '>';
+ } else {
+ duration_ms = ktime_to_ms(tze->duration);
+ c = '=';
+ }
+
+ seq_printf(s, ",-Mitigation at %lluus, duration%c%llums\n",
+ ktime_to_us(tze->timestamp), c, duration_ms);

- seq_printf(s, "| trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) |\n");
+ seq_printf(s, "| trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) |\n");

for_each_trip_desc(tz, td) {
const struct thermal_trip *trip = &td->trip;
@@ -794,12 +808,25 @@ static int tze_seq_show(struct seq_file
else
type = "hot";

- seq_printf(s, "| %*d | %*s | %*d | %*d | %*lld | %*d | %*d | %*d |\n",
+ if (trip_stats->timestamp != KTIME_MAX) {
+ /* Mitigation in progress. */
+ ktime_t delta = ktime_sub(ktime_get(),
+ trip_stats->timestamp);
+
+ delta = ktime_add(delta, trip_stats->duration);
+ duration_ms = ktime_to_ms(delta);
+ c = '>';
+ } else {
+ duration_ms = ktime_to_ms(trip_stats->duration);
+ c = ' ';
+ }
+
+ seq_printf(s, "| %*d | %*s | %*d | %*d | %c%*lld | %*d | %*d | %*d |\n",
4 , trip_id,
8, type,
9, trip->temperature,
9, trip->hysteresis,
- 10, ktime_to_ms(trip_stats->duration),
+ c, 10, duration_ms,
9, trip_stats->avg,
9, trip_stats->min,
9, trip_stats->max);




2024-04-25 14:30:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v2 1/3] thermal/debugfs: Create records for cdev states as they get used

From: Rafael J. Wysocki <[email protected]>

Because thermal_debug_cdev_state_update() only creates a duration record
for the old state of a cooling device, if its new state is used for the
first time, there will be no record for it and cdev_dt_seq_show() will
not print the duration information for it even though it contains code
to compute the duration value in that case.

Address this by making thermal_debug_cdev_state_update() create a
duration record for the new state if there is none.

Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
Reported-by: Lukasz Luba <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/thermal/thermal_debugfs.c | 8 ++++++++
1 file changed, 8 insertions(+)

Index: linux-pm/drivers/thermal/thermal_debugfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_debugfs.c
+++ linux-pm/drivers/thermal/thermal_debugfs.c
@@ -433,6 +433,14 @@ void thermal_debug_cdev_state_update(con
}

cdev_dbg->current_state = new_state;
+
+ /*
+ * Create a record for the new state if it is not there, so its
+ * duration will be printed by cdev_dt_seq_show() as expected if it
+ * runs before the next state transition.
+ */
+ thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations, new_state);
+
transition = (old_state << 16) | new_state;

/*




2024-04-25 19:08:22

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] thermal/debugfs: Create records for cdev states as they get used



On 4/25/24 15:03, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Because thermal_debug_cdev_state_update() only creates a duration record
> for the old state of a cooling device, if its new state is used for the
> first time, there will be no record for it and cdev_dt_seq_show() will
> not print the duration information for it even though it contains code
> to compute the duration value in that case.
>
> Address this by making thermal_debug_cdev_state_update() create a
> duration record for the new state if there is none.
>
> Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
> Reported-by: Lukasz Luba <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/thermal/thermal_debugfs.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -433,6 +433,14 @@ void thermal_debug_cdev_state_update(con
> }
>
> cdev_dbg->current_state = new_state;
> +
> + /*
> + * Create a record for the new state if it is not there, so its
> + * duration will be printed by cdev_dt_seq_show() as expected if it
> + * runs before the next state transition.
> + */
> + thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations, new_state);
> +
> transition = (old_state << 16) | new_state;
>
> /*
>
>
>
>

Reviewed-by: Lukasz Luba <[email protected]>

2024-04-25 20:54:44

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] thermal/debugfs: Avoid printing zero duration for mitigation events in progress



On 4/25/24 15:05, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> If a thermal mitigation event is in progress, its duration value has
> not been updated yet, so 0 will be printed as the event duration by
> tze_seq_show() which is confusing.
>
> Avoid doing that by marking the beginning of the event with the
> KTIME_MIN duration value and making tze_seq_show() compute the current
> event duration on the fly, in which case '>' will be printed instead of
> '=' in the event duration value field.
>
> Similarly, for trip points that have been crossed on the down, mark
> the end of mitigation with the KTIME_MAX timestamp value and make
> tze_seq_show() compute the current duration on the fly for the trip
> points still involved in the mitigation, in which cases the duration
> value printed by it will be prepended with a '>' character.
>
> Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/thermal/thermal_debugfs.c | 39 ++++++++++++++++++++++++++++++++------
> 1 file changed, 33 insertions(+), 6 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -552,6 +552,7 @@ static struct tz_episode *thermal_debugf
>
> INIT_LIST_HEAD(&tze->node);
> tze->timestamp = now;
> + tze->duration = KTIME_MIN;
>
> for (i = 0; i < tz->num_trips; i++) {
> tze->trip_stats[i].min = INT_MAX;
> @@ -680,6 +681,9 @@ void thermal_debug_tz_trip_down(struct t
> tze->trip_stats[trip_id].duration =
> ktime_add(delta, tze->trip_stats[trip_id].duration);
>
> + /* Mark the end of mitigation for this trip point. */
> + tze->trip_stats[trip_id].timestamp = KTIME_MAX;
> +
> /*
> * This event closes the mitigation as we are crossing the
> * last trip point the way down.
> @@ -754,15 +758,25 @@ static int tze_seq_show(struct seq_file
> struct thermal_trip_desc *td;
> struct tz_episode *tze;
> const char *type;
> + u64 duration_ms;
> int trip_id;
> + char c;
>
> tze = list_entry((struct list_head *)v, struct tz_episode, node);
>
> - seq_printf(s, ",-Mitigation at %lluus, duration=%llums\n",
> - ktime_to_us(tze->timestamp),
> - ktime_to_ms(tze->duration));
> + if (tze->duration == KTIME_MIN) {
> + /* Mitigation in progress. */
> + duration_ms = ktime_to_ms(ktime_sub(ktime_get(), tze->timestamp));
> + c = '>';
> + } else {
> + duration_ms = ktime_to_ms(tze->duration);
> + c = '=';
> + }
> +
> + seq_printf(s, ",-Mitigation at %lluus, duration%c%llums\n",
> + ktime_to_us(tze->timestamp), c, duration_ms);
>
> - seq_printf(s, "| trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) |\n");
> + seq_printf(s, "| trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) |\n");

So this one more space accounts for the new 'c' symbol in the rows
below that header, for the 'duration' column. Make sense.

>
> for_each_trip_desc(tz, td) {
> const struct thermal_trip *trip = &td->trip;
> @@ -794,12 +808,25 @@ static int tze_seq_show(struct seq_file
> else
> type = "hot";
>
> - seq_printf(s, "| %*d | %*s | %*d | %*d | %*lld | %*d | %*d | %*d |\n",
> + if (trip_stats->timestamp != KTIME_MAX) {
> + /* Mitigation in progress. */
> + ktime_t delta = ktime_sub(ktime_get(),
> + trip_stats->timestamp);
> +
> + delta = ktime_add(delta, trip_stats->duration);
> + duration_ms = ktime_to_ms(delta);
> + c = '>';
> + } else {
> + duration_ms = ktime_to_ms(trip_stats->duration);
> + c = ' ';
> + }
> +
> + seq_printf(s, "| %*d | %*s | %*d | %*d | %c%*lld | %*d | %*d | %*d |\n",
> 4 , trip_id,
> 8, type,
> 9, trip->temperature,
> 9, trip->hysteresis,
> - 10, ktime_to_ms(trip_stats->duration),
> + c, 10, duration_ms,
> 9, trip_stats->avg,
> 9, trip_stats->min,
> 9, trip_stats->max);
>
>
>
>

The comments in code in this particular case helps, since treating
the KTIME_MIN/MAX values might become not obvious after a while.
That LGTM

Reviewed-by: Lukasz Luba <[email protected]>

2024-04-25 20:55:54

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] thermal/debugfs: Fix handling of cdev states and mitigation episodes in progress

Hi Rafael,

On 4/25/24 15:01, Rafael J. Wysocki wrote:
> Hi Everyone,
>
> This is an update of
>
> https://lore.kernel.org/linux-pm/5774279.DvuYhMxLoT@kreacher/
>
> and the only non-trivial difference between it and the v1 is a small
> rebase of the second patch (the v1 of which didn't apply).
>
> It generally has been based on top of
>
> https://lore.kernel.org/linux-pm/12427744.O9o76ZdvQC@kreacher/
>
> but it should apply on top of the linux-next branch in linux-pm.git as well.
>
> It is present in the thermal-core-next branch in that tree, along with the
> above series.
>
> Thanks!
>
>
>

I have also tested the patches, so feel free to add the tag as well:

Tested-by: Lukasz Luba <[email protected]>

2024-04-26 09:55:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] thermal/debugfs: Fix handling of cdev states and mitigation episodes in progress

On Thu, Apr 25, 2024 at 10:55 PM Lukasz Luba <[email protected]> wrote:
>
> Hi Rafael,
>
> On 4/25/24 15:01, Rafael J. Wysocki wrote:
> > Hi Everyone,
> >
> > This is an update of
> >
> > https://lore.kernel.org/linux-pm/5774279.DvuYhMxLoT@kreacher/
> >
> > and the only non-trivial difference between it and the v1 is a small
> > rebase of the second patch (the v1 of which didn't apply).
> >
> > It generally has been based on top of
> >
> > https://lore.kernel.org/linux-pm/12427744.O9o76ZdvQC@kreacher/
> >
> > but it should apply on top of the linux-next branch in linux-pm.git as well.
> >
> > It is present in the thermal-core-next branch in that tree, along with the
> > above series.
> >
> > Thanks!
> >
> >
> >
>
> I have also tested the patches, so feel free to add the tag as well:
>
> Tested-by: Lukasz Luba <[email protected]>

Thank you!