2023-05-19 03:32:10

by Eduardo Valentin

[permalink] [raw]
Subject: [PATCH 0/7] thermal: enhancements on thermal stats

Hello Rafael and Daniel

After a long hiatus, I am returning to more frequent contributions
to the thermal subsystems, as least until I drain some of the
commits I have in my trees.

This is a first series of several that will come as improvements
on the thermal subsystem that will enable using this subsystem
in the Baseboard Management Controller (BMC) space, as part
of the Nitro BMC project. To do so, there were a few improvements
and new features wrote.

In this series in particular, I present a set of enhancements
on how we are handling statistics. The cooling device stats
are awesome, but I added a few new entries there. I also
introduce stats per thermal zone here too.

I tried to keep documentation as current as possible.
I may have missed a thing or two, so please help me out here.
Testing/Examples are in each code.

Let me know any feeback,

BR,

Cc: "Rafael J. Wysocki" <[email protected]> (supporter:THERMAL)
Cc: Daniel Lezcano <[email protected]> (supporter:THERMAL)
Cc: Amit Kucheria <[email protected]> (reviewer:THERMAL)
Cc: Zhang Rui <[email protected]> (reviewer:THERMAL)
Cc: Jonathan Corbet <[email protected]> (maintainer:DOCUMENTATION)
Cc: [email protected] (open list:THERMAL)
Cc: [email protected] (open list:DOCUMENTATION)
Cc: [email protected] (open list)

Eduardo Valentin (7):
thermal: stats: track time each dev changes due to tz
thermal: stats: track number of change requests due to tz
thermal: stats: introduce thermal zone stats/ directory
thermal: stats: introduce thermal zone stats/min_gradient
thermal: stats: introduce tz time in trip
ythermal: core: report errors to governors
thermal: stats: add error accounting to thermal zone

.../driver-api/thermal/sysfs-api.rst | 10 +
drivers/thermal/thermal_core.c | 15 +-
drivers/thermal/thermal_core.h | 16 +
drivers/thermal/thermal_helpers.c | 11 +-
drivers/thermal/thermal_sysfs.c | 495 +++++++++++++++++-
include/linux/thermal.h | 5 +
6 files changed, 539 insertions(+), 13 deletions(-)

--
2.34.1



2023-05-19 03:34:12

by Eduardo Valentin

[permalink] [raw]
Subject: [PATCH 4/7] thermal: stats: introduce thermal zone stats/min_gradient

From: Eduardo Valentin <[email protected]>

The patch adds a statistic to track
the minimum gradient (dT/dt) to the thermal zone
stats/ folder.

Samples:

$ echo 1000 > emul_temp
$ cat stats/min_gradient
0
$ echo 2000 > emul_temp
$ echo 1000 > emul_temp
$ cat stats/min_gradient
-3460

Cc: "Rafael J. Wysocki" <[email protected]> (supporter:THERMAL)
Cc: Daniel Lezcano <[email protected]> (supporter:THERMAL)
Cc: Amit Kucheria <[email protected]> (reviewer:THERMAL)
Cc: Zhang Rui <[email protected]> (reviewer:THERMAL)
Cc: Jonathan Corbet <[email protected]> (maintainer:DOCUMENTATION)
Cc: [email protected] (open list:THERMAL)
Cc: [email protected] (open list:DOCUMENTATION)
Cc: [email protected] (open list)

Signed-off-by: Eduardo Valentin <[email protected]>
---
.../driver-api/thermal/sysfs-api.rst | 1 +
drivers/thermal/thermal_sysfs.c | 23 +++++++++++++++++++
2 files changed, 24 insertions(+)

diff --git a/Documentation/driver-api/thermal/sysfs-api.rst b/Documentation/driver-api/thermal/sysfs-api.rst
index 18140dbb1ce1..ed5e6ba4e0d7 100644
--- a/Documentation/driver-api/thermal/sysfs-api.rst
+++ b/Documentation/driver-api/thermal/sysfs-api.rst
@@ -358,6 +358,7 @@ Thermal zone device sys I/F, created once it's registered::
|---stats: Directory containing thermal zone device's stats
|---stats/reset_tz_stats: Writes to this file resets the statistics.
|---stats/max_gradient: The maximum recorded dT/dt in uC/ms.
+ |---stats/min_gradient: The minimum recorded dT/dt in uC/ms.

Thermal cooling device sys I/F, created once it's registered::

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index aa28c1cae916..f89ec9a7e8c8 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -542,6 +542,7 @@ static void destroy_trip_attrs(struct thermal_zone_device *tz)
struct thermal_zone_device_stats {
spinlock_t lock; /* protects this struct */
s64 max_gradient;
+ s64 min_gradient;
ktime_t last_time;
};

@@ -569,6 +570,10 @@ static void temperature_stats_update(struct thermal_zone_device *tz)
/* update fastest temperature rise from our perspective */
if (cur_gradient > stats->max_gradient)
stats->max_gradient = cur_gradient;
+
+ /* update fastest temperature decay from our perspective */
+ if (cur_gradient < stats->min_gradient)
+ stats->min_gradient = cur_gradient;
}

void thermal_zone_device_stats_update(struct thermal_zone_device *tz)
@@ -595,6 +600,21 @@ static ssize_t max_gradient_show(struct device *dev,
return ret;
}

+static ssize_t min_gradient_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct thermal_zone_device *tz = to_thermal_zone(dev);
+ struct thermal_zone_device_stats *stats = tz->stats;
+ int ret;
+
+ spin_lock(&stats->lock);
+ temperature_stats_update(tz);
+ ret = snprintf(buf, PAGE_SIZE, "%lld\n", stats->min_gradient);
+ spin_unlock(&stats->lock);
+
+ return ret;
+}
+
static ssize_t
reset_tz_stats_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
@@ -604,6 +624,7 @@ reset_tz_stats_store(struct device *dev, struct device_attribute *attr,

spin_lock(&stats->lock);

+ stats->min_gradient = 0;
stats->max_gradient = 0;
stats->last_time = ktime_get();

@@ -612,10 +633,12 @@ reset_tz_stats_store(struct device *dev, struct device_attribute *attr,
return count;
}

+static DEVICE_ATTR_RO(min_gradient);
static DEVICE_ATTR_RO(max_gradient);
static DEVICE_ATTR_WO(reset_tz_stats);

static struct attribute *thermal_zone_device_stats_attrs[] = {
+ &dev_attr_min_gradient.attr,
&dev_attr_max_gradient.attr,
&dev_attr_reset_tz_stats.attr,
NULL
--
2.34.1


2023-05-19 03:55:43

by Eduardo Valentin

[permalink] [raw]
Subject: [PATCH 6/7] ythermal: core: report errors to governors

From: Eduardo Valentin <[email protected]>

Currently the thermal governors are not allowed to
react on temperature error events as the thermal core
skips the handling and logs an error on kernel buffer.
This patch adds the opportunity to report the errors
when they happen to governors.

Now, if a governor wants to react on temperature read
errors, they can implement the .check_error() callback.

Cc: "Rafael J. Wysocki" <[email protected]> (supporter:THERMAL)
Cc: Daniel Lezcano <[email protected]> (supporter:THERMAL)
Cc: Amit Kucheria <[email protected]> (reviewer:THERMAL)
Cc: Zhang Rui <[email protected]> (reviewer:THERMAL)
Cc: Jonathan Corbet <[email protected]> (maintainer:DOCUMENTATION)
Cc: [email protected] (open list:THERMAL)
Cc: [email protected] (open list:DOCUMENTATION)
Cc: [email protected] (open list)

Signed-off-by: Eduardo Valentin <[email protected]>
---
drivers/thermal/thermal_core.c | 9 +++++++++
include/linux/thermal.h | 3 +++
2 files changed, 12 insertions(+)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 3ba970c0744f..2ff7d9c7c973 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -313,6 +313,12 @@ static void handle_non_critical_trips(struct thermal_zone_device *tz, int trip)
def_governor->throttle(tz, trip);
}

+static void handle_error_temperature(struct thermal_zone_device *tz, int error)
+{
+ if (tz->governor && tz->governor->check_error)
+ tz->governor->check_error(tz, error);
+}
+
void thermal_zone_device_critical(struct thermal_zone_device *tz)
{
/*
@@ -380,6 +386,9 @@ static void update_temperature(struct thermal_zone_device *tz)
dev_warn(&tz->device,
"failed to read out thermal zone (%d)\n",
ret);
+ /* tell the governor its source is hosed */
+ handle_error_temperature(tz, ret);
+
return;
}

diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 9dc8292f0314..82c8e09a63e0 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -199,6 +199,8 @@ struct thermal_zone_device {
* thermal zone.
* @throttle: callback called for every trip point even if temperature is
* below the trip point temperature
+ * @check_error: callback called whenever temperature updates fail.
+ * Opportunity for the governor to react on errors.
* @governor_list: node in thermal_governor_list (in thermal_core.c)
*/
struct thermal_governor {
@@ -206,6 +208,7 @@ struct thermal_governor {
int (*bind_to_tz)(struct thermal_zone_device *tz);
void (*unbind_from_tz)(struct thermal_zone_device *tz);
int (*throttle)(struct thermal_zone_device *tz, int trip);
+ void (*check_error)(struct thermal_zone_device *tz, int error);
struct list_head governor_list;
};

--
2.34.1


2023-05-24 18:49:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 0/7] thermal: enhancements on thermal stats

Hi Eduardo,

On Fri, May 19, 2023 at 5:27 AM Eduardo Valentin <[email protected]> wrote:
>
> Hello Rafael and Daniel
>
> After a long hiatus, I am returning to more frequent contributions
> to the thermal subsystems, as least until I drain some of the
> commits I have in my trees.
>
> This is a first series of several that will come as improvements
> on the thermal subsystem that will enable using this subsystem
> in the Baseboard Management Controller (BMC) space, as part
> of the Nitro BMC project. To do so, there were a few improvements
> and new features wrote.
>
> In this series in particular, I present a set of enhancements
> on how we are handling statistics. The cooling device stats
> are awesome, but I added a few new entries there. I also
> introduce stats per thermal zone here too.
>
> I tried to keep documentation as current as possible.
> I may have missed a thing or two, so please help me out here.
> Testing/Examples are in each code.
>
> Let me know any feeback,
>
> BR,
>
> Cc: "Rafael J. Wysocki" <[email protected]> (supporter:THERMAL)
> Cc: Daniel Lezcano <[email protected]> (supporter:THERMAL)
> Cc: Amit Kucheria <[email protected]> (reviewer:THERMAL)
> Cc: Zhang Rui <[email protected]> (reviewer:THERMAL)
> Cc: Jonathan Corbet <[email protected]> (maintainer:DOCUMENTATION)
> Cc: [email protected] (open list:THERMAL)
> Cc: [email protected] (open list:DOCUMENTATION)
> Cc: [email protected] (open list)
>
> Eduardo Valentin (7):
> thermal: stats: track time each dev changes due to tz
> thermal: stats: track number of change requests due to tz
> thermal: stats: introduce thermal zone stats/ directory
> thermal: stats: introduce thermal zone stats/min_gradient
> thermal: stats: introduce tz time in trip
> ythermal: core: report errors to governors
> thermal: stats: add error accounting to thermal zone
>
> .../driver-api/thermal/sysfs-api.rst | 10 +
> drivers/thermal/thermal_core.c | 15 +-
> drivers/thermal/thermal_core.h | 16 +
> drivers/thermal/thermal_helpers.c | 11 +-
> drivers/thermal/thermal_sysfs.c | 495 +++++++++++++++++-
> include/linux/thermal.h | 5 +
> 6 files changed, 539 insertions(+), 13 deletions(-)
>
> --

There are still some other things I need to take care of before I can
get to this series, but I will get to it.

Thanks!

2023-06-05 23:39:27

by Eduardo Valentin

[permalink] [raw]
Subject: Re: [PATCH 0/7] thermal: enhancements on thermal stats

On Wed, May 24, 2023 at 08:22:11PM +0200, Rafael J. Wysocki wrote:
>
> There are still some other things I need to take care of before I can
> get to this series, but I will get to it.
>
> Thanks!

Ok, no worries.


--
All the best,
Eduardo Valentin

2023-06-20 18:10:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 6/7] ythermal: core: report errors to governors

On Fri, May 19, 2023 at 5:27 AM Eduardo Valentin <[email protected]> wrote:
>
> From: Eduardo Valentin <[email protected]>
>
> Currently the thermal governors are not allowed to
> react on temperature error events as the thermal core
> skips the handling and logs an error on kernel buffer.
> This patch adds the opportunity to report the errors
> when they happen to governors.
>
> Now, if a governor wants to react on temperature read
> errors, they can implement the .check_error() callback.

Explaining the use case for this would help a lot.

2023-06-20 18:29:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 4/7] thermal: stats: introduce thermal zone stats/min_gradient

On Fri, May 19, 2023 at 5:27 AM Eduardo Valentin <[email protected]> wrote:
>
> From: Eduardo Valentin <[email protected]>
>
> The patch adds a statistic to track
> the minimum gradient (dT/dt) to the thermal zone
> stats/ folder.
>
> Samples:
>
> $ echo 1000 > emul_temp
> $ cat stats/min_gradient
> 0
> $ echo 2000 > emul_temp
> $ echo 1000 > emul_temp
> $ cat stats/min_gradient
> -3460
>
> Cc: "Rafael J. Wysocki" <[email protected]> (supporter:THERMAL)
> Cc: Daniel Lezcano <[email protected]> (supporter:THERMAL)
> Cc: Amit Kucheria <[email protected]> (reviewer:THERMAL)
> Cc: Zhang Rui <[email protected]> (reviewer:THERMAL)
> Cc: Jonathan Corbet <[email protected]> (maintainer:DOCUMENTATION)
> Cc: [email protected] (open list:THERMAL)
> Cc: [email protected] (open list:DOCUMENTATION)
> Cc: [email protected] (open list)
>
> Signed-off-by: Eduardo Valentin <[email protected]>

This can be easily folded into the previous patch IMO.

> ---
> .../driver-api/thermal/sysfs-api.rst | 1 +
> drivers/thermal/thermal_sysfs.c | 23 +++++++++++++++++++
> 2 files changed, 24 insertions(+)
>
> diff --git a/Documentation/driver-api/thermal/sysfs-api.rst b/Documentation/driver-api/thermal/sysfs-api.rst
> index 18140dbb1ce1..ed5e6ba4e0d7 100644
> --- a/Documentation/driver-api/thermal/sysfs-api.rst
> +++ b/Documentation/driver-api/thermal/sysfs-api.rst
> @@ -358,6 +358,7 @@ Thermal zone device sys I/F, created once it's registered::
> |---stats: Directory containing thermal zone device's stats
> |---stats/reset_tz_stats: Writes to this file resets the statistics.
> |---stats/max_gradient: The maximum recorded dT/dt in uC/ms.
> + |---stats/min_gradient: The minimum recorded dT/dt in uC/ms.
>
> Thermal cooling device sys I/F, created once it's registered::
>
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index aa28c1cae916..f89ec9a7e8c8 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -542,6 +542,7 @@ static void destroy_trip_attrs(struct thermal_zone_device *tz)
> struct thermal_zone_device_stats {
> spinlock_t lock; /* protects this struct */
> s64 max_gradient;
> + s64 min_gradient;
> ktime_t last_time;
> };
>
> @@ -569,6 +570,10 @@ static void temperature_stats_update(struct thermal_zone_device *tz)
> /* update fastest temperature rise from our perspective */
> if (cur_gradient > stats->max_gradient)
> stats->max_gradient = cur_gradient;
> +
> + /* update fastest temperature decay from our perspective */
> + if (cur_gradient < stats->min_gradient)
> + stats->min_gradient = cur_gradient;
> }
>
> void thermal_zone_device_stats_update(struct thermal_zone_device *tz)
> @@ -595,6 +600,21 @@ static ssize_t max_gradient_show(struct device *dev,
> return ret;
> }
>
> +static ssize_t min_gradient_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct thermal_zone_device *tz = to_thermal_zone(dev);
> + struct thermal_zone_device_stats *stats = tz->stats;
> + int ret;
> +
> + spin_lock(&stats->lock);
> + temperature_stats_update(tz);
> + ret = snprintf(buf, PAGE_SIZE, "%lld\n", stats->min_gradient);
> + spin_unlock(&stats->lock);
> +
> + return ret;
> +}
> +
> static ssize_t
> reset_tz_stats_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
> @@ -604,6 +624,7 @@ reset_tz_stats_store(struct device *dev, struct device_attribute *attr,
>
> spin_lock(&stats->lock);
>
> + stats->min_gradient = 0;
> stats->max_gradient = 0;
> stats->last_time = ktime_get();
>
> @@ -612,10 +633,12 @@ reset_tz_stats_store(struct device *dev, struct device_attribute *attr,
> return count;
> }
>
> +static DEVICE_ATTR_RO(min_gradient);
> static DEVICE_ATTR_RO(max_gradient);
> static DEVICE_ATTR_WO(reset_tz_stats);
>
> static struct attribute *thermal_zone_device_stats_attrs[] = {
> + &dev_attr_min_gradient.attr,
> &dev_attr_max_gradient.attr,
> &dev_attr_reset_tz_stats.attr,
> NULL
> --
> 2.34.1
>

2023-06-20 19:55:48

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH 0/7] thermal: enhancements on thermal stats


Hi Eduardo,

On 19/05/2023 05:27, Eduardo Valentin wrote:
> Hello Rafael and Daniel
>
> After a long hiatus, I am returning to more frequent contributions
> to the thermal subsystems, as least until I drain some of the
> commits I have in my trees.
>
> This is a first series of several that will come as improvements
> on the thermal subsystem that will enable using this subsystem
> in the Baseboard Management Controller (BMC) space, as part
> of the Nitro BMC project. To do so, there were a few improvements
> and new features wrote.
>
> In this series in particular, I present a set of enhancements
> on how we are handling statistics. The cooling device stats
> are awesome, but I added a few new entries there. I also
> introduce stats per thermal zone here too.

From my POV, that kind of information belongs to debugfs. sysfs is not
suitable for that.

The cdev stats are a total mess because of the page size limitation of
sysfs and the explosion of the combination when there are a large number
of states (eg. display is 1024 cooling device states resulting in a
matrix of 1024 x 1024, so more than 4MB of memory).

For the record, I'm working on such of statistics [1][2], and optimized
this cooling device statistics in order to get ride of the existing
sysfs cdev stats.

Actually, all the stats rely on the mitigation episodes. However, for
that we need to correctly identify when they begin and when they end. We
can have mitigation episode inside mitigation episode (eg. passive
mitigation@trip0 and active mitigation@trip1).

This is not working today because the trip point detection is incorrect,
thus the mitigation episodes are also incorrect, consequently the stats
are de facto incorrect.

There is more details at [3] but the change assumes the trip points are
ordered in the ascending order which is wrong, that is why it was not
merged.

The mitigation works but the detection is fuzzy, so the math is
inaccurate and as we are in the boundaries of a temperature limit, the
resulting statistics do not show us the interesting information to
optimize the governors when they are not totally inconsistent.

All the work around the generic trip points is to fix that.

There is a proposal at LPC to add statistic/debug information for
thermal, may be you can participate so we join our efforts?

-- Daniel

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/log/?h=thermal/trip-crossed%2bdebugfs

[2]
https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/log/?h=thermal/debugfs-v2

[3]
https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/commit/?h=thermal/trip-crossed%2bdebugfs&id=7d713a9128ad9a153de9c3f5b854c6f1acfb3064



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


2023-06-21 04:28:42

by Eduardo Valentin

[permalink] [raw]
Subject: [PATCH 0/7] thermal: enhancements on thermal stats

On Tue, Jun 20, 2023 at 09:05:07PM +0200, Daniel Lezcano wrote:
>
>
>
> Hi Eduardo,
>
> On 19/05/2023 05:27, Eduardo Valentin wrote:
> > Hello Rafael and Daniel
> >
> > After a long hiatus, I am returning to more frequent contributions
> > to the thermal subsystems, as least until I drain some of the
> > commits I have in my trees.
> >
> > This is a first series of several that will come as improvements
> > on the thermal subsystem that will enable using this subsystem
> > in the Baseboard Management Controller (BMC) space, as part
> > of the Nitro BMC project. To do so, there were a few improvements
> > and new features wrote.
> >
> > In this series in particular, I present a set of enhancements
> > on how we are handling statistics. The cooling device stats
> > are awesome, but I added a few new entries there. I also
> > introduce stats per thermal zone here too.
>
> From my POV, that kind of information belongs to debugfs. sysfs is not
> suitable for that.
>
> The cdev stats are a total mess because of the page size limitation of
> sysfs and the explosion of the combination when there are a large number
> of states (eg. display is 1024 cooling device states resulting in a
> matrix of 1024 x 1024, so more than 4MB of memory).
>
> For the record, I'm working on such of statistics [1][2], and optimized
> this cooling device statistics in order to get ride of the existing
> sysfs cdev stats.
>
> Actually, all the stats rely on the mitigation episodes. However, for
> that we need to correctly identify when they begin and when they end. We
> can have mitigation episode inside mitigation episode (eg. passive
> mitigation@trip0 and active mitigation@trip1).
>
> This is not working today because the trip point detection is incorrect,
> thus the mitigation episodes are also incorrect, consequently the stats
> are de facto incorrect.
>
> There is more details at [3] but the change assumes the trip points are
> ordered in the ascending order which is wrong, that is why it was not
> merged.
>
> The mitigation works but the detection is fuzzy, so the math is
> inaccurate and as we are in the boundaries of a temperature limit, the
> resulting statistics do not show us the interesting information to
> optimize the governors when they are not totally inconsistent.
>
> All the work around the generic trip points is to fix that.
>
> There is a proposal at LPC to add statistic/debug information for
> thermal, may be you can participate so we join our efforts?

I am not sure if I would be able to join but will look into this and get back to you soon.

In fact, joining efforts will be awesome!

As for cdev statistics, I believe the transition table is an overkill. And for the cases I have been using,
with 20+ thermal zones with 10+ cdevs assgined to all of thermal zones, is way beyond the PAGE limit size.
Totally agree with that.

I agree this code deserves a cleanup. These patches build on top of what is currently in mainline.
I also would prefer to have this code potentially out of the -sysfs file and handled separately
from the actual sysfs node handling code.

As for the debugfs vs sysfs, the rule of thumb I use here is more if I need this into a production system
or not. The content of the cdev and thermal zone stats can in fact be interpreted in both ways:
(a) on a purely debug image for a developer to check governor behavior etc, which corroborates to your view,
but also (b) in an actual production system where statistics and residency are collected in the entire
population of devices running a particular governor/settings. In the later case, debugfs is not the best fit.


>
> -- Daniel
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/log/?h=thermal/trip-crossed%2bdebugfs
>
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/log/?h=thermal/debugfs-v2
>
> [3]
> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/commit/?h=thermal/trip-crossed%2bdebugfs&id=7d713a9128ad9a153de9c3f5b854c6f1acfb3064
>

I will take a closer look on the above. Thanks for sharing.


>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>

--
All the best,
Eduardo Valentin

2023-06-21 04:51:30

by Eduardo Valentin

[permalink] [raw]
Subject: [PATCH 6/7] ythermal: core: report errors to governors

On Tue, Jun 20, 2023 at 07:29:57PM +0200, Rafael J. Wysocki wrote:
>
>
>
> On Fri, May 19, 2023 at 5:27 AM Eduardo Valentin <[email protected]> wrote:
> >
> > From: Eduardo Valentin <[email protected]>
> >
> > Currently the thermal governors are not allowed to
> > react on temperature error events as the thermal core
> > skips the handling and logs an error on kernel buffer.
> > This patch adds the opportunity to report the errors
> > when they happen to governors.
> >
> > Now, if a governor wants to react on temperature read
> > errors, they can implement the .check_error() callback.
>
> Explaining the use case for this would help a lot.


Yeah I agree. I also did not send the full series and will also add
the governor changes for this in the next patch series.

The use case here is primarily when temperature reads can fail.
Common use case, not limited to though, is an I2C device temperature sensor.
While it can be, in many cases, reliable, it is not always guaranteed to
have a successful temperature read. In fact, it is common to see a sporadic
temperature read failure, followed by successful reads.

This patch series will enhance the core to allow temperature update
error communication to the governor so the governor can have the
opportunity to act upon sensor failure.

--
All the best,
Eduardo Valentin