The thermal trip down notification should be triggered when the
temperature goes below the trip temperature minus the hysteresis. But
while the temperature is correctly checked against that, the last
temperature is instead compared against the trip point temperature. The
end result is that the notification won't always be triggered, only when
the temperature happens to drop quick enough so that the last
temperature was still above the trip point temperature.
Fix the incorrect check.
Fixes: 55cdf0a283b8 ("thermal: core: Add notifications call in the framework")
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
---
drivers/thermal/thermal_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 58533ea75cd9..120fcf23b8e5 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -361,7 +361,7 @@ static void handle_thermal_trip(struct thermal_zone_device *tz, int trip_id)
tz->temperature >= trip.temperature)
thermal_notify_tz_trip_up(tz->id, trip_id,
tz->temperature);
- if (tz->last_temperature >= trip.temperature &&
+ if (tz->last_temperature >= (trip.temperature - trip.hysteresis) &&
tz->temperature < (trip.temperature - trip.hysteresis))
thermal_notify_tz_trip_down(tz->id, trip_id,
tz->temperature);
--
2.42.0
Hi Nicolas,
On 22/09/2023 21:27, Nícolas F. R. A. Prado wrote:
> The thermal trip down notification should be triggered when the
> temperature goes below the trip temperature minus the hysteresis. But
> while the temperature is correctly checked against that, the last
> temperature is instead compared against the trip point temperature. The
> end result is that the notification won't always be triggered, only when
> the temperature happens to drop quick enough so that the last
> temperature was still above the trip point temperature.
>
> Fix the incorrect check.
>
> Fixes: 55cdf0a283b8 ("thermal: core: Add notifications call in the framework")
> Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
>
> ---
>
> drivers/thermal/thermal_core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 58533ea75cd9..120fcf23b8e5 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -361,7 +361,7 @@ static void handle_thermal_trip(struct thermal_zone_device *tz, int trip_id)
> tz->temperature >= trip.temperature)
> thermal_notify_tz_trip_up(tz->id, trip_id,
> tz->temperature);
> - if (tz->last_temperature >= trip.temperature &&
> + if (tz->last_temperature >= (trip.temperature - trip.hysteresis) &&
> tz->temperature < (trip.temperature - trip.hysteresis))
> thermal_notify_tz_trip_down(tz->id, trip_id,
> tz->temperature);
We already did a try to fix the thermal trip cross notification but this
is not sufficient for a full fix.
We are receiving multiple notifications from the same event, all this
due to the hysteresis.
Let's say, we have a trip point T and a hysteresis H.
There is a trip point crossed the way up when:
last_temperature < T and temperature >= T
At this point, we send a notification
Now, the temperature decreases but it stays in the hysteresis:
last_temperature >= T and temperature > (T - H)
And then, the temperature increases again and is greater than T.
last_temperature > (T - H) and temperature >= T
We send a second notification.
With the mitigation kicking in at temp >= T, we end up with multiple
events 'temperature crossed the trip point the way up"'. That forces the
receiver of the events to detect duplicate events in order to ignore them.
More info:
https://lore.kernel.org/all/[email protected]/
We have done a lot of cleanups to use the 'generic trip points' and
remove those get_trip_* ops. So the trip point structure is a core
component being reused by the drivers and registered as an array.
Next step is to make sure the trip points are ordered in the array, so
we can implement the correct trip point crossing detection.
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Il 22/09/23 21:27, Nícolas F. R. A. Prado ha scritto:
> The thermal trip down notification should be triggered when the
> temperature goes below the trip temperature minus the hysteresis. But
> while the temperature is correctly checked against that, the last
> temperature is instead compared against the trip point temperature. The
> end result is that the notification won't always be triggered, only when
> the temperature happens to drop quick enough so that the last
> temperature was still above the trip point temperature.
>
> Fix the incorrect check.
>
> Fixes: 55cdf0a283b8 ("thermal: core: Add notifications call in the framework")
> Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
On Mon, Sep 25, 2023 at 12:19:18PM +0200, Daniel Lezcano wrote:
>
> Hi Nicolas,
>
> On 22/09/2023 21:27, Nícolas F. R. A. Prado wrote:
> > The thermal trip down notification should be triggered when the
> > temperature goes below the trip temperature minus the hysteresis. But
> > while the temperature is correctly checked against that, the last
> > temperature is instead compared against the trip point temperature. The
> > end result is that the notification won't always be triggered, only when
> > the temperature happens to drop quick enough so that the last
> > temperature was still above the trip point temperature.
> >
> > Fix the incorrect check.
> >
> > Fixes: 55cdf0a283b8 ("thermal: core: Add notifications call in the framework")
> > Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
> >
> > ---
> >
> > drivers/thermal/thermal_core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> > index 58533ea75cd9..120fcf23b8e5 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -361,7 +361,7 @@ static void handle_thermal_trip(struct thermal_zone_device *tz, int trip_id)
> > tz->temperature >= trip.temperature)
> > thermal_notify_tz_trip_up(tz->id, trip_id,
> > tz->temperature);
> > - if (tz->last_temperature >= trip.temperature &&
> > + if (tz->last_temperature >= (trip.temperature - trip.hysteresis) &&
> > tz->temperature < (trip.temperature - trip.hysteresis))
> > thermal_notify_tz_trip_down(tz->id, trip_id,
> > tz->temperature);
>
> We already did a try to fix the thermal trip cross notification but this is
> not sufficient for a full fix.
>
> We are receiving multiple notifications from the same event, all this due to
> the hysteresis.
>
> Let's say, we have a trip point T and a hysteresis H.
>
> There is a trip point crossed the way up when:
>
> last_temperature < T and temperature >= T
>
> At this point, we send a notification
>
> Now, the temperature decreases but it stays in the hysteresis:
>
> last_temperature >= T and temperature > (T - H)
>
> And then, the temperature increases again and is greater than T.
>
> last_temperature > (T - H) and temperature >= T
>
> We send a second notification.
Right, I've observed this issue in the logic that updates the trip points, and
just sent the v2 fix for it:
https://lore.kernel.org/all/[email protected]
I wasn't aware of the cleanups you pointed to below, but at least in their
current form it doesn't seem they would take care of fixing the trip point
update logic like I did in that patch. So please do take a look.
It was while testing that patch that I stumbled upon this code for the
notification and just quickly fixed it. But indeed this patch is not a full fix
as the one you pointed to, so let's wait for that one.
Thanks,
Nícolas
>
> With the mitigation kicking in at temp >= T, we end up with multiple events
> 'temperature crossed the trip point the way up"'. That forces the receiver
> of the events to detect duplicate events in order to ignore them.
>
> More info:
>
> https://lore.kernel.org/all/[email protected]/
>
> We have done a lot of cleanups to use the 'generic trip points' and remove
> those get_trip_* ops. So the trip point structure is a core component being
> reused by the drivers and registered as an array.
>
> Next step is to make sure the trip points are ordered in the array, so we
> can implement the correct trip point crossing detection.
>
>
>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>