When searching for the trip points that need to be set, the nearest
higher trip point's temperature is used for the high trip, while the
nearest lower trip point's temperature minus the hysteresis is used for
the low trip. The issue with this logic is that when the current
temperature is inside a trip point's hysteresis range, both high and low
trips will come from the same trip point. As a consequence instability
can still occur like this:
* the temperature rises slightly and enters the hysteresis range of a
trip point
* polling happens and updates the trip points to the hysteresis range
* the temperature falls slightly, exiting the hysteresis range, crossing
the trip point and triggering an IRQ, the trip points are updated
* repeat
So even though the current hysteresis implementation prevents
instability from happening due to IRQs triggering on the same
temperature value, both ways, it doesn't prevent it from happening due
to an IRQ on one way and polling on the other.
To properly implement a hysteresis behavior, when inside the hysteresis
range, don't update the trip points. This way, the previously set trip
points will stay in effect, which will in a way remember the previous
state (if the temperature signal came from above or below the range) and
therefore have the right trip point already set. The exception is if
there was no previous trip point set, in which case a previous state
doesn't exist, and so it's sensible to allow the hysteresis range as
trip points.
The following logs show the current behavior when running on a real
machine:
[ 202.524658] thermal thermal_zone0: new temperature boundaries: -2147483647 < x < 40000
203.562817: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=36986 temp=37979
[ 203.562845] thermal thermal_zone0: new temperature boundaries: 37000 < x < 40000
204.176059: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=37979 temp=40028
[ 204.176089] thermal thermal_zone0: new temperature boundaries: 37000 < x < 100000
205.226813: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=40028 temp=38652
[ 205.226842] thermal thermal_zone0: new temperature boundaries: 37000 < x < 40000
And with this patch applied:
[ 184.933415] thermal thermal_zone0: new temperature boundaries: -2147483647 < x < 40000
185.981182: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=36986 temp=37872
186.744685: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=37872 temp=40058
[ 186.744716] thermal thermal_zone0: new temperature boundaries: 37000 < x < 100000
187.773284: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=40058 temp=38698
Fixes: 060c034a9741 ("thermal: Add support for hardware-tracked trip points")
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
---
Changes in v2:
- Changed logic as suggested by Rafael
- Added log example to commit message
- Added fixes tag
drivers/thermal/thermal_trip.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
index 024e2e365a26..597ac4144e33 100644
--- a/drivers/thermal/thermal_trip.c
+++ b/drivers/thermal/thermal_trip.c
@@ -55,6 +55,7 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
{
struct thermal_trip trip;
int low = -INT_MAX, high = INT_MAX;
+ bool same_trip = false;
int i, ret;
lockdep_assert_held(&tz->lock);
@@ -63,6 +64,7 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
return;
for (i = 0; i < tz->num_trips; i++) {
+ bool low_set = false;
int trip_low;
ret = __thermal_zone_get_trip(tz, i , &trip);
@@ -71,18 +73,31 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
trip_low = trip.temperature - trip.hysteresis;
- if (trip_low < tz->temperature && trip_low > low)
+ if (trip_low < tz->temperature && trip_low > low) {
low = trip_low;
+ low_set = true;
+ same_trip = false;
+ }
if (trip.temperature > tz->temperature &&
- trip.temperature < high)
+ trip.temperature < high) {
high = trip.temperature;
+ same_trip = low_set;
+ }
}
/* No need to change trip points */
if (tz->prev_low_trip == low && tz->prev_high_trip == high)
return;
+ /*
+ * If "high" and "low" are the same, skip the change unless this is the
+ * first time.
+ */
+ if (same_trip && (tz->prev_low_trip != -INT_MAX ||
+ tz->prev_high_trip != INT_MAX))
+ return;
+
tz->prev_low_trip = low;
tz->prev_high_trip = high;
--
2.42.0
Il 22/09/23 20:44, Nícolas F. R. A. Prado ha scritto:
> When searching for the trip points that need to be set, the nearest
> higher trip point's temperature is used for the high trip, while the
> nearest lower trip point's temperature minus the hysteresis is used for
> the low trip. The issue with this logic is that when the current
> temperature is inside a trip point's hysteresis range, both high and low
> trips will come from the same trip point. As a consequence instability
> can still occur like this:
> * the temperature rises slightly and enters the hysteresis range of a
> trip point
> * polling happens and updates the trip points to the hysteresis range
> * the temperature falls slightly, exiting the hysteresis range, crossing
> the trip point and triggering an IRQ, the trip points are updated
> * repeat
>
> So even though the current hysteresis implementation prevents
> instability from happening due to IRQs triggering on the same
> temperature value, both ways, it doesn't prevent it from happening due
> to an IRQ on one way and polling on the other.
>
> To properly implement a hysteresis behavior, when inside the hysteresis
> range, don't update the trip points. This way, the previously set trip
> points will stay in effect, which will in a way remember the previous
> state (if the temperature signal came from above or below the range) and
> therefore have the right trip point already set. The exception is if
> there was no previous trip point set, in which case a previous state
> doesn't exist, and so it's sensible to allow the hysteresis range as
> trip points.
>
> The following logs show the current behavior when running on a real
> machine:
>
> [ 202.524658] thermal thermal_zone0: new temperature boundaries: -2147483647 < x < 40000
> 203.562817: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=36986 temp=37979
> [ 203.562845] thermal thermal_zone0: new temperature boundaries: 37000 < x < 40000
> 204.176059: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=37979 temp=40028
> [ 204.176089] thermal thermal_zone0: new temperature boundaries: 37000 < x < 100000
> 205.226813: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=40028 temp=38652
> [ 205.226842] thermal thermal_zone0: new temperature boundaries: 37000 < x < 40000
>
> And with this patch applied:
>
> [ 184.933415] thermal thermal_zone0: new temperature boundaries: -2147483647 < x < 40000
> 185.981182: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=36986 temp=37872
> 186.744685: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=37872 temp=40058
> [ 186.744716] thermal thermal_zone0: new temperature boundaries: 37000 < x < 100000
> 187.773284: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=40058 temp=38698
>
> Fixes: 060c034a9741 ("thermal: Add support for hardware-tracked trip points")
> Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
>
> ---
>
> Changes in v2:
> - Changed logic as suggested by Rafael
> - Added log example to commit message
> - Added fixes tag
>
> drivers/thermal/thermal_trip.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
> index 024e2e365a26..597ac4144e33 100644
> --- a/drivers/thermal/thermal_trip.c
> +++ b/drivers/thermal/thermal_trip.c
> @@ -55,6 +55,7 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
> {
> struct thermal_trip trip;
> int low = -INT_MAX, high = INT_MAX;
> + bool same_trip = false;
> int i, ret;
>
> lockdep_assert_held(&tz->lock);
> @@ -63,6 +64,7 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
> return;
>
> for (i = 0; i < tz->num_trips; i++) {
> + bool low_set = false;
> int trip_low;
>
> ret = __thermal_zone_get_trip(tz, i , &trip);
> @@ -71,18 +73,31 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
>
> trip_low = trip.temperature - trip.hysteresis;
>
> - if (trip_low < tz->temperature && trip_low > low)
> + if (trip_low < tz->temperature && trip_low > low) {
> low = trip_low;
> + low_set = true;
> + same_trip = false;
> + }
>
> if (trip.temperature > tz->temperature &&
> - trip.temperature < high)
> + trip.temperature < high) {
> high = trip.temperature;
> + same_trip = low_set;
> + }
> }
>
> /* No need to change trip points */
> if (tz->prev_low_trip == low && tz->prev_high_trip == high)
> return;
>
> + /*
> + * If "high" and "low" are the same, skip the change unless this is the
> + * first time.
> + */
> + if (same_trip && (tz->prev_low_trip != -INT_MAX ||
> + tz->prev_high_trip != INT_MAX))
> + return;
> +
> tz->prev_low_trip = low;
> tz->prev_high_trip = high;
>
On Mon, Sep 25, 2023 at 9:29 AM AngeloGioacchino Del Regno
<[email protected]> wrote:
>
> Il 22/09/23 20:44, Nícolas F. R. A. Prado ha scritto:
> > When searching for the trip points that need to be set, the nearest
> > higher trip point's temperature is used for the high trip, while the
> > nearest lower trip point's temperature minus the hysteresis is used for
> > the low trip. The issue with this logic is that when the current
> > temperature is inside a trip point's hysteresis range, both high and low
> > trips will come from the same trip point. As a consequence instability
> > can still occur like this:
> > * the temperature rises slightly and enters the hysteresis range of a
> > trip point
> > * polling happens and updates the trip points to the hysteresis range
> > * the temperature falls slightly, exiting the hysteresis range, crossing
> > the trip point and triggering an IRQ, the trip points are updated
> > * repeat
> >
> > So even though the current hysteresis implementation prevents
> > instability from happening due to IRQs triggering on the same
> > temperature value, both ways, it doesn't prevent it from happening due
> > to an IRQ on one way and polling on the other.
> >
> > To properly implement a hysteresis behavior, when inside the hysteresis
> > range, don't update the trip points. This way, the previously set trip
> > points will stay in effect, which will in a way remember the previous
> > state (if the temperature signal came from above or below the range) and
> > therefore have the right trip point already set. The exception is if
> > there was no previous trip point set, in which case a previous state
> > doesn't exist, and so it's sensible to allow the hysteresis range as
> > trip points.
> >
> > The following logs show the current behavior when running on a real
> > machine:
> >
> > [ 202.524658] thermal thermal_zone0: new temperature boundaries: -2147483647 < x < 40000
> > 203.562817: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=36986 temp=37979
> > [ 203.562845] thermal thermal_zone0: new temperature boundaries: 37000 < x < 40000
> > 204.176059: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=37979 temp=40028
> > [ 204.176089] thermal thermal_zone0: new temperature boundaries: 37000 < x < 100000
> > 205.226813: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=40028 temp=38652
> > [ 205.226842] thermal thermal_zone0: new temperature boundaries: 37000 < x < 40000
> >
> > And with this patch applied:
> >
> > [ 184.933415] thermal thermal_zone0: new temperature boundaries: -2147483647 < x < 40000
> > 185.981182: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=36986 temp=37872
> > 186.744685: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=37872 temp=40058
> > [ 186.744716] thermal thermal_zone0: new temperature boundaries: 37000 < x < 100000
> > 187.773284: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=40058 temp=38698
> >
> > Fixes: 060c034a9741 ("thermal: Add support for hardware-tracked trip points")
> > Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
>
> Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
>
> >
> > ---
> >
> > Changes in v2:
> > - Changed logic as suggested by Rafael
> > - Added log example to commit message
> > - Added fixes tag
> >
> > drivers/thermal/thermal_trip.c | 19 +++++++++++++++++--
> > 1 file changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
> > index 024e2e365a26..597ac4144e33 100644
> > --- a/drivers/thermal/thermal_trip.c
> > +++ b/drivers/thermal/thermal_trip.c
> > @@ -55,6 +55,7 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
> > {
> > struct thermal_trip trip;
> > int low = -INT_MAX, high = INT_MAX;
> > + bool same_trip = false;
> > int i, ret;
> >
> > lockdep_assert_held(&tz->lock);
> > @@ -63,6 +64,7 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
> > return;
> >
> > for (i = 0; i < tz->num_trips; i++) {
> > + bool low_set = false;
> > int trip_low;
> >
> > ret = __thermal_zone_get_trip(tz, i , &trip);
> > @@ -71,18 +73,31 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
> >
> > trip_low = trip.temperature - trip.hysteresis;
> >
> > - if (trip_low < tz->temperature && trip_low > low)
> > + if (trip_low < tz->temperature && trip_low > low) {
> > low = trip_low;
> > + low_set = true;
> > + same_trip = false;
> > + }
> >
> > if (trip.temperature > tz->temperature &&
> > - trip.temperature < high)
> > + trip.temperature < high) {
> > high = trip.temperature;
> > + same_trip = low_set;
> > + }
> > }
> >
> > /* No need to change trip points */
> > if (tz->prev_low_trip == low && tz->prev_high_trip == high)
> > return;
> >
> > + /*
> > + * If "high" and "low" are the same, skip the change unless this is the
> > + * first time.
> > + */
> > + if (same_trip && (tz->prev_low_trip != -INT_MAX ||
> > + tz->prev_high_trip != INT_MAX))
> > + return;
> > +
> > tz->prev_low_trip = low;
> > tz->prev_high_trip = high;
> >
Applied as 6.7 material, but I added a Co-developed-by tag for myself,
because it has been based on my patch.
Thanks!
On Fri, Oct 13, 2023 at 05:27:08PM +0200, Rafael J. Wysocki wrote:
> On Mon, Sep 25, 2023 at 9:29 AM AngeloGioacchino Del Regno
> <[email protected]> wrote:
> >
> > Il 22/09/23 20:44, Nícolas F. R. A. Prado ha scritto:
> > > When searching for the trip points that need to be set, the nearest
> > > higher trip point's temperature is used for the high trip, while the
> > > nearest lower trip point's temperature minus the hysteresis is used for
> > > the low trip. The issue with this logic is that when the current
> > > temperature is inside a trip point's hysteresis range, both high and low
> > > trips will come from the same trip point. As a consequence instability
> > > can still occur like this:
> > > * the temperature rises slightly and enters the hysteresis range of a
> > > trip point
> > > * polling happens and updates the trip points to the hysteresis range
> > > * the temperature falls slightly, exiting the hysteresis range, crossing
> > > the trip point and triggering an IRQ, the trip points are updated
> > > * repeat
> > >
> > > So even though the current hysteresis implementation prevents
> > > instability from happening due to IRQs triggering on the same
> > > temperature value, both ways, it doesn't prevent it from happening due
> > > to an IRQ on one way and polling on the other.
> > >
> > > To properly implement a hysteresis behavior, when inside the hysteresis
> > > range, don't update the trip points. This way, the previously set trip
> > > points will stay in effect, which will in a way remember the previous
> > > state (if the temperature signal came from above or below the range) and
> > > therefore have the right trip point already set. The exception is if
> > > there was no previous trip point set, in which case a previous state
> > > doesn't exist, and so it's sensible to allow the hysteresis range as
> > > trip points.
> > >
> > > The following logs show the current behavior when running on a real
> > > machine:
> > >
> > > [ 202.524658] thermal thermal_zone0: new temperature boundaries: -2147483647 < x < 40000
> > > 203.562817: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=36986 temp=37979
> > > [ 203.562845] thermal thermal_zone0: new temperature boundaries: 37000 < x < 40000
> > > 204.176059: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=37979 temp=40028
> > > [ 204.176089] thermal thermal_zone0: new temperature boundaries: 37000 < x < 100000
> > > 205.226813: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=40028 temp=38652
> > > [ 205.226842] thermal thermal_zone0: new temperature boundaries: 37000 < x < 40000
> > >
> > > And with this patch applied:
> > >
> > > [ 184.933415] thermal thermal_zone0: new temperature boundaries: -2147483647 < x < 40000
> > > 185.981182: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=36986 temp=37872
> > > 186.744685: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=37872 temp=40058
> > > [ 186.744716] thermal thermal_zone0: new temperature boundaries: 37000 < x < 100000
> > > 187.773284: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=40058 temp=38698
> > >
> > > Fixes: 060c034a9741 ("thermal: Add support for hardware-tracked trip points")
> > > Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
> >
> > Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
> >
> > >
> > > ---
> > >
> > > Changes in v2:
> > > - Changed logic as suggested by Rafael
> > > - Added log example to commit message
> > > - Added fixes tag
> > >
> > > drivers/thermal/thermal_trip.c | 19 +++++++++++++++++--
> > > 1 file changed, 17 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
> > > index 024e2e365a26..597ac4144e33 100644
> > > --- a/drivers/thermal/thermal_trip.c
> > > +++ b/drivers/thermal/thermal_trip.c
> > > @@ -55,6 +55,7 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
> > > {
> > > struct thermal_trip trip;
> > > int low = -INT_MAX, high = INT_MAX;
> > > + bool same_trip = false;
> > > int i, ret;
> > >
> > > lockdep_assert_held(&tz->lock);
> > > @@ -63,6 +64,7 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
> > > return;
> > >
> > > for (i = 0; i < tz->num_trips; i++) {
> > > + bool low_set = false;
> > > int trip_low;
> > >
> > > ret = __thermal_zone_get_trip(tz, i , &trip);
> > > @@ -71,18 +73,31 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
> > >
> > > trip_low = trip.temperature - trip.hysteresis;
> > >
> > > - if (trip_low < tz->temperature && trip_low > low)
> > > + if (trip_low < tz->temperature && trip_low > low) {
> > > low = trip_low;
> > > + low_set = true;
> > > + same_trip = false;
> > > + }
> > >
> > > if (trip.temperature > tz->temperature &&
> > > - trip.temperature < high)
> > > + trip.temperature < high) {
> > > high = trip.temperature;
> > > + same_trip = low_set;
> > > + }
> > > }
> > >
> > > /* No need to change trip points */
> > > if (tz->prev_low_trip == low && tz->prev_high_trip == high)
> > > return;
> > >
> > > + /*
> > > + * If "high" and "low" are the same, skip the change unless this is the
> > > + * first time.
> > > + */
> > > + if (same_trip && (tz->prev_low_trip != -INT_MAX ||
> > > + tz->prev_high_trip != INT_MAX))
> > > + return;
> > > +
> > > tz->prev_low_trip = low;
> > > tz->prev_high_trip = high;
> > >
>
> Applied as 6.7 material, but I added a Co-developed-by tag for myself,
> because it has been based on my patch.
Sounds good, thanks! I'll add it myself in situations like this in the future.
Thanks,
Nícolas