Let's assume the following setup:
- trip 0 = 65°C
- trip 1 = 70°C
- trip 2 = 75°C
The current temperature is 35°C.
The interrupt is setup to fire at 65°C. If the thermal capacity is
saturated it is possible the temperature jumps to 72°c when reading
the temperature after the interrupt fired when 65°C was crossed. That
means we should have two events notified to userspace. The first one
for trip 0 and the second one for trip 1.
When the function thermal_zone_update() is called from the threaded
interrupt, it will read the temperature and then call for_each_trip()
which in turns call handle_trip_point().
This function will check:
if (tz->last_temperature < trip->temperature &&
tz->temperature >= trip->temperature)
thermal_notify_tz_trip_up()
So here, we will call this function with trip0 followed by trip1. That
will result in an event for each trip point, reflecting the trip point
being crossed the way up with a temperature raising. So far, so good.
Usually the sensors have an interrupt when the temperature is crossed
the way up but not the way down, so there an extra delay corresponding
to the passive polling where the temperature could have dropped and
crossed more than one trip point. This scenario is likely to happen
more often when multiple trip points are specified. So considering the
same setup after crossing the trip 2, we stop the workload responsible
of the heat and the temperature drops suddenly to 62°C. In this case,
the next polling will call thermal_zone_device_update(), then
for_each_trip() and handle_trip_point().
This function will check:
if (tz->last_temperature >= trip->temperature &&
tz->temperature < trip->temperature - trip->hysteresis)
thermal_notify_tz_trip_down()
The loop for_each_trip() will call trip0, 1 and 2. That will result in
generating the events for trip0, 1 and 2, in the wrong order. That is
not reflecting the thermal dynamic and puzzles the userspace
monitoring the temperature.
Fix this by inspecting the trend of the temperature. If it is raising,
then we browse the trip point in the ascending order, if it is falling
then we browse in the descending order.
Signed-off-by: Daniel Lezcano <[email protected]>
---
drivers/thermal/thermal_core.c | 8 ++++++--
drivers/thermal/thermal_core.h | 3 +++
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index dfaa6341694a..abb8ee5c9afe 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -473,8 +473,12 @@ void __thermal_zone_device_update(struct thermal_zone_device *tz,
tz->notify_event = event;
- for_each_trip(tz, trip)
- handle_thermal_trip(tz, trip);
+ if (tz->last_temperature < tz->temperature)
+ for_each_trip(tz, trip)
+ handle_thermal_trip(tz, trip);
+ else
+ for_each_trip_reverse(tz, trip)
+ handle_thermal_trip(tz, trip);
monitor_thermal_zone(tz);
}
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index e9c099ecdd0f..0072b3d4039e 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -123,6 +123,9 @@ void thermal_governor_update_tz(struct thermal_zone_device *tz,
#define for_each_trip(__tz, __trip) \
for (__trip = __tz->trips; __trip - __tz->trips < __tz->num_trips; __trip++)
+#define for_each_trip_reverse(__tz, __trip) \
+ for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
+
void __thermal_zone_set_trips(struct thermal_zone_device *tz);
int thermal_zone_trip_id(const struct thermal_zone_device *tz,
const struct thermal_trip *trip);
--
2.34.1
On Wed, Mar 6, 2024 at 9:54 AM Daniel Lezcano <[email protected]> wrote:
>
> Let's assume the following setup:
>
> - trip 0 = 65°C
> - trip 1 = 70°C
> - trip 2 = 75°C
>
> The current temperature is 35°C.
>
> The interrupt is setup to fire at 65°C. If the thermal capacity is
> saturated it is possible the temperature jumps to 72°c when reading
> the temperature after the interrupt fired when 65°C was crossed. That
> means we should have two events notified to userspace. The first one
> for trip 0 and the second one for trip 1.
>
> When the function thermal_zone_update() is called from the threaded
> interrupt, it will read the temperature and then call for_each_trip()
> which in turns call handle_trip_point().
>
> This function will check:
>
> if (tz->last_temperature < trip->temperature &&
> tz->temperature >= trip->temperature)
> thermal_notify_tz_trip_up()
For the mainline:
$ git grep handle_trip_point | cat
$
Do you mean handle_thermal_trip()?
But it doesn't do the above in the mainline. It does (comments omitted)
if (tz->last_temperature < trip->threshold) {
if (tz->temperature >= trip->temperature) {
thermal_notify_tz_trip_up(tz, trip);
thermal_debug_tz_trip_up(tz, trip);
trip->threshold = trip->temperature - trip->hysteresis;
} else {
trip->threshold = trip->temperature;
}
}
>
> So here, we will call this function with trip0 followed by trip1. That
> will result in an event for each trip point, reflecting the trip point
> being crossed the way up with a temperature raising. So far, so good.
>
> Usually the sensors have an interrupt when the temperature is crossed
> the way up but not the way down, so there an extra delay corresponding
> to the passive polling where the temperature could have dropped and
> crossed more than one trip point. This scenario is likely to happen
> more often when multiple trip points are specified. So considering the
> same setup after crossing the trip 2, we stop the workload responsible
> of the heat and the temperature drops suddenly to 62°C. In this case,
> the next polling will call thermal_zone_device_update(), then
> for_each_trip() and handle_trip_point().
>
> This function will check:
>
> if (tz->last_temperature >= trip->temperature &&
> tz->temperature < trip->temperature - trip->hysteresis)
> thermal_notify_tz_trip_down()
Again, assuming that you mean handle_thermal_trip(), the above is not
the current mainline code, which is (comments omitted)
if (tz->last_temperature >= trip->threshold) {
if (tz->temperature < trip->temperature - trip->hysteresis) {
thermal_notify_tz_trip_down(tz, trip);
thermal_debug_tz_trip_down(tz, trip);
trip->threshold = trip->temperature;
} else {
trip->threshold = trip->temperature - trip->hysteresis;
}
}
I guess this doesn't matter here?
> The loop for_each_trip() will call trip0, 1 and 2. That will result in
> generating the events for trip0, 1 and 2, in the wrong order. That is
> not reflecting the thermal dynamic and puzzles the userspace
> monitoring the temperature.
Only if the trips are ordered in a specific way, but they don't need
to be ordered in any way.
> Fix this by inspecting the trend of the temperature. If it is raising,
> then we browse the trip point in the ascending order, if it is falling
> then we browse in the descending order.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> ---
> drivers/thermal/thermal_core.c | 8 ++++++--
> drivers/thermal/thermal_core.h | 3 +++
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index dfaa6341694a..abb8ee5c9afe 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -473,8 +473,12 @@ void __thermal_zone_device_update(struct thermal_zone_device *tz,
>
> tz->notify_event = event;
>
> - for_each_trip(tz, trip)
> - handle_thermal_trip(tz, trip);
> + if (tz->last_temperature < tz->temperature)
> + for_each_trip(tz, trip)
> + handle_thermal_trip(tz, trip);
> + else
> + for_each_trip_reverse(tz, trip)
> + handle_thermal_trip(tz, trip);
This works assuming a "proper" ordering of the trips.
>
> monitor_thermal_zone(tz);
> }
> diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
> index e9c099ecdd0f..0072b3d4039e 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -123,6 +123,9 @@ void thermal_governor_update_tz(struct thermal_zone_device *tz,
> #define for_each_trip(__tz, __trip) \
> for (__trip = __tz->trips; __trip - __tz->trips < __tz->num_trips; __trip++)
>
> +#define for_each_trip_reverse(__tz, __trip) \
> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
> +
> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
> const struct thermal_trip *trip);
> --
Generally speaking, this is a matter of getting alignment on the
expectations between the kernel and user space.
It looks like user space expects to get the notifications in the order
of either growing or falling temperatures, depending on the direction
of the temperature change. Ordering the trips in the kernel is not
practical, but the notifications can be ordered in principle. Is this
what you'd like to do?
Or can user space be bothered with recognizing that it may get the
notifications for different trips out of order?
On 06/03/2024 13:02, Rafael J. Wysocki wrote:
[ ... ]
>> +#define for_each_trip_reverse(__tz, __trip) \
>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
>> +
>> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
>> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
>> const struct thermal_trip *trip);
>> --
>
> Generally speaking, this is a matter of getting alignment on the
> expectations between the kernel and user space.
>
> It looks like user space expects to get the notifications in the order
> of either growing or falling temperatures, depending on the direction
> of the temperature change. Ordering the trips in the kernel is not
> practical, but the notifications can be ordered in principle. Is this
> what you'd like to do?
Yes
> Or can user space be bothered with recognizing that it may get the
> notifications for different trips out of order?
IMO it is a bad information if the trip points events are coming
unordered. The temperature signal is a time related measurements, the
userspace should receive thermal information from this signal in the
right order. It sounds strange to track the temperature signal in the
kernel, then scramble the information, pass it to the userspace and
except it to apply some kind of logic to unscramble it.
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <[email protected]> wrote:
>
> On 06/03/2024 13:02, Rafael J. Wysocki wrote:
>
> [ ... ]
>
> >> +#define for_each_trip_reverse(__tz, __trip) \
> >> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
> >> +
> >> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
> >> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
> >> const struct thermal_trip *trip);
> >> --
> >
> > Generally speaking, this is a matter of getting alignment on the
> > expectations between the kernel and user space.
> >
> > It looks like user space expects to get the notifications in the order
> > of either growing or falling temperatures, depending on the direction
> > of the temperature change. Ordering the trips in the kernel is not
> > practical, but the notifications can be ordered in principle. Is this
> > what you'd like to do?
>
> Yes
>
> > Or can user space be bothered with recognizing that it may get the
> > notifications for different trips out of order?
>
> IMO it is a bad information if the trip points events are coming
> unordered. The temperature signal is a time related measurements, the
> userspace should receive thermal information from this signal in the
> right order. It sounds strange to track the temperature signal in the
> kernel, then scramble the information, pass it to the userspace and
> except it to apply some kind of logic to unscramble it.
So the notifications can be ordered before sending them out, as long
as they are produced by a single __thermal_zone_device_update() call.
I guess you also would like the thermal_debug_tz_trip_up/down() calls
to be ordered, wouldn't you?
On 06/03/2024 13:53, Rafael J. Wysocki wrote:
> On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <[email protected]> wrote:
>>
>> On 06/03/2024 13:02, Rafael J. Wysocki wrote:
>>
>> [ ... ]
>>
>>>> +#define for_each_trip_reverse(__tz, __trip) \
>>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
>>>> +
>>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
>>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
>>>> const struct thermal_trip *trip);
>>>> --
>>>
>>> Generally speaking, this is a matter of getting alignment on the
>>> expectations between the kernel and user space.
>>>
>>> It looks like user space expects to get the notifications in the order
>>> of either growing or falling temperatures, depending on the direction
>>> of the temperature change. Ordering the trips in the kernel is not
>>> practical, but the notifications can be ordered in principle. Is this
>>> what you'd like to do?
>>
>> Yes
>>
>>> Or can user space be bothered with recognizing that it may get the
>>> notifications for different trips out of order?
>>
>> IMO it is a bad information if the trip points events are coming
>> unordered. The temperature signal is a time related measurements, the
>> userspace should receive thermal information from this signal in the
>> right order. It sounds strange to track the temperature signal in the
>> kernel, then scramble the information, pass it to the userspace and
>> except it to apply some kind of logic to unscramble it.
>
> So the notifications can be ordered before sending them out, as long
> as they are produced by a single __thermal_zone_device_update() call.
>
> I guess you also would like the thermal_debug_tz_trip_up/down() calls
> to be ordered, wouldn't you?
Right
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Wed, Mar 6, 2024 at 2:16 PM Daniel Lezcano <[email protected]> wrote:
>
> On 06/03/2024 13:53, Rafael J. Wysocki wrote:
> > On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <[email protected]> wrote:
> >>
> >> On 06/03/2024 13:02, Rafael J. Wysocki wrote:
> >>
> >> [ ... ]
> >>
> >>>> +#define for_each_trip_reverse(__tz, __trip) \
> >>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
> >>>> +
> >>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
> >>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
> >>>> const struct thermal_trip *trip);
> >>>> --
> >>>
> >>> Generally speaking, this is a matter of getting alignment on the
> >>> expectations between the kernel and user space.
> >>>
> >>> It looks like user space expects to get the notifications in the order
> >>> of either growing or falling temperatures, depending on the direction
> >>> of the temperature change. Ordering the trips in the kernel is not
> >>> practical, but the notifications can be ordered in principle. Is this
> >>> what you'd like to do?
> >>
> >> Yes
> >>
> >>> Or can user space be bothered with recognizing that it may get the
> >>> notifications for different trips out of order?
> >>
> >> IMO it is a bad information if the trip points events are coming
> >> unordered. The temperature signal is a time related measurements, the
> >> userspace should receive thermal information from this signal in the
> >> right order. It sounds strange to track the temperature signal in the
> >> kernel, then scramble the information, pass it to the userspace and
> >> except it to apply some kind of logic to unscramble it.
> >
> > So the notifications can be ordered before sending them out, as long
> > as they are produced by a single __thermal_zone_device_update() call.
> >
> > I guess you also would like the thermal_debug_tz_trip_up/down() calls
> > to be ordered, wouldn't you?
>
> Right
I have an idea how to do this, but it is based on a couple of patches
that I've been working on in the meantime.
Let me post these patches first and then I'll send a prototype patch
addressing this on top of them.
On 06/03/2024 16:41, Rafael J. Wysocki wrote:
> On Wed, Mar 6, 2024 at 2:16 PM Daniel Lezcano <[email protected]> wrote:
>>
>> On 06/03/2024 13:53, Rafael J. Wysocki wrote:
>>> On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <[email protected]> wrote:
>>>>
>>>> On 06/03/2024 13:02, Rafael J. Wysocki wrote:
>>>>
>>>> [ ... ]
>>>>
>>>>>> +#define for_each_trip_reverse(__tz, __trip) \
>>>>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
>>>>>> +
>>>>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
>>>>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
>>>>>> const struct thermal_trip *trip);
>>>>>> --
>>>>>
>>>>> Generally speaking, this is a matter of getting alignment on the
>>>>> expectations between the kernel and user space.
>>>>>
>>>>> It looks like user space expects to get the notifications in the order
>>>>> of either growing or falling temperatures, depending on the direction
>>>>> of the temperature change. Ordering the trips in the kernel is not
>>>>> practical, but the notifications can be ordered in principle. Is this
>>>>> what you'd like to do?
>>>>
>>>> Yes
>>>>
>>>>> Or can user space be bothered with recognizing that it may get the
>>>>> notifications for different trips out of order?
>>>>
>>>> IMO it is a bad information if the trip points events are coming
>>>> unordered. The temperature signal is a time related measurements, the
>>>> userspace should receive thermal information from this signal in the
>>>> right order. It sounds strange to track the temperature signal in the
>>>> kernel, then scramble the information, pass it to the userspace and
>>>> except it to apply some kind of logic to unscramble it.
>>>
>>> So the notifications can be ordered before sending them out, as long
>>> as they are produced by a single __thermal_zone_device_update() call.
>>>
>>> I guess you also would like the thermal_debug_tz_trip_up/down() calls
>>> to be ordered, wouldn't you?
>>
>> Right
>
> I have an idea how to do this, but it is based on a couple of patches
> that I've been working on in the meantime.
>
> Let me post these patches first and then I'll send a prototype patch
> addressing this on top of them.
That is awesome, thanks !
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Wednesday, March 6, 2024 4:55:51 PM CET Daniel Lezcano wrote:
> On 06/03/2024 16:41, Rafael J. Wysocki wrote:
> > On Wed, Mar 6, 2024 at 2:16 PM Daniel Lezcano <[email protected]> wrote:
> >>
> >> On 06/03/2024 13:53, Rafael J. Wysocki wrote:
> >>> On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <[email protected]> wrote:
> >>>>
> >>>> On 06/03/2024 13:02, Rafael J. Wysocki wrote:
> >>>>
> >>>> [ ... ]
> >>>>
> >>>>>> +#define for_each_trip_reverse(__tz, __trip) \
> >>>>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
> >>>>>> +
> >>>>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
> >>>>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
> >>>>>> const struct thermal_trip *trip);
> >>>>>> --
> >>>>>
> >>>>> Generally speaking, this is a matter of getting alignment on the
> >>>>> expectations between the kernel and user space.
> >>>>>
> >>>>> It looks like user space expects to get the notifications in the order
> >>>>> of either growing or falling temperatures, depending on the direction
> >>>>> of the temperature change. Ordering the trips in the kernel is not
> >>>>> practical, but the notifications can be ordered in principle. Is this
> >>>>> what you'd like to do?
> >>>>
> >>>> Yes
> >>>>
> >>>>> Or can user space be bothered with recognizing that it may get the
> >>>>> notifications for different trips out of order?
> >>>>
> >>>> IMO it is a bad information if the trip points events are coming
> >>>> unordered. The temperature signal is a time related measurements, the
> >>>> userspace should receive thermal information from this signal in the
> >>>> right order. It sounds strange to track the temperature signal in the
> >>>> kernel, then scramble the information, pass it to the userspace and
> >>>> except it to apply some kind of logic to unscramble it.
> >>>
> >>> So the notifications can be ordered before sending them out, as long
> >>> as they are produced by a single __thermal_zone_device_update() call.
> >>>
> >>> I guess you also would like the thermal_debug_tz_trip_up/down() calls
> >>> to be ordered, wouldn't you?
> >>
> >> Right
> >
> > I have an idea how to do this, but it is based on a couple of patches
> > that I've been working on in the meantime.
> >
> > Let me post these patches first and then I'll send a prototype patch
> > addressing this on top of them.
>
> That is awesome, thanks !
Anytime!
Now that I've posted this series:
https://lore.kernel.org/linux-pm/4558384.LvFx2qVVIh@kreacher/
I can append the patch below that is based on it.
The idea is really straightforward: Instead of sending the notifications
and recording the stats right away, create two lists of trips for which
they need to be send, sort them and then send the notifications etc in
the right order. I want to avoid explicit memory allocations that can
fail in principle, which is why lists are used.
The reason why two lists are used is in case the trips are updated and
that's why they appear to be crossed (which may not depend on the actual
temperature change).
One caveat is that the lists are sorted by trip thresholds (because they
are the real values take into account in the code), but user space may
expect them to be sorted by trip temperatures instead. That can be changed.
---
drivers/thermal/thermal_core.c | 39 +++++++++++++++++++++++++++++++++------
drivers/thermal/thermal_core.h | 1 +
2 files changed, 34 insertions(+), 6 deletions(-)
Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -15,6 +15,7 @@
#include <linux/slab.h>
#include <linux/kdev_t.h>
#include <linux/idr.h>
+#include <linux/list_sort.h>
#include <linux/thermal.h>
#include <linux/reboot.h>
#include <linux/string.h>
@@ -361,7 +362,9 @@ static void handle_critical_trips(struct
}
static void handle_thermal_trip(struct thermal_zone_device *tz,
- struct thermal_trip_desc *td)
+ struct thermal_trip_desc *td,
+ struct list_head *way_up_list,
+ struct list_head *way_down_list)
{
const struct thermal_trip *trip = &td->trip;
@@ -382,8 +385,7 @@ static void handle_thermal_trip(struct t
* the threshold and the trip temperature will be equal.
*/
if (tz->temperature >= trip->temperature) {
- thermal_notify_tz_trip_up(tz, trip);
- thermal_debug_tz_trip_up(tz, trip);
+ list_add_tail(&td->notify_list_node, way_up_list);
td->threshold = trip->temperature - trip->hysteresis;
} else {
td->threshold = trip->temperature;
@@ -400,8 +402,7 @@ static void handle_thermal_trip(struct t
* the trip.
*/
if (tz->temperature < trip->temperature - trip->hysteresis) {
- thermal_notify_tz_trip_down(tz, trip);
- thermal_debug_tz_trip_down(tz, trip);
+ list_add(&td->notify_list_node, way_down_list);
td->threshold = trip->temperature;
} else {
td->threshold = trip->temperature - trip->hysteresis;
@@ -457,10 +458,24 @@ static void thermal_zone_device_init(str
pos->initialized = false;
}
+static int thermal_trip_notify_cmp(void *ascending, const struct list_head *a,
+ const struct list_head *b)
+{
+ struct thermal_trip_desc *tda = container_of(a, struct thermal_trip_desc,
+ notify_list_node);
+ struct thermal_trip_desc *tdb = container_of(b, struct thermal_trip_desc,
+ notify_list_node);
+ int ret = tdb->threshold - tda->threshold;
+
+ return ascending ? ret : -ret;
+}
+
void __thermal_zone_device_update(struct thermal_zone_device *tz,
enum thermal_notify_event event)
{
struct thermal_trip_desc *td;
+ LIST_HEAD(way_down_list);
+ LIST_HEAD(way_up_list);
if (tz->suspended)
return;
@@ -475,7 +490,19 @@ void __thermal_zone_device_update(struct
tz->notify_event = event;
for_each_trip_desc(tz, td)
- handle_thermal_trip(tz, td);
+ handle_thermal_trip(tz, td, &way_up_list, &way_down_list);
+
+ list_sort((void *)true, &way_up_list, thermal_trip_notify_cmp);
+ list_for_each_entry(td, &way_up_list, notify_list_node) {
+ thermal_notify_tz_trip_up(tz, &td->trip);
+ thermal_debug_tz_trip_up(tz, &td->trip);
+ }
+
+ list_sort(NULL, &way_down_list, thermal_trip_notify_cmp);
+ list_for_each_entry(td, &way_down_list, notify_list_node) {
+ thermal_notify_tz_trip_down(tz, &td->trip);
+ thermal_debug_tz_trip_down(tz, &td->trip);
+ }
monitor_thermal_zone(tz);
}
Index: linux-pm/drivers/thermal/thermal_core.h
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.h
+++ linux-pm/drivers/thermal/thermal_core.h
@@ -17,6 +17,7 @@
struct thermal_trip_desc {
struct thermal_trip trip;
+ struct list_head notify_list_node;
int threshold;
};
On Wed, Mar 6, 2024 at 8:32 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Wednesday, March 6, 2024 4:55:51 PM CET Daniel Lezcano wrote:
> > On 06/03/2024 16:41, Rafael J. Wysocki wrote:
> > > On Wed, Mar 6, 2024 at 2:16 PM Daniel Lezcano <[email protected]> wrote:
> > >>
> > >> On 06/03/2024 13:53, Rafael J. Wysocki wrote:
> > >>> On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <[email protected]> wrote:
> > >>>>
> > >>>> On 06/03/2024 13:02, Rafael J. Wysocki wrote:
> > >>>>
> > >>>> [ ... ]
> > >>>>
> > >>>>>> +#define for_each_trip_reverse(__tz, __trip) \
> > >>>>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--)
> > >>>>>> +
> > >>>>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz);
> > >>>>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz,
> > >>>>>> const struct thermal_trip *trip);
> > >>>>>> --
> > >>>>>
> > >>>>> Generally speaking, this is a matter of getting alignment on the
> > >>>>> expectations between the kernel and user space.
> > >>>>>
> > >>>>> It looks like user space expects to get the notifications in the order
> > >>>>> of either growing or falling temperatures, depending on the direction
> > >>>>> of the temperature change. Ordering the trips in the kernel is not
> > >>>>> practical, but the notifications can be ordered in principle. Is this
> > >>>>> what you'd like to do?
> > >>>>
> > >>>> Yes
> > >>>>
> > >>>>> Or can user space be bothered with recognizing that it may get the
> > >>>>> notifications for different trips out of order?
> > >>>>
> > >>>> IMO it is a bad information if the trip points events are coming
> > >>>> unordered. The temperature signal is a time related measurements, the
> > >>>> userspace should receive thermal information from this signal in the
> > >>>> right order. It sounds strange to track the temperature signal in the
> > >>>> kernel, then scramble the information, pass it to the userspace and
> > >>>> except it to apply some kind of logic to unscramble it.
> > >>>
> > >>> So the notifications can be ordered before sending them out, as long
> > >>> as they are produced by a single __thermal_zone_device_update() call.
> > >>>
> > >>> I guess you also would like the thermal_debug_tz_trip_up/down() calls
> > >>> to be ordered, wouldn't you?
> > >>
> > >> Right
> > >
> > > I have an idea how to do this, but it is based on a couple of patches
> > > that I've been working on in the meantime.
> > >
> > > Let me post these patches first and then I'll send a prototype patch
> > > addressing this on top of them.
> >
> > That is awesome, thanks !
>
> Anytime!
>
> Now that I've posted this series:
>
> https://lore.kernel.org/linux-pm/4558384.LvFx2qVVIh@kreacher/
>
> I can append the patch below that is based on it.
>
> The idea is really straightforward: Instead of sending the notifications
> and recording the stats right away, create two lists of trips for which
> they need to be send, sort them and then send the notifications etc in
> the right order. I want to avoid explicit memory allocations that can
> fail in principle, which is why lists are used.
>
> The reason why two lists are used is in case the trips are updated and
> that's why they appear to be crossed (which may not depend on the actual
> temperature change).
>
> One caveat is that the lists are sorted by trip thresholds (because they
> are the real values take into account in the code), but user space may
> expect them to be sorted by trip temperatures instead. That can be changed.
I've concluded that it is better to sort by trip temperature, because
the thresholds used here are the new ones (computed after the trips
have been crossed) which may be confusing.
> ---
> drivers/thermal/thermal_core.c | 39 +++++++++++++++++++++++++++++++++------
> drivers/thermal/thermal_core.h | 1 +
> 2 files changed, 34 insertions(+), 6 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -15,6 +15,7 @@
> #include <linux/slab.h>
> #include <linux/kdev_t.h>
> #include <linux/idr.h>
> +#include <linux/list_sort.h>
> #include <linux/thermal.h>
> #include <linux/reboot.h>
> #include <linux/string.h>
> @@ -361,7 +362,9 @@ static void handle_critical_trips(struct
> }
>
> static void handle_thermal_trip(struct thermal_zone_device *tz,
> - struct thermal_trip_desc *td)
> + struct thermal_trip_desc *td,
> + struct list_head *way_up_list,
> + struct list_head *way_down_list)
> {
> const struct thermal_trip *trip = &td->trip;
>
> @@ -382,8 +385,7 @@ static void handle_thermal_trip(struct t
> * the threshold and the trip temperature will be equal.
> */
> if (tz->temperature >= trip->temperature) {
> - thermal_notify_tz_trip_up(tz, trip);
> - thermal_debug_tz_trip_up(tz, trip);
> + list_add_tail(&td->notify_list_node, way_up_list);
> td->threshold = trip->temperature - trip->hysteresis;
> } else {
> td->threshold = trip->temperature;
> @@ -400,8 +402,7 @@ static void handle_thermal_trip(struct t
> * the trip.
> */
> if (tz->temperature < trip->temperature - trip->hysteresis) {
> - thermal_notify_tz_trip_down(tz, trip);
> - thermal_debug_tz_trip_down(tz, trip);
> + list_add(&td->notify_list_node, way_down_list);
> td->threshold = trip->temperature;
> } else {
> td->threshold = trip->temperature - trip->hysteresis;
> @@ -457,10 +458,24 @@ static void thermal_zone_device_init(str
> pos->initialized = false;
> }
>
> +static int thermal_trip_notify_cmp(void *ascending, const struct list_head *a,
> + const struct list_head *b)
> +{
> + struct thermal_trip_desc *tda = container_of(a, struct thermal_trip_desc,
> + notify_list_node);
> + struct thermal_trip_desc *tdb = container_of(b, struct thermal_trip_desc,
> + notify_list_node);
> + int ret = tdb->threshold - tda->threshold;
So this will become
+ int ret = tdb->trip.temperature - tda->trip.temperature;
> +
> + return ascending ? ret : -ret;
> +}
> +
> void __thermal_zone_device_update(struct thermal_zone_device *tz,
> enum thermal_notify_event event)
> {
> struct thermal_trip_desc *td;
> + LIST_HEAD(way_down_list);
> + LIST_HEAD(way_up_list);
>
> if (tz->suspended)
> return;
> @@ -475,7 +490,19 @@ void __thermal_zone_device_update(struct
> tz->notify_event = event;
>
> for_each_trip_desc(tz, td)
> - handle_thermal_trip(tz, td);
> + handle_thermal_trip(tz, td, &way_up_list, &way_down_list);
> +
> + list_sort((void *)true, &way_up_list, thermal_trip_notify_cmp);
> + list_for_each_entry(td, &way_up_list, notify_list_node) {
> + thermal_notify_tz_trip_up(tz, &td->trip);
> + thermal_debug_tz_trip_up(tz, &td->trip);
> + }
> +
> + list_sort(NULL, &way_down_list, thermal_trip_notify_cmp);
> + list_for_each_entry(td, &way_down_list, notify_list_node) {
> + thermal_notify_tz_trip_down(tz, &td->trip);
> + thermal_debug_tz_trip_down(tz, &td->trip);
> + }
>
> monitor_thermal_zone(tz);
> }
> Index: linux-pm/drivers/thermal/thermal_core.h
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.h
> +++ linux-pm/drivers/thermal/thermal_core.h
> @@ -17,6 +17,7 @@
>
> struct thermal_trip_desc {
> struct thermal_trip trip;
> + struct list_head notify_list_node;
> int threshold;
> };
>