Thermal framework today supports monitoring for rising temperatures and
subsequently initiating cooling action in case of a thermal trip point
being crossed. There are scenarios where a SoC need some warming action to
be activated if the temperature falls below a cetain permissible limit.
Since warming action can be considered mirror opposite of cooling action,
most of the thermal framework can be re-used to achieve this.
This patch series is yet another attempt to add support for monitoring
falling temperature in thermal framework. Unlike the first attempt[1]
(where a new property was added to thermal trip point binding to indicate
direction of temperature monitoring), this series introduces a new trip
point type (THERMAL_TRIP_COLD) to indicate a trip point at which falling
temperature monitoring must be triggered. This patch series uses Daniel
Lezcano's recently added thermal genetlink interface[2] to notify userspace
of falling temperature and rising temperature at the cold trip point. This
will enable a user space engine to trigger the relevant mitigation for
falling temperature. At present, no support is added to any of the thermal
governors to monitor and mitigate falling temperature at the cold trip
point;rather all governors return doing nothing if triggered for a cold
trip point. As future extension, monitoring of falling temperature can be
added to the relevant thermal governor.
Patch series introducing various resources that are used as warming devices
on Qualcomm sdm845:
https://lkml.org/lkml/2019/7/29/749 (already merged)
https://lkml.org/lkml/2020/6/3/1112 (under review)
1.https://lkml.org/lkml/2019/9/18/1180
2.https://lkml.org/lkml/2020/7/6/238
Thara Gopinath (4):
dt-bindings:thermal:Add cold trip point type
thermal: Add support for cold trip point
thermal:core:Add genetlink notifications for monitoring falling
temperature
thermal: Modify thermal governors to do nothing for "cold" trip points
.../devicetree/bindings/thermal/thermal.txt | 1 +
drivers/thermal/gov_bang_bang.c | 8 +++++++
drivers/thermal/gov_fair_share.c | 8 +++++++
drivers/thermal/gov_power_allocator.c | 8 +++++++
drivers/thermal/gov_step_wise.c | 8 +++++++
drivers/thermal/thermal_core.c | 21 +++++++++++++------
drivers/thermal/thermal_of.c | 1 +
include/uapi/linux/thermal.h | 1 +
8 files changed, 50 insertions(+), 6 deletions(-)
--
2.25.1
Add notification calls for trip type THERMAL_TRIP_COLD when temperature
crosses the trip point in either direction.
Signed-off-by: Thara Gopinath <[email protected]>
---
drivers/thermal/thermal_core.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 750a89f0c20a..e2302ca1cd3b 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -429,12 +429,21 @@ static void handle_thermal_trip(struct thermal_zone_device *tz, int trip)
tz->ops->get_trip_hyst(tz, trip, &hyst);
if (tz->last_temperature != THERMAL_TEMP_INVALID) {
- if (tz->last_temperature < trip_temp &&
- tz->temperature >= trip_temp)
- thermal_notify_tz_trip_up(tz->id, trip);
- if (tz->last_temperature >= trip_temp &&
- tz->temperature < (trip_temp - hyst))
- thermal_notify_tz_trip_down(tz->id, trip);
+ if (type == THERMAL_TRIP_COLD) {
+ if (tz->last_temperature > trip_temp &&
+ tz->temperature <= trip_temp)
+ thermal_notify_tz_trip_down(tz->id, trip);
+ if (tz->last_temperature <= trip_temp &&
+ tz->temperature > (trip_temp + hyst))
+ thermal_notify_tz_trip_up(tz->id, trip);
+ } else {
+ if (tz->last_temperature < trip_temp &&
+ tz->temperature >= trip_temp)
+ thermal_notify_tz_trip_up(tz->id, trip);
+ if (tz->last_temperature >= trip_temp &&
+ tz->temperature < (trip_temp - hyst))
+ thermal_notify_tz_trip_down(tz->id, trip);
+ }
}
if (type == THERMAL_TRIP_CRITICAL || type == THERMAL_TRIP_HOT)
--
2.25.1
On 10/07/2020 15:51, Thara Gopinath wrote:
> Thermal framework today supports monitoring for rising temperatures and
> subsequently initiating cooling action in case of a thermal trip point
> being crossed. There are scenarios where a SoC need some warming action to
> be activated if the temperature falls below a cetain permissible limit.
> Since warming action can be considered mirror opposite of cooling action,
> most of the thermal framework can be re-used to achieve this.
>
> This patch series is yet another attempt to add support for monitoring
> falling temperature in thermal framework. Unlike the first attempt[1]
> (where a new property was added to thermal trip point binding to indicate
> direction of temperature monitoring), this series introduces a new trip
> point type (THERMAL_TRIP_COLD) to indicate a trip point at which falling
> temperature monitoring must be triggered. This patch series uses Daniel
> Lezcano's recently added thermal genetlink interface[2] to notify userspace
> of falling temperature and rising temperature at the cold trip point. This
> will enable a user space engine to trigger the relevant mitigation for
> falling temperature. At present, no support is added to any of the thermal
> governors to monitor and mitigate falling temperature at the cold trip
> point;rather all governors return doing nothing if triggered for a cold
> trip point. As future extension, monitoring of falling temperature can be
> added to the relevant thermal governor.
I agree we need a cold trip point in order to introduce the functioning
temperature range in the thermal framework.
Rui, what is your opinion ?
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
On Mon, 2020-07-13 at 17:03 +0200, Daniel Lezcano wrote:
> On 10/07/2020 15:51, Thara Gopinath wrote:
> > Thermal framework today supports monitoring for rising temperatures
> > and
> > subsequently initiating cooling action in case of a thermal trip
> > point
> > being crossed. There are scenarios where a SoC need some warming
> > action to
> > be activated if the temperature falls below a cetain permissible
> > limit.
> > Since warming action can be considered mirror opposite of cooling
> > action,
> > most of the thermal framework can be re-used to achieve this.
> >
> > This patch series is yet another attempt to add support for
> > monitoring
> > falling temperature in thermal framework. Unlike the first
> > attempt[1]
> > (where a new property was added to thermal trip point binding to
> > indicate
> > direction of temperature monitoring), this series introduces a new
> > trip
> > point type (THERMAL_TRIP_COLD) to indicate a trip point at which
> > falling
> > temperature monitoring must be triggered. This patch series uses
> > Daniel
> > Lezcano's recently added thermal genetlink interface[2] to notify
> > userspace
> > of falling temperature and rising temperature at the cold trip
> > point. This
> > will enable a user space engine to trigger the relevant mitigation
> > for
> > falling temperature. At present, no support is added to any of the
> > thermal
> > governors to monitor and mitigate falling temperature at the cold
> > trip
> > point;rather all governors return doing nothing if triggered for a
> > cold
> > trip point. As future extension, monitoring of falling temperature
> > can be
> > added to the relevant thermal governor.
>
> I agree we need a cold trip point in order to introduce the
> functioning
> temperature range in the thermal framework.
>
> Rui, what is your opinion ?
I agree with the concept of "cold" trip point.
In this patch set, the cold trip point is defined with only netlink
event support. But there are still quite a lot of things unclear,
especially what we should do in thermal framework?
For example, to support this, we can
either
introduce both "cold" trip points and "warming devices", and introduce
new logic in thermal framework and governors to handle them,
Or
introduce "cold" trip point and "warming" device, but only
semantically, and treat them just like normal trip points and cooling
devices. And strictly define cooling state 0 as the state that
generates most heat, and define max cooling state as the state that
generates least heat. Then, say, we have a trip point at -10C, the
"warming" device is set to cooling state 0 when the temperature is
lower than -10C, and in most cases, this thermal zone is always in a
"overheating" state (temperature higher than -10C), and the "warming"
device for this thermal zone is "throttled" to generate as least heat
as possible. And this is pretty much what the current code has always
been doing, right?
I can not say which one is better for now as I don't have the
background of this requirement. It's nice that Thara sent this RFC
series for discussion, but from upstream point of view, I'd prefer to
see a full stack solution, before taking any code.
thanks,
Rui
On 7/14/20 9:49 AM, Zhang Rui wrote:
> On Mon, 2020-07-13 at 17:03 +0200, Daniel Lezcano wrote:
>> On 10/07/2020 15:51, Thara Gopinath wrote:
>>> Thermal framework today supports monitoring for rising temperatures
>>> and
>>> subsequently initiating cooling action in case of a thermal trip
>>> point
>>> being crossed. There are scenarios where a SoC need some warming
>>> action to
>>> be activated if the temperature falls below a cetain permissible
>>> limit.
>>> Since warming action can be considered mirror opposite of cooling
>>> action,
>>> most of the thermal framework can be re-used to achieve this.
>>>
>>> This patch series is yet another attempt to add support for
>>> monitoring
>>> falling temperature in thermal framework. Unlike the first
>>> attempt[1]
>>> (where a new property was added to thermal trip point binding to
>>> indicate
>>> direction of temperature monitoring), this series introduces a new
>>> trip
>>> point type (THERMAL_TRIP_COLD) to indicate a trip point at which
>>> falling
>>> temperature monitoring must be triggered. This patch series uses
>>> Daniel
>>> Lezcano's recently added thermal genetlink interface[2] to notify
>>> userspace
>>> of falling temperature and rising temperature at the cold trip
>>> point. This
>>> will enable a user space engine to trigger the relevant mitigation
>>> for
>>> falling temperature. At present, no support is added to any of the
>>> thermal
>>> governors to monitor and mitigate falling temperature at the cold
>>> trip
>>> point;rather all governors return doing nothing if triggered for a
>>> cold
>>> trip point. As future extension, monitoring of falling temperature
>>> can be
>>> added to the relevant thermal governor.
>>
>> I agree we need a cold trip point in order to introduce the
>> functioning
>> temperature range in the thermal framework.
>>
>> Rui, what is your opinion ?
>
> I agree with the concept of "cold" trip point.
> In this patch set, the cold trip point is defined with only netlink
> event support. But there are still quite a lot of things unclear,
> especially what we should do in thermal framework?
Hi Rui,
Thanks for the comments.
You are right that cold trip points are dealt with only by netlink
events in this patch series. Eventually IMHO, governors should handle
them with a logic opposite to what is being currently done for non-cold
trip points.
>
> For example, to support this, we can
> either
> introduce both "cold" trip points and "warming devices", and introduce
> new logic in thermal framework and governors to handle them,
> Or
> introduce "cold" trip point and "warming" device, but only
> semantically, and treat them just like normal trip points and cooling
> devices. And strictly define cooling state 0 as the state that
> generates most heat, and define max cooling state as the state that
> generates least heat. Then, say, we have a trip point at -10C, the
> "warming" device is set to cooling state 0 when the temperature is
> lower than -10C, and in most cases, this thermal zone is always in a
> "overheating" state (temperature higher than -10C), and the "warming"
> device for this thermal zone is "throttled" to generate as least heat
> as possible. And this is pretty much what the current code has always
> been doing, right?
IMHO, thermal framework should move to a direction where the term
"mitigation" is used rather than cooling or warming. In this case
"cooling dev" and "warming dev" should will become
"temp-mitigating-dev". So going by this, I think what you mention as
option 1 is more suitable where new logic is introduced into the
framework and governors to handle the trip points marked as "cold".
Also in the current set of requirements, we have a few power domain
rails and other resources that are used exclusively in the thermal
framework for warming alone as in they are not used ever for cooling
down a zone. But then one of the requirements we have discussed is for
cpufreq and gpu scaling to be behave as warming devices where the
minimum operating point/ voltage of the relevant cpu/gpu is restricted.
So in this case, Daniel had this suggestion of introducing negative
states for presently what is defined as cooling devices. So cooling dev
/ temp-mitigation-dev states can range from say -3 to 5 with 0 as the
good state where no mitigation is happening. This is an interesting idea
though I have not proto-typed it yet.
>
> I can not say which one is better for now as I don't have the
> background of this requirement. It's nice that Thara sent this RFC
> series for discussion, but from upstream point of view, I'd prefer to
> see a full stack solution, before taking any code.
We had done a session at ELC on this requirement. Here is the link to
the presentation. Hopefully it gives you some back ground on this.
https://elinux.org/images/f/f7/ELC-2020-Thara-Ram-Linux-Kernel-Thermal-Warming.pdf
I have sent across some patches for introducing a generic power domain
warming device which is under review by Daniel.
So how do you want to proceed on this? Can you elaborate a bit more on
what you mean by a full stack solution.
>
> thanks,
> Rui
>
--
Warm Regards
Thara
Hi, Thara,
On Tue, 2020-07-14 at 17:39 -0400, Thara Gopinath wrote:
>
> >
> > For example, to support this, we can
> > either
> > introduce both "cold" trip points and "warming devices", and
> > introduce
> > new logic in thermal framework and governors to handle them,
> > Or
> > introduce "cold" trip point and "warming" device, but only
> > semantically, and treat them just like normal trip points and
> > cooling
> > devices. And strictly define cooling state 0 as the state that
> > generates most heat, and define max cooling state as the state that
> > generates least heat. Then, say, we have a trip point at -10C, the
> > "warming" device is set to cooling state 0 when the temperature is
> > lower than -10C, and in most cases, this thermal zone is always in
> > a
> > "overheating" state (temperature higher than -10C), and the
> > "warming"
> > device for this thermal zone is "throttled" to generate as least
> > heat
> > as possible. And this is pretty much what the current code has
> > always
> > been doing, right?
>
>
> IMHO, thermal framework should move to a direction where the term
> "mitigation" is used rather than cooling or warming. In this case
> "cooling dev" and "warming dev" should will become
> "temp-mitigating-dev". So going by this, I think what you mention as
> option 1 is more suitable where new logic is introduced into the
> framework and governors to handle the trip points marked as "cold".
>
> Also in the current set of requirements, we have a few power domain
> rails and other resources that are used exclusively in the thermal
> framework for warming alone as in they are not used ever for cooling
> down a zone. But then one of the requirements we have discussed is
> for cpufreq and gpu scaling to be behave as warming devices where
> the minimum operating point/ voltage of the relevant cpu/gpu is
> restricted.
> So in this case, Daniel had this suggestion of introducing negative
> states for presently what is defined as cooling devices. So cooling
> dev
> / temp-mitigation-dev states can range from say -3 to 5 with 0 as
> the
> good state where no mitigation is happening. This is an interesting
> idea
> though I have not proto-typed it yet.
Agreed. If some devices support both "cooling" and "warning", we should
have only one "temp-mitigating-dev" instead.
>
> >
> > I can not say which one is better for now as I don't have the
> > background of this requirement. It's nice that Thara sent this RFC
> > series for discussion, but from upstream point of view, I'd prefer
> > to
> > see a full stack solution, before taking any code.
>
> We had done a session at ELC on this requirement. Here is the link
> to
> the presentation. Hopefully it gives you some back ground on this.
yes, it helps. :)
>
>
https://elinux.org/images/f/f7/ELC-2020-Thara-Ram-Linux-Kernel-Thermal-Warming.pdf
>
> I have sent across some patches for introducing a generic power
> domain
> warming device which is under review by Daniel.
>
> So how do you want to proceed on this? Can you elaborate a bit more
> on
> what you mean by a full stack solution.
I mean, the patches, and the idea look good to me, just with some minor
comments. But applying this patch series, alone, does not bring us
anything because we don't have a thermal zone driver that supports cold
trip point, right?
I'd like to see this patch series together with the support in
thermal_core/governors and real users like updated/new thermal
zone/cdev drivers that supports the cold trip point and warming
actions.
Or else I've the concern that this piece of code may be changed back
and forth when prototyping the rest of the support.
thanks,
rui
On Fri, 2020-07-10 at 09:51 -0400, Thara Gopinath wrote:
> Add notification calls for trip type THERMAL_TRIP_COLD when
> temperature
> crosses the trip point in either direction.
>
> Signed-off-by: Thara Gopinath <[email protected]>
> ---
> drivers/thermal/thermal_core.c | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c
> b/drivers/thermal/thermal_core.c
> index 750a89f0c20a..e2302ca1cd3b 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -429,12 +429,21 @@ static void handle_thermal_trip(struct
> thermal_zone_device *tz, int trip)
> tz->ops->get_trip_hyst(tz, trip, &hyst);
>
> if (tz->last_temperature != THERMAL_TEMP_INVALID) {
> - if (tz->last_temperature < trip_temp &&
> - tz->temperature >= trip_temp)
> - thermal_notify_tz_trip_up(tz->id, trip);
> - if (tz->last_temperature >= trip_temp &&
> - tz->temperature < (trip_temp - hyst))
> - thermal_notify_tz_trip_down(tz->id, trip);
> + if (type == THERMAL_TRIP_COLD) {
> + if (tz->last_temperature > trip_temp &&
> + tz->temperature <= trip_temp)
> + thermal_notify_tz_trip_down(tz->id,
> trip);
trip_type should also be part of the event because trip_down/trip_up
for hot trip and cold trip have different meanings.
Or can we use some more generic names like trip_on/trip_off? trip_on
means the trip point is violated or actions need to be taken for the
specific trip points, for both hot and cold trips. I know
trip_on/trip_off doesn't represent what I mean clearly, but surely you
can find a better name.
thanks,
rui
> + if (tz->last_temperature <= trip_temp &&
> + tz->temperature > (trip_temp + hyst))
> + thermal_notify_tz_trip_up(tz->id,
> trip);
> + } else {
> + if (tz->last_temperature < trip_temp &&
> + tz->temperature >= trip_temp)
> + thermal_notify_tz_trip_up(tz->id,
> trip);
> + if (tz->last_temperature >= trip_temp &&
> + tz->temperature < (trip_temp - hyst))
> + thermal_notify_tz_trip_down(tz->id,
> trip);
> + }
> }
>
> if (type == THERMAL_TRIP_CRITICAL || type == THERMAL_TRIP_HOT)
On 7/15/20 4:27 AM, Zhang Rui wrote:
> Hi, Thara,
>
> On Tue, 2020-07-14 at 17:39 -0400, Thara Gopinath wrote:
>>
>>>
>>> For example, to support this, we can
>>> either
>>> introduce both "cold" trip points and "warming devices", and
>>> introduce
>>> new logic in thermal framework and governors to handle them,
>>> Or
>>> introduce "cold" trip point and "warming" device, but only
>>> semantically, and treat them just like normal trip points and
>>> cooling
>>> devices. And strictly define cooling state 0 as the state that
>>> generates most heat, and define max cooling state as the state that
>>> generates least heat. Then, say, we have a trip point at -10C, the
>>> "warming" device is set to cooling state 0 when the temperature is
>>> lower than -10C, and in most cases, this thermal zone is always in
>>> a
>>> "overheating" state (temperature higher than -10C), and the
>>> "warming"
>>> device for this thermal zone is "throttled" to generate as least
>>> heat
>>> as possible. And this is pretty much what the current code has
>>> always
>>> been doing, right?
>>
>>
>> IMHO, thermal framework should move to a direction where the term
>> "mitigation" is used rather than cooling or warming. In this case
>> "cooling dev" and "warming dev" should will become
>> "temp-mitigating-dev". So going by this, I think what you mention as
>> option 1 is more suitable where new logic is introduced into the
>> framework and governors to handle the trip points marked as "cold".
>>
>> Also in the current set of requirements, we have a few power domain
>> rails and other resources that are used exclusively in the thermal
>> framework for warming alone as in they are not used ever for cooling
>> down a zone. But then one of the requirements we have discussed is
>> for cpufreq and gpu scaling to be behave as warming devices where
>> the minimum operating point/ voltage of the relevant cpu/gpu is
>> restricted.
>> So in this case, Daniel had this suggestion of introducing negative
>> states for presently what is defined as cooling devices. So cooling
>> dev
>> / temp-mitigation-dev states can range from say -3 to 5 with 0 as
>> the
>> good state where no mitigation is happening. This is an interesting
>> idea
>> though I have not proto-typed it yet.
>
> Agreed. If some devices support both "cooling" and "warning", we should
> have only one "temp-mitigating-dev" instead.
>>
>>>
>>> I can not say which one is better for now as I don't have the
>>> background of this requirement. It's nice that Thara sent this RFC
>>> series for discussion, but from upstream point of view, I'd prefer
>>> to
>>> see a full stack solution, before taking any code.
>>
>> We had done a session at ELC on this requirement. Here is the link
>> to
>> the presentation. Hopefully it gives you some back ground on this.
>
> yes, it helps. :)
>>
>>
> https://elinux.org/images/f/f7/ELC-2020-Thara-Ram-Linux-Kernel-Thermal-Warming.pdf
>>
>> I have sent across some patches for introducing a generic power
>> domain
>> warming device which is under review by Daniel.
>>
>> So how do you want to proceed on this? Can you elaborate a bit more
>> on
>> what you mean by a full stack solution.
>
> I mean, the patches, and the idea look good to me, just with some minor
> comments. But applying this patch series, alone, does not bring us
> anything because we don't have a thermal zone driver that supports cold
> trip point, right?
> I'd like to see this patch series together with the support in
> thermal_core/governors and real users like updated/new thermal
> zone/cdev drivers that supports the cold trip point and warming
> actions.
> Or else I've the concern that this piece of code may be changed back
> and forth when prototyping the rest of the support.
Got it! I will try to include more pieces in the next version.
>
> thanks,
> rui
>
--
Warm Regards
Thara
On 7/15/20 4:46 AM, Zhang Rui wrote:
> On Fri, 2020-07-10 at 09:51 -0400, Thara Gopinath wrote:
>> Add notification calls for trip type THERMAL_TRIP_COLD when
>> temperature
>> crosses the trip point in either direction.
>>
>> Signed-off-by: Thara Gopinath <[email protected]>
>> ---
>> drivers/thermal/thermal_core.c | 21 +++++++++++++++------
>> 1 file changed, 15 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/thermal/thermal_core.c
>> b/drivers/thermal/thermal_core.c
>> index 750a89f0c20a..e2302ca1cd3b 100644
>> --- a/drivers/thermal/thermal_core.c
>> +++ b/drivers/thermal/thermal_core.c
>> @@ -429,12 +429,21 @@ static void handle_thermal_trip(struct
>> thermal_zone_device *tz, int trip)
>> tz->ops->get_trip_hyst(tz, trip, &hyst);
>>
>> if (tz->last_temperature != THERMAL_TEMP_INVALID) {
>> - if (tz->last_temperature < trip_temp &&
>> - tz->temperature >= trip_temp)
>> - thermal_notify_tz_trip_up(tz->id, trip);
>> - if (tz->last_temperature >= trip_temp &&
>> - tz->temperature < (trip_temp - hyst))
>> - thermal_notify_tz_trip_down(tz->id, trip);
>> + if (type == THERMAL_TRIP_COLD) {
>> + if (tz->last_temperature > trip_temp &&
>> + tz->temperature <= trip_temp)
>> + thermal_notify_tz_trip_down(tz->id,
>> trip);
>
> trip_type should also be part of the event because trip_down/trip_up
> for hot trip and cold trip have different meanings.
> Or can we use some more generic names like trip_on/trip_off? trip_on
> means the trip point is violated or actions need to be taken for the
> specific trip points, for both hot and cold trips. I know
> trip_on/trip_off doesn't represent what I mean clearly, but surely you
> can find a better name.
Makes sense.. I will fix this in the next version. I don't have a good
name at the moment but will think of something!
--
Warm Regards
Thara