2023-12-20 23:17:06

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 0/9] Add callback for cooling list update to speed-up IPA

Hi all,

The patch set adds a new callback for thermal governors and implementation for
Intelligent Power Allocator.

The goal is to move some heavy operations like the memory allocations and heavy
computations (multiplications) out of throttle() callback hot path.

The new callback is generic enough to handle other important update events.
It re-uses existing thermal_notify_event definitions.

In addition there are some small clean-ups for IPA code.

The patch set is based on current pm/bleeding-edge branch (20 Dec).

changes:
v3:
- changed helper name to thermal_governor_update_tz() with also
"reason" argument (Rafael)
- added thermal_governor_update_tz() to thermal_core.h for use from sysfs
functions
- changed names of the events (THERMAL_TZ_BIND_CDEV) (Rafael)
- patch 2/9 changed header and comment for function (Rafael)
- patch 3/9: used unsigned types for num_actors, buffer_size (Rafael)
- changed trace functions and added new patch 4/9 to be prepare tracing for
different internal IPA array; it also drops dynamic array inside trace event
- used new structure power_actor and changed the code in patch 5/9 (Rafael)
- keept the local num_actors variable (Rafael)
- patch 6/9 skipped redundant parens and changed the header desc. (Rafael)
- patch 7/9 changed header and used instance->tz->lock (Rafael)
- patch 8/9 removed handle_weight_update() and renamed new event to
THERMAL_INSTANCE_WEIGHT_CHANGE (Rafael)
- patch 9/9 aliged to use THERMAL_INSTANCE_WEIGHT_CHANGE is switch (Rafael)

v2 can be found here [1]

Regards,
Lukasz

[1] https://lore.kernel.org/lkml/[email protected]/

Lukasz Luba (9):
thermal: core: Add governor callback for thermal zone change
thermal: gov_power_allocator: Refactor check_power_actors()
thermal: gov_power_allocator: Refactor checks in divvy_up_power()
thermal: gov_power_allocator: Change trace functions
thermal: gov_power_allocator: Move memory allocation out of throttle()
thermal: gov_power_allocator: Simplify checks for valid power actor
thermal/sysfs: Update instance->weight under tz lock
thermal/sysfs: Update governors when the 'weight' has changed
thermal: gov_power_allocator: Support new update callback of weights

drivers/thermal/gov_power_allocator.c | 269 ++++++++++++++++----------
drivers/thermal/thermal_core.c | 14 ++
drivers/thermal/thermal_core.h | 2 +
drivers/thermal/thermal_sysfs.c | 7 +
drivers/thermal/thermal_trace_ipa.h | 50 +++--
include/linux/thermal.h | 7 +
6 files changed, 226 insertions(+), 123 deletions(-)

--
2.25.1



2023-12-20 23:17:15

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 1/9] thermal: core: Add governor callback for thermal zone change

Add a new callback to the struct thermal_governor. It can be used for
updating governors when there is a change in the thermal zone internals,
e.g. thermal cooling device is bind to the thermal zone.

That makes possible to move some heavy operations like memory allocations
related to the number of cooling instances out of the throttle() callback.

Both callback code paths (throttle() and update_tz()) are protected with
the same thermal zone lock, which guaranties the consistency.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/thermal_core.c | 14 ++++++++++++++
drivers/thermal/thermal_core.h | 2 ++
include/linux/thermal.h | 6 ++++++
3 files changed, 22 insertions(+)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 0d761afb7cbc..62979c5401c3 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -311,6 +311,15 @@ static void handle_non_critical_trips(struct thermal_zone_device *tz,
def_governor->throttle(tz, trip);
}

+void thermal_governor_update_tz(struct thermal_zone_device *tz,
+ enum thermal_notify_event reason)
+{
+ if (!tz->governor || !tz->governor->update_tz)
+ return;
+
+ tz->governor->update_tz(tz, reason);
+}
+
static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdown)
{
/*
@@ -727,6 +736,8 @@ int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
list_add_tail(&dev->tz_node, &tz->thermal_instances);
list_add_tail(&dev->cdev_node, &cdev->thermal_instances);
atomic_set(&tz->need_update, 1);
+
+ thermal_governor_update_tz(tz, THERMAL_TZ_BIND_CDEV);
}
mutex_unlock(&cdev->lock);
mutex_unlock(&tz->lock);
@@ -785,6 +796,9 @@ int thermal_unbind_cdev_from_trip(struct thermal_zone_device *tz,
if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
list_del(&pos->tz_node);
list_del(&pos->cdev_node);
+
+ thermal_governor_update_tz(tz, THERMAL_TZ_UNBIND_CDEV);
+
mutex_unlock(&cdev->lock);
mutex_unlock(&tz->lock);
goto unbind;
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index b5e6743bd157..e6a2b6f97be8 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -115,6 +115,8 @@ int thermal_build_list_of_policies(char *buf);
void __thermal_zone_device_update(struct thermal_zone_device *tz,
enum thermal_notify_event event);
void thermal_zone_device_critical_reboot(struct thermal_zone_device *tz);
+void thermal_governor_update_tz(struct thermal_zone_device *tz,
+ enum thermal_notify_event reason);

/* Helpers */
#define for_each_trip(__tz, __trip) \
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index c0fabbcdefb9..1b76878cea46 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -51,6 +51,8 @@ enum thermal_notify_event {
THERMAL_DEVICE_POWER_CAPABILITY_CHANGED, /* power capability changed */
THERMAL_TABLE_CHANGED, /* Thermal table(s) changed */
THERMAL_EVENT_KEEP_ALIVE, /* Request for user space handler to respond */
+ THERMAL_TZ_BIND_CDEV, /* Cooling dev is bind to the thermal zone */
+ THERMAL_TZ_UNBIND_CDEV, /* Cooling dev is unbind from the thermal zone */
};

/**
@@ -197,6 +199,8 @@ struct thermal_zone_device {
* thermal zone.
* @throttle: callback called for every trip point even if temperature is
* below the trip point temperature
+ * @update_tz: callback called when thermal zone internals have changed, e.g.
+ * thermal cooling instance was added/removed
* @governor_list: node in thermal_governor_list (in thermal_core.c)
*/
struct thermal_governor {
@@ -205,6 +209,8 @@ struct thermal_governor {
void (*unbind_from_tz)(struct thermal_zone_device *tz);
int (*throttle)(struct thermal_zone_device *tz,
const struct thermal_trip *trip);
+ void (*update_tz)(struct thermal_zone_device *tz,
+ enum thermal_notify_event reason);
struct list_head governor_list;
};

--
2.25.1


2023-12-20 23:17:35

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 2/9] thermal: gov_power_allocator: Refactor check_power_actors()

In preparation for a subsequent change, rearrange check_power_actors().

No intentional functional impact.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/gov_power_allocator.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 785fff14223d..d9175d9f5e3f 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -581,8 +581,9 @@ static void allow_maximum_power(struct thermal_zone_device *tz, bool update)
* power actor API. The warning should help to investigate the issue, which
* could be e.g. lack of Energy Model for a given device.
*
- * Return: 0 on success, -EINVAL if any cooling device does not implement
- * the power actor API.
+ * If all of the cooling devices currently attached to @tz implement the power
+ * actor API, return the number of them (which may be 0, because some cooling
+ * devices may be attached later). Otherwise, return -EINVAL.
*/
static int check_power_actors(struct thermal_zone_device *tz,
struct power_allocator_params *params)
@@ -597,8 +598,9 @@ static int check_power_actors(struct thermal_zone_device *tz,
if (!cdev_is_power_actor(instance->cdev)) {
dev_warn(&tz->device, "power_allocator: %s is not a power actor\n",
instance->cdev->type);
- ret = -EINVAL;
+ return -EINVAL;
}
+ ret++;
}

return ret;
@@ -631,7 +633,7 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
}

ret = check_power_actors(tz, params);
- if (ret) {
+ if (ret < 0) {
dev_warn(&tz->device, "power_allocator: binding failed\n");
kfree(params);
return ret;
--
2.25.1


2023-12-20 23:17:49

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 3/9] thermal: gov_power_allocator: Refactor checks in divvy_up_power()

Simplify the code and remove one extra 'if' block.

No intentional functional impact.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/gov_power_allocator.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index d9175d9f5e3f..9e35ebd7cb03 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -332,7 +332,8 @@ static void divvy_up_power(u32 *req_power, u32 *max_power, int num_actors,
u32 total_req_power, u32 power_range,
u32 *granted_power, u32 *extra_actor_power)
{
- u32 extra_power, capped_extra_power;
+ u32 capped_extra_power = 0;
+ u32 extra_power = 0;
int i;

/*
@@ -341,8 +342,6 @@ static void divvy_up_power(u32 *req_power, u32 *max_power, int num_actors,
if (!total_req_power)
total_req_power = 1;

- capped_extra_power = 0;
- extra_power = 0;
for (i = 0; i < num_actors; i++) {
u64 req_range = (u64)req_power[i] * power_range;

@@ -358,7 +357,7 @@ static void divvy_up_power(u32 *req_power, u32 *max_power, int num_actors,
capped_extra_power += extra_actor_power[i];
}

- if (!extra_power)
+ if (!extra_power || !capped_extra_power)
return;

/*
@@ -366,12 +365,13 @@ static void divvy_up_power(u32 *req_power, u32 *max_power, int num_actors,
* how far they are from the max
*/
extra_power = min(extra_power, capped_extra_power);
- if (capped_extra_power > 0)
- for (i = 0; i < num_actors; i++) {
- u64 extra_range = (u64)extra_actor_power[i] * extra_power;
- granted_power[i] += DIV_ROUND_CLOSEST_ULL(extra_range,
- capped_extra_power);
- }
+
+ for (i = 0; i < num_actors; i++) {
+ u64 extra_range = (u64)extra_actor_power[i] * extra_power;
+
+ granted_power[i] += DIV_ROUND_CLOSEST_ULL(extra_range,
+ capped_extra_power);
+ }
}

static int allocate_power(struct thermal_zone_device *tz, int control_temp)
--
2.25.1


2023-12-20 23:18:06

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 4/9] thermal: gov_power_allocator: Change trace functions

Change trace event trace_thermal_power_allocator() to not use dynamic
array for requested power and granted power for all power actors.
Instead, simplify the trace event and print other simple values.

Add new trace event to print power actor information of requested power
and granted power. That trace event would be called in a loop for each
power actor. The trace data would be easier to parse comparing to the
dynamic array implementation.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/gov_power_allocator.c | 5 +--
drivers/thermal/thermal_trace_ipa.h | 50 ++++++++++++++++-----------
2 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 9e35ebd7cb03..53283fd8a944 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -469,11 +469,12 @@ static int allocate_power(struct thermal_zone_device *tz, int control_temp)
granted_power[i]);
total_granted_power += granted_power[i];

+ trace_thermal_power_actor(tz, i, req_power[i],
+ granted_power[i]);
i++;
}

- trace_thermal_power_allocator(tz, req_power, total_req_power,
- granted_power, total_granted_power,
+ trace_thermal_power_allocator(tz, total_req_power, total_granted_power,
num_actors, power_range,
max_allocatable_power, tz->temperature,
control_temp - tz->temperature);
diff --git a/drivers/thermal/thermal_trace_ipa.h b/drivers/thermal/thermal_trace_ipa.h
index 84568db5421b..b16b5dd863d9 100644
--- a/drivers/thermal/thermal_trace_ipa.h
+++ b/drivers/thermal/thermal_trace_ipa.h
@@ -8,19 +8,14 @@
#include <linux/tracepoint.h>

TRACE_EVENT(thermal_power_allocator,
- TP_PROTO(struct thermal_zone_device *tz, u32 *req_power,
- u32 total_req_power, u32 *granted_power,
- u32 total_granted_power, size_t num_actors,
- u32 power_range, u32 max_allocatable_power,
- int current_temp, s32 delta_temp),
- TP_ARGS(tz, req_power, total_req_power, granted_power,
- total_granted_power, num_actors, power_range,
- max_allocatable_power, current_temp, delta_temp),
+ TP_PROTO(struct thermal_zone_device *tz, u32 total_req_power,
+ u32 total_granted_power, int num_actors, u32 power_range,
+ u32 max_allocatable_power, int current_temp, s32 delta_temp),
+ TP_ARGS(tz, total_req_power, total_granted_power, num_actors,
+ power_range, max_allocatable_power, current_temp, delta_temp),
TP_STRUCT__entry(
__field(int, tz_id )
- __dynamic_array(u32, req_power, num_actors )
__field(u32, total_req_power )
- __dynamic_array(u32, granted_power, num_actors)
__field(u32, total_granted_power )
__field(size_t, num_actors )
__field(u32, power_range )
@@ -30,11 +25,7 @@ TRACE_EVENT(thermal_power_allocator,
),
TP_fast_assign(
__entry->tz_id = tz->id;
- memcpy(__get_dynamic_array(req_power), req_power,
- num_actors * sizeof(*req_power));
__entry->total_req_power = total_req_power;
- memcpy(__get_dynamic_array(granted_power), granted_power,
- num_actors * sizeof(*granted_power));
__entry->total_granted_power = total_granted_power;
__entry->num_actors = num_actors;
__entry->power_range = power_range;
@@ -43,18 +34,35 @@ TRACE_EVENT(thermal_power_allocator,
__entry->delta_temp = delta_temp;
),

- TP_printk("thermal_zone_id=%d req_power={%s} total_req_power=%u granted_power={%s} total_granted_power=%u power_range=%u max_allocatable_power=%u current_temperature=%d delta_temperature=%d",
- __entry->tz_id,
- __print_array(__get_dynamic_array(req_power),
- __entry->num_actors, 4),
- __entry->total_req_power,
- __print_array(__get_dynamic_array(granted_power),
- __entry->num_actors, 4),
+ TP_printk("thermal_zone_id=%d total_req_power=%u total_granted_power=%u power_range=%u max_allocatable_power=%u current_temperature=%d delta_temperature=%d",
+ __entry->tz_id, __entry->total_req_power,
__entry->total_granted_power, __entry->power_range,
__entry->max_allocatable_power, __entry->current_temp,
__entry->delta_temp)
);

+TRACE_EVENT(thermal_power_actor,
+ TP_PROTO(struct thermal_zone_device *tz, int actor_id, u32 req_power,
+ u32 granted_power),
+ TP_ARGS(tz, actor_id, req_power, granted_power),
+ TP_STRUCT__entry(
+ __field(int, tz_id)
+ __field(int, actor_id)
+ __field(u32, req_power)
+ __field(u32, granted_power)
+ ),
+ TP_fast_assign(
+ __entry->tz_id = tz->id;
+ __entry->actor_id = actor_id;
+ __entry->req_power = req_power;
+ __entry->granted_power = granted_power;
+ ),
+
+ TP_printk("thermal_zone_id=%d actor_id=%d req_power=%u granted_power=%u",
+ __entry->tz_id, __entry->actor_id, __entry->req_power,
+ __entry->granted_power)
+);
+
TRACE_EVENT(thermal_power_allocator_pid,
TP_PROTO(struct thermal_zone_device *tz, s32 err, s32 err_integral,
s64 p, s64 i, s64 d, s32 output),
--
2.25.1


2023-12-20 23:18:22

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 5/9] thermal: gov_power_allocator: Move memory allocation out of throttle()

The new thermal callback allows to react to the change of cooling
instances in the thermal zone. Move the memory allocation to that new
callback and save CPU cycles in the throttle() code path.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/gov_power_allocator.c | 207 +++++++++++++++++---------
1 file changed, 136 insertions(+), 71 deletions(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 53283fd8a944..626c635f137f 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -46,6 +46,22 @@ static inline s64 div_frac(s64 x, s64 y)
return div_s64(x << FRAC_BITS, y);
}

+/**
+ * struct power_actor - internal power information for power actor
+ * @req_power: requested power value (not weighted)
+ * @max_power: max allocatable power for this actor
+ * @granted_power: granted power for this actor
+ * @extra_actor_power: extra power that this actor can receive
+ * @weighted_req_power: weighted requested power as input to IPA
+ */
+struct power_actor {
+ u32 req_power;
+ u32 max_power;
+ u32 granted_power;
+ u32 extra_actor_power;
+ u32 weighted_req_power;
+};
+
/**
* struct power_allocator_params - parameters for the power allocator governor
* @allocated_tzp: whether we have allocated tzp for this thermal zone and
@@ -61,6 +77,9 @@ static inline s64 div_frac(s64 x, s64 y)
* @trip_switch_on should be NULL.
* @trip_max: last passive trip point of the thermal zone. The
* temperature we are controlling for.
+ * @num_actors: number of cooling devices supporting IPA callbacks
+ * @buffer_size: internal buffer size, to avoid runtime re-calculation
+ * @power: buffer for all power actors internal power information
*/
struct power_allocator_params {
bool allocated_tzp;
@@ -69,6 +88,9 @@ struct power_allocator_params {
u32 sustainable_power;
const struct thermal_trip *trip_switch_on;
const struct thermal_trip *trip_max;
+ unsigned int num_actors;
+ unsigned int buffer_size;
+ struct power_actor *power;
};

/**
@@ -303,15 +325,10 @@ power_actor_set_power(struct thermal_cooling_device *cdev,

/**
* divvy_up_power() - divvy the allocated power between the actors
- * @req_power: each actor's requested power
- * @max_power: each actor's maximum available power
- * @num_actors: size of the @req_power, @max_power and @granted_power's array
- * @total_req_power: sum of @req_power
+ * @power: buffer for all power actors internal power information
+ * @num_actors: number of power actors in this thermal zone
+ * @total_req_power: sum of all weighted requested power for all actors
* @power_range: total allocated power
- * @granted_power: output array: each actor's granted power
- * @extra_actor_power: an appropriately sized array to be used in the
- * function as temporary storage of the extra power given
- * to the actors
*
* This function divides the total allocated power (@power_range)
* fairly between the actors. It first tries to give each actor a
@@ -324,13 +341,9 @@ power_actor_set_power(struct thermal_cooling_device *cdev,
* If any actor received more than their maximum power, then that
* surplus is re-divvied among the actors based on how far they are
* from their respective maximums.
- *
- * Granted power for each actor is written to @granted_power, which
- * should've been allocated by the calling function.
*/
-static void divvy_up_power(u32 *req_power, u32 *max_power, int num_actors,
- u32 total_req_power, u32 power_range,
- u32 *granted_power, u32 *extra_actor_power)
+static void divvy_up_power(struct power_actor *power, int num_actors,
+ u32 total_req_power, u32 power_range)
{
u32 capped_extra_power = 0;
u32 extra_power = 0;
@@ -343,18 +356,19 @@ static void divvy_up_power(u32 *req_power, u32 *max_power, int num_actors,
total_req_power = 1;

for (i = 0; i < num_actors; i++) {
- u64 req_range = (u64)req_power[i] * power_range;
+ struct power_actor *pa = &power[i];
+ u64 req_range = (u64)pa->req_power * power_range;

- granted_power[i] = DIV_ROUND_CLOSEST_ULL(req_range,
- total_req_power);
+ pa->granted_power = DIV_ROUND_CLOSEST_ULL(req_range,
+ total_req_power);

- if (granted_power[i] > max_power[i]) {
- extra_power += granted_power[i] - max_power[i];
- granted_power[i] = max_power[i];
+ if (pa->granted_power > pa->max_power) {
+ extra_power += pa->granted_power - pa->max_power;
+ pa->granted_power = pa->max_power;
}

- extra_actor_power[i] = max_power[i] - granted_power[i];
- capped_extra_power += extra_actor_power[i];
+ pa->extra_actor_power = pa->max_power - pa->granted_power;
+ capped_extra_power += pa->extra_actor_power;
}

if (!extra_power || !capped_extra_power)
@@ -367,61 +381,44 @@ static void divvy_up_power(u32 *req_power, u32 *max_power, int num_actors,
extra_power = min(extra_power, capped_extra_power);

for (i = 0; i < num_actors; i++) {
- u64 extra_range = (u64)extra_actor_power[i] * extra_power;
+ struct power_actor *pa = &power[i];
+ u64 extra_range = pa->extra_actor_power;

- granted_power[i] += DIV_ROUND_CLOSEST_ULL(extra_range,
- capped_extra_power);
+ extra_range *= extra_power;
+ pa->granted_power += DIV_ROUND_CLOSEST_ULL(extra_range,
+ capped_extra_power);
}
}

static int allocate_power(struct thermal_zone_device *tz, int control_temp)
{
- u32 *req_power, *max_power, *granted_power, *extra_actor_power;
struct power_allocator_params *params = tz->governor_data;
+ unsigned int num_actors = params->num_actors;
+ struct power_actor *power = params->power;
struct thermal_cooling_device *cdev;
struct thermal_instance *instance;
u32 total_weighted_req_power = 0;
u32 max_allocatable_power = 0;
u32 total_granted_power = 0;
u32 total_req_power = 0;
- u32 *weighted_req_power;
u32 power_range, weight;
int total_weight = 0;
- int num_actors = 0;
- int i = 0;
-
- list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
- if ((instance->trip == params->trip_max) &&
- cdev_is_power_actor(instance->cdev)) {
- num_actors++;
- total_weight += instance->weight;
- }
- }
+ int i = 0, ret;

if (!num_actors)
return -ENODEV;

- /*
- * We need to allocate five arrays of the same size:
- * req_power, max_power, granted_power, extra_actor_power and
- * weighted_req_power. They are going to be needed until this
- * function returns. Allocate them all in one go to simplify
- * the allocation and deallocation logic.
- */
- BUILD_BUG_ON(sizeof(*req_power) != sizeof(*max_power));
- BUILD_BUG_ON(sizeof(*req_power) != sizeof(*granted_power));
- BUILD_BUG_ON(sizeof(*req_power) != sizeof(*extra_actor_power));
- BUILD_BUG_ON(sizeof(*req_power) != sizeof(*weighted_req_power));
- req_power = kcalloc(num_actors * 5, sizeof(*req_power), GFP_KERNEL);
- if (!req_power)
- return -ENOMEM;
+ list_for_each_entry(instance, &tz->thermal_instances, tz_node)
+ if ((instance->trip == params->trip_max) &&
+ cdev_is_power_actor(instance->cdev))
+ total_weight += instance->weight;

- max_power = &req_power[num_actors];
- granted_power = &req_power[2 * num_actors];
- extra_actor_power = &req_power[3 * num_actors];
- weighted_req_power = &req_power[4 * num_actors];
+ /* Clean all buffers for new power estimations */
+ memset(power, 0, params->buffer_size);

list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
+ struct power_actor *pa = &power[i];
+
cdev = instance->cdev;

if (instance->trip != params->trip_max)
@@ -430,7 +427,8 @@ static int allocate_power(struct thermal_zone_device *tz, int control_temp)
if (!cdev_is_power_actor(cdev))
continue;

- if (cdev->ops->get_requested_power(cdev, &req_power[i]))
+ ret = cdev->ops->get_requested_power(cdev, &pa->req_power);
+ if (ret)
continue;

if (!total_weight)
@@ -438,27 +436,29 @@ static int allocate_power(struct thermal_zone_device *tz, int control_temp)
else
weight = instance->weight;

- weighted_req_power[i] = frac_to_int(weight * req_power[i]);
+ pa->weighted_req_power = frac_to_int(weight * pa->req_power);

- if (cdev->ops->state2power(cdev, instance->lower,
- &max_power[i]))
+ ret = cdev->ops->state2power(cdev, instance->lower,
+ &pa->max_power);
+ if (ret)
continue;

- total_req_power += req_power[i];
- max_allocatable_power += max_power[i];
- total_weighted_req_power += weighted_req_power[i];
+ total_req_power += pa->req_power;
+ max_allocatable_power += pa->max_power;
+ total_weighted_req_power += pa->weighted_req_power;

i++;
}

power_range = pid_controller(tz, control_temp, max_allocatable_power);

- divvy_up_power(weighted_req_power, max_power, num_actors,
- total_weighted_req_power, power_range, granted_power,
- extra_actor_power);
+ divvy_up_power(power, num_actors, total_weighted_req_power,
+ power_range);

i = 0;
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
+ struct power_actor *pa = &power[i];
+
if (instance->trip != params->trip_max)
continue;

@@ -466,11 +466,11 @@ static int allocate_power(struct thermal_zone_device *tz, int control_temp)
continue;

power_actor_set_power(instance->cdev, instance,
- granted_power[i]);
- total_granted_power += granted_power[i];
+ pa->granted_power);
+ total_granted_power += pa->granted_power;

- trace_thermal_power_actor(tz, i, req_power[i],
- granted_power[i]);
+ trace_thermal_power_actor(tz, i, pa->req_power,
+ pa->granted_power);
i++;
}

@@ -479,8 +479,6 @@ static int allocate_power(struct thermal_zone_device *tz, int control_temp)
max_allocatable_power, tz->temperature,
control_temp - tz->temperature);

- kfree(req_power);
-
return 0;
}

@@ -607,6 +605,63 @@ static int check_power_actors(struct thermal_zone_device *tz,
return ret;
}

+static int allocate_actors_buffer(struct power_allocator_params *params,
+ int num_actors)
+{
+ int ret;
+
+ kfree(params->power);
+
+ /* There might be no cooling devices yet. */
+ if (!num_actors) {
+ ret = -EINVAL;
+ goto clean_state;
+ }
+
+ params->power = kcalloc(num_actors, sizeof(struct power_actor),
+ GFP_KERNEL);
+ if (!params->power) {
+ ret = -ENOMEM;
+ goto clean_state;
+ }
+
+ params->num_actors = num_actors;
+ params->buffer_size = num_actors * sizeof(struct power_actor);
+
+ return 0;
+
+clean_state:
+ params->num_actors = 0;
+ params->buffer_size = 0;
+ params->power = NULL;
+ return ret;
+}
+
+static void power_allocator_update_tz(struct thermal_zone_device *tz,
+ enum thermal_notify_event reason)
+{
+ struct power_allocator_params *params = tz->governor_data;
+ struct thermal_instance *instance;
+ int num_actors = 0;
+
+ switch (reason) {
+ case THERMAL_TZ_BIND_CDEV:
+ case THERMAL_TZ_UNBIND_CDEV:
+ list_for_each_entry(instance, &tz->thermal_instances, tz_node)
+ if ((instance->trip == params->trip_max) &&
+ cdev_is_power_actor(instance->cdev))
+ num_actors++;
+
+ if (num_actors == params->num_actors)
+ return;
+
+ allocate_actors_buffer(params, num_actors);
+ break;
+ default:
+ break;
+ }
+}
+
/**
* power_allocator_bind() - bind the power_allocator governor to a thermal zone
* @tz: thermal zone to bind it to
@@ -640,6 +695,13 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
return ret;
}

+ ret = allocate_actors_buffer(params, ret);
+ if (ret) {
+ dev_warn(&tz->device, "power_allocator: allocation failed\n");
+ kfree(params);
+ return ret;
+ }
+
if (!tz->tzp) {
tz->tzp = kzalloc(sizeof(*tz->tzp), GFP_KERNEL);
if (!tz->tzp) {
@@ -664,6 +726,7 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
return 0;

free_params:
+ kfree(params->power);
kfree(params);

return ret;
@@ -680,6 +743,7 @@ static void power_allocator_unbind(struct thermal_zone_device *tz)
tz->tzp = NULL;
}

+ kfree(params->power);
kfree(tz->governor_data);
tz->governor_data = NULL;
}
@@ -718,5 +782,6 @@ static struct thermal_governor thermal_gov_power_allocator = {
.bind_to_tz = power_allocator_bind,
.unbind_from_tz = power_allocator_unbind,
.throttle = power_allocator_throttle,
+ .update_tz = power_allocator_update_tz,
};
THERMAL_GOVERNOR_DECLARE(thermal_gov_power_allocator);
--
2.25.1


2023-12-20 23:18:38

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 6/9] thermal: gov_power_allocator: Simplify checks for valid power actor

There is a need to check if the cooling device in the thermal zone
supports IPA callback and is set for control trip point.
Refactor the code which validates the power actor capabilities and
make it more consistent in all places.

No intentional functional impact.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/gov_power_allocator.c | 40 ++++++++++++---------------
1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 626c635f137f..b5ec60ae7efd 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -93,6 +93,13 @@ struct power_allocator_params {
struct power_actor *power;
};

+static bool power_actor_is_valid(struct power_allocator_params *params,
+ struct thermal_instance *instance)
+{
+ return (instance->trip == params->trip_max &&
+ cdev_is_power_actor(instance->cdev));
+}
+
/**
* estimate_sustainable_power() - Estimate the sustainable power of a thermal zone
* @tz: thermal zone we are operating in
@@ -113,14 +120,10 @@ static u32 estimate_sustainable_power(struct thermal_zone_device *tz)
u32 min_power;

list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
- cdev = instance->cdev;
-
- if (instance->trip != params->trip_max)
- continue;
-
- if (!cdev_is_power_actor(cdev))
+ if (!power_actor_is_valid(params, instance))
continue;

+ cdev = instance->cdev;
if (cdev->ops->state2power(cdev, instance->upper, &min_power))
continue;

@@ -409,8 +412,7 @@ static int allocate_power(struct thermal_zone_device *tz, int control_temp)
return -ENODEV;

list_for_each_entry(instance, &tz->thermal_instances, tz_node)
- if ((instance->trip == params->trip_max) &&
- cdev_is_power_actor(instance->cdev))
+ if (power_actor_is_valid(params, instance))
total_weight += instance->weight;

/* Clean all buffers for new power estimations */
@@ -419,13 +421,10 @@ static int allocate_power(struct thermal_zone_device *tz, int control_temp)
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
struct power_actor *pa = &power[i];

- cdev = instance->cdev;
-
- if (instance->trip != params->trip_max)
+ if (!power_actor_is_valid(params, instance))
continue;

- if (!cdev_is_power_actor(cdev))
- continue;
+ cdev = instance->cdev;

ret = cdev->ops->get_requested_power(cdev, &pa->req_power);
if (ret)
@@ -459,10 +458,7 @@ static int allocate_power(struct thermal_zone_device *tz, int control_temp)
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
struct power_actor *pa = &power[i];

- if (instance->trip != params->trip_max)
- continue;
-
- if (!cdev_is_power_actor(instance->cdev))
+ if (!power_actor_is_valid(params, instance))
continue;

power_actor_set_power(instance->cdev, instance,
@@ -548,12 +544,11 @@ static void allow_maximum_power(struct thermal_zone_device *tz, bool update)
u32 req_power;

list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
- cdev = instance->cdev;
-
- if (instance->trip != params->trip_max ||
- !cdev_is_power_actor(instance->cdev))
+ if (!power_actor_is_valid(params, instance))
continue;

+ cdev = instance->cdev;
+
instance->target = 0;
mutex_lock(&cdev->lock);
/*
@@ -648,8 +643,7 @@ static void power_allocator_update_tz(struct thermal_zone_device *tz,
case THERMAL_TZ_BIND_CDEV:
case THERMAL_TZ_UNBIND_CDEV:
list_for_each_entry(instance, &tz->thermal_instances, tz_node)
- if ((instance->trip == params->trip_max) &&
- cdev_is_power_actor(instance->cdev))
+ if (power_actor_is_valid(params, instance))
num_actors++;

if (num_actors == params->num_actors)
--
2.25.1


2023-12-20 23:18:53

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 7/9] thermal/sysfs: Update instance->weight under tz lock

User space can change the weight of a thermal instance via sysfs while the
.throttle() callback is running for a governor, because weight_store()
does not use the zone lock.

The IPA governor uses instance weight values for power calculations and
caches the sum of them as total_weight, so it gets confused when one of
them changes while its .throttle() callback is running.

To prevent that from happening, use thermal zone locking in
weight_store().

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/thermal_sysfs.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index d8ff74a4338a..299c0fb16593 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -936,7 +936,11 @@ ssize_t weight_store(struct device *dev, struct device_attribute *attr,
return ret;

instance = container_of(attr, struct thermal_instance, weight_attr);
+
+ /* Don't race with governors using the 'weight' value */
+ mutex_lock(&instance->tz->lock);
instance->weight = weight;
+ mutex_unlock(&instance->tz->lock);

return count;
}
--
2.25.1


2023-12-20 23:19:10

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 8/9] thermal/sysfs: Update governors when the 'weight' has changed

Support governors update when the thermal instance's weight has changed.
This allows to adjust internal state for the governor.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/thermal_sysfs.c | 3 +++
include/linux/thermal.h | 1 +
2 files changed, 4 insertions(+)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 299c0fb16593..5abf6d136c24 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -940,6 +940,9 @@ ssize_t weight_store(struct device *dev, struct device_attribute *attr,
/* Don't race with governors using the 'weight' value */
mutex_lock(&instance->tz->lock);
instance->weight = weight;
+
+ thermal_governor_update_tz(instance->tz,
+ THERMAL_INSTANCE_WEIGHT_CHANGED);
mutex_unlock(&instance->tz->lock);

return count;
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 1b76878cea46..a759570de364 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -53,6 +53,7 @@ enum thermal_notify_event {
THERMAL_EVENT_KEEP_ALIVE, /* Request for user space handler to respond */
THERMAL_TZ_BIND_CDEV, /* Cooling dev is bind to the thermal zone */
THERMAL_TZ_UNBIND_CDEV, /* Cooling dev is unbind from the thermal zone */
+ THERMAL_INSTANCE_WEIGHT_CHANGED, /* Thermal instance weight changed */
};

/**
--
2.25.1


2023-12-20 23:19:25

by Lukasz Luba

[permalink] [raw]
Subject: [PATCH v3 9/9] thermal: gov_power_allocator: Support new update callback of weights

When the thermal instance's weight is updated from the sysfs the governor
update_tz() callback is triggered. Implement proper reaction to this
event in the IPA, which would save CPU cycles spent in throttle().
This will speed-up the main throttle() IPA function and clean it up
a bit.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/gov_power_allocator.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index b5ec60ae7efd..7b6aa265ff6a 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -77,6 +77,7 @@ struct power_actor {
* @trip_switch_on should be NULL.
* @trip_max: last passive trip point of the thermal zone. The
* temperature we are controlling for.
+ * @total_weight: Sum of all thermal instances weights
* @num_actors: number of cooling devices supporting IPA callbacks
* @buffer_size: internal buffer size, to avoid runtime re-calculation
* @power: buffer for all power actors internal power information
@@ -88,6 +89,7 @@ struct power_allocator_params {
u32 sustainable_power;
const struct thermal_trip *trip_switch_on;
const struct thermal_trip *trip_max;
+ int total_weight;
unsigned int num_actors;
unsigned int buffer_size;
struct power_actor *power;
@@ -405,16 +407,11 @@ static int allocate_power(struct thermal_zone_device *tz, int control_temp)
u32 total_granted_power = 0;
u32 total_req_power = 0;
u32 power_range, weight;
- int total_weight = 0;
int i = 0, ret;

if (!num_actors)
return -ENODEV;

- list_for_each_entry(instance, &tz->thermal_instances, tz_node)
- if (power_actor_is_valid(params, instance))
- total_weight += instance->weight;
-
/* Clean all buffers for new power estimations */
memset(power, 0, params->buffer_size);

@@ -430,7 +427,7 @@ static int allocate_power(struct thermal_zone_device *tz, int control_temp)
if (ret)
continue;

- if (!total_weight)
+ if (!params->total_weight)
weight = 1 << FRAC_BITS;
else
weight = instance->weight;
@@ -651,6 +648,12 @@ static void power_allocator_update_tz(struct thermal_zone_device *tz,

allocate_actors_buffer(params, num_actors);
break;
+ case THERMAL_INSTANCE_WEIGHT_CHANGED:
+ params->total_weight = 0;
+ list_for_each_entry(instance, &tz->thermal_instances, tz_node)
+ if (power_actor_is_valid(params, instance))
+ params->total_weight += instance->weight;
+ break;
default:
break;
}
--
2.25.1


2023-12-29 17:15:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] Add callback for cooling list update to speed-up IPA

On Thu, Dec 21, 2023 at 12:16 AM Lukasz Luba <[email protected]> wrote:
>
> Hi all,
>
> The patch set adds a new callback for thermal governors and implementation for
> Intelligent Power Allocator.
>
> The goal is to move some heavy operations like the memory allocations and heavy
> computations (multiplications) out of throttle() callback hot path.
>
> The new callback is generic enough to handle other important update events.
> It re-uses existing thermal_notify_event definitions.
>
> In addition there are some small clean-ups for IPA code.
>
> The patch set is based on current pm/bleeding-edge branch (20 Dec).
>
> changes:
> v3:
> - changed helper name to thermal_governor_update_tz() with also
> "reason" argument (Rafael)
> - added thermal_governor_update_tz() to thermal_core.h for use from sysfs
> functions
> - changed names of the events (THERMAL_TZ_BIND_CDEV) (Rafael)
> - patch 2/9 changed header and comment for function (Rafael)
> - patch 3/9: used unsigned types for num_actors, buffer_size (Rafael)
> - changed trace functions and added new patch 4/9 to be prepare tracing for
> different internal IPA array; it also drops dynamic array inside trace event
> - used new structure power_actor and changed the code in patch 5/9 (Rafael)
> - keept the local num_actors variable (Rafael)
> - patch 6/9 skipped redundant parens and changed the header desc. (Rafael)
> - patch 7/9 changed header and used instance->tz->lock (Rafael)
> - patch 8/9 removed handle_weight_update() and renamed new event to
> THERMAL_INSTANCE_WEIGHT_CHANGE (Rafael)
> - patch 9/9 aliged to use THERMAL_INSTANCE_WEIGHT_CHANGE is switch (Rafael)
>
> v2 can be found here [1]
>
> Regards,
> Lukasz
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Lukasz Luba (9):
> thermal: core: Add governor callback for thermal zone change
> thermal: gov_power_allocator: Refactor check_power_actors()
> thermal: gov_power_allocator: Refactor checks in divvy_up_power()
> thermal: gov_power_allocator: Change trace functions
> thermal: gov_power_allocator: Move memory allocation out of throttle()
> thermal: gov_power_allocator: Simplify checks for valid power actor
> thermal/sysfs: Update instance->weight under tz lock
> thermal/sysfs: Update governors when the 'weight' has changed
> thermal: gov_power_allocator: Support new update callback of weights
>
> drivers/thermal/gov_power_allocator.c | 269 ++++++++++++++++----------
> drivers/thermal/thermal_core.c | 14 ++
> drivers/thermal/thermal_core.h | 2 +
> drivers/thermal/thermal_sysfs.c | 7 +
> drivers/thermal/thermal_trace_ipa.h | 50 +++--
> include/linux/thermal.h | 7 +
> 6 files changed, 226 insertions(+), 123 deletions(-)
>
> --

All patches in the series applied as 6.8 material, with minor white
space adjustment in patch [8/9].

Thanks!

2024-01-02 09:38:29

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] Add callback for cooling list update to speed-up IPA



On 12/29/23 17:15, Rafael J. Wysocki wrote:
> On Thu, Dec 21, 2023 at 12:16 AM Lukasz Luba <[email protected]> wrote:
>>
>> Hi all,
>>
>> The patch set adds a new callback for thermal governors and implementation for
>> Intelligent Power Allocator.
>>
>> The goal is to move some heavy operations like the memory allocations and heavy
>> computations (multiplications) out of throttle() callback hot path.
>>
>> The new callback is generic enough to handle other important update events.
>> It re-uses existing thermal_notify_event definitions.
>>
>> In addition there are some small clean-ups for IPA code.
>>
>> The patch set is based on current pm/bleeding-edge branch (20 Dec).
>>
>> changes:
>> v3:
>> - changed helper name to thermal_governor_update_tz() with also
>> "reason" argument (Rafael)
>> - added thermal_governor_update_tz() to thermal_core.h for use from sysfs
>> functions
>> - changed names of the events (THERMAL_TZ_BIND_CDEV) (Rafael)
>> - patch 2/9 changed header and comment for function (Rafael)
>> - patch 3/9: used unsigned types for num_actors, buffer_size (Rafael)
>> - changed trace functions and added new patch 4/9 to be prepare tracing for
>> different internal IPA array; it also drops dynamic array inside trace event
>> - used new structure power_actor and changed the code in patch 5/9 (Rafael)
>> - keept the local num_actors variable (Rafael)
>> - patch 6/9 skipped redundant parens and changed the header desc. (Rafael)
>> - patch 7/9 changed header and used instance->tz->lock (Rafael)
>> - patch 8/9 removed handle_weight_update() and renamed new event to
>> THERMAL_INSTANCE_WEIGHT_CHANGE (Rafael)
>> - patch 9/9 aliged to use THERMAL_INSTANCE_WEIGHT_CHANGE is switch (Rafael)
>>
>> v2 can be found here [1]
>>
>> Regards,
>> Lukasz
>>
>> [1] https://lore.kernel.org/lkml/[email protected]/
>>
>> Lukasz Luba (9):
>> thermal: core: Add governor callback for thermal zone change
>> thermal: gov_power_allocator: Refactor check_power_actors()
>> thermal: gov_power_allocator: Refactor checks in divvy_up_power()
>> thermal: gov_power_allocator: Change trace functions
>> thermal: gov_power_allocator: Move memory allocation out of throttle()
>> thermal: gov_power_allocator: Simplify checks for valid power actor
>> thermal/sysfs: Update instance->weight under tz lock
>> thermal/sysfs: Update governors when the 'weight' has changed
>> thermal: gov_power_allocator: Support new update callback of weights
>>
>> drivers/thermal/gov_power_allocator.c | 269 ++++++++++++++++----------
>> drivers/thermal/thermal_core.c | 14 ++
>> drivers/thermal/thermal_core.h | 2 +
>> drivers/thermal/thermal_sysfs.c | 7 +
>> drivers/thermal/thermal_trace_ipa.h | 50 +++--
>> include/linux/thermal.h | 7 +
>> 6 files changed, 226 insertions(+), 123 deletions(-)
>>
>> --
>
> All patches in the series applied as 6.8 material, with minor white
> space adjustment in patch [8/9].
>
> Thanks!
>

Thank you Rafael!