2021-12-17 18:49:19

by Chetankumar Mistry

[permalink] [raw]
Subject: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic

Implement the Ziegler-Nichols Heuristic algorithm to better
estimate the PID Coefficients for a running platform.
The values are tuned to minimuse the amount of overshoot in
the temperature of the platform and subsequently minimise
the number of switches for cdev states.

Signed-off-by: Chetankumar Mistry <[email protected]>
---
Changelog v2:
- Updated Kernel-Docs to use ':' delimiter (asked by Randy Dunlap)
- Changed divide operation to use div_frac (requested by kernel_test_robot)

drivers/thermal/gov_power_allocator.c | 418 ++++++++++++++++++++++++++
drivers/thermal/thermal_sysfs.c | 2 +
include/linux/thermal.h | 7 +
3 files changed, 427 insertions(+)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 13e375751d22..b7e85ee8a673 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -49,6 +49,59 @@ static inline s64 div_frac(s64 x, s64 y)
return div_s64(x << FRAC_BITS, y);
}

+/**
+ * enum pivot_type - Values representing what type of pivot the current error
+ * value is
+ * @PEAK: The current error is a peak
+ * @TROUGH: The current error is a trough
+ * @MIDPOINT: The current error is neither a peak or trough and is some midpoint
+ * in between
+ */
+enum pivot_type { PEAK = 1, TROUGH = -1, MIDPOINT = 0 };
+
+/**
+ * enum ZN_VALUES - Values which the Ziegler-Nichols variable can take. This
+ * determines which set of PID Coefficients to use
+ * @ZN_ORIGINAL: Use the Original PID Coefficients when the thermal zone was
+ * initially bound
+ * @ZN_OFF: Use the current set of PID Coefficients
+ * @ZN_ON: Use Ziegler-Nichols to determine the best set of PID Coefficients
+ * @ZN_RESET: Reset the Ziegler-Nichols set of PID Coefficients so they can be
+ * found again
+ */
+enum ZN_VALUES { ZN_ORIGINAL = -1, ZN_OFF = 0, ZN_ON = 1, ZN_RESET = 2 };
+
+/**
+ * struct zn_coefficients - values used by the Ziegler-Nichols Heuristic to
+ * determine what the optimal PID coefficients are
+ * @zn_found: Determine whether we have found or are still searching for
+ * optimal PID coefficients
+ * @prev_err: Previous err logged
+ * @curr_err: Current err being processed
+ * @t_prev_peak: Timestamp for the previous "Peak"
+ * @period: Period of osciallation
+ * @k_ultimate: Value of k_P which produces stable oscillations
+ * @base_peak: Err value of the current peak
+ * @base_trough: Err value fo the current trough
+ * @oscillation_count: Number of stable oscillations we have observed
+ * @prev_pivot: Whether the previous pivot was a peak or trough
+ * @zn_state: Current Ziegler-Nichols state
+ *
+ */
+struct zn_coefficients {
+ bool zn_found;
+ s32 prev_err;
+ s32 curr_err;
+ u32 t_prev_peak;
+ u32 period;
+ u32 k_ultimate;
+
+ s32 base_peak;
+ s32 base_trough;
+ s32 oscillation_count;
+ enum pivot_type prev_pivot;
+};
+
/**
* struct power_allocator_params - parameters for the power allocator governor
* @allocated_tzp: whether we have allocated tzp for this thermal zone and
@@ -65,6 +118,8 @@ static inline s64 div_frac(s64 x, s64 y)
* controlling for.
* @sustainable_power: Sustainable power (heat) that this thermal zone can
* dissipate
+ * @zn_coeffs: Structure to hold information used by the Ziegler-Nichols
+ * heuristic
*/
struct power_allocator_params {
bool allocated_tzp;
@@ -73,6 +128,7 @@ struct power_allocator_params {
int trip_switch_on;
int trip_max_desired_temperature;
u32 sustainable_power;
+ struct zn_coefficients *zn_coeffs;
};

/**
@@ -85,6 +141,8 @@ struct power_allocator_params {
* can give some degree of functionality. For optimal performance of
* this governor, provide a sustainable_power in the thermal zone's
* thermal_zone_params.
+ *
+ * Return: the sustainable power for this thermal_zone
*/
static u32 estimate_sustainable_power(struct thermal_zone_device *tz)
{
@@ -171,6 +229,8 @@ static void estimate_pid_constants(struct thermal_zone_device *tz,
* on variables which might be updated by the user sysfs interface. If that
* happen the new value is going to be estimated and updated. It is also used
* after thermal zone binding, where the initial values where set to 0.
+ *
+ * Return: The sustainable power for this thermal_zone
*/
static u32 get_sustainable_power(struct thermal_zone_device *tz,
struct power_allocator_params *params,
@@ -196,6 +256,330 @@ static u32 get_sustainable_power(struct thermal_zone_device *tz,
return sustainable_power;
}

+/**
+ * set_original_pid_coefficients() - Reset PID Coefficients in the Thermal Zone
+ * to original values
+ * @tzp: Thermal Zone Parameters we want to update
+ *
+ */
+static inline void set_original_pid_coefficients(struct thermal_zone_params *tzp)
+{
+ static bool init = true;
+ static s32 k_po, k_pu, k_i, k_d, integral_cutoff;
+
+ if (init) {
+ k_po = tzp->k_po;
+ k_pu = tzp->k_pu;
+ k_i = tzp->k_i;
+ k_d = tzp->k_d;
+ integral_cutoff = tzp->integral_cutoff;
+ init = false;
+ } else {
+ tzp->k_po = k_po;
+ tzp->k_pu = k_pu;
+ tzp->k_i = k_i;
+ tzp->k_d = k_d;
+ tzp->integral_cutoff = integral_cutoff;
+ }
+}
+
+/**
+ * set_zn_pid_coefficients() - Calculate and set PID Coefficients based
+ * on Ziegler-Nichols Heuristic
+ * @tzp: thermal zone params to set
+ * @period: time taken for error to cycle 1 period
+ * @k_ultimate: the Ultimate Proportional Gain value at which
+ * the error oscillates around the set-point
+ *
+ * This function sets the PID Coefficients of the thermal device
+ */
+static inline void set_zn_pid_coefficients(struct thermal_zone_params *tzp,
+ u32 period, s32 k_ultimate)
+{
+ /* Convert time in ms for 1 cycle to cycles/s */
+ s32 freq = 1000 / period;
+
+ /* Make k_pu and k_po identical so it represents k_p */
+ tzp->k_pu = k_ultimate * 1 / 10;
+ tzp->k_po = tzp->k_pu;
+
+ tzp->k_i = freq / 2;
+ /* We want an integral term so if the value is 0, set it to 1 */
+ tzp->k_i = tzp->k_i > 0 ? tzp->k_i : 1;
+
+ tzp->k_d = (33 * freq) / 100;
+ /* We want an integral term so if the value is 0, set it to 1 */
+ tzp->k_d = tzp->k_d > 0 ? tzp->k_d : 1;
+}
+
+/**
+ * is_error_acceptable() - Check whether the error determined to be a pivot
+ * point is within the acceptable range
+ * @err: error value we are checking
+ * @base: the base_line value we are comparing against
+ *
+ * This function is used to determine whether our current pivot point is within
+ * the acceptable limits. The value of base is the first pivot point within
+ * this series of oscillations
+ *
+ * Return: boolean representing whether or not the error was within the acceptable
+ * range
+ */
+static inline bool is_error_acceptable(s32 err, s32 base)
+{
+ /* Margin for error in milli-celcius */
+ const s32 MARGIN = 500;
+ s32 lower = abs(base) - MARGIN;
+ s32 upper = abs(base) + MARGIN;
+
+ if (lower < abs(err) && abs(err) < upper)
+ return true;
+ return false;
+}
+
+/**
+ * is_error_pivot() - Determine whether an error value is a pivot based on the
+ * previous and next error values
+ * @next_err: the next error in a series
+ * @curr_err: the current error value we are checking
+ * @prev_err: the previous error in a series
+ * @peak_trough: integer value to output what kind of pivot (if any)
+ * the error value is
+ *
+ * Determine whether or not the current value of error is a pivot and if it is
+ * a pivot, which type of pivot it is (peak or trough).
+ *
+ * Return: Bool representing whether the current value is a pivot point and
+ * integer set to PEAK, TROUGH or MIDPOINT
+ */
+static inline bool is_error_pivot(s32 next_err, s32 curr_err, s32 prev_err,
+ enum pivot_type *peak_trough)
+{
+ /*
+ * Check whether curr_err is at it's highest value compared to its neighbours and that error
+ * value is positive
+ */
+ if (prev_err < curr_err && curr_err > next_err && curr_err > 0) {
+ *peak_trough = PEAK;
+ return true;
+ }
+ /*
+ * Check whether curr_err is at it's lowest value compared to its neighbours and that error
+ * value is negative
+ */
+ if (prev_err > curr_err && curr_err < next_err && curr_err < 0) {
+ *peak_trough = TROUGH;
+ return true;
+ }
+ /* If the error is not a pivot then it must be somewhere between pivots */
+ *peak_trough = MIDPOINT;
+ return false;
+}
+
+/**
+ * update_oscillation_count() - Update the Oscillation Count for this set of pivots
+ * @curr_err: the current error value we are checking
+ * @base_pivot: the amplitude we are comparing against
+ * @peak_trough: the type of pivot we are currently processing
+ * @zn_coeffs: the data structure holding information used by the Ziegler-Nichols Hueristic
+ *
+ * Update the number of times we have oscillated based on our current error value being within the
+ * accepted range from the amplitude of previous pivots in this oscillation series.
+ *
+ * Return: Integer count of the number of oscillations
+ */
+static inline s32 update_oscillation_count(s32 curr_err, s32 *base_pivot,
+ enum pivot_type peak_trough,
+ struct zn_coefficients *zn_coeffs)
+{
+ if (is_error_acceptable(curr_err, *base_pivot) &&
+ zn_coeffs->prev_pivot == -peak_trough) {
+ zn_coeffs->oscillation_count++;
+ } else {
+ zn_coeffs->oscillation_count = 0;
+ *base_pivot = curr_err;
+ }
+ zn_coeffs->prev_pivot = peak_trough;
+ return zn_coeffs->oscillation_count;
+}
+
+/**
+ * get_oscillation_count() - Update and get the number of times we have oscillated
+ * @curr_err: the current error value we are checking
+ * @peak_trough: the type of pivot we are currently processing
+ * @zn_coeffs: the data structure holding information used by the
+ * Ziegler-Nichols Hueristic
+ *
+ * Return: The number of times we have oscillated for this k_ultimate
+ */
+static inline s32 get_oscillation_count(s32 curr_err,
+ enum pivot_type peak_trough,
+ struct zn_coefficients *zn_coeffs)
+{
+ s32 *base_pivot = 0;
+
+ if (peak_trough == PEAK)
+ base_pivot = &zn_coeffs->base_peak;
+ else if (peak_trough == TROUGH)
+ base_pivot = &zn_coeffs->base_trough;
+
+ return update_oscillation_count(curr_err, base_pivot, peak_trough,
+ zn_coeffs);
+}
+
+/**
+ * get_zn_state() - Update and get the current Ziegler-Nichols State
+ * @tzp: The thermal zone params to check to determine the current state
+ * @zn_state: The current state which should be returned if no changes are
+ * made
+ *
+ * Return: The next zieger-nichols state for this pass of the PID controller
+ */
+static inline int get_zn_state(struct thermal_zone_params *tzp, int zn_state)
+{
+ if (tzp->k_po == ZN_RESET && tzp->k_pu == ZN_RESET)
+ return ZN_RESET;
+
+ if (tzp->k_po == ZN_ORIGINAL && tzp->k_pu == ZN_ORIGINAL)
+ return ZN_ORIGINAL;
+
+ if (tzp->k_po == ZN_ON && tzp->k_pu == ZN_ON)
+ return ZN_ON;
+
+ return zn_state;
+}
+
+/**
+ * is_temperature_safe() - Check if the current temperature is within 10% of
+ * the target
+ *
+ * @current_temperature: Current reported temperature
+ * @control_temp: Control Temperature we are targeting
+ *
+ * Return: True if current temperature is within 10% of the target, False otherwise
+ */
+static inline bool is_temperature_safe(int current_temperature,
+ int control_temp)
+{
+ return (current_temperature - control_temp) < (control_temp / 10) ?
+ true :
+ false;
+}
+
+/**
+ * reset_ziegler_nichols() - Reset the Values used to Track Ziegler-Nichols
+ *
+ * @zn_coeffs: the data structure holding information used by the Ziegler-Nichols Hueristic
+ *
+ */
+static inline void reset_ziegler_nichols(struct zn_coefficients *zn_coeffs)
+{
+ zn_coeffs->zn_found = false;
+ zn_coeffs->k_ultimate = 10;
+ zn_coeffs->prev_err = 0;
+ zn_coeffs->curr_err = 0;
+ zn_coeffs->t_prev_peak = 0;
+ zn_coeffs->period = 0;
+ /* Manually input INT_MAX as a previous value so the system cannot use it accidentally */
+ zn_coeffs->oscillation_count = update_oscillation_count(
+ INT_MAX, &zn_coeffs->curr_err, PEAK, zn_coeffs);
+}
+
+/**
+ * ziegler_nichols() - Calculate the k_ultimate and period for the thermal device
+ * and use these values to calculate and set the PID coefficients based on
+ * the Ziegler-Nichols Heuristic
+ * @tz: The thermal device we are operating on
+ * @next_err: The next error value to be used for calculations
+ * @control_temp: The temperature we are trying to target
+ *
+ * The Ziegler-Nichols PID Coefficient Tuning Method works by determining a K_Ultimate value. This
+ * is the largest K_P which yields a stable set of oscillations in error. By using historic and
+ * current values of error, this function attempts to determine whether or not it is oscillating,
+ * and increment the value of K_Ultimate accordingly. Once it has determined that the system is
+ * oscillating, it calculates the time between "peaks" to determine its period
+ *
+ */
+static inline void ziegler_nichols(struct thermal_zone_device *tz, s32 next_err,
+ int control_temp)
+{
+ struct power_allocator_params *params = tz->governor_data;
+ struct zn_coefficients *zn_coeffs = params->zn_coeffs;
+ const int NUMBER_OF_OSCILLATIONS = 10;
+
+ u32 t_now = (u32)div_frac(ktime_get_real_ns(), 1000000);
+ enum pivot_type peak_trough = MIDPOINT;
+ s32 oscillation_count = 0;
+ bool is_pivot;
+ bool is_safe =
+ is_temperature_safe((control_temp - next_err), control_temp);
+
+ if (tz->tzp->ziegler_nichols == ZN_RESET) {
+ reset_ziegler_nichols(zn_coeffs);
+ tz->tzp->ziegler_nichols = ZN_ON;
+ }
+
+ /* Override default PID Coefficients. These will be updated later according to the
+ * Heuristic
+ */
+ tz->tzp->k_po = zn_coeffs->k_ultimate;
+ tz->tzp->k_pu = zn_coeffs->k_ultimate;
+ tz->tzp->k_i = 0;
+ tz->tzp->k_d = 0;
+
+ if (!zn_coeffs->zn_found) {
+ /* Make sure that the previous errors have been logged and this isn't executed on
+ * first pass
+ */
+ if (zn_coeffs->curr_err != zn_coeffs->prev_err &&
+ zn_coeffs->prev_err != 0) {
+ if (!is_safe)
+ goto set_zn;
+ is_pivot = is_error_pivot(next_err, zn_coeffs->curr_err,
+ zn_coeffs->prev_err,
+ &peak_trough);
+ if (is_pivot) {
+ oscillation_count = get_oscillation_count(
+ zn_coeffs->curr_err, peak_trough,
+ zn_coeffs);
+ if (oscillation_count >=
+ NUMBER_OF_OSCILLATIONS) {
+ goto set_zn;
+ }
+ if (peak_trough == PEAK)
+ zn_coeffs->t_prev_peak = t_now;
+ }
+ if (!is_pivot || !oscillation_count)
+ zn_coeffs->k_ultimate += 10;
+ }
+ goto update_errors;
+ } else {
+ set_zn_pid_coefficients(tz->tzp, zn_coeffs->period,
+ zn_coeffs->k_ultimate);
+ tz->tzp->ziegler_nichols = ZN_OFF;
+ }
+ return;
+
+update_errors:
+ zn_coeffs->prev_err = zn_coeffs->curr_err;
+ zn_coeffs->curr_err = next_err;
+ return;
+
+set_zn:
+ if (zn_coeffs->t_prev_peak) {
+ zn_coeffs->zn_found = true;
+ zn_coeffs->period = abs(t_now - zn_coeffs->t_prev_peak);
+ set_zn_pid_coefficients(tz->tzp, zn_coeffs->period,
+ zn_coeffs->k_ultimate);
+ ((struct power_allocator_params *)tz->governor_data)
+ ->err_integral = 0;
+ tz->tzp->ziegler_nichols = ZN_OFF;
+ } else {
+ if (peak_trough == PEAK)
+ zn_coeffs->t_prev_peak = t_now;
+ }
+}
+
/**
* pid_controller() - PID controller
* @tz: thermal zone we are operating in
@@ -228,6 +612,26 @@ static u32 pid_controller(struct thermal_zone_device *tz,
sustainable_power = get_sustainable_power(tz, params, control_temp);

err = control_temp - tz->temperature;
+
+ switch (tz->tzp->ziegler_nichols) {
+ case ZN_ORIGINAL: {
+ set_original_pid_coefficients(tz->tzp);
+ tz->tzp->ziegler_nichols = ZN_OFF;
+ break;
+ }
+ case ZN_RESET: {
+ ziegler_nichols(tz, err, control_temp);
+ tz->tzp->ziegler_nichols = ZN_ON;
+ break;
+ }
+ case ZN_ON: {
+ ziegler_nichols(tz, err, control_temp);
+ break;
+ }
+ default:
+ break;
+ }
+
err = int_to_frac(err);

/* Calculate the proportional term */
@@ -375,6 +779,7 @@ static void divvy_up_power(u32 *req_power, u32 *max_power, int num_actors,
if (capped_extra_power > 0)
for (i = 0; i < num_actors; i++) {
u64 extra_range = (u64)extra_actor_power[i] * extra_power;
+
granted_power[i] += DIV_ROUND_CLOSEST_ULL(extra_range,
capped_extra_power);
}
@@ -644,6 +1049,7 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
int ret;
struct power_allocator_params *params;
int control_temp;
+ struct zn_coefficients *zn_coeffs;

ret = check_power_actors(tz);
if (ret)
@@ -653,6 +1059,12 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
if (!params)
return -ENOMEM;

+ zn_coeffs = kzalloc(sizeof(*zn_coeffs), GFP_KERNEL);
+ if (!zn_coeffs)
+ return -ENOMEM;
+
+ params->zn_coeffs = zn_coeffs;
+
if (!tz->tzp) {
tz->tzp = kzalloc(sizeof(*tz->tzp), GFP_KERNEL);
if (!tz->tzp) {
@@ -676,6 +1088,8 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
estimate_pid_constants(tz, tz->tzp->sustainable_power,
params->trip_switch_on,
control_temp);
+ /* Store the original PID coefficient values */
+ set_original_pid_coefficients(tz->tzp);
}

reset_pid_controller(params);
@@ -696,6 +1110,9 @@ static void power_allocator_unbind(struct thermal_zone_device *tz)

dev_dbg(&tz->device, "Unbinding from thermal zone %d\n", tz->id);

+ kfree(params->zn_coeffs);
+ params->zn_coeffs = NULL;
+
if (params->allocated_tzp) {
kfree(tz->tzp);
tz->tzp = NULL;
@@ -749,4 +1166,5 @@ static struct thermal_governor thermal_gov_power_allocator = {
.unbind_from_tz = power_allocator_unbind,
.throttle = power_allocator_throttle,
};
+
THERMAL_GOVERNOR_DECLARE(thermal_gov_power_allocator);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index f154bada2906..d2f410a33995 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -342,6 +342,7 @@ create_s32_tzp_attr(k_po);
create_s32_tzp_attr(k_pu);
create_s32_tzp_attr(k_i);
create_s32_tzp_attr(k_d);
+create_s32_tzp_attr(ziegler_nichols);
create_s32_tzp_attr(integral_cutoff);
create_s32_tzp_attr(slope);
create_s32_tzp_attr(offset);
@@ -375,6 +376,7 @@ static struct attribute *thermal_zone_dev_attrs[] = {
&dev_attr_k_pu.attr,
&dev_attr_k_i.attr,
&dev_attr_k_d.attr,
+ &dev_attr_ziegler_nichols.attr,
&dev_attr_integral_cutoff.attr,
&dev_attr_slope.attr,
&dev_attr_offset.attr,
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index c314893970b3..ed8cd6a826ed 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -282,6 +282,13 @@ struct thermal_zone_params {
* Used by thermal zone drivers (default 0).
*/
int offset;
+
+ /*
+ * Ziegler-Nichols estimation setting. Allows the user to decide
+ * whether to use original PID coefficients or calculate using
+ * the Ziegler-Nichols algorithm
+ */
+ s32 ziegler_nichols;
};

/**
--
2.25.1



2021-12-17 18:49:21

by Chetankumar Mistry

[permalink] [raw]
Subject: [PATCH v2][RFC 2/2] Remove sysfs entry

The previous patch introduced a sysfs entry to track the
Ziegler-Nichols state.
This patch will remove it, if it is unwanted.

Signed-off-by: Chetankumar Mistry <[email protected]>
---
drivers/thermal/gov_power_allocator.c | 44 +++++++++++++--------------
drivers/thermal/thermal_sysfs.c | 2 --
include/linux/thermal.h | 7 -----
3 files changed, 21 insertions(+), 32 deletions(-)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index b7e85ee8a673..c0be03d0d161 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -100,6 +100,8 @@ struct zn_coefficients {
s32 base_trough;
s32 oscillation_count;
enum pivot_type prev_pivot;
+
+ int zn_state;
};

/**
@@ -431,7 +433,7 @@ static inline s32 get_oscillation_count(s32 curr_err,
* get_zn_state() - Update and get the current Ziegler-Nichols State
* @tzp: The thermal zone params to check to determine the current state
* @zn_state: The current state which should be returned if no changes are
- * made
+ * made
*
* Return: The next zieger-nichols state for this pass of the PID controller
*/
@@ -514,9 +516,21 @@ static inline void ziegler_nichols(struct thermal_zone_device *tz, s32 next_err,
bool is_safe =
is_temperature_safe((control_temp - next_err), control_temp);

- if (tz->tzp->ziegler_nichols == ZN_RESET) {
+ zn_coeffs->zn_state = get_zn_state(tz->tzp, zn_coeffs->zn_state);
+ switch (zn_coeffs->zn_state) {
+ case ZN_ORIGINAL: {
+ set_original_pid_coefficients(tz->tzp);
+ zn_coeffs->zn_state = ZN_OFF;
+ return;
+ }
+ case ZN_RESET: {
reset_ziegler_nichols(zn_coeffs);
- tz->tzp->ziegler_nichols = ZN_ON;
+ zn_coeffs->zn_state = ZN_ON;
+ break;
+ }
+
+ case ZN_OFF:
+ return;
}

/* Override default PID Coefficients. These will be updated later according to the
@@ -556,7 +570,7 @@ static inline void ziegler_nichols(struct thermal_zone_device *tz, s32 next_err,
} else {
set_zn_pid_coefficients(tz->tzp, zn_coeffs->period,
zn_coeffs->k_ultimate);
- tz->tzp->ziegler_nichols = ZN_OFF;
+ zn_coeffs->zn_state = ZN_OFF;
}
return;

@@ -573,7 +587,7 @@ static inline void ziegler_nichols(struct thermal_zone_device *tz, s32 next_err,
zn_coeffs->k_ultimate);
((struct power_allocator_params *)tz->governor_data)
->err_integral = 0;
- tz->tzp->ziegler_nichols = ZN_OFF;
+ zn_coeffs->zn_state = ZN_OFF;
} else {
if (peak_trough == PEAK)
zn_coeffs->t_prev_peak = t_now;
@@ -613,24 +627,7 @@ static u32 pid_controller(struct thermal_zone_device *tz,

err = control_temp - tz->temperature;

- switch (tz->tzp->ziegler_nichols) {
- case ZN_ORIGINAL: {
- set_original_pid_coefficients(tz->tzp);
- tz->tzp->ziegler_nichols = ZN_OFF;
- break;
- }
- case ZN_RESET: {
- ziegler_nichols(tz, err, control_temp);
- tz->tzp->ziegler_nichols = ZN_ON;
- break;
- }
- case ZN_ON: {
- ziegler_nichols(tz, err, control_temp);
- break;
- }
- default:
- break;
- }
+ ziegler_nichols(tz, err, control_temp);

err = int_to_frac(err);

@@ -1064,6 +1061,7 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
return -ENOMEM;

params->zn_coeffs = zn_coeffs;
+ zn_coeffs->zn_state = ZN_ON;

if (!tz->tzp) {
tz->tzp = kzalloc(sizeof(*tz->tzp), GFP_KERNEL);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index d2f410a33995..f154bada2906 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -342,7 +342,6 @@ create_s32_tzp_attr(k_po);
create_s32_tzp_attr(k_pu);
create_s32_tzp_attr(k_i);
create_s32_tzp_attr(k_d);
-create_s32_tzp_attr(ziegler_nichols);
create_s32_tzp_attr(integral_cutoff);
create_s32_tzp_attr(slope);
create_s32_tzp_attr(offset);
@@ -376,7 +375,6 @@ static struct attribute *thermal_zone_dev_attrs[] = {
&dev_attr_k_pu.attr,
&dev_attr_k_i.attr,
&dev_attr_k_d.attr,
- &dev_attr_ziegler_nichols.attr,
&dev_attr_integral_cutoff.attr,
&dev_attr_slope.attr,
&dev_attr_offset.attr,
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index ed8cd6a826ed..c314893970b3 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -282,13 +282,6 @@ struct thermal_zone_params {
* Used by thermal zone drivers (default 0).
*/
int offset;
-
- /*
- * Ziegler-Nichols estimation setting. Allows the user to decide
- * whether to use original PID coefficients or calculate using
- * the Ziegler-Nichols algorithm
- */
- s32 ziegler_nichols;
};

/**
--
2.25.1


2021-12-18 06:07:32

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic



On 12/17/21 10:49, Chetankumar Mistry wrote:
> Implement the Ziegler-Nichols Heuristic algorithm to better
> estimate the PID Coefficients for a running platform.
> The values are tuned to minimuse the amount of overshoot in
> the temperature of the platform and subsequently minimise
> the number of switches for cdev states.
>
> Signed-off-by: Chetankumar Mistry <[email protected]>

The kernel-doc changes all look good and don't cause any warnings.
Thanks.

> ---
> Changelog v2:
> - Updated Kernel-Docs to use ':' delimiter (asked by Randy Dunlap)
> - Changed divide operation to use div_frac (requested by kernel_test_robot)
>
> drivers/thermal/gov_power_allocator.c | 418 ++++++++++++++++++++++++++
> drivers/thermal/thermal_sysfs.c | 2 +
> include/linux/thermal.h | 7 +
> 3 files changed, 427 insertions(+)

--
~Randy

2021-12-20 09:59:02

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic

Hi Randy,

Chetan is on holidays and I'll respond to feedback.

On 12/18/21 6:07 AM, Randy Dunlap wrote:
>
>
> On 12/17/21 10:49, Chetankumar Mistry wrote:
>> Implement the Ziegler-Nichols Heuristic algorithm to better
>> estimate the PID Coefficients for a running platform.
>> The values are tuned to minimuse the amount of overshoot in
>> the temperature of the platform and subsequently minimise
>> the number of switches for cdev states.
>>
>> Signed-off-by: Chetankumar Mistry <[email protected]>
>
> The kernel-doc changes all look good and don't cause any warnings.
> Thanks.

Thank you for the testing it.

Regards,
Lukasz

2022-01-06 11:54:12

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic

Hi Daniel,

Could you have a look at this, please?


On 12/17/21 6:49 PM, Chetankumar Mistry wrote:
> Implement the Ziegler-Nichols Heuristic algorithm to better
> estimate the PID Coefficients for a running platform.
> The values are tuned to minimuse the amount of overshoot in
> the temperature of the platform and subsequently minimise
> the number of switches for cdev states.
>
> Signed-off-by: Chetankumar Mistry <[email protected]>


This is the continuation of the previous idea to have
better k_* values. You might remember this conversation [1].

I've spent some time researching papers how and what can be done
in this field and if possible to plumb in to the kernel.
We had internal discussions (~2017) of one method fuzzy-logic that I
found back then, but died at the begging not fitting into this
IPA kernel specific environment and user-space. Your suggestion with
observing undershooting and overshooting results sparked better idea.
I thought it's worth to invest in it but I didn't have
time. We are lucky, Chetan was designated to help me and
experiment/implement/test these ideas and here is the patch set.

He's chosen the Ziegler-Nichols method, which shows really
good results in benchmarks (Geekbench and GFXbench on hikey960 Android).
The improved performance in Geekbench is ~10% (vs. old IPA).

The main question from our side is the sysfs interface
which we could be used to trigger this algorithm for
better coefficients estimations.
We ask user to echo to some sysfs files in thermal zone
and start his/her workload. This new IPA 'learns' the system
utilization and reaction in temperature. After a few rounds,
we get better fitted coefficients.
If you need more background about the code or mechanisms, or tests,
I'm sure Chetan is happy to provide you those.

If you are interested in those analyses we can find a way to share a
.html file with the results from LISA notebook.

We are waiting for your decision regarding the design and user
interface.

Regards,
Lukasz

[1]
https://lore.kernel.org/lkml/[email protected]/

2022-01-06 12:16:37

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic


Hi Lukasz,

On 06/01/2022 12:54, Lukasz Luba wrote:
> Hi Daniel,
>
> Could you have a look at this, please?

Yes, I had a quick look at the code and went to the algorithm description.

Still digesting ...

> On 12/17/21 6:49 PM, Chetankumar Mistry wrote:
>> Implement the Ziegler-Nichols Heuristic algorithm to better
>> estimate the PID Coefficients for a running platform.
>> The values are tuned to minimuse the amount of overshoot in
>> the temperature of the platform and subsequently minimise
>> the number of switches for cdev states.
>>
>> Signed-off-by: Chetankumar Mistry <[email protected]>
>
>
> This is the continuation of the previous idea to have
> better k_* values. You might remember this conversation [1].
>
> I've spent some time researching papers how and what can be done
> in this field and if possible to plumb in to the kernel.
> We had internal discussions (~2017) of one method fuzzy-logic that I
> found back then, but died at the begging not fitting into this
> IPA kernel specific environment and user-space. Your suggestion with
> observing undershooting and overshooting results sparked better idea.
> I thought it's worth to invest in it but I didn't have
> time. We are lucky, Chetan was designated to help me and
> experiment/implement/test these ideas and here is the patch set.
>
> He's chosen the Ziegler-Nichols method, which shows really
> good results in benchmarks (Geekbench and GFXbench on hikey960 Android).
> The improved performance in Geekbench is ~10% (vs. old IPA).

+10% perf improvements sounds great. What about the temperature
mitigation (temp avg + stddev) ?

> The main question from our side is the sysfs interface
> which we could be used to trigger this algorithm for
> better coefficients estimations.
> We ask user to echo to some sysfs files in thermal zone
> and start his/her workload. This new IPA 'learns' the system
> utilization and reaction in temperature. After a few rounds,
> we get better fitted coefficients.
> If you need more background about the code or mechanisms, or tests,
> I'm sure Chetan is happy to provide you those.

I'm worried about the complexity of the algorithm and the overhead implied.

The k_* factors are tied with the system and the thermal setup (fan,
heatsink, processor, opp, ...). So IIUC when the factors are found, they
should not change and could be part of the system setup.

Would the algorithm fit better in a separate userspace kernel tooling?
So we can run it once and find the k_* for a board.

Additionally, the values can be stored in the Documentation for
different board and a documentation on how to use the tool.

Then up to the SoC vendor to setup the k_* in sysfs, so no need to
change any interface.

> If you are interested in those analyses we can find a way to share a> .html file with the results from LISA notebook.

Yes,

> We are waiting for your decision regarding the design and user
> interface.
>
> Regards,
> Lukasz
>
> [1]
> https://lore.kernel.org/lkml/[email protected]/


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2022-01-06 13:16:41

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic


Thank you for fast response!

On 1/6/22 12:16 PM, Daniel Lezcano wrote:
>
> Hi Lukasz,
>
> On 06/01/2022 12:54, Lukasz Luba wrote:
>> Hi Daniel,
>>
>> Could you have a look at this, please?
>
> Yes, I had a quick look at the code and went to the algorithm description.
>
> Still digesting ...
>
>> On 12/17/21 6:49 PM, Chetankumar Mistry wrote:
>>> Implement the Ziegler-Nichols Heuristic algorithm to better
>>> estimate the PID Coefficients for a running platform.
>>> The values are tuned to minimuse the amount of overshoot in
>>> the temperature of the platform and subsequently minimise
>>> the number of switches for cdev states.
>>>
>>> Signed-off-by: Chetankumar Mistry <[email protected]>
>>
>>
>> This is the continuation of the previous idea to have
>> better k_* values. You might remember this conversation [1].
>>
>> I've spent some time researching papers how and what can be done
>> in this field and if possible to plumb in to the kernel.
>> We had internal discussions (~2017) of one method fuzzy-logic that I
>> found back then, but died at the begging not fitting into this
>> IPA kernel specific environment and user-space. Your suggestion with
>> observing undershooting and overshooting results sparked better idea.
>> I thought it's worth to invest in it but I didn't have
>> time. We are lucky, Chetan was designated to help me and
>> experiment/implement/test these ideas and here is the patch set.
>>
>> He's chosen the Ziegler-Nichols method, which shows really
>> good results in benchmarks (Geekbench and GFXbench on hikey960 Android).
>> The improved performance in Geekbench is ~10% (vs. old IPA).
>
> +10% perf improvements sounds great. What about the temperature
> mitigation (temp avg + stddev) ?

Chetan would respond about that with the link to the .html file.
We just have to create an official public server space for it.
I hope till Monday evening we would get something.

>
>> The main question from our side is the sysfs interface
>> which we could be used to trigger this algorithm for
>> better coefficients estimations.
>> We ask user to echo to some sysfs files in thermal zone
>> and start his/her workload. This new IPA 'learns' the system
>> utilization and reaction in temperature. After a few rounds,
>> we get better fitted coefficients.
>> If you need more background about the code or mechanisms, or tests,
>> I'm sure Chetan is happy to provide you those.
>
> I'm worried about the complexity of the algorithm and the overhead implied.
>
> The k_* factors are tied with the system and the thermal setup (fan,
> heatsink, processor, opp, ...). So IIUC when the factors are found, they
> should not change and could be part of the system setup.

True, they are found and will be fixed for that board.

>
> Would the algorithm fit better in a separate userspace kernel tooling?
> So we can run it once and find the k_* for a board.

We wanted to be part of IPA in kernel because:
- the logic needs access to internals of IPA
- it would be easy accessible for all distros out-of-box
- no additional maintenance and keeping in sync two codes, especially
those in some packages for user-space

>
> Additionally, the values can be stored in the Documentation for
> different board and a documentation on how to use the tool.
>
> Then up to the SoC vendor to setup the k_* in sysfs, so no need to
> change any interface.

It wouldn't be for SoC vendor, but up to the OEM or board designer,
because the same SoC might have different thermal headroom thanks
to better cooling or bigger PCB, etc.

I agree that these optimized k_* values might be shared in the kernel.
Ideally I would see them in the board's DT file, in the thermal zone,
but I'm afraid they are not 'Device description' so don't fit into DT
scope. They are rather optimizations of pure kernel mechanism.

where would be a good place for it? Maybe a new IPA Documentation/
sub-directory?

>
>> If you are interested in those analyses we can find a way to share a> .html file with the results from LISA notebook.
>
> Yes,

Sure thing, let me arrange that.

2022-01-07 11:52:50

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic

On 06/01/2022 14:16, Lukasz Luba wrote:
>
> Thank you for fast response!
>
> On 1/6/22 12:16 PM, Daniel Lezcano wrote:
>>
>> Hi Lukasz,
>>
>> On 06/01/2022 12:54, Lukasz Luba wrote:
>>> Hi Daniel,
>>>
>>> Could you have a look at this, please?
>>
>> Yes, I had a quick look at the code and went to the algorithm
>> description.
>>
>> Still digesting ...
>>
>>> On 12/17/21 6:49 PM, Chetankumar Mistry wrote:
>>>> Implement the Ziegler-Nichols Heuristic algorithm to better
>>>> estimate the PID Coefficients for a running platform.
>>>> The values are tuned to minimuse the amount of overshoot in
>>>> the temperature of the platform and subsequently minimise
>>>> the number of switches for cdev states.
>>>>
>>>> Signed-off-by: Chetankumar Mistry <[email protected]>
>>>
>>>
>>> This is the continuation of the previous idea to have
>>> better k_* values. You might remember this conversation [1].
>>>
>>> I've spent some time researching papers how and what can be done
>>> in this field and if possible to plumb in to the kernel.
>>> We had internal discussions (~2017) of one method fuzzy-logic that I
>>> found back then, but died at the begging not fitting into this
>>> IPA kernel specific environment and user-space. Your suggestion with
>>> observing undershooting and overshooting results sparked better idea.
>>> I thought it's worth to invest in it but I didn't have
>>> time. We are lucky, Chetan was designated to help me and
>>> experiment/implement/test these ideas and here is the patch set.
>>>
>>> He's chosen the Ziegler-Nichols method, which shows really
>>> good results in benchmarks (Geekbench and GFXbench on hikey960 Android).
>>> The improved performance in Geekbench is ~10% (vs. old IPA).
>>
>> +10% perf improvements sounds great. What about the temperature
>> mitigation (temp avg + stddev) ?
>
> Chetan would respond about that with the link to the .html file.
> We just have to create an official public server space for it.
> I hope till Monday evening we would get something.
>
>>
>>> The main question from our side is the sysfs interface
>>> which we could be used to trigger this algorithm for
>>> better coefficients estimations.
>>> We ask user to echo to some sysfs files in thermal zone
>>> and start his/her workload. This new IPA 'learns' the system
>>> utilization and reaction in temperature. After a few rounds,
>>> we get better fitted coefficients.
>>> If you need more background about the code or mechanisms, or tests,
>>> I'm sure Chetan is happy to provide you those.
>>
>> I'm worried about the complexity of the algorithm and the overhead
>> implied.
>>
>> The k_* factors are tied with the system and the thermal setup (fan,
>> heatsink, processor, opp, ...). So IIUC when the factors are found, they
>> should not change and could be part of the system setup.
>
> True, they are found and will be fixed for that board.
>
>>
>> Would the algorithm fit better in a separate userspace kernel tooling?
>> So we can run it once and find the k_* for a board.
>
> We wanted to be part of IPA in kernel because:
> - the logic needs access to internals of IPA
> - it would be easy accessible for all distros out-of-box
> - no additional maintenance and keeping in sync two codes, especially
>   those in some packages for user-space

Sorry, I'm not convinced :/

AFAICT, the temperature and the sampling rate should be enough
information to find out the k_*

IMO, an userspace tool in ./tools/thermal/ipa is the right place

So if you give the tooling to the SoC vendor via the thermal ones, with
a file containing the temp + timestamp, they should be able to find the
k_* and setup their boards.

Actually my opinion is the kernel should not handle everything and the
SoC vendor should at least do some work to setup their system. If they
are able to find out the sustainable power, they can do the same for the
right coefficients.

>> Additionally, the values can be stored in the Documentation for
>> different board and a documentation on how to use the tool.
>>
>> Then up to the SoC vendor to setup the k_* in sysfs, so no need to
>> change any interface.
>
> It wouldn't be for SoC vendor, but up to the OEM or board designer,
> because the same SoC might have different thermal headroom thanks
> to better cooling or bigger PCB, etc.

Right, s/SoC vendor/SoC platform/

> I agree that these optimized k_* values might be shared in the kernel.
> Ideally I would see them in the board's DT file, in the thermal zone,
> but I'm afraid they are not 'Device description' so don't fit into DT
> scope. They are rather optimizations of pure kernel mechanism.
>
> where would be a good place for it? Maybe a new IPA Documentation/
> sub-directory?

You can improve the documentation in:

Documentation/driver-api/thermal/power_allocator.rst

And if we agree on a tools/thermal/ipa, the documentation with examples
and some SoC reference can be put there also

>>> If you are interested in those analyses we can find a way to share a>
>>> .html file with the results from LISA notebook.
>>
>> Yes,
>
> Sure thing, let me arrange that.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2022-01-07 13:39:52

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic



On 1/7/22 11:52 AM, Daniel Lezcano wrote:
> On 06/01/2022 14:16, Lukasz Luba wrote:
>>
>> Thank you for fast response!
>>
>> On 1/6/22 12:16 PM, Daniel Lezcano wrote:
>>>
>>> Hi Lukasz,
>>>
>>> On 06/01/2022 12:54, Lukasz Luba wrote:
>>>> Hi Daniel,
>>>>
>>>> Could you have a look at this, please?
>>>
>>> Yes, I had a quick look at the code and went to the algorithm
>>> description.
>>>
>>> Still digesting ...
>>>
>>>> On 12/17/21 6:49 PM, Chetankumar Mistry wrote:
>>>>> Implement the Ziegler-Nichols Heuristic algorithm to better
>>>>> estimate the PID Coefficients for a running platform.
>>>>> The values are tuned to minimuse the amount of overshoot in
>>>>> the temperature of the platform and subsequently minimise
>>>>> the number of switches for cdev states.
>>>>>
>>>>> Signed-off-by: Chetankumar Mistry <[email protected]>
>>>>
>>>>
>>>> This is the continuation of the previous idea to have
>>>> better k_* values. You might remember this conversation [1].
>>>>
>>>> I've spent some time researching papers how and what can be done
>>>> in this field and if possible to plumb in to the kernel.
>>>> We had internal discussions (~2017) of one method fuzzy-logic that I
>>>> found back then, but died at the begging not fitting into this
>>>> IPA kernel specific environment and user-space. Your suggestion with
>>>> observing undershooting and overshooting results sparked better idea.
>>>> I thought it's worth to invest in it but I didn't have
>>>> time. We are lucky, Chetan was designated to help me and
>>>> experiment/implement/test these ideas and here is the patch set.
>>>>
>>>> He's chosen the Ziegler-Nichols method, which shows really
>>>> good results in benchmarks (Geekbench and GFXbench on hikey960 Android).
>>>> The improved performance in Geekbench is ~10% (vs. old IPA).
>>>
>>> +10% perf improvements sounds great. What about the temperature
>>> mitigation (temp avg + stddev) ?
>>
>> Chetan would respond about that with the link to the .html file.
>> We just have to create an official public server space for it.
>> I hope till Monday evening we would get something.
>>
>>>
>>>> The main question from our side is the sysfs interface
>>>> which we could be used to trigger this algorithm for
>>>> better coefficients estimations.
>>>> We ask user to echo to some sysfs files in thermal zone
>>>> and start his/her workload. This new IPA 'learns' the system
>>>> utilization and reaction in temperature. After a few rounds,
>>>> we get better fitted coefficients.
>>>> If you need more background about the code or mechanisms, or tests,
>>>> I'm sure Chetan is happy to provide you those.
>>>
>>> I'm worried about the complexity of the algorithm and the overhead
>>> implied.
>>>
>>> The k_* factors are tied with the system and the thermal setup (fan,
>>> heatsink, processor, opp, ...). So IIUC when the factors are found, they
>>> should not change and could be part of the system setup.
>>
>> True, they are found and will be fixed for that board.
>>
>>>
>>> Would the algorithm fit better in a separate userspace kernel tooling?
>>> So we can run it once and find the k_* for a board.
>>
>> We wanted to be part of IPA in kernel because:
>> - the logic needs access to internals of IPA
>> - it would be easy accessible for all distros out-of-box
>> - no additional maintenance and keeping in sync two codes, especially
>>   those in some packages for user-space
>
> Sorry, I'm not convinced :/
>
> AFAICT, the temperature and the sampling rate should be enough
> information to find out the k_*

We are allowing to overshoot the temperature by not capping the
power actors, but we have a safety net to not overshoot too much.
It's internal decision insdie IPA. Userspace would have to
re-implement whole IPA logic and take control over cooling
device states - which would contradict the decision from IPA
controlling the same thermal zone. The finding of coefficients
is by testing many values while running. The post processing
of the data (temp., power requests, frequency, etc) won't tell us
the the limits. We have to check them. That means the user-space tool
would have to re-implement major part of IPA, but also somehow
get the CPUs utilization, then use the obsolete user-space
governor API to experiment with them.

>
> IMO, an userspace tool in ./tools/thermal/ipa is the right place
>
> So if you give the tooling to the SoC vendor via the thermal ones, with
> a file containing the temp + timestamp, they should be able to find the
> k_* and setup their boards.

This feature is aimed for every user of the device, even w/o
expert knowledge in thermal/power. If vendor or OEM didn't
support properly the board, users could do this and they don't
have to be restricted (IMO).
Apart from that, I see vendors are rather interested in investing
in their proprietary solutions, not willing to share know-how
in open source power/thermal mechanisms. Then user is restricted
IMO in using the board. This might block e.g. research of some
PhD students, who have good ideas, but the restrictions of the platform
prevent them or cause the behavior of the board that they cannot
control. I would like to enable everyone to use fully the potential
of the HW/SW - even end-user.

>
> Actually my opinion is the kernel should not handle everything and the
> SoC vendor should at least do some work to setup their system. If they
> are able to find out the sustainable power, they can do the same for the
> right coefficients.

Ideally, yes, I would also like to see that. As you said a few months
ago during review of my former patches, a lot of this 'sustainable
power' entries in DT are Linaro contribution. I wish vendors also
contribute, but c'est la vie.

>
>>> Additionally, the values can be stored in the Documentation for
>>> different board and a documentation on how to use the tool.
>>>
>>> Then up to the SoC vendor to setup the k_* in sysfs, so no need to
>>> change any interface.
>>
>> It wouldn't be for SoC vendor, but up to the OEM or board designer,
>> because the same SoC might have different thermal headroom thanks
>> to better cooling or bigger PCB, etc.
>
> Right, s/SoC vendor/SoC platform/
>
>> I agree that these optimized k_* values might be shared in the kernel.
>> Ideally I would see them in the board's DT file, in the thermal zone,
>> but I'm afraid they are not 'Device description' so don't fit into DT
>> scope. They are rather optimizations of pure kernel mechanism.
>>
>> where would be a good place for it? Maybe a new IPA Documentation/
>> sub-directory?
>
> You can improve the documentation in:
>
> Documentation/driver-api/thermal/power_allocator.rst
>
> And if we agree on a tools/thermal/ipa, the documentation with examples
> and some SoC reference can be put there also
>
>>>> If you are interested in those analyses we can find a way to share a>
>>>> .html file with the results from LISA notebook.
>>>
>>> Yes,
>>
>> Sure thing, let me arrange that.
>
>

2022-02-04 23:53:10

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic


Thanks!

On 04/02/2022 12:48, Chetan Mistry wrote:
> Hi Daniel,
>
> Here is a link to the Compiled Jupyter Notebook with the performance
> analysis of the proposed system vs what is currently there:
>
> https://nbviewer.org/gist/chemis01/0e86ad81508860659a57338dae8274f9
> <https://nbviewer.org/gist/chemis01/0e86ad81508860659a57338dae8274f9>
>
> Kind Regards,
>
> Chetan
>
> *From: *Lukasz Luba <[email protected]>
> *Date: *Friday, 7 January 2022 at 13:39
> *To: *Daniel Lezcano <[email protected]>
> *Cc: *[email protected] <[email protected]>,
> [email protected] <[email protected]>,
> [email protected] <[email protected]>, Chetan Mistry
> <[email protected]>, [email protected] <[email protected]>,
> [email protected] <[email protected]>
> *Subject: *Re: [PATCH v2][RFC 1/2] Implement Ziegler-Nichols Heuristic
>
>
>
> On 1/7/22 11:52 AM, Daniel Lezcano wrote:
>> On 06/01/2022 14:16, Lukasz Luba wrote:
>>>
>>> Thank you for fast response!
>>>
>>> On 1/6/22 12:16 PM, Daniel Lezcano wrote:
>>>>
>>>> Hi Lukasz,
>>>>
>>>> On 06/01/2022 12:54, Lukasz Luba wrote:
>>>>> Hi Daniel,
>>>>>
>>>>> Could you have a look at this, please?
>>>>
>>>> Yes, I had a quick look at the code and went to the algorithm
>>>> description.
>>>>
>>>> Still digesting ...
>>>>
>>>>> On 12/17/21 6:49 PM, Chetankumar Mistry wrote:
>>>>>> Implement the Ziegler-Nichols Heuristic algorithm to better
>>>>>> estimate the PID Coefficients for a running platform.
>>>>>> The values are tuned to minimuse the amount of overshoot in
>>>>>> the temperature of the platform and subsequently minimise
>>>>>> the number of switches for cdev states.
>>>>>>
>>>>>> Signed-off-by: Chetankumar Mistry <[email protected]>
>>>>>
>>>>>
>>>>> This is the continuation of the previous idea to have
>>>>> better k_* values. You might remember this conversation [1].
>>>>>
>>>>> I've spent some time researching papers how and what can be done
>>>>> in this field and if possible to plumb in to the kernel.
>>>>> We had internal discussions (~2017) of one method fuzzy-logic that I
>>>>> found back then, but died at the begging not fitting into this
>>>>> IPA kernel specific environment and user-space. Your suggestion with
>>>>> observing undershooting and overshooting results sparked better idea.
>>>>> I thought it's worth to invest in it but I didn't have
>>>>> time. We are lucky, Chetan was designated to help me and
>>>>> experiment/implement/test these ideas and here is the patch set.
>>>>>
>>>>> He's chosen the Ziegler-Nichols method, which shows really
>>>>> good results in benchmarks (Geekbench and GFXbench on hikey960 Android).
>>>>> The improved performance in Geekbench is ~10% (vs. old IPA).
>>>>
>>>> +10% perf improvements sounds great. What about the temperature
>>>> mitigation (temp avg + stddev) ?
>>>
>>> Chetan would respond about that with the link to the .html file.
>>> We just have to create an official public server space for it.
>>> I hope till Monday evening we would get something.
>>>
>>>>
>>>>> The main question from our side is the sysfs interface
>>>>> which we could be used to trigger this algorithm for
>>>>> better coefficients estimations.
>>>>> We ask user to echo to some sysfs files in thermal zone
>>>>> and start his/her workload. This new IPA 'learns' the system
>>>>> utilization and reaction in temperature. After a few rounds,
>>>>> we get better fitted coefficients.
>>>>> If you need more background about the code or mechanisms, or tests,
>>>>> I'm sure Chetan is happy to provide you those.
>>>>
>>>> I'm worried about the complexity of the algorithm and the overhead
>>>> implied.
>>>>
>>>> The k_* factors are tied with the system and the thermal setup (fan,
>>>> heatsink, processor, opp, ...). So IIUC when the factors are found, they
>>>> should not change and could be part of the system setup.
>>>
>>> True, they are found and will be fixed for that board.
>>>
>>>>
>>>> Would the algorithm fit better in a separate userspace kernel tooling?
>>>> So we can run it once and find the k_* for a board.
>>>
>>> We wanted to be part of IPA in kernel because:
>>> - the logic needs access to internals of IPA
>>> - it would be easy accessible for all distros out-of-box
>>> - no additional maintenance and keeping in sync two codes, especially
>>>    those in some packages for user-space
>>
>> Sorry, I'm not convinced :/
>>
>> AFAICT, the temperature and the sampling rate should be enough
>> information to find out the k_*
>
> We are allowing to overshoot the temperature by not capping the
> power actors, but we have a safety net to not overshoot too much.
> It's internal decision insdie IPA. Userspace would have to
> re-implement whole IPA logic and take control over cooling
> device states - which would contradict the decision from IPA
> controlling the same thermal zone. The finding of coefficients
> is by testing many values while running. The post processing
> of the data (temp., power requests, frequency, etc) won't tell us
> the the limits. We have to check them. That means the user-space tool
> would have to re-implement major part of IPA, but also somehow
> get the CPUs utilization, then use the obsolete user-space
> governor API to experiment with them.
>
>>
>> IMO, an userspace tool in ./tools/thermal/ipa is the right place
>>
>> So if you give the tooling to the SoC vendor via the thermal ones, with
>> a file containing the temp + timestamp, they should be able to find the
>> k_* and setup their boards.
>
> This feature is aimed for every user of the device, even w/o
> expert knowledge in thermal/power. If vendor or OEM didn't
> support properly the board, users could do this and they don't
> have to be restricted (IMO).
> Apart from that, I see vendors are rather interested in investing
> in their proprietary solutions, not willing to share know-how
> in open source power/thermal mechanisms. Then user is restricted
> IMO in using the board. This might block e.g. research of some
> PhD students, who have good ideas, but the restrictions of the platform
> prevent them or cause the behavior of the board that they cannot
> control. I would like to enable everyone to use fully the potential
> of the HW/SW - even end-user.
>
>>
>> Actually my opinion is the kernel should not handle everything and the
>> SoC vendor should at least do some work to setup their system. If they
>> are able to find out the sustainable power, they can do the same for the
>> right coefficients.
>
> Ideally, yes, I would also like to see that. As you said a few months
> ago during review of my former patches, a lot of this 'sustainable
> power' entries in DT are Linaro contribution. I wish vendors also
> contribute, but c'est la vie.
>
>>
>>>> Additionally, the values can be stored in the Documentation for
>>>> different board and a documentation on how to use the tool.
>>>>
>>>> Then up to the SoC vendor to setup the k_* in sysfs, so no need to
>>>> change any interface.
>>>
>>> It wouldn't be for SoC vendor, but up to the OEM or board designer,
>>> because the same SoC might have different thermal headroom thanks
>>> to better cooling or bigger PCB, etc.
>>
>> Right, s/SoC vendor/SoC platform/
>>
>>> I agree that these optimized k_* values might be shared in the kernel.
>>> Ideally I would see them in the board's DT file, in the thermal zone,
>>> but I'm afraid they are not 'Device description' so don't fit into DT
>>> scope. They are rather optimizations of pure kernel mechanism.
>>>
>>> where would be a good place for it? Maybe a new IPA Documentation/
>>> sub-directory?
>>
>> You can improve the documentation in:
>>
>> Documentation/driver-api/thermal/power_allocator.rst
>>
>> And if we agree on a tools/thermal/ipa, the documentation with examples
>> and some SoC reference can be put there also
>>
>>>>> If you are interested in those analyses we can find a way to share a>
>>>>> .html file with the results from LISA notebook.
>>>>
>>>> Yes,
>>>
>>> Sure thing, let me arrange that.
>>
>>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy
> the information in any medium. Thank you.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog