2022-03-21 21:16:12

by Lukasz Luba

[permalink] [raw]
Subject: [RESEND][PATCH 0/8] Introduce support for artificial Energy Model

Hi all,

This patch set adds new callback and support for artificial Energy Model (EM).
The new EMs have artificially generated performance states.
Such EMs can be created from lean information sources, such
as the relative energy efficiency between CPUs. The ACPI based
platforms provide this information
(ACPI 6.4, s5.2.12.14 'GIC CPU Interface (GICC) Structure'
'Processor Power efficiency Class' field).

Artificial EMs might require to directly provide the 'cost' of
the generated performance state. This patch set adds a new callback
.get_cost() for this. The EM framework does not force any model
or formula, it's up to the platform code.

Artificial EMs aim to leverage the Energy Aware Scheduler
(EAS). Other frameworks relying on performance states
information (i.e. IPA/DTPM) must be informed of the
EM type and might be prevented from using it. This patch
sets also does this by introducing a new flag:
EM_PERF_DOMAIN_ARTIFICIAL.

The patch set is based on current linux-next, where some
changes to OPP & EM are queuing.

The patch set also contains (patch 7/8 and patch 8/8) logic which prevents
two EM's client frameworks from using this new EM type. Some other approach,
using 'milli-watts', has been proposed and discussed, but refused [1].
This new flag is more precised and should not leave space for
wrong interpretation.

Shortly after this patch set you will see a patch set implementing the
platform code and registering this new EM.

Regards,
Lukasz Luba

[1] https://lore.kernel.org/linux-pm/[email protected]/

Lukasz Luba (7):
PM: EM: Add .get_cost() callback
PM: EM: Use the new .get_cost() callback while registering EM
PM: EM: Change the order of arguments in the .active_power() callback
PM: EM: Remove old debugfs files and print all 'flags'
Documentation: EM: Add artificial EM registration description
thermal: cooling: Check Energy Model type in cpufreq_cooling and
devfreq_cooling
powercap: DTPM: Check for Energy Model type

Pierre Gondois (1):
PM: EM: Add artificial EM flag

Documentation/power/energy-model.rst | 24 +++++++++-
drivers/cpufreq/mediatek-cpufreq-hw.c | 4 +-
drivers/cpufreq/scmi-cpufreq.c | 4 +-
drivers/opp/of.c | 6 +--
drivers/powercap/dtpm_cpu.c | 2 +-
drivers/thermal/cpufreq_cooling.c | 2 +-
drivers/thermal/devfreq_cooling.c | 8 ++--
include/linux/energy_model.h | 35 +++++++++++++--
kernel/power/energy_model.c | 63 +++++++++++++++------------
9 files changed, 101 insertions(+), 47 deletions(-)

--
2.17.1


2022-03-21 21:33:10

by Lukasz Luba

[permalink] [raw]
Subject: [RESEND][PATCH 1/8] PM: EM: Add .get_cost() callback

The Energy Model (EM) supports devices which report abstract power scale,
not only real Watts. The primary goal for EM is to enable the Energy Aware
Scheduler (EAS) for a given platform. Some of the platforms might not be
able to deliver proper power values. The only information that they might
have is the relative efficiency between CPU types.

Thus, it makes sense to remove some restrictions in the EM framework and
introduce a mechanism which would support those platforms. What is crucial
for EAS to operate is the 'cost' field in the EM. The 'cost' is calculated
internally in EM framework based on knowledge from 'power' values.
The 'cost' values must be strictly increasing. The existing API with its
'power' value size restrictions cannot guarantee that the 'cost' will meet
this requirement.

Since the platform is missing this detailed information, but has only
efficiency details, introduce a new custom callback in the EM framework.
The new callback would allow to provide the 'cost' values which reflect
efficiency of the CPUs. This would allow to provide EAS information which
has different relation than what would be forced by the EM internal
formulas calculating 'cost' values. Thanks to this new callback it is
possible to create a system view for EAS which has no overlapping
performance states across many Performance Domains.

Signed-off-by: Lukasz Luba <[email protected]>
---
include/linux/energy_model.h | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 9f3c400bc52d..0a3a5663177b 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -114,9 +114,30 @@ struct em_data_callback {
*/
int (*active_power)(unsigned long *power, unsigned long *freq,
struct device *dev);
+
+ /**
+ * get_cost() - Provide the cost at the given performance state of
+ * a device
+ * @dev : Device for which we do this operation (can be a CPU)
+ * @freq : Frequency at the performance state in kHz
+ * @cost : The cost value for the performance state
+ * (modified)
+ *
+ * In case of CPUs, the cost is the one of a single CPU in the domain.
+ * It is expected to fit in the [0, EM_MAX_POWER] range due to internal
+ * usage in EAS calculation.
+ *
+ * Return 0 on success, or appropriate error value in case of failure.
+ */
+ int (*get_cost)(struct device *dev, unsigned long freq,
+ unsigned long *cost);
};
-#define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb }
#define EM_SET_ACTIVE_POWER_CB(em_cb, cb) ((em_cb).active_power = cb)
+#define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) \
+ { .active_power = _active_power_cb, \
+ .get_cost = _cost_cb }
+#define EM_DATA_CB(_active_power_cb) \
+ EM_ADV_DATA_CB(_active_power_cb, NULL)

struct em_perf_domain *em_cpu_get(int cpu);
struct em_perf_domain *em_pd_get(struct device *dev);
@@ -264,6 +285,7 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)

#else
struct em_data_callback {};
+#define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) { }
#define EM_DATA_CB(_active_power_cb) { }
#define EM_SET_ACTIVE_POWER_CB(em_cb, cb) do { } while (0)

--
2.17.1

2022-03-21 22:07:27

by Lukasz Luba

[permalink] [raw]
Subject: [RESEND][PATCH 8/8] powercap: DTPM: Check for Energy Model type

The Energy Model power values might be artificial. In such case
it's safe to bail out during the registration, since the PowerCap
framework supports only micro-Watts.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/powercap/dtpm_cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
index bca2f912d349..f5eced0842b3 100644
--- a/drivers/powercap/dtpm_cpu.c
+++ b/drivers/powercap/dtpm_cpu.c
@@ -211,7 +211,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent)
return 0;

pd = em_cpu_get(cpu);
- if (!pd)
+ if (!pd || em_is_artificial(pd))
return -EINVAL;

dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL);
--
2.17.1

2022-03-21 22:14:11

by Lukasz Luba

[permalink] [raw]
Subject: [RESEND][PATCH 3/8] PM: EM: Use the new .get_cost() callback while registering EM

The Energy Model (EM) allows to provide the 'cost' values when the device
driver provides the .get_cost() optional callback. This removes
restriction which is in the EM calculation function of the 'cost'
for each performance state. Now, the driver is in charge of providing
the right values which are then used by Energy Aware Scheduler.

Signed-off-by: Lukasz Luba <[email protected]>
---
kernel/power/energy_model.c | 39 ++++++++++++++++++++++++++-----------
1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 6ecee99af842..95a3b33001f6 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -121,7 +121,8 @@ static void em_debug_remove_pd(struct device *dev) {}
#endif

static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
- int nr_states, struct em_data_callback *cb)
+ int nr_states, struct em_data_callback *cb,
+ unsigned long flags)
{
unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX;
struct em_perf_state *table;
@@ -173,10 +174,22 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
/* Compute the cost of each performance state. */
fmax = (u64) table[nr_states - 1].frequency;
for (i = nr_states - 1; i >= 0; i--) {
- unsigned long power_res = em_scale_power(table[i].power);
+ unsigned long power_res, cost;
+
+ if (flags & EM_PERF_DOMAIN_ARTIFICIAL) {
+ ret = cb->get_cost(dev, table[i].frequency, &cost);
+ if (ret || !cost || cost > EM_MAX_POWER) {
+ dev_err(dev, "EM: invalid cost %lu %d\n",
+ cost, ret);
+ goto free_ps_table;
+ }
+ } else {
+ power_res = em_scale_power(table[i].power);
+ cost = div64_u64(fmax * power_res, table[i].frequency);
+ }
+
+ table[i].cost = cost;

- table[i].cost = div64_u64(fmax * power_res,
- table[i].frequency);
if (table[i].cost >= prev_cost) {
table[i].flags = EM_PERF_STATE_INEFFICIENT;
dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
@@ -197,7 +210,8 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
}

static int em_create_pd(struct device *dev, int nr_states,
- struct em_data_callback *cb, cpumask_t *cpus)
+ struct em_data_callback *cb, cpumask_t *cpus,
+ unsigned long flags)
{
struct em_perf_domain *pd;
struct device *cpu_dev;
@@ -215,7 +229,7 @@ static int em_create_pd(struct device *dev, int nr_states,
return -ENOMEM;
}

- ret = em_create_perf_table(dev, pd, nr_states, cb);
+ ret = em_create_perf_table(dev, pd, nr_states, cb, flags);
if (ret) {
kfree(pd);
return ret;
@@ -332,6 +346,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
bool milliwatts)
{
unsigned long cap, prev_cap = 0;
+ unsigned long flags = 0;
int cpu, ret;

if (!dev || !nr_states || !cb)
@@ -378,14 +393,16 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
}
}

- ret = em_create_pd(dev, nr_states, cb, cpus);
+ if (milliwatts)
+ flags |= EM_PERF_DOMAIN_MILLIWATTS;
+ else if (cb->get_cost)
+ flags |= EM_PERF_DOMAIN_ARTIFICIAL;
+
+ ret = em_create_pd(dev, nr_states, cb, cpus, flags);
if (ret)
goto unlock;

- if (milliwatts)
- dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS;
- else if (cb->get_cost)
- dev->em_pd->flags |= EM_PERF_DOMAIN_ARTIFICIAL;
+ dev->em_pd->flags |= flags;

em_cpufreq_update_efficiencies(dev);

--
2.17.1

2022-03-21 22:41:10

by Lukasz Luba

[permalink] [raw]
Subject: [RESEND][PATCH 7/8] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling

The Energy Model can now be artificial, which means the power values
are mathematically generated to leverage EAS while not expected to be on
an uniform scale with other devices providing power information. If this
EM type is in use, the thermal governor IPA should not be allowed to
operate, since the relation between cooling devices is not properly
defined. Thus, it might be possible that big GPU has lower power values
than a Little CPU. To mitigate a misbehaviour of the thermal control
algorithm, simply do not register the cooling device as IPA's power
actor.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/cpufreq_cooling.c | 2 +-
drivers/thermal/devfreq_cooling.c | 8 +++++---
2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index 0bfb8eebd126..b8151d95a806 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -328,7 +328,7 @@ static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev,
struct cpufreq_policy *policy;
unsigned int nr_levels;

- if (!em)
+ if (!em || em_is_artificial(em))
return false;

policy = cpufreq_cdev->policy;
diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index 4310cb342a9f..b04dcbbf721a 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -358,6 +358,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
struct thermal_cooling_device *cdev;
struct device *dev = df->dev.parent;
struct devfreq_cooling_device *dfc;
+ struct em_perf_domain *em;
char *name;
int err, num_opps;

@@ -367,8 +368,9 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,

dfc->devfreq = df;

- dfc->em_pd = em_pd_get(dev);
- if (dfc->em_pd) {
+ em = em_pd_get(dev);
+ if (em && !em_is_artificial(em)) {
+ dfc->em_pd = em;
devfreq_cooling_ops.get_requested_power =
devfreq_cooling_get_requested_power;
devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
@@ -379,7 +381,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
num_opps = em_pd_nr_perf_states(dfc->em_pd);
} else {
/* Backward compatibility for drivers which do not use IPA */
- dev_dbg(dev, "missing EM for cooling device\n");
+ dev_dbg(dev, "missing proper EM for cooling device\n");

num_opps = dev_pm_opp_get_opp_count(dev);

--
2.17.1

2022-03-21 22:45:45

by Lukasz Luba

[permalink] [raw]
Subject: [RESEND][PATCH 6/8] Documentation: EM: Add artificial EM registration description

Add description about new artificial EM registration and use cases.
Update also the documentation with the new .get_cost() callback
description and usage.

Signed-off-by: Lukasz Luba <[email protected]>
---
Documentation/power/energy-model.rst | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
index fd29ed2506c0..feb257b7f350 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -123,6 +123,26 @@ allows a platform to register EM power values which are reflecting total power
(static + dynamic). These power values might be coming directly from
experiments and measurements.

+Registration of 'artificial' EM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There is an option to provide a custom callback for drivers missing detailed
+knowledge about power value for each performance state. The callback
+.get_cost() is optional and provides the 'cost' values used by the EAS.
+This is useful for platforms that only provide information on relative
+efficiency between CPU types, where one could use the information to
+create an abstract power model. But even an abstract power model can
+sometimes be hard to fit in, given the input power value size restrictions.
+The .get_cost() allows to provide the 'cost' values which reflect the
+efficiency of the CPUs. This would allow to provide EAS information which
+has different relation than what would be forced by the EM internal
+formulas calculating 'cost' values. To register an EM for such platform, the
+driver must set the flag 'milliwatts' to 0, provide .get_power() callback
+and provide .get_cost() callback. The EM framework would handle such platform
+properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such
+platform. Special care should be taken by other frameworks which are using EM
+to test and treat this flag properly.
+
Registration of 'simple' EM
~~~~~~~~~~~~~~~~~~~~~~~~~~~

--
2.17.1

2022-03-21 23:25:59

by Lukasz Luba

[permalink] [raw]
Subject: [RESEND][PATCH 5/8] PM: EM: Remove old debugfs files and print all 'flags'

The Energy Model gets more bits used in 'flags'. Avoid adding another
debugfs file just to print what is the status of a new flag. Simply
remove old debugfs files and add one generic which prints all flags
as a hex value.

Signed-off-by: Lukasz Luba <[email protected]>
---
kernel/power/energy_model.c | 24 +++++-------------------
1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index babefc72085d..092513575e4e 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -54,28 +54,15 @@ static int em_debug_cpus_show(struct seq_file *s, void *unused)
}
DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);

-static int em_debug_units_show(struct seq_file *s, void *unused)
+static int em_debug_flags_show(struct seq_file *s, void *unused)
{
struct em_perf_domain *pd = s->private;
- char *units = (pd->flags & EM_PERF_DOMAIN_MILLIWATTS) ?
- "milliWatts" : "bogoWatts";

- seq_printf(s, "%s\n", units);
+ seq_printf(s, "%#lx\n", pd->flags);

return 0;
}
-DEFINE_SHOW_ATTRIBUTE(em_debug_units);
-
-static int em_debug_skip_inefficiencies_show(struct seq_file *s, void *unused)
-{
- struct em_perf_domain *pd = s->private;
- int enabled = (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES) ? 1 : 0;
-
- seq_printf(s, "%d\n", enabled);
-
- return 0;
-}
-DEFINE_SHOW_ATTRIBUTE(em_debug_skip_inefficiencies);
+DEFINE_SHOW_ATTRIBUTE(em_debug_flags);

static void em_debug_create_pd(struct device *dev)
{
@@ -89,9 +76,8 @@ static void em_debug_create_pd(struct device *dev)
debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus,
&em_debug_cpus_fops);

- debugfs_create_file("units", 0444, d, dev->em_pd, &em_debug_units_fops);
- debugfs_create_file("skip-inefficiencies", 0444, d, dev->em_pd,
- &em_debug_skip_inefficiencies_fops);
+ debugfs_create_file("flags", 0444, d, dev->em_pd,
+ &em_debug_flags_fops);

/* Create a sub-directory for each performance state */
for (i = 0; i < dev->em_pd->nr_perf_states; i++)
--
2.17.1

2022-03-21 23:26:41

by Lukasz Luba

[permalink] [raw]
Subject: [RESEND][PATCH 4/8] PM: EM: Change the order of arguments in the .active_power() callback

The .active_power() callback passes the device pointer when it's called.
Aligned with a convetion present in other subsystems and pass the 'dev'
as a first argument. It looks more cleaner.

Adjust all affected drivers which implement that API callback.

Suggested-by: Ionela Voinescu <[email protected]>
Signed-off-by: Lukasz Luba <[email protected]>
---
Documentation/power/energy-model.rst | 4 ++--
drivers/cpufreq/mediatek-cpufreq-hw.c | 4 ++--
drivers/cpufreq/scmi-cpufreq.c | 4 ++--
drivers/opp/of.c | 6 +++---
include/linux/energy_model.h | 6 +++---
kernel/power/energy_model.c | 2 +-
6 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
index 49549aab41b4..fd29ed2506c0 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -181,8 +181,8 @@ EM framework::

-> drivers/cpufreq/foo_cpufreq.c

- 01 static int est_power(unsigned long *mW, unsigned long *KHz,
- 02 struct device *dev)
+ 01 static int est_power(struct device *dev, unsigned long *mW,
+ 02 unsigned long *KHz)
03 {
04 long freq, power;
05
diff --git a/drivers/cpufreq/mediatek-cpufreq-hw.c b/drivers/cpufreq/mediatek-cpufreq-hw.c
index 0a94c56ddad2..813cccbfe934 100644
--- a/drivers/cpufreq/mediatek-cpufreq-hw.c
+++ b/drivers/cpufreq/mediatek-cpufreq-hw.c
@@ -51,8 +51,8 @@ static const u16 cpufreq_mtk_offsets[REG_ARRAY_SIZE] = {
};

static int __maybe_unused
-mtk_cpufreq_get_cpu_power(unsigned long *mW,
- unsigned long *KHz, struct device *cpu_dev)
+mtk_cpufreq_get_cpu_power(struct device *cpu_dev, unsigned long *mW,
+ unsigned long *KHz)
{
struct mtk_cpufreq_data *data;
struct cpufreq_policy *policy;
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index 919fa6e3f462..6d2a4cf46db7 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -96,8 +96,8 @@ scmi_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask)
}

static int __maybe_unused
-scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
- struct device *cpu_dev)
+scmi_get_cpu_power(struct device *cpu_dev, unsigned long *power,
+ unsigned long *KHz)
{
unsigned long Hz;
int ret, domain;
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 440ab5a03df9..485ea980bde7 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -1448,7 +1448,7 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
* Returns 0 on success or a proper -EINVAL value in case of error.
*/
static int __maybe_unused
-_get_dt_power(unsigned long *mW, unsigned long *kHz, struct device *dev)
+_get_dt_power(struct device *dev, unsigned long *mW, unsigned long *kHz)
{
struct dev_pm_opp *opp;
unsigned long opp_freq, opp_power;
@@ -1482,8 +1482,8 @@ _get_dt_power(unsigned long *mW, unsigned long *kHz, struct device *dev)
* Returns -EINVAL if the power calculation failed because of missing
* parameters, 0 otherwise.
*/
-static int __maybe_unused _get_power(unsigned long *mW, unsigned long *kHz,
- struct device *dev)
+static int __maybe_unused _get_power(struct device *dev, unsigned long *mW,
+ unsigned long *kHz)
{
struct dev_pm_opp *opp;
struct device_node *np;
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 92e82a322859..8419bffb4398 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -101,11 +101,11 @@ struct em_data_callback {
/**
* active_power() - Provide power at the next performance state of
* a device
+ * @dev : Device for which we do this operation (can be a CPU)
* @power : Active power at the performance state
* (modified)
* @freq : Frequency at the performance state in kHz
* (modified)
- * @dev : Device for which we do this operation (can be a CPU)
*
* active_power() must find the lowest performance state of 'dev' above
* 'freq' and update 'power' and 'freq' to the matching active power
@@ -117,8 +117,8 @@ struct em_data_callback {
*
* Return 0 on success.
*/
- int (*active_power)(unsigned long *power, unsigned long *freq,
- struct device *dev);
+ int (*active_power)(struct device *dev, unsigned long *power,
+ unsigned long *freq);

/**
* get_cost() - Provide the cost at the given performance state of
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 95a3b33001f6..babefc72085d 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -140,7 +140,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
* lowest performance state of 'dev' above 'freq' and updates
* 'power' and 'freq' accordingly.
*/
- ret = cb->active_power(&power, &freq, dev);
+ ret = cb->active_power(dev, &power, &freq);
if (ret) {
dev_err(dev, "EM: invalid perf. state: %d\n",
ret);
--
2.17.1

2022-03-28 10:11:02

by Viresh Kumar

[permalink] [raw]
Subject: Re: [RESEND][PATCH 7/8] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling

On 21-03-22, 09:57, Lukasz Luba wrote:
> The Energy Model can now be artificial, which means the power values
> are mathematically generated to leverage EAS while not expected to be on
> an uniform scale with other devices providing power information. If this
> EM type is in use, the thermal governor IPA should not be allowed to
> operate, since the relation between cooling devices is not properly
> defined. Thus, it might be possible that big GPU has lower power values
> than a Little CPU. To mitigate a misbehaviour of the thermal control
> algorithm, simply do not register the cooling device as IPA's power
> actor.
>
> Signed-off-by: Lukasz Luba <[email protected]>
> ---
> drivers/thermal/cpufreq_cooling.c | 2 +-
> drivers/thermal/devfreq_cooling.c | 8 +++++---
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
> index 0bfb8eebd126..b8151d95a806 100644
> --- a/drivers/thermal/cpufreq_cooling.c
> +++ b/drivers/thermal/cpufreq_cooling.c
> @@ -328,7 +328,7 @@ static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev,
> struct cpufreq_policy *policy;
> unsigned int nr_levels;
>
> - if (!em)
> + if (!em || em_is_artificial(em))
> return false;
>
> policy = cpufreq_cdev->policy;
> diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
> index 4310cb342a9f..b04dcbbf721a 100644
> --- a/drivers/thermal/devfreq_cooling.c
> +++ b/drivers/thermal/devfreq_cooling.c
> @@ -358,6 +358,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
> struct thermal_cooling_device *cdev;
> struct device *dev = df->dev.parent;
> struct devfreq_cooling_device *dfc;
> + struct em_perf_domain *em;
> char *name;
> int err, num_opps;
>
> @@ -367,8 +368,9 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
>
> dfc->devfreq = df;
>
> - dfc->em_pd = em_pd_get(dev);
> - if (dfc->em_pd) {
> + em = em_pd_get(dev);
> + if (em && !em_is_artificial(em)) {
> + dfc->em_pd = em;
> devfreq_cooling_ops.get_requested_power =
> devfreq_cooling_get_requested_power;
> devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
> @@ -379,7 +381,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
> num_opps = em_pd_nr_perf_states(dfc->em_pd);
> } else {
> /* Backward compatibility for drivers which do not use IPA */
> - dev_dbg(dev, "missing EM for cooling device\n");
> + dev_dbg(dev, "missing proper EM for cooling device\n");
>
> num_opps = dev_pm_opp_get_opp_count(dev);

Acked-by: Viresh Kumar <[email protected]>

--
viresh

2022-03-29 19:44:56

by Lukasz Luba

[permalink] [raw]
Subject: Re: [RESEND][PATCH 7/8] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling



On 3/28/22 04:27, Viresh Kumar wrote:
> On 21-03-22, 09:57, Lukasz Luba wrote:
>> The Energy Model can now be artificial, which means the power values
>> are mathematically generated to leverage EAS while not expected to be on
>> an uniform scale with other devices providing power information. If this
>> EM type is in use, the thermal governor IPA should not be allowed to
>> operate, since the relation between cooling devices is not properly
>> defined. Thus, it might be possible that big GPU has lower power values
>> than a Little CPU. To mitigate a misbehaviour of the thermal control
>> algorithm, simply do not register the cooling device as IPA's power
>> actor.
>>
>> Signed-off-by: Lukasz Luba <[email protected]>
>> ---
>> drivers/thermal/cpufreq_cooling.c | 2 +-
>> drivers/thermal/devfreq_cooling.c | 8 +++++---
>> 2 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
>> index 0bfb8eebd126..b8151d95a806 100644
>> --- a/drivers/thermal/cpufreq_cooling.c
>> +++ b/drivers/thermal/cpufreq_cooling.c
>> @@ -328,7 +328,7 @@ static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev,
>> struct cpufreq_policy *policy;
>> unsigned int nr_levels;
>>
>> - if (!em)
>> + if (!em || em_is_artificial(em))
>> return false;
>>
>> policy = cpufreq_cdev->policy;
>> diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
>> index 4310cb342a9f..b04dcbbf721a 100644
>> --- a/drivers/thermal/devfreq_cooling.c
>> +++ b/drivers/thermal/devfreq_cooling.c
>> @@ -358,6 +358,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
>> struct thermal_cooling_device *cdev;
>> struct device *dev = df->dev.parent;
>> struct devfreq_cooling_device *dfc;
>> + struct em_perf_domain *em;
>> char *name;
>> int err, num_opps;
>>
>> @@ -367,8 +368,9 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
>>
>> dfc->devfreq = df;
>>
>> - dfc->em_pd = em_pd_get(dev);
>> - if (dfc->em_pd) {
>> + em = em_pd_get(dev);
>> + if (em && !em_is_artificial(em)) {
>> + dfc->em_pd = em;
>> devfreq_cooling_ops.get_requested_power =
>> devfreq_cooling_get_requested_power;
>> devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
>> @@ -379,7 +381,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
>> num_opps = em_pd_nr_perf_states(dfc->em_pd);
>> } else {
>> /* Backward compatibility for drivers which do not use IPA */
>> - dev_dbg(dev, "missing EM for cooling device\n");
>> + dev_dbg(dev, "missing proper EM for cooling device\n");
>>
>> num_opps = dev_pm_opp_get_opp_count(dev);
>
> Acked-by: Viresh Kumar <[email protected]>
>

Thank you Viresh for the ACK!

2022-04-04 23:13:19

by Lukasz Luba

[permalink] [raw]
Subject: Re: [RESEND][PATCH 0/8] Introduce support for artificial Energy Model

Hi Rafael,


The patch set has been on LKML for quite a while and got one ACK,
for the code touching something outside the EM (thermal cooling).

AFAICS there is no interest and objections from others for this code.

Therefore, I have a question:
What would be be process to have merge this code?
(We had internally a few reviews of this code)

On 3/21/22 09:57, Lukasz Luba wrote:
> Hi all,
>
> This patch set adds new callback and support for artificial Energy Model (EM).
> The new EMs have artificially generated performance states.
> Such EMs can be created from lean information sources, such
> as the relative energy efficiency between CPUs. The ACPI based
> platforms provide this information
> (ACPI 6.4, s5.2.12.14 'GIC CPU Interface (GICC) Structure'
> 'Processor Power efficiency Class' field).
>
> Artificial EMs might require to directly provide the 'cost' of
> the generated performance state. This patch set adds a new callback
> .get_cost() for this. The EM framework does not force any model
> or formula, it's up to the platform code.
>
> Artificial EMs aim to leverage the Energy Aware Scheduler
> (EAS). Other frameworks relying on performance states
> information (i.e. IPA/DTPM) must be informed of the
> EM type and might be prevented from using it. This patch
> sets also does this by introducing a new flag:
> EM_PERF_DOMAIN_ARTIFICIAL.
>
> The patch set is based on current linux-next, where some
> changes to OPP & EM are queuing.
>
> The patch set also contains (patch 7/8 and patch 8/8) logic which prevents
> two EM's client frameworks from using this new EM type. Some other approach,
> using 'milli-watts', has been proposed and discussed, but refused [1].
> This new flag is more precised and should not leave space for
> wrong interpretation.
>
> Shortly after this patch set you will see a patch set implementing the
> platform code and registering this new EM.
>


No one from Arm is an official maintainer of the EM code.

Regards,
Lukasz

2022-04-05 00:28:48

by Ionela Voinescu

[permalink] [raw]
Subject: Re: [RESEND][PATCH 1/8] PM: EM: Add .get_cost() callback

On Monday 21 Mar 2022 at 09:57:22 (+0000), Lukasz Luba wrote:
> The Energy Model (EM) supports devices which report abstract power scale,
> not only real Watts. The primary goal for EM is to enable the Energy Aware
> Scheduler (EAS) for a given platform. Some of the platforms might not be
> able to deliver proper power values. The only information that they might
> have is the relative efficiency between CPU types.
>
> Thus, it makes sense to remove some restrictions in the EM framework and
> introduce a mechanism which would support those platforms. What is crucial
> for EAS to operate is the 'cost' field in the EM. The 'cost' is calculated
> internally in EM framework based on knowledge from 'power' values.
> The 'cost' values must be strictly increasing. The existing API with its
> 'power' value size restrictions cannot guarantee that the 'cost' will meet
> this requirement.
>
> Since the platform is missing this detailed information, but has only
> efficiency details, introduce a new custom callback in the EM framework.
> The new callback would allow to provide the 'cost' values which reflect
> efficiency of the CPUs. This would allow to provide EAS information which
> has different relation than what would be forced by the EM internal
> formulas calculating 'cost' values. Thanks to this new callback it is
> possible to create a system view for EAS which has no overlapping
> performance states across many Performance Domains.
>
> Signed-off-by: Lukasz Luba <[email protected]>
> ---
> include/linux/energy_model.h | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 9f3c400bc52d..0a3a5663177b 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -114,9 +114,30 @@ struct em_data_callback {
> */
> int (*active_power)(unsigned long *power, unsigned long *freq,
> struct device *dev);
> +
> + /**
> + * get_cost() - Provide the cost at the given performance state of
> + * a device
> + * @dev : Device for which we do this operation (can be a CPU)
> + * @freq : Frequency at the performance state in kHz
> + * @cost : The cost value for the performance state
> + * (modified)
> + *
> + * In case of CPUs, the cost is the one of a single CPU in the domain.
> + * It is expected to fit in the [0, EM_MAX_POWER] range due to internal
> + * usage in EAS calculation.
> + *
> + * Return 0 on success, or appropriate error value in case of failure.
> + */
> + int (*get_cost)(struct device *dev, unsigned long freq,
> + unsigned long *cost);
> };
> -#define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb }
> #define EM_SET_ACTIVE_POWER_CB(em_cb, cb) ((em_cb).active_power = cb)
> +#define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) \
> + { .active_power = _active_power_cb, \
> + .get_cost = _cost_cb }
> +#define EM_DATA_CB(_active_power_cb) \
> + EM_ADV_DATA_CB(_active_power_cb, NULL)
>
> struct em_perf_domain *em_cpu_get(int cpu);
> struct em_perf_domain *em_pd_get(struct device *dev);
> @@ -264,6 +285,7 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
>
> #else
> struct em_data_callback {};
> +#define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) { }
> #define EM_DATA_CB(_active_power_cb) { }
> #define EM_SET_ACTIVE_POWER_CB(em_cb, cb) do { } while (0)
>

Reviewed-by: Ionela Voinescu <[email protected]>

Thanks,
Ionela.

> --
> 2.17.1
>

2022-04-05 01:41:10

by Ionela Voinescu

[permalink] [raw]
Subject: Re: [RESEND][PATCH 8/8] powercap: DTPM: Check for Energy Model type

On Monday 21 Mar 2022 at 09:57:29 (+0000), Lukasz Luba wrote:
> The Energy Model power values might be artificial. In such case
> it's safe to bail out during the registration, since the PowerCap
> framework supports only micro-Watts.
>
> Signed-off-by: Lukasz Luba <[email protected]>
> ---
> drivers/powercap/dtpm_cpu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
> index bca2f912d349..f5eced0842b3 100644
> --- a/drivers/powercap/dtpm_cpu.c
> +++ b/drivers/powercap/dtpm_cpu.c
> @@ -211,7 +211,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent)
> return 0;
>
> pd = em_cpu_get(cpu);
> - if (!pd)
> + if (!pd || em_is_artificial(pd))
> return -EINVAL;
>
> dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL);

Reviewed-by: Ionela Voinescu <[email protected]>

> --
> 2.17.1
>

2022-04-05 02:10:07

by Ionela Voinescu

[permalink] [raw]
Subject: Re: [RESEND][PATCH 7/8] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling

On Monday 21 Mar 2022 at 09:57:28 (+0000), Lukasz Luba wrote:
> The Energy Model can now be artificial, which means the power values
> are mathematically generated to leverage EAS while not expected to be on
> an uniform scale with other devices providing power information. If this
> EM type is in use, the thermal governor IPA should not be allowed to
> operate, since the relation between cooling devices is not properly
> defined. Thus, it might be possible that big GPU has lower power values
> than a Little CPU. To mitigate a misbehaviour of the thermal control
> algorithm, simply do not register the cooling device as IPA's power
> actor.
>
> Signed-off-by: Lukasz Luba <[email protected]>
> ---
> drivers/thermal/cpufreq_cooling.c | 2 +-
> drivers/thermal/devfreq_cooling.c | 8 +++++---
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
> index 0bfb8eebd126..b8151d95a806 100644
> --- a/drivers/thermal/cpufreq_cooling.c
> +++ b/drivers/thermal/cpufreq_cooling.c
> @@ -328,7 +328,7 @@ static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev,
> struct cpufreq_policy *policy;
> unsigned int nr_levels;
>
> - if (!em)
> + if (!em || em_is_artificial(em))
> return false;
>
> policy = cpufreq_cdev->policy;
> diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
> index 4310cb342a9f..b04dcbbf721a 100644
> --- a/drivers/thermal/devfreq_cooling.c
> +++ b/drivers/thermal/devfreq_cooling.c
> @@ -358,6 +358,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
> struct thermal_cooling_device *cdev;
> struct device *dev = df->dev.parent;
> struct devfreq_cooling_device *dfc;
> + struct em_perf_domain *em;
> char *name;
> int err, num_opps;
>
> @@ -367,8 +368,9 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
>
> dfc->devfreq = df;
>
> - dfc->em_pd = em_pd_get(dev);
> - if (dfc->em_pd) {
> + em = em_pd_get(dev);
> + if (em && !em_is_artificial(em)) {
> + dfc->em_pd = em;
> devfreq_cooling_ops.get_requested_power =
> devfreq_cooling_get_requested_power;
> devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
> @@ -379,7 +381,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
> num_opps = em_pd_nr_perf_states(dfc->em_pd);
> } else {
> /* Backward compatibility for drivers which do not use IPA */
> - dev_dbg(dev, "missing EM for cooling device\n");
> + dev_dbg(dev, "missing proper EM for cooling device\n");
>
> num_opps = dev_pm_opp_get_opp_count(dev);
>

If needed,

Reviewed-by: Ionela Voinescu <[email protected]>

> --
> 2.17.1
>

2022-04-05 02:19:21

by Ionela Voinescu

[permalink] [raw]
Subject: Re: [RESEND][PATCH 5/8] PM: EM: Remove old debugfs files and print all 'flags'

Hi Lukasz,

IMO the debugfs files were fine as they were:
- They offered information on units and inefficiencies without having
to dig into the code to see which bit is for which flag.
- I believe the artificial EM power values fit under bogoWatts as unit,
so that part would still be correct.

On the other hand, your new file offers more information: we'd be able
to see in debugfs whether we're dealing with an artificial EM, despite
needing a bit more looking over the code to understand the output.

I don't have a strong opinion and the code looks fine, so:

Reviewed-by: Ionela Voinescu <[email protected]>


On Monday 21 Mar 2022 at 09:57:26 (+0000), Lukasz Luba wrote:
> The Energy Model gets more bits used in 'flags'. Avoid adding another
> debugfs file just to print what is the status of a new flag. Simply
> remove old debugfs files and add one generic which prints all flags
> as a hex value.
>
> Signed-off-by: Lukasz Luba <[email protected]>
> ---
> kernel/power/energy_model.c | 24 +++++-------------------
> 1 file changed, 5 insertions(+), 19 deletions(-)
>
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index babefc72085d..092513575e4e 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -54,28 +54,15 @@ static int em_debug_cpus_show(struct seq_file *s, void *unused)
> }
> DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);
>
> -static int em_debug_units_show(struct seq_file *s, void *unused)
> +static int em_debug_flags_show(struct seq_file *s, void *unused)
> {
> struct em_perf_domain *pd = s->private;
> - char *units = (pd->flags & EM_PERF_DOMAIN_MILLIWATTS) ?
> - "milliWatts" : "bogoWatts";
>
> - seq_printf(s, "%s\n", units);
> + seq_printf(s, "%#lx\n", pd->flags);
>
> return 0;
> }
> -DEFINE_SHOW_ATTRIBUTE(em_debug_units);
> -
> -static int em_debug_skip_inefficiencies_show(struct seq_file *s, void *unused)
> -{
> - struct em_perf_domain *pd = s->private;
> - int enabled = (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES) ? 1 : 0;
> -
> - seq_printf(s, "%d\n", enabled);
> -
> - return 0;
> -}
> -DEFINE_SHOW_ATTRIBUTE(em_debug_skip_inefficiencies);
> +DEFINE_SHOW_ATTRIBUTE(em_debug_flags);
>
> static void em_debug_create_pd(struct device *dev)
> {
> @@ -89,9 +76,8 @@ static void em_debug_create_pd(struct device *dev)
> debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus,
> &em_debug_cpus_fops);
>
> - debugfs_create_file("units", 0444, d, dev->em_pd, &em_debug_units_fops);
> - debugfs_create_file("skip-inefficiencies", 0444, d, dev->em_pd,
> - &em_debug_skip_inefficiencies_fops);
> + debugfs_create_file("flags", 0444, d, dev->em_pd,
> + &em_debug_flags_fops);
>
> /* Create a sub-directory for each performance state */
> for (i = 0; i < dev->em_pd->nr_perf_states; i++)
> --
> 2.17.1
>

2022-04-05 02:24:42

by Ionela Voinescu

[permalink] [raw]
Subject: Re: [RESEND][PATCH 3/8] PM: EM: Use the new .get_cost() callback while registering EM

Hi Lukasz,

On Monday 21 Mar 2022 at 09:57:24 (+0000), Lukasz Luba wrote:
> The Energy Model (EM) allows to provide the 'cost' values when the device
> driver provides the .get_cost() optional callback. This removes
> restriction which is in the EM calculation function of the 'cost'
> for each performance state. Now, the driver is in charge of providing
> the right values which are then used by Energy Aware Scheduler.
>
> Signed-off-by: Lukasz Luba <[email protected]>
> ---
> kernel/power/energy_model.c | 39 ++++++++++++++++++++++++++-----------
> 1 file changed, 28 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 6ecee99af842..95a3b33001f6 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -121,7 +121,8 @@ static void em_debug_remove_pd(struct device *dev) {}
> #endif
>
> static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
> - int nr_states, struct em_data_callback *cb)
> + int nr_states, struct em_data_callback *cb,
> + unsigned long flags)
> {
> unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX;
> struct em_perf_state *table;
> @@ -173,10 +174,22 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
> /* Compute the cost of each performance state. */
> fmax = (u64) table[nr_states - 1].frequency;
> for (i = nr_states - 1; i >= 0; i--) {
> - unsigned long power_res = em_scale_power(table[i].power);
> + unsigned long power_res, cost;
> +
> + if (flags & EM_PERF_DOMAIN_ARTIFICIAL) {
> + ret = cb->get_cost(dev, table[i].frequency, &cost);
> + if (ret || !cost || cost > EM_MAX_POWER) {
> + dev_err(dev, "EM: invalid cost %lu %d\n",
> + cost, ret);
> + goto free_ps_table;
> + }
> + } else {
> + power_res = em_scale_power(table[i].power);
> + cost = div64_u64(fmax * power_res, table[i].frequency);
> + }
> +
> + table[i].cost = cost;
>
> - table[i].cost = div64_u64(fmax * power_res,
> - table[i].frequency);
> if (table[i].cost >= prev_cost) {
> table[i].flags = EM_PERF_STATE_INEFFICIENT;
> dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
> @@ -197,7 +210,8 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
> }
>
> static int em_create_pd(struct device *dev, int nr_states,
> - struct em_data_callback *cb, cpumask_t *cpus)
> + struct em_data_callback *cb, cpumask_t *cpus,
> + unsigned long flags)
> {
> struct em_perf_domain *pd;
> struct device *cpu_dev;
> @@ -215,7 +229,7 @@ static int em_create_pd(struct device *dev, int nr_states,
> return -ENOMEM;
> }
>
> - ret = em_create_perf_table(dev, pd, nr_states, cb);
> + ret = em_create_perf_table(dev, pd, nr_states, cb, flags);
> if (ret) {
> kfree(pd);
> return ret;
> @@ -332,6 +346,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
> bool milliwatts)
> {
> unsigned long cap, prev_cap = 0;
> + unsigned long flags = 0;
> int cpu, ret;
>
> if (!dev || !nr_states || !cb)
> @@ -378,14 +393,16 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
> }
> }
>
> - ret = em_create_pd(dev, nr_states, cb, cpus);
> + if (milliwatts)
> + flags |= EM_PERF_DOMAIN_MILLIWATTS;
> + else if (cb->get_cost)
> + flags |= EM_PERF_DOMAIN_ARTIFICIAL;
> +
> + ret = em_create_pd(dev, nr_states, cb, cpus, flags);
> if (ret)
> goto unlock;
>
> - if (milliwatts)
> - dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS;
> - else if (cb->get_cost)
> - dev->em_pd->flags |= EM_PERF_DOMAIN_ARTIFICIAL;
> + dev->em_pd->flags |= flags;
>

nit: given that there is a designated function to create the PD which
now requires information on flags to initilise the domains properly,
might be better to have dev->em_pd->flags set inside that function,
rather than here.

> em_cpufreq_update_efficiencies(dev);
>

Reviewed-by: Ionela Voinescu <[email protected]>

Thanks,
Ionela.

> --
> 2.17.1
>

2022-04-05 02:25:41

by Lukasz Luba

[permalink] [raw]
Subject: Re: [RESEND][PATCH 5/8] PM: EM: Remove old debugfs files and print all 'flags'

Hi Ionela,

Thank you for reviewing these patches.

On 4/4/22 17:02, Ionela Voinescu wrote:
> Hi Lukasz,
>
> IMO the debugfs files were fine as they were:
> - They offered information on units and inefficiencies without having
> to dig into the code to see which bit is for which flag.
> - I believe the artificial EM power values fit under bogoWatts as unit,
> so that part would still be correct.
>
> On the other hand, your new file offers more information: we'd be able
> to see in debugfs whether we're dealing with an artificial EM, despite
> needing a bit more looking over the code to understand the output.

I have consolidated them, so we would support more features in the flag
automatically when there will be a need.

In Android kernel we don't have unfortunately the debugfs compiled-in,
so this information is still only for kernel testing with some Linux
distro.

I have been thinking to switch into sysfs interface, so we could
have it in Android as well. This patch change adding a generic 'flags'
which would be more 'stable' (which is a requirement for sysfs contract)
is also a closer step into that direction. But that is for longer
discussion and not for this $subject.

>
> I don't have a strong opinion and the code looks fine, so:
>
> Reviewed-by: Ionela Voinescu <[email protected]>
>

2022-04-05 03:01:24

by Ionela Voinescu

[permalink] [raw]
Subject: Re: [RESEND][PATCH 6/8] Documentation: EM: Add artificial EM registration description

Thanks for documenting this as well.

On Monday 21 Mar 2022 at 09:57:27 (+0000), Lukasz Luba wrote:
> Add description about new artificial EM registration and use cases.
> Update also the documentation with the new .get_cost() callback
> description and usage.
>
> Signed-off-by: Lukasz Luba <[email protected]>
> ---
> Documentation/power/energy-model.rst | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
> index fd29ed2506c0..feb257b7f350 100644
> --- a/Documentation/power/energy-model.rst
> +++ b/Documentation/power/energy-model.rst
> @@ -123,6 +123,26 @@ allows a platform to register EM power values which are reflecting total power
> (static + dynamic). These power values might be coming directly from
> experiments and measurements.
>
> +Registration of 'artificial' EM
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There is an option to provide a custom callback for drivers missing detailed
> +knowledge about power value for each performance state. The callback
> +.get_cost() is optional and provides the 'cost' values used by the EAS.
> +This is useful for platforms that only provide information on relative
> +efficiency between CPU types, where one could use the information to
> +create an abstract power model. But even an abstract power model can
> +sometimes be hard to fit in, given the input power value size restrictions.
> +The .get_cost() allows to provide the 'cost' values which reflect the
> +efficiency of the CPUs. This would allow to provide EAS information which
> +has different relation than what would be forced by the EM internal
> +formulas calculating 'cost' values. To register an EM for such platform, the
> +driver must set the flag 'milliwatts' to 0, provide .get_power() callback
> +and provide .get_cost() callback. The EM framework would handle such platform

Maybe it worth adding a note here that the power values provided with
.get_power() are only for information purposes and that they don't have
any functional use.

> +properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such
> +platform. Special care should be taken by other frameworks which are using EM
> +to test and treat this flag properly.
> +
> Registration of 'simple' EM
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>

Reviewed-by: Ionela Voinescu <[email protected]>

> --
> 2.17.1
>

2022-04-05 03:19:46

by Ionela Voinescu

[permalink] [raw]
Subject: Re: [RESEND][PATCH 4/8] PM: EM: Change the order of arguments in the .active_power() callback

Thanks!

On Monday 21 Mar 2022 at 09:57:25 (+0000), Lukasz Luba wrote:
> The .active_power() callback passes the device pointer when it's called.
> Aligned with a convetion present in other subsystems and pass the 'dev'
> as a first argument. It looks more cleaner.
>
> Adjust all affected drivers which implement that API callback.
>
> Suggested-by: Ionela Voinescu <[email protected]>
> Signed-off-by: Lukasz Luba <[email protected]>
> ---
> Documentation/power/energy-model.rst | 4 ++--
> drivers/cpufreq/mediatek-cpufreq-hw.c | 4 ++--
> drivers/cpufreq/scmi-cpufreq.c | 4 ++--
> drivers/opp/of.c | 6 +++---
> include/linux/energy_model.h | 6 +++---
> kernel/power/energy_model.c | 2 +-
> 6 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
> index 49549aab41b4..fd29ed2506c0 100644
> --- a/Documentation/power/energy-model.rst
> +++ b/Documentation/power/energy-model.rst
> @@ -181,8 +181,8 @@ EM framework::
>
> -> drivers/cpufreq/foo_cpufreq.c
>
> - 01 static int est_power(unsigned long *mW, unsigned long *KHz,
> - 02 struct device *dev)
> + 01 static int est_power(struct device *dev, unsigned long *mW,
> + 02 unsigned long *KHz)
> 03 {
> 04 long freq, power;
> 05
> diff --git a/drivers/cpufreq/mediatek-cpufreq-hw.c b/drivers/cpufreq/mediatek-cpufreq-hw.c
> index 0a94c56ddad2..813cccbfe934 100644
> --- a/drivers/cpufreq/mediatek-cpufreq-hw.c
> +++ b/drivers/cpufreq/mediatek-cpufreq-hw.c
> @@ -51,8 +51,8 @@ static const u16 cpufreq_mtk_offsets[REG_ARRAY_SIZE] = {
> };
>
> static int __maybe_unused
> -mtk_cpufreq_get_cpu_power(unsigned long *mW,
> - unsigned long *KHz, struct device *cpu_dev)
> +mtk_cpufreq_get_cpu_power(struct device *cpu_dev, unsigned long *mW,
> + unsigned long *KHz)
> {
> struct mtk_cpufreq_data *data;
> struct cpufreq_policy *policy;
> diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
> index 919fa6e3f462..6d2a4cf46db7 100644
> --- a/drivers/cpufreq/scmi-cpufreq.c
> +++ b/drivers/cpufreq/scmi-cpufreq.c
> @@ -96,8 +96,8 @@ scmi_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask)
> }
>
> static int __maybe_unused
> -scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
> - struct device *cpu_dev)
> +scmi_get_cpu_power(struct device *cpu_dev, unsigned long *power,
> + unsigned long *KHz)
> {
> unsigned long Hz;
> int ret, domain;
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index 440ab5a03df9..485ea980bde7 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -1448,7 +1448,7 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
> * Returns 0 on success or a proper -EINVAL value in case of error.
> */
> static int __maybe_unused
> -_get_dt_power(unsigned long *mW, unsigned long *kHz, struct device *dev)
> +_get_dt_power(struct device *dev, unsigned long *mW, unsigned long *kHz)
> {
> struct dev_pm_opp *opp;
> unsigned long opp_freq, opp_power;
> @@ -1482,8 +1482,8 @@ _get_dt_power(unsigned long *mW, unsigned long *kHz, struct device *dev)
> * Returns -EINVAL if the power calculation failed because of missing
> * parameters, 0 otherwise.
> */
> -static int __maybe_unused _get_power(unsigned long *mW, unsigned long *kHz,
> - struct device *dev)
> +static int __maybe_unused _get_power(struct device *dev, unsigned long *mW,
> + unsigned long *kHz)
> {
> struct dev_pm_opp *opp;
> struct device_node *np;
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index 92e82a322859..8419bffb4398 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -101,11 +101,11 @@ struct em_data_callback {
> /**
> * active_power() - Provide power at the next performance state of
> * a device
> + * @dev : Device for which we do this operation (can be a CPU)
> * @power : Active power at the performance state
> * (modified)
> * @freq : Frequency at the performance state in kHz
> * (modified)
> - * @dev : Device for which we do this operation (can be a CPU)
> *
> * active_power() must find the lowest performance state of 'dev' above
> * 'freq' and update 'power' and 'freq' to the matching active power
> @@ -117,8 +117,8 @@ struct em_data_callback {
> *
> * Return 0 on success.
> */
> - int (*active_power)(unsigned long *power, unsigned long *freq,
> - struct device *dev);
> + int (*active_power)(struct device *dev, unsigned long *power,
> + unsigned long *freq);
>
> /**
> * get_cost() - Provide the cost at the given performance state of
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 95a3b33001f6..babefc72085d 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -140,7 +140,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
> * lowest performance state of 'dev' above 'freq' and updates
> * 'power' and 'freq' accordingly.
> */
> - ret = cb->active_power(&power, &freq, dev);
> + ret = cb->active_power(dev, &power, &freq);
> if (ret) {
> dev_err(dev, "EM: invalid perf. state: %d\n",
> ret);

Reviewed-by: Ionela Voinescu <[email protected]>

> --
> 2.17.1
>

2022-04-12 23:52:53

by Lukasz Luba

[permalink] [raw]
Subject: Re: [RESEND][PATCH 0/8] Introduce support for artificial Energy Model

Hi Rafael,

gentle ping. If you need some help with this maintenance,
we can help.

Regards,
Lukasz

On 4/4/22 13:35, Lukasz Luba wrote:
> Hi Rafael,
>
>
> The patch set has been on LKML for quite a while and got one ACK,
> for the code touching something outside the EM (thermal cooling).
>
> AFAICS there is no interest and objections from others for this code.
>
> Therefore, I have a question:
> What would be be process to have merge this code?
> (We had internally a few reviews of this code)
>
> On 3/21/22 09:57, Lukasz Luba wrote:
>> Hi all,
>>
>> This patch set adds new callback and support for artificial Energy
>> Model (EM).
>> The new EMs have artificially generated performance states.
>> Such EMs can be created from lean information sources, such
>> as the relative energy efficiency between CPUs. The ACPI based
>> platforms provide this information
>> (ACPI 6.4, s5.2.12.14 'GIC CPU Interface (GICC) Structure'
>> 'Processor Power efficiency Class' field).
>>
>> Artificial EMs might require to directly provide the 'cost' of
>> the generated performance state. This patch set adds a new callback
>> .get_cost() for this. The EM framework does not force any model
>> or formula, it's up to the platform code.
>>
>> Artificial EMs aim to leverage the Energy Aware Scheduler
>> (EAS). Other frameworks relying on performance states
>> information (i.e. IPA/DTPM) must be informed of the
>> EM type and might be prevented from using it. This patch
>> sets also does this by introducing a new flag:
>> EM_PERF_DOMAIN_ARTIFICIAL.
>>
>> The patch set is based on current linux-next, where some
>> changes to OPP & EM are queuing.
>>
>> The patch set also contains (patch 7/8 and patch 8/8) logic which
>> prevents
>> two EM's client frameworks from using this new EM type. Some other
>> approach,
>> using 'milli-watts', has been proposed and discussed, but refused [1].
>> This new flag is more precised and should not leave space for
>> wrong interpretation.
>>
>> Shortly after this patch set you will see a patch set implementing the
>> platform code and registering this new EM.
>>
>
>
> No one from Arm is an official maintainer of the EM code.
>
> Regards,
> Lukasz

2022-04-13 19:13:05

by Lukasz Luba

[permalink] [raw]
Subject: Re: [RESEND][PATCH 0/8] Introduce support for artificial Energy Model



On 4/13/22 15:29, Rafael J. Wysocki wrote:
> Hi,
>
> On Tue, Apr 12, 2022 at 8:53 AM Lukasz Luba <[email protected]> wrote:
>>
>> Hi Rafael,
>>
>> gentle ping. If you need some help with this maintenance,
>> we can help.
>
> Sorry for the delay.
>
> Given the lack of objections or concerns, I've applied the whole
> series as 5.19 material.

Thank you Rafael!

>
> Thanks!

2022-04-14 12:08:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RESEND][PATCH 0/8] Introduce support for artificial Energy Model

Hi,

On Tue, Apr 12, 2022 at 8:53 AM Lukasz Luba <[email protected]> wrote:
>
> Hi Rafael,
>
> gentle ping. If you need some help with this maintenance,
> we can help.

Sorry for the delay.

Given the lack of objections or concerns, I've applied the whole
series as 5.19 material.

Thanks!