LinuxLists.cc - [0/8] Introduce support for artificial Energy Model

2022-03-17 00:29:25

Subject: [0/8] Introduce support for artificial Energy Model

Hi all,

This patch set adds new callback and support for artificial Energy Model (EM).
The new EMs have artificially generated performance states.
Such EMs can be created from lean information sources, such
as the relative energy efficiency between CPUs. The ACPI based
platforms provide this information
(ACPI 6.4, s5.2.12.14 'GIC CPU Interface (GICC) Structure'
'Processor Power efficiency Class' field).

Artificial EMs might require to directly provide the 'cost' of
the generated performance state. This patch set adds a new callback
.get_cost() for this. The EM framework does not force any model
or formula, it's up to the platform code.

Artificial EMs aim to leverage the Energy Aware Scheduler
(EAS). Other frameworks relying on performance states
information (i.e. IPA/DTPM) must be informed of the
EM type and might be prevented from using it. This patch
sets also does this by introducing a new flag:
EM_PERF_DOMAIN_ARTIFICIAL.

The patch set is based on current linux-next, where some
changes to OPP & EM are queuing.

The patch set also contains (patch 7/8 and patch 8/8) logic which prevents
two EM's client frameworks from using this new EM type. Some other approach,
using 'milli-watts', has been proposed and discussed, but refused [1].
This new flag is more precised and should not leave space for
wrong interpretation.

Shortly after this patch set you will see a patch set implementing the
platform code and registering this new EM.

Regards,
Lukasz Luba

[1] https://lore.kernel.org/linux-pm/[email protected]/

Lukasz Luba (7):
PM: EM: Add .get_cost() callback
PM: EM: Use the new .get_cost() callback while registering EM
PM: EM: Change the order of arguments in the .active_power() callback
PM: EM: Remove old debugfs files and print all 'flags'
Documentation: EM: Add artificial EM registration description
thermal: cooling: Check Energy Model type in cpufreq_cooling and
devfreq_cooling
powercap: DTPM: Check for Energy Model type

Pierre Gondois (1):
PM: EM: Add artificial EM flag

Documentation/power/energy-model.rst | 24 +++++++++-
drivers/cpufreq/mediatek-cpufreq-hw.c | 4 +-
drivers/cpufreq/scmi-cpufreq.c | 4 +-
drivers/opp/of.c | 6 +--
drivers/powercap/dtpm_cpu.c | 2 +-
drivers/thermal/cpufreq_cooling.c | 2 +-
drivers/thermal/devfreq_cooling.c | 8 ++--
include/linux/energy_model.h | 35 +++++++++++++--
kernel/power/energy_model.c | 63 +++++++++++++++------------
9 files changed, 101 insertions(+), 47 deletions(-)

--
2.17.1

2022-03-17 04:38:01

by Lukasz Luba

[permalink] [raw]

Subject: [[PATCH 4/8] PM: EM: Change the order of arguments in the .active_power() callback

The .active_power() callback passes the device pointer when it's called.
Aligned with a convetion present in other subsystems and pass the 'dev'
as a first argument. It looks more cleaner.

Adjust all affected drivers which implement that API callback.

Suggested-by: Ionela Voinescu <[email protected]>
Signed-off-by: Lukasz Luba <[email protected]>
---
Documentation/power/energy-model.rst | 4 ++--
drivers/cpufreq/mediatek-cpufreq-hw.c | 4 ++--
drivers/cpufreq/scmi-cpufreq.c | 4 ++--
drivers/opp/of.c | 6 +++---
include/linux/energy_model.h | 6 +++---
kernel/power/energy_model.c | 2 +-
6 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
index 49549aab41b4..fd29ed2506c0 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -181,8 +181,8 @@ EM framework::

-> drivers/cpufreq/foo_cpufreq.c

- 01 static int est_power(unsigned long *mW, unsigned long *KHz,
- 02 struct device *dev)
+ 01 static int est_power(struct device *dev, unsigned long *mW,
+ 02 unsigned long *KHz)
03 {
04 long freq, power;
05
diff --git a/drivers/cpufreq/mediatek-cpufreq-hw.c b/drivers/cpufreq/mediatek-cpufreq-hw.c
index 0a94c56ddad2..813cccbfe934 100644
--- a/drivers/cpufreq/mediatek-cpufreq-hw.c
+++ b/drivers/cpufreq/mediatek-cpufreq-hw.c
@@ -51,8 +51,8 @@ static const u16 cpufreq_mtk_offsets[REG_ARRAY_SIZE] = {
};

static int __maybe_unused
-mtk_cpufreq_get_cpu_power(unsigned long *mW,
- unsigned long *KHz, struct device *cpu_dev)
+mtk_cpufreq_get_cpu_power(struct device *cpu_dev, unsigned long *mW,
+ unsigned long *KHz)
{
struct mtk_cpufreq_data *data;
struct cpufreq_policy *policy;
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index 919fa6e3f462..6d2a4cf46db7 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -96,8 +96,8 @@ scmi_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask)
}

static int __maybe_unused
-scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
- struct device *cpu_dev)
+scmi_get_cpu_power(struct device *cpu_dev, unsigned long *power,
+ unsigned long *KHz)
{
unsigned long Hz;
int ret, domain;
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 440ab5a03df9..485ea980bde7 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -1448,7 +1448,7 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
* Returns 0 on success or a proper -EINVAL value in case of error.
*/
static int __maybe_unused
-_get_dt_power(unsigned long *mW, unsigned long *kHz, struct device *dev)
+_get_dt_power(struct device *dev, unsigned long *mW, unsigned long *kHz)
{
struct dev_pm_opp *opp;
unsigned long opp_freq, opp_power;
@@ -1482,8 +1482,8 @@ _get_dt_power(unsigned long *mW, unsigned long *kHz, struct device *dev)
* Returns -EINVAL if the power calculation failed because of missing
* parameters, 0 otherwise.
*/
-static int __maybe_unused _get_power(unsigned long *mW, unsigned long *kHz,
- struct device *dev)
+static int __maybe_unused _get_power(struct device *dev, unsigned long *mW,
+ unsigned long *kHz)
{
struct dev_pm_opp *opp;
struct device_node *np;
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 92e82a322859..8419bffb4398 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -101,11 +101,11 @@ struct em_data_callback {
/**
* active_power() - Provide power at the next performance state of
* a device
+ * @dev : Device for which we do this operation (can be a CPU)
* @power : Active power at the performance state
* (modified)
* @freq : Frequency at the performance state in kHz
* (modified)
- * @dev : Device for which we do this operation (can be a CPU)
*
* active_power() must find the lowest performance state of 'dev' above
* 'freq' and update 'power' and 'freq' to the matching active power
@@ -117,8 +117,8 @@ struct em_data_callback {
*
* Return 0 on success.
*/
- int (*active_power)(unsigned long *power, unsigned long *freq,
- struct device *dev);
+ int (*active_power)(struct device *dev, unsigned long *power,
+ unsigned long *freq);

/**
* get_cost() - Provide the cost at the given performance state of
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 95a3b33001f6..babefc72085d 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -140,7 +140,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
* lowest performance state of 'dev' above 'freq' and updates
* 'power' and 'freq' accordingly.
*/
- ret = cb->active_power(&power, &freq, dev);
+ ret = cb->active_power(dev, &power, &freq);
if (ret) {
dev_err(dev, "EM: invalid perf. state: %d\n",
ret);
--
2.17.1

2022-03-17 05:03:04

by Lukasz Luba

[permalink] [raw]

Subject: [[PATCH 6/8] Documentation: EM: Add artificial EM registration description

Add description about new artificial EM registration and use cases.
Update also the documentation with the new .get_cost() callback
description and usage.

Signed-off-by: Lukasz Luba <[email protected]>
---
Documentation/power/energy-model.rst | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
index fd29ed2506c0..feb257b7f350 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -123,6 +123,26 @@ allows a platform to register EM power values which are reflecting total power
(static + dynamic). These power values might be coming directly from
experiments and measurements.

+Registration of 'artificial' EM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There is an option to provide a custom callback for drivers missing detailed
+knowledge about power value for each performance state. The callback
+.get_cost() is optional and provides the 'cost' values used by the EAS.
+This is useful for platforms that only provide information on relative
+efficiency between CPU types, where one could use the information to
+create an abstract power model. But even an abstract power model can
+sometimes be hard to fit in, given the input power value size restrictions.
+The .get_cost() allows to provide the 'cost' values which reflect the
+efficiency of the CPUs. This would allow to provide EAS information which
+has different relation than what would be forced by the EM internal
+formulas calculating 'cost' values. To register an EM for such platform, the
+driver must set the flag 'milliwatts' to 0, provide .get_power() callback
+and provide .get_cost() callback. The EM framework would handle such platform
+properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such
+platform. Special care should be taken by other frameworks which are using EM
+to test and treat this flag properly.
+
Registration of 'simple' EM
~~~~~~~~~~~~~~~~~~~~~~~~~~~

--
2.17.1

2022-03-17 05:47:38

by Lukasz Luba

[permalink] [raw]

Subject: [[PATCH 1/8] PM: EM: Add .get_cost() callback

The Energy Model (EM) supports devices which report abstract power scale,
not only real Watts. The primary goal for EM is to enable the Energy Aware
Scheduler (EAS) for a given platform. Some of the platforms might not be
able to deliver proper power values. The only information that they might
have is the relative efficiency between CPU types.

Thus, it makes sense to remove some restrictions in the EM framework and
introduce a mechanism which would support those platforms. What is crucial
for EAS to operate is the 'cost' field in the EM. The 'cost' is calculated
internally in EM framework based on knowledge from 'power' values.
The 'cost' values must be strictly increasing. The existing API with its
'power' value size restrictions cannot guarantee that the 'cost' will meet
this requirement.

Since the platform is missing this detailed information, but has only
efficiency details, introduce a new custom callback in the EM framework.
The new callback would allow to provide the 'cost' values which reflect
efficiency of the CPUs. This would allow to provide EAS information which
has different relation than what would be forced by the EM internal
formulas calculating 'cost' values. Thanks to this new callback it is
possible to create a system view for EAS which has no overlapping
performance states across many Performance Domains.

Signed-off-by: Lukasz Luba <[email protected]>
---
include/linux/energy_model.h | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 9f3c400bc52d..0a3a5663177b 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -114,9 +114,30 @@ struct em_data_callback {
*/
int (*active_power)(unsigned long *power, unsigned long *freq,
struct device *dev);
+
+ /**
+ * get_cost() - Provide the cost at the given performance state of
+ * a device
+ * @dev : Device for which we do this operation (can be a CPU)
+ * @freq : Frequency at the performance state in kHz
+ * @cost : The cost value for the performance state
+ * (modified)
+ *
+ * In case of CPUs, the cost is the one of a single CPU in the domain.
+ * It is expected to fit in the [0, EM_MAX_POWER] range due to internal
+ * usage in EAS calculation.
+ *
+ * Return 0 on success, or appropriate error value in case of failure.
+ */
+ int (*get_cost)(struct device *dev, unsigned long freq,
+ unsigned long *cost);
};
-#define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb }
#define EM_SET_ACTIVE_POWER_CB(em_cb, cb) ((em_cb).active_power = cb)
+#define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) \
+ { .active_power = _active_power_cb, \
+ .get_cost = _cost_cb }
+#define EM_DATA_CB(_active_power_cb) \
+ EM_ADV_DATA_CB(_active_power_cb, NULL)

struct em_perf_domain *em_cpu_get(int cpu);
struct em_perf_domain *em_pd_get(struct device *dev);
@@ -264,6 +285,7 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)

#else
struct em_data_callback {};
+#define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) { }
#define EM_DATA_CB(_active_power_cb) { }
#define EM_SET_ACTIVE_POWER_CB(em_cb, cb) do { } while (0)

--
2.17.1

2022-03-17 06:05:03

by Lukasz Luba

[permalink] [raw]

Subject: [[PATCH 2/8] PM: EM: Add artificial EM flag

From: Pierre Gondois <[email protected]>

The Energy Model (EM) can be used on platforms which are missing real
power information. Those platforms would implement .get_cost() which
populates needed values for the Energy Aware Scheduler (EAS). The EAS
doesn't use 'power' fields from EM, but other frameworks might use them.
Thus, to avoid miss-usage of this specific type of EM, introduce a new
flags which can be checked by other frameworks.

Signed-off-by: Pierre Gondois <[email protected]>
Signed-off-by: Lukasz Luba <[email protected]>
---
include/linux/energy_model.h | 5 +++++
kernel/power/energy_model.c | 2 ++
2 files changed, 7 insertions(+)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 0a3a5663177b..92e82a322859 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -67,11 +67,16 @@ struct em_perf_domain {
*
* EM_PERF_DOMAIN_SKIP_INEFFICIENCIES: Skip inefficient states when estimating
* energy consumption.
+ *
+ * EM_PERF_DOMAIN_ARTIFICIAL: The power values are artificial and might be
+ * created by platform missing real power information
*/
#define EM_PERF_DOMAIN_MILLIWATTS BIT(0)
#define EM_PERF_DOMAIN_SKIP_INEFFICIENCIES BIT(1)
+#define EM_PERF_DOMAIN_ARTIFICIAL BIT(2)

#define em_span_cpus(em) (to_cpumask((em)->cpus))
+#define em_is_artificial(em) ((em)->flags & EM_PERF_DOMAIN_ARTIFICIAL)

#ifdef CONFIG_ENERGY_MODEL
#define EM_MAX_POWER 0xFFFF
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 0153b0ca7b23..6ecee99af842 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -384,6 +384,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,

if (milliwatts)
dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS;
+ else if (cb->get_cost)
+ dev->em_pd->flags |= EM_PERF_DOMAIN_ARTIFICIAL;

em_cpufreq_update_efficiencies(dev);

--
2.17.1

2022-03-17 06:08:06

by Lukasz Luba

[permalink] [raw]

Subject: [[PATCH 8/8] powercap: DTPM: Check for Energy Model type

The Energy Model power values might be artificial. In such case
it's safe to bail out during the registration, since the PowerCap
framework supports only micro-Watts.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/powercap/dtpm_cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
index bca2f912d349..f5eced0842b3 100644
--- a/drivers/powercap/dtpm_cpu.c
+++ b/drivers/powercap/dtpm_cpu.c
@@ -211,7 +211,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent)
return 0;

pd = em_cpu_get(cpu);
- if (!pd)
+ if (!pd || em_is_artificial(pd))
return -EINVAL;

dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL);
--
2.17.1

2022-03-17 06:17:03

by Lukasz Luba

[permalink] [raw]

Subject: [[PATCH 5/8] PM: EM: Remove old debugfs files and print all 'flags'

The Energy Model gets more bits used in 'flags'. Avoid adding another
debugfs file just to print what is the status of a new flag. Simply
remove old debugfs files and add one generic which prints all flags
as a hex value.

Signed-off-by: Lukasz Luba <[email protected]>
---
kernel/power/energy_model.c | 24 +++++-------------------
1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index babefc72085d..092513575e4e 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -54,28 +54,15 @@ static int em_debug_cpus_show(struct seq_file *s, void *unused)
}
DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);

-static int em_debug_units_show(struct seq_file *s, void *unused)
+static int em_debug_flags_show(struct seq_file *s, void *unused)
{
struct em_perf_domain *pd = s->private;
- char *units = (pd->flags & EM_PERF_DOMAIN_MILLIWATTS) ?
- "milliWatts" : "bogoWatts";

- seq_printf(s, "%s\n", units);
+ seq_printf(s, "%#lx\n", pd->flags);

return 0;
}
-DEFINE_SHOW_ATTRIBUTE(em_debug_units);
-
-static int em_debug_skip_inefficiencies_show(struct seq_file *s, void *unused)
-{
- struct em_perf_domain *pd = s->private;
- int enabled = (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES) ? 1 : 0;
-
- seq_printf(s, "%d\n", enabled);
-
- return 0;
-}
-DEFINE_SHOW_ATTRIBUTE(em_debug_skip_inefficiencies);
+DEFINE_SHOW_ATTRIBUTE(em_debug_flags);

static void em_debug_create_pd(struct device *dev)
{
@@ -89,9 +76,8 @@ static void em_debug_create_pd(struct device *dev)
debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus,
&em_debug_cpus_fops);

- debugfs_create_file("units", 0444, d, dev->em_pd, &em_debug_units_fops);
- debugfs_create_file("skip-inefficiencies", 0444, d, dev->em_pd,
- &em_debug_skip_inefficiencies_fops);
+ debugfs_create_file("flags", 0444, d, dev->em_pd,
+ &em_debug_flags_fops);

/* Create a sub-directory for each performance state */
for (i = 0; i < dev->em_pd->nr_perf_states; i++)
--
2.17.1

2022-03-17 06:35:14

by Lukasz Luba

[permalink] [raw]

Subject: [[PATCH 7/8] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling

The Energy Model can now be artificial, which means the power values
are mathematically generated to leverage EAS while not expected to be on
an uniform scale with other devices providing power information. If this
EM type is in use, the thermal governor IPA should not be allowed to
operate, since the relation between cooling devices is not properly
defined. Thus, it might be possible that big GPU has lower power values
than a Little CPU. To mitigate a misbehaviour of the thermal control
algorithm, simply do not register the cooling device as IPA's power
actor.

Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/thermal/cpufreq_cooling.c | 2 +-
drivers/thermal/devfreq_cooling.c | 8 +++++---
2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index 0bfb8eebd126..b8151d95a806 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -328,7 +328,7 @@ static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev,
struct cpufreq_policy *policy;
unsigned int nr_levels;

- if (!em)
+ if (!em || em_is_artificial(em))
return false;

policy = cpufreq_cdev->policy;
diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index 4310cb342a9f..b04dcbbf721a 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -358,6 +358,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
struct thermal_cooling_device *cdev;
struct device *dev = df->dev.parent;
struct devfreq_cooling_device *dfc;
+ struct em_perf_domain *em;
char *name;
int err, num_opps;

@@ -367,8 +368,9 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,

dfc->devfreq = df;

- dfc->em_pd = em_pd_get(dev);
- if (dfc->em_pd) {
+ em = em_pd_get(dev);
+ if (em && !em_is_artificial(em)) {
+ dfc->em_pd = em;
devfreq_cooling_ops.get_requested_power =
devfreq_cooling_get_requested_power;
devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
@@ -379,7 +381,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
num_opps = em_pd_nr_perf_states(dfc->em_pd);
} else {
/* Backward compatibility for drivers which do not use IPA */
- dev_dbg(dev, "missing EM for cooling device\n");
+ dev_dbg(dev, "missing proper EM for cooling device\n");

num_opps = dev_pm_opp_get_opp_count(dev);

--
2.17.1

2022-03-17 06:40:33

by Lukasz Luba

[permalink] [raw]

Subject: [[PATCH 3/8] PM: EM: Use the new .get_cost() callback while registering EM

The Energy Model (EM) allows to provide the 'cost' values when the device
driver provides the .get_cost() optional callback. This removes
restriction which is in the EM calculation function of the 'cost'
for each performance state. Now, the driver is in charge of providing
the right values which are then used by Energy Aware Scheduler.

Signed-off-by: Lukasz Luba <[email protected]>
---
kernel/power/energy_model.c | 39 ++++++++++++++++++++++++++-----------
1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 6ecee99af842..95a3b33001f6 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -121,7 +121,8 @@ static void em_debug_remove_pd(struct device *dev) {}
#endif

static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
- int nr_states, struct em_data_callback *cb)
+ int nr_states, struct em_data_callback *cb,
+ unsigned long flags)
{
unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX;
struct em_perf_state *table;
@@ -173,10 +174,22 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
/* Compute the cost of each performance state. */
fmax = (u64) table[nr_states - 1].frequency;
for (i = nr_states - 1; i >= 0; i--) {
- unsigned long power_res = em_scale_power(table[i].power);
+ unsigned long power_res, cost;
+
+ if (flags & EM_PERF_DOMAIN_ARTIFICIAL) {
+ ret = cb->get_cost(dev, table[i].frequency, &cost);
+ if (ret || !cost || cost > EM_MAX_POWER) {
+ dev_err(dev, "EM: invalid cost %lu %d\n",
+ cost, ret);
+ goto free_ps_table;
+ }
+ } else {
+ power_res = em_scale_power(table[i].power);
+ cost = div64_u64(fmax * power_res, table[i].frequency);
+ }
+
+ table[i].cost = cost;

- table[i].cost = div64_u64(fmax * power_res,
- table[i].frequency);
if (table[i].cost >= prev_cost) {
table[i].flags = EM_PERF_STATE_INEFFICIENT;
dev_dbg(dev, "EM: OPP:%lu is inefficient\n",
@@ -197,7 +210,8 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
}

static int em_create_pd(struct device *dev, int nr_states,
- struct em_data_callback *cb, cpumask_t *cpus)
+ struct em_data_callback *cb, cpumask_t *cpus,
+ unsigned long flags)
{
struct em_perf_domain *pd;
struct device *cpu_dev;
@@ -215,7 +229,7 @@ static int em_create_pd(struct device *dev, int nr_states,
return -ENOMEM;
}

- ret = em_create_perf_table(dev, pd, nr_states, cb);
+ ret = em_create_perf_table(dev, pd, nr_states, cb, flags);
if (ret) {
kfree(pd);
return ret;
@@ -332,6 +346,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
bool milliwatts)
{
unsigned long cap, prev_cap = 0;
+ unsigned long flags = 0;
int cpu, ret;

if (!dev || !nr_states || !cb)
@@ -378,14 +393,16 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
}
}

- ret = em_create_pd(dev, nr_states, cb, cpus);
+ if (milliwatts)
+ flags |= EM_PERF_DOMAIN_MILLIWATTS;
+ else if (cb->get_cost)
+ flags |= EM_PERF_DOMAIN_ARTIFICIAL;
+
+ ret = em_create_pd(dev, nr_states, cb, cpus, flags);
if (ret)
goto unlock;

- if (milliwatts)
- dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS;
- else if (cb->get_cost)
- dev->em_pd->flags |= EM_PERF_DOMAIN_ARTIFICIAL;
+ dev->em_pd->flags |= flags;

em_cpufreq_update_efficiencies(dev);

--
2.17.1

2022-03-29 22:06:30

by Cristian Marussi

[permalink] [raw]

Subject: Re: [0/8] Introduce support for artificial Energy Model

On Wed, Mar 16, 2022 at 11:52:03PM +0000, Lukasz Luba wrote:
> Hi all,
>

Hi Lukasz,

> This patch set adds new callback and support for artificial Energy Model (EM).
> The new EMs have artificially generated performance states.
> Such EMs can be created from lean information sources, such
> as the relative energy efficiency between CPUs. The ACPI based
> platforms provide this information
> (ACPI 6.4, s5.2.12.14 'GIC CPU Interface (GICC) Structure'
> 'Processor Power efficiency Class' field).
>
> Artificial EMs might require to directly provide the 'cost' of
> the generated performance state. This patch set adds a new callback
> .get_cost() for this. The EM framework does not force any model
> or formula, it's up to the platform code.
>
> Artificial EMs aim to leverage the Energy Aware Scheduler
> (EAS). Other frameworks relying on performance states
> information (i.e. IPA/DTPM) must be informed of the
> EM type and might be prevented from using it. This patch
> sets also does this by introducing a new flag:
> EM_PERF_DOMAIN_ARTIFICIAL.
>
> The patch set is based on current linux-next, where some
> changes to OPP & EM are queuing.
>
> The patch set also contains (patch 7/8 and patch 8/8) logic which prevents
> two EM's client frameworks from using this new EM type. Some other approach,
> using 'milli-watts', has been proposed and discussed, but refused [1].
> This new flag is more precised and should not leave space for
> wrong interpretation.
>
> Shortly after this patch set you will see a patch set implementing the
> platform code and registering this new EM.
>

Just to let you know that in the few days I'm going to post the first
chunk of some SCMIv3.1 additions that includes also (as you probably
know) the SCMI Perf protocol support for reporting perf_domain costs in
micro-watts and not only in milli-watts.

Given that it does not seem that as of now the em_ API used by the SCMI
cpufreq driver can make use of this new scale (and being not at all
familiar with EM/EAS for sure :P), the SCMIv3.1 'Perf micro-watts' patch
which I will post (I'll CC you) does NOT expose any new interface but only
takes care to store the new micro-watts capability internally in a flag
(if advertised by an SCMIv3.1 backend server), so that, basically, you'll
keep seeing from the SCMI cpufreq driver that the scale is milli-watt
(when milli-watts are used of course) or non-milli-watt (for abstract and
micro-watts scales).

This is intended to be of course a first step, laying out just the bare
minimum commmon internal SCMI support, until we figure out how to properly
expose this from the SCMI Perf in order to make it usable for EM.
(if neeeded at all).

Thanks,
Cristian

2022-03-31 02:34:53

by Lukasz Luba

[permalink] [raw]

Subject: Re: [0/8] Introduce support for artificial Energy Model

Hi Cristian,

On 3/29/22 14:29, Cristian Marussi wrote:
> On Wed, Mar 16, 2022 at 11:52:03PM +0000, Lukasz Luba wrote:
>> Hi all,
>>
>
> Hi Lukasz,
>
>> This patch set adds new callback and support for artificial Energy Model (EM).
>> The new EMs have artificially generated performance states.
>> Such EMs can be created from lean information sources, such
>> as the relative energy efficiency between CPUs. The ACPI based
>> platforms provide this information
>> (ACPI 6.4, s5.2.12.14 'GIC CPU Interface (GICC) Structure'
>> 'Processor Power efficiency Class' field).
>>
>> Artificial EMs might require to directly provide the 'cost' of
>> the generated performance state. This patch set adds a new callback
>> .get_cost() for this. The EM framework does not force any model
>> or formula, it's up to the platform code.
>>
>> Artificial EMs aim to leverage the Energy Aware Scheduler
>> (EAS). Other frameworks relying on performance states
>> information (i.e. IPA/DTPM) must be informed of the
>> EM type and might be prevented from using it. This patch
>> sets also does this by introducing a new flag:
>> EM_PERF_DOMAIN_ARTIFICIAL.
>>
>> The patch set is based on current linux-next, where some
>> changes to OPP & EM are queuing.
>>
>> The patch set also contains (patch 7/8 and patch 8/8) logic which prevents
>> two EM's client frameworks from using this new EM type. Some other approach,
>> using 'milli-watts', has been proposed and discussed, but refused [1].
>> This new flag is more precised and should not leave space for
>> wrong interpretation.
>>
>> Shortly after this patch set you will see a patch set implementing the
>> platform code and registering this new EM.
>>
>
> Just to let you know that in the few days I'm going to post the first
> chunk of some SCMIv3.1 additions that includes also (as you probably
> know) the SCMI Perf protocol support for reporting perf_domain costs in
> micro-watts and not only in milli-watts.
>
> Given that it does not seem that as of now the em_ API used by the SCMI
> cpufreq driver can make use of this new scale (and being not at all
> familiar with EM/EAS for sure :P), the SCMIv3.1 'Perf micro-watts' patch
> which I will post (I'll CC you) does NOT expose any new interface but only
> takes care to store the new micro-watts capability internally in a flag
> (if advertised by an SCMIv3.1 backend server), so that, basically, you'll
> keep seeing from the SCMI cpufreq driver that the scale is milli-watt
> (when milli-watts are used of course) or non-milli-watt (for abstract and
> micro-watts scales).

Sounds good!

>
> This is intended to be of course a first step, laying out just the bare
> minimum commmon internal SCMI support, until we figure out how to properly
> expose this from the SCMI Perf in order to make it usable for EM.
> (if neeeded at all).
>

I had such a patch for the EM, to keep the power in micro-Watts.
We can glue these two layers (high level EM and low layer SCMI
perf). Let's sort it out.

Regards,
Lukasz