Hi all,
The Energy Model supports power values expressed in an abstract scale.
This has an impact on Intelligent Power Allocation (IPA) and should be
documented properly. Kernel sub-systems like EAS, IPA and DTPM
(new comming PowerCap framework) would use the new flag to capture
potential miss-configuration where the devices have registered different
power scales, thus cannot operate together.
There was a discussion below v2 of this patch series, which might help
you to get context of these changes [2].
The agreed approach is to have the DT as a source of power values expressed
always in milli-Watts and the only way to submit with abstract scale values
is via the em_dev_register_perf_domain() API.
Changes:
v4:
- change bool to int type for 'miliwatts' in struct em_perf_domain
(suggested by Quentin)
- removed one sentence from patch 2/4 in IPA doc power_allocator.rst
(suggested by Quentin)
- added reviewed-by from Quentin to 1/4, 3/4, 4/4 patches
v3 [3]:
- added boolean flag to struct em_perf_domain and registration function
indicating if EM holds real power values in milli-Watts (suggested by
Daniel and aggreed with Quentin)
- updated documentation regarding this new flag
- dropped DT binding change for 'sustainable-power'
- added more maintainers on CC (due to patch 1/3 touching different things)
v2 [2]:
- updated sustainable power section in IPA documentation
- updated DT binding for the 'sustainable-power'
v1 [1]:
- simple documenation update with new 'abstract scale' in EAS, EM, IPA
Regards,
Lukasz Luba
[1] https://lore.kernel.org/linux-doc/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/
[3] https://lore.kernel.org/lkml/[email protected]/
Lukasz Luba (4):
PM / EM: Add a flag indicating units of power values in Energy Model
docs: Clarify abstract scale usage for power values in Energy Model
PM / EM: update the comments related to power scale
docs: power: Update Energy Model with new flag indicating power scale
.../driver-api/thermal/power_allocator.rst | 12 +++++++-
Documentation/power/energy-model.rst | 30 +++++++++++++++----
Documentation/scheduler/sched-energy.rst | 5 ++++
drivers/cpufreq/scmi-cpufreq.c | 3 +-
drivers/opp/of.c | 2 +-
include/linux/energy_model.h | 20 ++++++++-----
kernel/power/energy_model.c | 26 ++++++++++++++--
7 files changed, 80 insertions(+), 18 deletions(-)
--
2.17.1
There are different platforms and devices which might use different scale
for the power values. Kernel sub-systems might need to check if all
Energy Model (EM) devices are using the same scale. Address that issue and
store the information inside EM for each device. Thanks to that they can
be easily compared and proper action triggered.
Suggested-by: Daniel Lezcano <[email protected]>
Reviewed-by: Quentin Perret <[email protected]>
Signed-off-by: Lukasz Luba <[email protected]>
---
drivers/cpufreq/scmi-cpufreq.c | 3 ++-
drivers/opp/of.c | 2 +-
include/linux/energy_model.h | 9 +++++++--
kernel/power/energy_model.c | 24 +++++++++++++++++++++++-
4 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index e855e8612a67..3714a4cd07fa 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -188,7 +188,8 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
policy->fast_switch_possible =
handle->perf_ops->fast_switch_possible(handle, cpu_dev);
- em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
+ em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
+ false);
return 0;
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 9faeb83e4b32..16f39e2127a5 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -1335,7 +1335,7 @@ int dev_pm_opp_of_register_em(struct device *dev, struct cpumask *cpus)
goto failed;
}
- ret = em_dev_register_perf_domain(dev, nr_opp, &em_cb, cpus);
+ ret = em_dev_register_perf_domain(dev, nr_opp, &em_cb, cpus, true);
if (ret)
goto failed;
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index b67a51c574b9..685d9e919137 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -29,6 +29,8 @@ struct em_perf_state {
* em_perf_domain - Performance domain
* @table: List of performance states, in ascending order
* @nr_perf_states: Number of performance states
+ * @milliwatts: Flag indicating the power values are in milli-Watts
+ * or some other scale.
* @cpus: Cpumask covering the CPUs of the domain. It's here
* for performance reasons to avoid potential cache
* misses during energy calculations in the scheduler
@@ -43,6 +45,7 @@ struct em_perf_state {
struct em_perf_domain {
struct em_perf_state *table;
int nr_perf_states;
+ int milliwatts;
unsigned long cpus[];
};
@@ -79,7 +82,8 @@ struct em_data_callback {
struct em_perf_domain *em_cpu_get(int cpu);
struct em_perf_domain *em_pd_get(struct device *dev);
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
- struct em_data_callback *cb, cpumask_t *span);
+ struct em_data_callback *cb, cpumask_t *spani,
+ bool milliwatts);
void em_dev_unregister_perf_domain(struct device *dev);
/**
@@ -186,7 +190,8 @@ struct em_data_callback {};
static inline
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
- struct em_data_callback *cb, cpumask_t *span)
+ struct em_data_callback *cb, cpumask_t *span,
+ bool milliwatts)
{
return -EINVAL;
}
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index c1ff7fa030ab..efe2a595988e 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -52,6 +52,17 @@ static int em_debug_cpus_show(struct seq_file *s, void *unused)
}
DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);
+static int em_debug_units_show(struct seq_file *s, void *unused)
+{
+ struct em_perf_domain *pd = s->private;
+ char *units = pd->milliwatts ? "milliWatts" : "bogoWatts";
+
+ seq_printf(s, "%s\n", units);
+
+ return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(em_debug_units);
+
static void em_debug_create_pd(struct device *dev)
{
struct dentry *d;
@@ -64,6 +75,8 @@ static void em_debug_create_pd(struct device *dev)
debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus,
&em_debug_cpus_fops);
+ debugfs_create_file("units", 0444, d, dev->em_pd, &em_debug_units_fops);
+
/* Create a sub-directory for each performance state */
for (i = 0; i < dev->em_pd->nr_perf_states; i++)
em_debug_create_ps(&dev->em_pd->table[i], d);
@@ -250,17 +263,24 @@ EXPORT_SYMBOL_GPL(em_cpu_get);
* @cpus : Pointer to cpumask_t, which in case of a CPU device is
* obligatory. It can be taken from i.e. 'policy->cpus'. For other
* type of devices this should be set to NULL.
+ * @milliwatts : Flag indicating that the power values are in milliWatts or
+ * in some other scale. It must be set properly.
*
* Create Energy Model tables for a performance domain using the callbacks
* defined in cb.
*
+ * The @milliwatts is important to set with correct value. Some kernel
+ * sub-systems might rely on this flag and check if all devices in the EM are
+ * using the same scale.
+ *
* If multiple clients register the same performance domain, all but the first
* registration will be ignored.
*
* Return 0 on success
*/
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
- struct em_data_callback *cb, cpumask_t *cpus)
+ struct em_data_callback *cb, cpumask_t *cpus,
+ bool milliwatts)
{
unsigned long cap, prev_cap = 0;
int cpu, ret;
@@ -313,6 +333,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
if (ret)
goto unlock;
+ dev->em_pd->milliwatts = milliwatts;
+
em_debug_create_pd(dev);
dev_info(dev, "EM: created perf domain\n");
--
2.17.1
The Energy Model (EM) can store power values in milli-Watts or in abstract
scale. This might cause issues in the subsystems which use the EM for
estimating the device power, such as:
- mixing of different scales in a subsystem which uses multiple
(cooling) devices (e.g. thermal Intelligent Power Allocation (IPA))
- assuming that energy [milli-Joules] can be derived from the EM power
values which might not be possible since the power scale doesn't have to
be in milli-Watts
To avoid misconfiguration add the needed documentation to the EM and
related subsystems: EAS and IPA.
Signed-off-by: Lukasz Luba <[email protected]>
---
.../driver-api/thermal/power_allocator.rst | 12 +++++++++++-
Documentation/power/energy-model.rst | 13 +++++++++++++
Documentation/scheduler/sched-energy.rst | 5 +++++
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/Documentation/driver-api/thermal/power_allocator.rst b/Documentation/driver-api/thermal/power_allocator.rst
index 67b6a3297238..aa5f66552d6f 100644
--- a/Documentation/driver-api/thermal/power_allocator.rst
+++ b/Documentation/driver-api/thermal/power_allocator.rst
@@ -71,7 +71,9 @@ to the speed-grade of the silicon. `sustainable_power` is therefore
simply an estimate, and may be tuned to affect the aggressiveness of
the thermal ramp. For reference, the sustainable power of a 4" phone
is typically 2000mW, while on a 10" tablet is around 4500mW (may vary
-depending on screen size).
+depending on screen size). It is possible to have the power value
+expressed in an abstract scale. The sustained power should be aligned
+to the scale used by the related cooling devices.
If you are using device tree, do add it as a property of the
thermal-zone. For example::
@@ -269,3 +271,11 @@ won't be very good. Note that this is not particular to this
governor, step-wise will also misbehave if you call its throttle()
faster than the normal thermal framework tick (due to interrupts for
example) as it will overreact.
+
+Energy Model requirements
+=========================
+
+Another important thing is the consistent scale of the power values
+provided by the cooling devices. All of the cooling devices in a single
+thermal zone should have power values reported either in milli-Watts
+or scaled to the same 'abstract scale'.
diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
index a6fb986abe3c..ba7aa581b307 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -20,6 +20,19 @@ possible source of information on its own, the EM framework intervenes as an
abstraction layer which standardizes the format of power cost tables in the
kernel, hence enabling to avoid redundant work.
+The power values might be expressed in milli-Watts or in an 'abstract scale'.
+Multiple subsystems might use the EM and it is up to the system integrator to
+check that the requirements for the power value scale types are met. An example
+can be found in the Energy-Aware Scheduler documentation
+Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or
+powercap power values expressed in an 'abstract scale' might cause issues.
+These subsystems are more interested in estimation of power used in the past,
+thus the real milli-Watts might be needed. An example of these requirements can
+be found in the Intelligent Power Allocation in
+Documentation/driver-api/thermal/power_allocator.rst.
+Important thing to keep in mind is that when the power values are expressed in
+an 'abstract scale' deriving real energy in milli-Joules would not be possible.
+
The figure below depicts an example of drivers (Arm-specific here, but the
approach is applicable to any architecture) providing power costs to the EM
framework, and interested clients reading the data from it::
diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst
index 001e09c95e1d..afe02d394402 100644
--- a/Documentation/scheduler/sched-energy.rst
+++ b/Documentation/scheduler/sched-energy.rst
@@ -350,6 +350,11 @@ independent EM framework in Documentation/power/energy-model.rst.
Please also note that the scheduling domains need to be re-built after the
EM has been registered in order to start EAS.
+EAS uses the EM to make a forecasting decision on energy usage and thus it is
+more focused on the difference when checking possible options for task
+placement. For EAS it doesn't matter whether the EM power values are expressed
+in milli-Watts or in an 'abstract scale'.
+
6.3 - Energy Model complexity
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--
2.17.1
The Energy Model supports power values expressed in milli-Watts or in an
'abstract scale'. Update the related comments is the code to reflect that
state.
Reviewed-by: Quentin Perret <[email protected]>
Signed-off-by: Lukasz Luba <[email protected]>
---
include/linux/energy_model.h | 11 +++++------
kernel/power/energy_model.c | 2 +-
2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 685d9e919137..65a0c6ce0616 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -13,9 +13,8 @@
/**
* em_perf_state - Performance state of a performance domain
* @frequency: The frequency in KHz, for consistency with CPUFreq
- * @power: The power consumed at this level, in milli-watts (by 1 CPU or
- by a registered device). It can be a total power: static and
- dynamic.
+ * @power: The power consumed at this level (by 1 CPU or by a registered
+ * device). It can be a total power: static and dynamic.
* @cost: The cost coefficient associated with this level, used during
* energy calculation. Equal to: power * max_frequency / frequency
*/
@@ -58,7 +57,7 @@ struct em_data_callback {
/**
* active_power() - Provide power at the next performance state of
* a device
- * @power : Active power at the performance state in mW
+ * @power : Active power at the performance state
* (modified)
* @freq : Frequency at the performance state in kHz
* (modified)
@@ -69,8 +68,8 @@ struct em_data_callback {
* and frequency.
*
* In case of CPUs, the power is the one of a single CPU in the domain,
- * expressed in milli-watts. It is expected to fit in the
- * [0, EM_MAX_POWER] range.
+ * expressed in milli-Watts or an abstract scale. It is expected to
+ * fit in the [0, EM_MAX_POWER] range.
*
* Return 0 on success.
*/
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index efe2a595988e..1358fa4abfa8 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -143,7 +143,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
/*
* The power returned by active_state() is expected to be
- * positive, in milli-watts and to fit into 16 bits.
+ * positive and to fit into 16 bits.
*/
if (!power || power > EM_MAX_POWER) {
dev_err(dev, "EM: invalid power: %lu\n",
--
2.17.1
Update description and meaning of a new flag, which indicates the type of
power scale used for a registered Energy Model (EM) device.
Reviewed-by: Quentin Perret <[email protected]>
Signed-off-by: Lukasz Luba <[email protected]>
---
Documentation/power/energy-model.rst | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst
index ba7aa581b307..60ac091d3b0d 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -30,6 +30,8 @@ These subsystems are more interested in estimation of power used in the past,
thus the real milli-Watts might be needed. An example of these requirements can
be found in the Intelligent Power Allocation in
Documentation/driver-api/thermal/power_allocator.rst.
+Kernel subsystems might implement automatic detection to check whether EM
+registered devices have inconsistent scale (based on EM internal flag).
Important thing to keep in mind is that when the power values are expressed in
an 'abstract scale' deriving real energy in milli-Joules would not be possible.
@@ -86,7 +88,7 @@ Drivers are expected to register performance domains into the EM framework by
calling the following API::
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
- struct em_data_callback *cb, cpumask_t *cpus);
+ struct em_data_callback *cb, cpumask_t *cpus, bool milliwatts);
Drivers must provide a callback function returning <frequency, power> tuples
for each performance state. The callback function provided by the driver is free
@@ -94,6 +96,10 @@ to fetch data from any relevant location (DT, firmware, ...), and by any mean
deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
performance domains using cpumask. For other devices than CPUs the last
argument must be set to NULL.
+The last argument 'milliwatts' is important to set with correct value. Kernel
+subsystems which use EM might rely on this flag to check if all EM devices use
+the same scale. If there are different scales, these subsystems might decide
+to: return warning/error, stop working or panic.
See Section 3. for an example of driver implementing this
callback, and kernel/power/energy_model.c for further documentation on this
API.
@@ -169,7 +175,8 @@ EM framework::
37 nr_opp = foo_get_nr_opp(policy);
38
39 /* And register the new performance domain */
- 40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
- 41
- 42 return 0;
- 43 }
+ 40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
+ 41 true);
+ 42
+ 43 return 0;
+ 44 }
--
2.17.1
Hi Rafael,
On 11/3/20 9:05 AM, Lukasz Luba wrote:
> Hi all,
>
> The Energy Model supports power values expressed in an abstract scale.
> This has an impact on Intelligent Power Allocation (IPA) and should be
> documented properly. Kernel sub-systems like EAS, IPA and DTPM
> (new comming PowerCap framework) would use the new flag to capture
> potential miss-configuration where the devices have registered different
> power scales, thus cannot operate together.
>
> There was a discussion below v2 of this patch series, which might help
> you to get context of these changes [2].
>
> The agreed approach is to have the DT as a source of power values expressed
> always in milli-Watts and the only way to submit with abstract scale values
> is via the em_dev_register_perf_domain() API.
>
> Changes:
> v4:
> - change bool to int type for 'miliwatts' in struct em_perf_domain
> (suggested by Quentin)
> - removed one sentence from patch 2/4 in IPA doc power_allocator.rst
> (suggested by Quentin)
> - added reviewed-by from Quentin to 1/4, 3/4, 4/4 patches
There was no major objections in the v3 and this v4 just addressed
minor comments. The important discussions mostly happen in v2.
Could you take the patches via your tree, please?
Regards,
Lukasz
On Tue, Nov 03, 2020 at 09:05:57AM +0000, Lukasz Luba wrote:
> @@ -79,7 +82,8 @@ struct em_data_callback {
> struct em_perf_domain *em_cpu_get(int cpu);
> struct em_perf_domain *em_pd_get(struct device *dev);
> int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
> - struct em_data_callback *cb, cpumask_t *span);
> + struct em_data_callback *cb, cpumask_t *spani,
"spani" looks like a typo?
On 11/5/20 9:18 AM, Morten Rasmussen wrote:
> On Tue, Nov 03, 2020 at 09:05:57AM +0000, Lukasz Luba wrote:
>> @@ -79,7 +82,8 @@ struct em_data_callback {
>> struct em_perf_domain *em_cpu_get(int cpu);
>> struct em_perf_domain *em_pd_get(struct device *dev);
>> int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
>> - struct em_data_callback *cb, cpumask_t *span);
>> + struct em_data_callback *cb, cpumask_t *spani,
>
> "spani" looks like a typo?
>
Good catch, yes, the vim 'i'.
Thank you Morten. I will resend this patch when you don't
find other issues in the rest of patches.
On Thu, Nov 05, 2020 at 10:09:05AM +0000, Lukasz Luba wrote:
>
>
> On 11/5/20 9:18 AM, Morten Rasmussen wrote:
> > On Tue, Nov 03, 2020 at 09:05:57AM +0000, Lukasz Luba wrote:
> > > @@ -79,7 +82,8 @@ struct em_data_callback {
> > > struct em_perf_domain *em_cpu_get(int cpu);
> > > struct em_perf_domain *em_pd_get(struct device *dev);
> > > int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
> > > - struct em_data_callback *cb, cpumask_t *span);
> > > + struct em_data_callback *cb, cpumask_t *spani,
> >
> > "spani" looks like a typo?
> >
>
> Good catch, yes, the vim 'i'.
>
> Thank you Morten. I will resend this patch when you don't
> find other issues in the rest of patches.
The rest of the series looks okay to me.
Morten
There are different platforms and devices which might use different scale
for the power values. Kernel sub-systems might need to check if all
Energy Model (EM) devices are using the same scale. Address that issue and
store the information inside EM for each device. Thanks to that they can
be easily compared and proper action triggered.
Suggested-by: Daniel Lezcano <[email protected]>
Reviewed-by: Quentin Perret <[email protected]>
Signed-off-by: Lukasz Luba <[email protected]>
---
This is a just a small change which addresses typo
in function argument name (s/spani/span) pointed by Morten [1].
Regards,
Lukasz
[1] https://lore.kernel.org/linux-pm/20201105091759.GA8237@e123083-lin/
drivers/cpufreq/scmi-cpufreq.c | 3 ++-
drivers/opp/of.c | 2 +-
include/linux/energy_model.h | 9 +++++++--
kernel/power/energy_model.c | 24 +++++++++++++++++++++++-
4 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index e855e8612a67..3714a4cd07fa 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -188,7 +188,8 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
policy->fast_switch_possible =
handle->perf_ops->fast_switch_possible(handle, cpu_dev);
- em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
+ em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
+ false);
return 0;
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 9faeb83e4b32..16f39e2127a5 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -1335,7 +1335,7 @@ int dev_pm_opp_of_register_em(struct device *dev, struct cpumask *cpus)
goto failed;
}
- ret = em_dev_register_perf_domain(dev, nr_opp, &em_cb, cpus);
+ ret = em_dev_register_perf_domain(dev, nr_opp, &em_cb, cpus, true);
if (ret)
goto failed;
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index b67a51c574b9..3a33c738d876 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -29,6 +29,8 @@ struct em_perf_state {
* em_perf_domain - Performance domain
* @table: List of performance states, in ascending order
* @nr_perf_states: Number of performance states
+ * @milliwatts: Flag indicating the power values are in milli-Watts
+ * or some other scale.
* @cpus: Cpumask covering the CPUs of the domain. It's here
* for performance reasons to avoid potential cache
* misses during energy calculations in the scheduler
@@ -43,6 +45,7 @@ struct em_perf_state {
struct em_perf_domain {
struct em_perf_state *table;
int nr_perf_states;
+ int milliwatts;
unsigned long cpus[];
};
@@ -79,7 +82,8 @@ struct em_data_callback {
struct em_perf_domain *em_cpu_get(int cpu);
struct em_perf_domain *em_pd_get(struct device *dev);
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
- struct em_data_callback *cb, cpumask_t *span);
+ struct em_data_callback *cb, cpumask_t *span,
+ bool milliwatts);
void em_dev_unregister_perf_domain(struct device *dev);
/**
@@ -186,7 +190,8 @@ struct em_data_callback {};
static inline
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
- struct em_data_callback *cb, cpumask_t *span)
+ struct em_data_callback *cb, cpumask_t *span,
+ bool milliwatts)
{
return -EINVAL;
}
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index c1ff7fa030ab..efe2a595988e 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -52,6 +52,17 @@ static int em_debug_cpus_show(struct seq_file *s, void *unused)
}
DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);
+static int em_debug_units_show(struct seq_file *s, void *unused)
+{
+ struct em_perf_domain *pd = s->private;
+ char *units = pd->milliwatts ? "milliWatts" : "bogoWatts";
+
+ seq_printf(s, "%s\n", units);
+
+ return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(em_debug_units);
+
static void em_debug_create_pd(struct device *dev)
{
struct dentry *d;
@@ -64,6 +75,8 @@ static void em_debug_create_pd(struct device *dev)
debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus,
&em_debug_cpus_fops);
+ debugfs_create_file("units", 0444, d, dev->em_pd, &em_debug_units_fops);
+
/* Create a sub-directory for each performance state */
for (i = 0; i < dev->em_pd->nr_perf_states; i++)
em_debug_create_ps(&dev->em_pd->table[i], d);
@@ -250,17 +263,24 @@ EXPORT_SYMBOL_GPL(em_cpu_get);
* @cpus : Pointer to cpumask_t, which in case of a CPU device is
* obligatory. It can be taken from i.e. 'policy->cpus'. For other
* type of devices this should be set to NULL.
+ * @milliwatts : Flag indicating that the power values are in milliWatts or
+ * in some other scale. It must be set properly.
*
* Create Energy Model tables for a performance domain using the callbacks
* defined in cb.
*
+ * The @milliwatts is important to set with correct value. Some kernel
+ * sub-systems might rely on this flag and check if all devices in the EM are
+ * using the same scale.
+ *
* If multiple clients register the same performance domain, all but the first
* registration will be ignored.
*
* Return 0 on success
*/
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
- struct em_data_callback *cb, cpumask_t *cpus)
+ struct em_data_callback *cb, cpumask_t *cpus,
+ bool milliwatts)
{
unsigned long cap, prev_cap = 0;
int cpu, ret;
@@ -313,6 +333,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
if (ret)
goto unlock;
+ dev->em_pd->milliwatts = milliwatts;
+
em_debug_create_pd(dev);
dev_info(dev, "EM: created perf domain\n");
--
2.17.1
On 11/5/20 10:56 AM, Morten Rasmussen wrote:
> On Thu, Nov 05, 2020 at 10:09:05AM +0000, Lukasz Luba wrote:
>>
>>
>> On 11/5/20 9:18 AM, Morten Rasmussen wrote:
>>> On Tue, Nov 03, 2020 at 09:05:57AM +0000, Lukasz Luba wrote:
>>>> @@ -79,7 +82,8 @@ struct em_data_callback {
>>>> struct em_perf_domain *em_cpu_get(int cpu);
>>>> struct em_perf_domain *em_pd_get(struct device *dev);
>>>> int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
>>>> - struct em_data_callback *cb, cpumask_t *span);
>>>> + struct em_data_callback *cb, cpumask_t *spani,
>>>
>>> "spani" looks like a typo?
>>>
>>
>> Good catch, yes, the vim 'i'.
>>
>> Thank you Morten. I will resend this patch when you don't
>> find other issues in the rest of patches.
>
> The rest of the series looks okay to me.
Thank you for checking the whole series. I have re-sent this patch only.
Lukasz
>
> Morten
>
On Wed, Nov 4, 2020 at 11:58 AM Lukasz Luba <[email protected]> wrote:
>
> Hi Rafael,
>
> On 11/3/20 9:05 AM, Lukasz Luba wrote:
> > Hi all,
> >
> > The Energy Model supports power values expressed in an abstract scale.
> > This has an impact on Intelligent Power Allocation (IPA) and should be
> > documented properly. Kernel sub-systems like EAS, IPA and DTPM
> > (new comming PowerCap framework) would use the new flag to capture
> > potential miss-configuration where the devices have registered different
> > power scales, thus cannot operate together.
> >
> > There was a discussion below v2 of this patch series, which might help
> > you to get context of these changes [2].
> >
> > The agreed approach is to have the DT as a source of power values expressed
> > always in milli-Watts and the only way to submit with abstract scale values
> > is via the em_dev_register_perf_domain() API.
> >
> > Changes:
> > v4:
> > - change bool to int type for 'miliwatts' in struct em_perf_domain
> > (suggested by Quentin)
> > - removed one sentence from patch 2/4 in IPA doc power_allocator.rst
> > (suggested by Quentin)
> > - added reviewed-by from Quentin to 1/4, 3/4, 4/4 patches
>
> There was no major objections in the v3 and this v4 just addressed
> minor comments. The important discussions mostly happen in v2.
>
> Could you take the patches via your tree, please?
Applied as 5.11 material, thanks!
On 11/10/20 7:32 PM, Rafael J. Wysocki wrote:
> On Wed, Nov 4, 2020 at 11:58 AM Lukasz Luba <[email protected]> wrote:
>>
>> Hi Rafael,
>>
>> On 11/3/20 9:05 AM, Lukasz Luba wrote:
>>> Hi all,
>>>
>>> The Energy Model supports power values expressed in an abstract scale.
>>> This has an impact on Intelligent Power Allocation (IPA) and should be
>>> documented properly. Kernel sub-systems like EAS, IPA and DTPM
>>> (new comming PowerCap framework) would use the new flag to capture
>>> potential miss-configuration where the devices have registered different
>>> power scales, thus cannot operate together.
>>>
>>> There was a discussion below v2 of this patch series, which might help
>>> you to get context of these changes [2].
>>>
>>> The agreed approach is to have the DT as a source of power values expressed
>>> always in milli-Watts and the only way to submit with abstract scale values
>>> is via the em_dev_register_perf_domain() API.
>>>
>>> Changes:
>>> v4:
>>> - change bool to int type for 'miliwatts' in struct em_perf_domain
>>> (suggested by Quentin)
>>> - removed one sentence from patch 2/4 in IPA doc power_allocator.rst
>>> (suggested by Quentin)
>>> - added reviewed-by from Quentin to 1/4, 3/4, 4/4 patches
>>
>> There was no major objections in the v3 and this v4 just addressed
>> minor comments. The important discussions mostly happen in v2.
>>
>> Could you take the patches via your tree, please?
>
> Applied as 5.11 material, thanks!
>
Thank you Rafael!
Regards,
Lukasz