2016-03-18 14:23:36

by Juri Lelli

[permalink] [raw]
Subject: [PATCH v4 0/8] CPUs capacity information for heterogeneous systems

Hi all,

this is take 4 of "CPUs capacity information for heterogeneous systems"
patchset [1]; some context follows.

ARM systems may be configured to have CPUs with different power/performance
characteristics within the same chip. In this case, additional information has
to be made available to the kernel (the scheduler in particular) for it to be
aware of such differences and take decisions accordingly. This RFC stems from
the ongoing discussion about introducing a simple platform energy cost model to
guide scheduling decisions (a.k.a Energy Aware Scheduling [2]), but also aims
to be an independent track aimed to standardise the way we make the scheduler
aware of heterogeneous CPU systems. With these patches and in addition patches
from [2] (that make the scheduler wakeup paths aware of heterogeneous CPU
systems) we enable the scheduler to have good default performance on such
systems. In addition, we get a clearly defined way of providing the scheduler
with needed information about CPU capacity on such systems.

CPU capacity is defined in this context as a number that provides the scheduler
information about CPUs heterogeneity. Such heterogeneity can come from
micro-architectural differences (e.g., ARM big.LITTLE systems) or maximum
frequency at which CPUs can run (e.g., SMP systems with multiple frequency
domains and different max frequencies). Heterogeneity in this context is about
differing performance characteristics; in practice, the binding that we propose
in this RFC tries to capture a first-order approximation of the relative
performance of CPUs.

After discussing at the recent Linaro Connect pros and cons of the alternatives
presented on the list, we seem to come to the conclusion that a new DT binding
is reasonable: the new property is only a first-order approximation that it is
useful to get acceptable good behaviours during boot and early execution; it
can be then overwritten using the sysfs interface if needed.
I thus rebased v1 of this set on mainline as of today and I also removed the
capacity-scale property (as agreed during v1 review): CPU capacity properties
are now normalized w.r.t. the biggest capacity found while parsing the DT.
The capacity property name and definition didn't change w.r.t. v1, as I
intended this to be an almost pure refresh of that posting. Comments and
feedback on what needs to be changed there is highly welcome.

Patches high level description:

o 01/08 cleans up how cpu_scale is initialized in arm (already landed on
Russell's patch system)
o 02/08 introduces documentation for the new optional DT binding
o [03-06]/08 add cpu-capacity attribute to TC2 and Juno DTs and provide
parsing of such information at boot time
o [07-08]/08 introduce sysfs attribute

The patchset is based on top of mainline as of today (4.5). Changelogs comments
regarding changes relative to previous versions, where present, refer to v1, as
v2 and v3 didn't contain 02-06.

In case you would like to test this out, I pushed a branch here:

git://linux-arm.org/linux-jl.git upstream/default_caps_v4

This branch contains additional patches, useful to better understand how CPU
capacity information is actually used by the scheduler. However, discussion
regarding these additional patches is outside the scope of this posting.

Best,

- Juri

[1] v1 - https://lkml.org/lkml/2015/11/23/391
v2 - https://lkml.org/lkml/2016/1/8/417
v3 - https://lkml.org/lkml/2016/2/3/405
[2] https://lkml.org/lkml/2015/7/7/754

Juri Lelli (8):
ARM: initialize cpu_scale to its default
Documentation: arm: define DT cpu capacity bindings
arm: parse cpu capacity from DT
arm, dts: add TC2 cpu capacity information
arm64: parse cpu capacity from DT
arm64, dts: add Juno cpu capacity information
arm: add sysfs cpu_capacity attribute
arm64: add sysfs cpu_capacity attribute

.../devicetree/bindings/arm/cpu-capacity.txt | 222 +++++++++++++++++++++
Documentation/devicetree/bindings/arm/cpus.txt | 9 +
arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts | 5 +
arch/arm/kernel/topology.c | 150 +++++++++++++-
arch/arm64/boot/dts/arm/juno.dts | 6 +
arch/arm64/kernel/topology.c | 143 +++++++++++++
6 files changed, 531 insertions(+), 4 deletions(-)
create mode 100644 Documentation/devicetree/bindings/arm/cpu-capacity.txt

--
2.7.0


2016-03-18 14:23:39

by Juri Lelli

[permalink] [raw]
Subject: [PATCH v4 1/8] ARM: initialize cpu_scale to its default

Instead of looping through all cpus calling set_capacity_scale, we can
initialise cpu_scale per-cpu variables to SCHED_CAPACITY_SCALE with their
definition.

Cc: Russell King <[email protected]>
Acked-by: Vincent Guittot <[email protected]>
Signed-off-by: Juri Lelli <[email protected]>
---

Applied:
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8497/1
---
arch/arm/kernel/topology.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 08b7847..ec279d1 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -40,7 +40,7 @@
* to run the rebalance_domains for all idle cores and the cpu_capacity can be
* updated during this sequence.
*/
-static DEFINE_PER_CPU(unsigned long, cpu_scale);
+static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;

unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
{
@@ -306,8 +306,6 @@ void __init init_cpu_topology(void)
cpu_topo->socket_id = -1;
cpumask_clear(&cpu_topo->core_sibling);
cpumask_clear(&cpu_topo->thread_sibling);
-
- set_capacity_scale(cpu, SCHED_CAPACITY_SCALE);
}
smp_wmb();

--
2.7.0

2016-03-18 14:23:50

by Juri Lelli

[permalink] [raw]
Subject: [PATCH v4 3/8] arm: parse cpu capacity from DT

With the introduction of cpu capacity bindings, CPU capacities can now be
extracted from DT. Add parsing of such information at boot time. We keep
code that can produce same information, based on different DT properties
and hard-coded values, as fall-back for backward compatibility.

Caveat: the information provided by this patch will start to be used in
the future, by properly defining arch_scale_cpu_capacity().

Cc: Russell King <[email protected]>
Signed-off-by: Juri Lelli <[email protected]>
---

Changes from v1:
- normalize w.r.t. highest capacity found in DT
- bailout conditions (all-or-nothing)
---
arch/arm/kernel/topology.c | 78 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 77 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index ec279d1..53c13c1 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -78,6 +78,67 @@ static unsigned long *__cpu_capacity;
#define cpu_capacity(cpu) __cpu_capacity[cpu]

static unsigned long middle_capacity = 1;
+static bool cap_from_dt = true;
+static u32 *raw_capacity;
+static bool cap_parsing_failed;
+static u32 capacity_scale;
+
+static int __init parse_cpu_capacity(struct device_node *cpu_node, int cpu)
+{
+ int ret = 1;
+ u32 cpu_capacity;
+
+ if (cap_parsing_failed)
+ return !ret;
+
+ ret = of_property_read_u32(cpu_node,
+ "capacity",
+ &cpu_capacity);
+ if (!ret) {
+ if (!raw_capacity) {
+ raw_capacity = kzalloc(sizeof(*raw_capacity) *
+ num_possible_cpus(), GFP_KERNEL);
+ if (!raw_capacity) {
+ pr_err("cpu_capacity: failed to allocate memory"
+ " for raw capacities\n");
+ cap_parsing_failed = true;
+ return !ret;
+ }
+ }
+ capacity_scale = max(cpu_capacity, capacity_scale);
+ raw_capacity[cpu] = cpu_capacity;
+ pr_debug("cpu_capacity: %s cpu_capacity=%u (raw)\n",
+ cpu_node->full_name, raw_capacity[cpu]);
+ } else {
+ pr_err("cpu_capacity: missing %s raw capacity "
+ "(fallback to 1024 for all CPUs)\n",
+ cpu_node->full_name);
+ cap_parsing_failed = true;
+ kfree(raw_capacity);
+ }
+
+ return !ret;
+}
+
+static void __init normalize_cpu_capacity(void)
+{
+ u64 capacity;
+ int cpu;
+
+ if (cap_parsing_failed)
+ return;
+
+ pr_info("cpu_capacity: capacity_scale=%u\n", capacity_scale);
+ for_each_possible_cpu(cpu) {
+ capacity = (raw_capacity[cpu] << SCHED_CAPACITY_SHIFT)
+ / capacity_scale;
+ set_capacity_scale(cpu, capacity);
+ pr_info("cpu_capacity: CPU%d cpu_capacity=%lu\n",
+ cpu, arch_scale_cpu_capacity(NULL, cpu));
+ }
+
+ kfree(raw_capacity);
+}

/*
* Iterate all CPUs' descriptor in DT and compute the efficiency
@@ -99,6 +160,12 @@ static void __init parse_dt_topology(void)
__cpu_capacity = kcalloc(nr_cpu_ids, sizeof(*__cpu_capacity),
GFP_NOWAIT);

+ cn = of_find_node_by_path("/cpus");
+ if (!cn) {
+ pr_err("No CPU information found in DT\n");
+ return;
+ }
+
for_each_possible_cpu(cpu) {
const u32 *rate;
int len;
@@ -110,6 +177,13 @@ static void __init parse_dt_topology(void)
continue;
}

+ if (parse_cpu_capacity(cn, cpu)) {
+ of_node_put(cn);
+ continue;
+ }
+
+ cap_from_dt = false;
+
for (cpu_eff = table_efficiency; cpu_eff->compatible; cpu_eff++)
if (of_device_is_compatible(cn, cpu_eff->compatible))
break;
@@ -151,6 +225,8 @@ static void __init parse_dt_topology(void)
middle_capacity = ((max_capacity / 3)
>> (SCHED_CAPACITY_SHIFT-1)) + 1;

+ if (cap_from_dt && !cap_parsing_failed)
+ normalize_cpu_capacity();
}

/*
@@ -160,7 +236,7 @@ static void __init parse_dt_topology(void)
*/
static void update_cpu_capacity(unsigned int cpu)
{
- if (!cpu_capacity(cpu))
+ if (!cpu_capacity(cpu) || cap_from_dt)
return;

set_capacity_scale(cpu, cpu_capacity(cpu) / middle_capacity);
--
2.7.0

2016-03-18 14:23:58

by Juri Lelli

[permalink] [raw]
Subject: [PATCH v4 4/8] arm, dts: add TC2 cpu capacity information

Add TC2 cpu capacity binding information.

Cc: Liviu Dudau <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Lorenzo Pieralisi <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Pawel Moll <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Ian Campbell <[email protected]>
Cc: Kumar Gala <[email protected]>
Cc: Russell King <[email protected]>
Cc: [email protected]
Signed-off-by: Juri Lelli <[email protected]>
---

Changes from v1:
- capacity-scale removed
---
arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts b/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
index 17f63f7..0924844 100644
--- a/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
+++ b/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
@@ -39,6 +39,7 @@
reg = <0>;
cci-control-port = <&cci_control1>;
cpu-idle-states = <&CLUSTER_SLEEP_BIG>;
+ capacity = <1024>;
};

cpu1: cpu@1 {
@@ -47,6 +48,7 @@
reg = <1>;
cci-control-port = <&cci_control1>;
cpu-idle-states = <&CLUSTER_SLEEP_BIG>;
+ capacity = <1024>;
};

cpu2: cpu@2 {
@@ -55,6 +57,7 @@
reg = <0x100>;
cci-control-port = <&cci_control2>;
cpu-idle-states = <&CLUSTER_SLEEP_LITTLE>;
+ capacity = <430>;
};

cpu3: cpu@3 {
@@ -63,6 +66,7 @@
reg = <0x101>;
cci-control-port = <&cci_control2>;
cpu-idle-states = <&CLUSTER_SLEEP_LITTLE>;
+ capacity = <430>;
};

cpu4: cpu@4 {
@@ -71,6 +75,7 @@
reg = <0x102>;
cci-control-port = <&cci_control2>;
cpu-idle-states = <&CLUSTER_SLEEP_LITTLE>;
+ capacity = <430>;
};

idle-states {
--
2.7.0

2016-03-18 14:24:05

by Juri Lelli

[permalink] [raw]
Subject: [PATCH v4 7/8] arm: add sysfs cpu_capacity attribute

Add a sysfs cpu_capacity attribute with which it is possible to read and
write (thus over-writing default values) CPUs capacity. This might be
useful in situations where values needs changing after boot.

The new attribute shows up as:

/sys/devices/system/cpu/cpu*/cpu_capacity

Cc: Russell King <[email protected]>
Signed-off-by: Juri Lelli <[email protected]>
---
arch/arm/kernel/topology.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 53c13c1..28a4029 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -52,6 +52,74 @@ static void set_capacity_scale(unsigned int cpu, unsigned long capacity)
per_cpu(cpu_scale, cpu) = capacity;
}

+#ifdef CONFIG_PROC_SYSCTL
+#include <asm/cpu.h>
+#include <linux/string.h>
+static ssize_t show_cpu_capacity(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ ssize_t rc;
+ int cpunum = cpu->dev.id;
+ unsigned long capacity = arch_scale_cpu_capacity(NULL, cpunum);
+
+ rc = sprintf(buf, "%lu\n", capacity);
+
+ return rc;
+}
+
+static ssize_t store_cpu_capacity(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf,
+ size_t count)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ int this_cpu = cpu->dev.id, i;
+ unsigned long new_capacity;
+ ssize_t ret;
+
+ if (count) {
+ char *p = (char *) buf;
+
+ ret = kstrtoul(p, 0, &new_capacity);
+ if (ret)
+ return ret;
+ if (new_capacity > SCHED_CAPACITY_SCALE)
+ return -EINVAL;
+
+ for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
+ set_capacity_scale(i, new_capacity);
+ }
+
+ return count;
+}
+
+static DEVICE_ATTR(cpu_capacity,
+ 0644,
+ show_cpu_capacity,
+ store_cpu_capacity);
+
+static int register_cpu_capacity_sysctl(void)
+{
+ int i;
+ struct device *cpu;
+
+ for_each_possible_cpu(i) {
+ cpu = get_cpu_device(i);
+ if (!cpu) {
+ pr_err("%s: too early to get CPU%d device!\n",
+ __func__, i);
+ continue;
+ }
+ device_create_file(cpu, &dev_attr_cpu_capacity);
+ }
+
+ return 0;
+}
+late_initcall(register_cpu_capacity_sysctl);
+#endif
+
#ifdef CONFIG_OF
struct cpu_efficiency {
const char *compatible;
--
2.7.0

2016-03-18 14:24:15

by Juri Lelli

[permalink] [raw]
Subject: [PATCH v4 6/8] arm64, dts: add Juno cpu capacity information

Add Juno cpu capacity bindings information.

Cc: Rob Herring <[email protected]>
Cc: Pawel Moll <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Ian Campbell <[email protected]>
Cc: Kumar Gala <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Liviu Dudau <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Jon Medhurst <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: [email protected]
Signed-off-by: Juri Lelli <[email protected]>
---

Changes from v1:
- capacity-scale removed
---
arch/arm64/boot/dts/arm/juno.dts | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/arm/juno.dts b/arch/arm64/boot/dts/arm/juno.dts
index dcfcf15..a15c781 100644
--- a/arch/arm64/boot/dts/arm/juno.dts
+++ b/arch/arm64/boot/dts/arm/juno.dts
@@ -90,6 +90,7 @@
next-level-cache = <&A57_L2>;
clocks = <&scpi_dvfs 0>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <1024>;
};

A57_1: cpu@1 {
@@ -100,6 +101,7 @@
next-level-cache = <&A57_L2>;
clocks = <&scpi_dvfs 0>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <1024>;
};

A53_0: cpu@100 {
@@ -110,6 +112,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <447>;
};

A53_1: cpu@101 {
@@ -120,6 +123,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <447>;
};

A53_2: cpu@102 {
@@ -130,6 +134,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <447>;
};

A53_3: cpu@103 {
@@ -140,6 +145,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <447>;
};

A57_L2: l2-cache0 {
--
2.7.0

2016-03-18 14:24:11

by Juri Lelli

[permalink] [raw]
Subject: [PATCH v4 8/8] arm64: add sysfs cpu_capacity attribute

Add a sysfs cpu_capacity attribute with which it is possible to read and
write (thus over-writing default values) CPUs capacity. This might be
useful in situations where values needs changing after boot.

The new attribute shows up as:

/sys/devices/system/cpu/cpu*/cpu_capacity

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Sudeep Holla <[email protected]>
Signed-off-by: Juri Lelli <[email protected]>
---
arch/arm64/kernel/topology.c | 68 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 69229b3..4d1fddb 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -36,6 +36,74 @@ static void set_capacity_scale(unsigned int cpu, unsigned long capacity)
per_cpu(cpu_scale, cpu) = capacity;
}

+#ifdef CONFIG_PROC_SYSCTL
+#include <asm/cpu.h>
+#include <linux/string.h>
+static ssize_t show_cpu_capacity(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ ssize_t rc;
+ int cpunum = cpu->dev.id;
+ unsigned long capacity = arch_scale_cpu_capacity(NULL, cpunum);
+
+ rc = sprintf(buf, "%lu\n", capacity);
+
+ return rc;
+}
+
+static ssize_t store_cpu_capacity(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf,
+ size_t count)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ int this_cpu = cpu->dev.id, i;
+ unsigned long new_capacity;
+ ssize_t ret;
+
+ if (count) {
+ char *p = (char *) buf;
+
+ ret = kstrtoul(p, 0, &new_capacity);
+ if (ret)
+ return ret;
+ if (new_capacity > SCHED_CAPACITY_SCALE)
+ return -EINVAL;
+
+ for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
+ set_capacity_scale(i, new_capacity);
+ }
+
+ return count;
+}
+
+static DEVICE_ATTR(cpu_capacity,
+ 0644,
+ show_cpu_capacity,
+ store_cpu_capacity);
+
+static int register_cpu_capacity_sysctl(void)
+{
+ int i;
+ struct device *cpu;
+
+ for_each_possible_cpu(i) {
+ cpu = get_cpu_device(i);
+ if (!cpu) {
+ pr_err("%s: too early to get CPU%d device!\n",
+ __func__, i);
+ continue;
+ }
+ device_create_file(cpu, &dev_attr_cpu_capacity);
+ }
+
+ return 0;
+}
+late_initcall(register_cpu_capacity_sysctl);
+#endif
+
static u32 capacity_scale;
static u32 *raw_capacity;
static bool cap_parsing_failed;
--
2.7.0

2016-03-18 14:23:55

by Juri Lelli

[permalink] [raw]
Subject: [PATCH v4 5/8] arm64: parse cpu capacity from DT

With the introduction of cpu capacity bindings, CPU capacities can now be
extracted from DT. Add parsing of such information at boot time. Also,
store such information using per CPU variables, as we do for arm.

Caveat: the information provided by this patch will start to be used in
the future, by properly defining arch_scale_cpu_capacity().

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Sudeep Holla <[email protected]>
Signed-off-by: Juri Lelli <[email protected]>
---

Changes from v1:
- normalize w.r.t. highest capacity found in DT
- bailout conditions (all-or-nothing)
---
arch/arm64/kernel/topology.c | 75 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 694f6de..69229b3 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -19,10 +19,82 @@
#include <linux/nodemask.h>
#include <linux/of.h>
#include <linux/sched.h>
+#include <linux/slab.h>

#include <asm/cputype.h>
#include <asm/topology.h>

+static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
+
+unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
+{
+ return per_cpu(cpu_scale, cpu);
+}
+
+static void set_capacity_scale(unsigned int cpu, unsigned long capacity)
+{
+ per_cpu(cpu_scale, cpu) = capacity;
+}
+
+static u32 capacity_scale;
+static u32 *raw_capacity;
+static bool cap_parsing_failed;
+
+static void __init parse_cpu_capacity(struct device_node *cpu_node, int cpu)
+{
+ int ret;
+ u32 cpu_capacity;
+
+ if (cap_parsing_failed)
+ return;
+
+ ret = of_property_read_u32(cpu_node,
+ "capacity",
+ &cpu_capacity);
+ if (!ret) {
+ if (!raw_capacity) {
+ raw_capacity = kzalloc(sizeof(*raw_capacity) *
+ num_possible_cpus(), GFP_KERNEL);
+ if (!raw_capacity) {
+ pr_err("cpu_capacity: failed to allocate memory"
+ " for raw capacities\n");
+ cap_parsing_failed = true;
+ return;
+ }
+ }
+ capacity_scale = max(cpu_capacity, capacity_scale);
+ raw_capacity[cpu] = cpu_capacity;
+ pr_debug("cpu_capacity: %s cpu_capacity=%u (raw)\n",
+ cpu_node->full_name, raw_capacity[cpu]);
+ } else {
+ pr_err("cpu_capacity: missing %s raw capacity "
+ "(fallback to 1024 for all CPUs)\n",
+ cpu_node->full_name);
+ cap_parsing_failed = true;
+ kfree(raw_capacity);
+ }
+}
+
+static void __init normalize_cpu_capacity(void)
+{
+ u64 capacity;
+ int cpu;
+
+ if (cap_parsing_failed)
+ return;
+
+ pr_info("cpu_capacity: capacity_scale=%u\n", capacity_scale);
+ for_each_possible_cpu(cpu) {
+ capacity = (raw_capacity[cpu] << SCHED_CAPACITY_SHIFT)
+ / capacity_scale;
+ set_capacity_scale(cpu, capacity);
+ pr_info("cpu_capacity: CPU%d cpu_capacity=%lu\n",
+ cpu, arch_scale_cpu_capacity(NULL, cpu));
+ }
+
+ kfree(raw_capacity);
+}
+
static int __init get_cpu_for_node(struct device_node *node)
{
struct device_node *cpu_node;
@@ -34,6 +106,7 @@ static int __init get_cpu_for_node(struct device_node *node)

for_each_possible_cpu(cpu) {
if (of_get_cpu_node(cpu, NULL) == cpu_node) {
+ parse_cpu_capacity(cpu_node, cpu);
of_node_put(cpu_node);
return cpu;
}
@@ -185,6 +258,8 @@ static int __init parse_dt_topology(void)
if (ret != 0)
goto out_map;

+ normalize_cpu_capacity();
+
/*
* Check that all cores are in the topology; the SMP code will
* only mark cores described in the DT as possible.
--
2.7.0

2016-03-18 14:33:46

by Juri Lelli

[permalink] [raw]
Subject: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

ARM systems may be configured to have cpus with different power/performance
characteristics within the same chip. In this case, additional information
has to be made available to the kernel (the scheduler in particular) for it
to be aware of such differences and take decisions accordingly.

Therefore, this patch aims at standardizing cpu capacities device tree
bindings for ARM platforms. Bindings define cpu capacity parameter, to
allow operating systems to retrieve such information from the device tree
and initialize related kernel structures, paving the way for common code in
the kernel to deal with heterogeneity.

Cc: Rob Herring <[email protected]>
Cc: Pawel Moll <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Ian Campbell <[email protected]>
Cc: Kumar Gala <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: Gregory CLEMENT <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Thomas Petazzoni <[email protected]>
Cc: [email protected]
Signed-off-by: Juri Lelli <[email protected]>
---

Changes from v1:
- removed section regarding capacity-scale
- added information regarding normalization
---
.../devicetree/bindings/arm/cpu-capacity.txt | 222 +++++++++++++++++++++
Documentation/devicetree/bindings/arm/cpus.txt | 9 +
2 files changed, 231 insertions(+)
create mode 100644 Documentation/devicetree/bindings/arm/cpu-capacity.txt

diff --git a/Documentation/devicetree/bindings/arm/cpu-capacity.txt b/Documentation/devicetree/bindings/arm/cpu-capacity.txt
new file mode 100644
index 0000000..fdfc453
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/cpu-capacity.txt
@@ -0,0 +1,222 @@
+==========================================
+ARM CPUs capacity bindings
+==========================================
+
+==========================================
+1 - Introduction
+==========================================
+
+ARM systems may be configured to have cpus with different power/performance
+characteristics within the same chip. In this case, additional information
+has to be made available to the kernel (the scheduler in particular) for
+it to be aware of such differences and take decisions accordingly.
+
+==========================================
+2 - CPU capacity definition
+==========================================
+
+CPU capacity is a number that provides the scheduler information about CPUs
+heterogeneity. Such heterogeneity can come from micro-architectural differences
+(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
+(e.g., SMP systems with multiple frequency domains). Heterogeneity in this
+context is about differing performance characteristics; this binding tries to
+capture a first-order approximation of the relative performance of CPUs.
+
+One simple way to estimate CPU capacities is to iteratively run a well-known
+CPU user space benchmark (e.g, sysbench) on each CPU at maximum frequency and
+then normalize values w.r.t. the best performing CPU. One can also do a
+statistically significant study of a wide collection of benchmarks, but pros
+of such an approach are not really evident at the time of writing.
+
+==========================================
+3 - capacity
+==========================================
+
+capacity is an optional cpu node [1] property: u32 value representing CPU
+capacity. Values are normalized w.r.t. the biggest capacity found while
+parsing the DT.
+
+If capacity property is all-or-nothing: if it is specified for a cpu node, it
+has to be specified for every other cpu nodes, or the system will fall back to
+the default capacity value for every CPU.
+
+===========================================
+4 - Examples
+===========================================
+
+Example 1 (ARM 64-bit, 6-cpu system, two clusters):
+capacities are scaled w.r.t. 1024 (cpu@0 and cpu@1)
+
+cpus {
+ #address-cells = <2>;
+ #size-cells = <0>;
+
+ cpu-map {
+ cluster0 {
+ core0 {
+ cpu = <&A57_0>;
+ };
+ core1 {
+ cpu = <&A57_1>;
+ };
+ };
+
+ cluster1 {
+ core0 {
+ cpu = <&A53_0>;
+ };
+ core1 {
+ cpu = <&A53_1>;
+ };
+ core2 {
+ cpu = <&A53_2>;
+ };
+ core3 {
+ cpu = <&A53_3>;
+ };
+ };
+ };
+
+ idle-states {
+ entry-method = "arm,psci";
+
+ CPU_SLEEP_0: cpu-sleep-0 {
+ compatible = "arm,idle-state";
+ arm,psci-suspend-param = <0x0010000>;
+ local-timer-stop;
+ entry-latency-us = <100>;
+ exit-latency-us = <250>;
+ min-residency-us = <150>;
+ };
+
+ CLUSTER_SLEEP_0: cluster-sleep-0 {
+ compatible = "arm,idle-state";
+ arm,psci-suspend-param = <0x1010000>;
+ local-timer-stop;
+ entry-latency-us = <800>;
+ exit-latency-us = <700>;
+ min-residency-us = <2500>;
+ };
+ };
+
+ A57_0: cpu@0 {
+ compatible = "arm,cortex-a57","arm,armv8";
+ reg = <0x0 0x0>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A57_L2>;
+ clocks = <&scpi_dvfs 0>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <1024>;
+ };
+
+ A57_1: cpu@1 {
+ compatible = "arm,cortex-a57","arm,armv8";
+ reg = <0x0 0x1>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A57_L2>;
+ clocks = <&scpi_dvfs 0>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <1024>;
+ };
+
+ A53_0: cpu@100 {
+ compatible = "arm,cortex-a53","arm,armv8";
+ reg = <0x0 0x100>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A53_L2>;
+ clocks = <&scpi_dvfs 1>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <447>;
+ };
+
+ A53_1: cpu@101 {
+ compatible = "arm,cortex-a53","arm,armv8";
+ reg = <0x0 0x101>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A53_L2>;
+ clocks = <&scpi_dvfs 1>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <447>;
+ };
+
+ A53_2: cpu@102 {
+ compatible = "arm,cortex-a53","arm,armv8";
+ reg = <0x0 0x102>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A53_L2>;
+ clocks = <&scpi_dvfs 1>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <447>;
+ };
+
+ A53_3: cpu@103 {
+ compatible = "arm,cortex-a53","arm,armv8";
+ reg = <0x0 0x103>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A53_L2>;
+ clocks = <&scpi_dvfs 1>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ capacity = <447>;
+ };
+
+ A57_L2: l2-cache0 {
+ compatible = "cache";
+ };
+
+ A53_L2: l2-cache1 {
+ compatible = "cache";
+ };
+};
+
+Example 2 (ARM 32-bit, 4-cpu system, two clusters,
+ cpus 0,1@1GHz, cpus 2,3@500MHz):
+capacities are scaled w.r.t. 2 (cpu@0 and cpu@1), this means that first
+cluster is twice fast than second cluster (i.e., cpu@0 and cpu@1 might be
+running at twice the clock-frequency of cpu@2 and cpu@3)
+
+cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ cpu0: cpu@0 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a15";
+ reg = <0>;
+ capacity = <2>;
+ };
+
+ cpu1: cpu@1 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a15";
+ reg = <1>;
+ capacity = <2>;
+ };
+
+ cpu2: cpu@2 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a15";
+ reg = <0x100>;
+ capacity = <1>;
+ };
+
+ cpu3: cpu@3 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a15";
+ reg = <0x101>;
+ capacity = <1>;
+ };
+};
+
+===========================================
+5 - References
+===========================================
+
+[1] ARM Linux Kernel documentation - CPUs bindings
+ Documentation/devicetree/bindings/arm/cpus.txt
diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt
index ae9be07..efd6151 100644
--- a/Documentation/devicetree/bindings/arm/cpus.txt
+++ b/Documentation/devicetree/bindings/arm/cpus.txt
@@ -237,6 +237,13 @@ nodes to be present and contain the properties described below.
# List of phandles to idle state nodes supported
by this cpu [3].

+ - capacity
+ Usage: Optional
+ Value type: <u32>
+ Definition:
+ # u32 value representing CPU capacity [3], relative to
+ highest capacity in the system.
+
- rockchip,pmu
Usage: optional for systems that have an "enable-method"
property value of "rockchip,rk3066-smp"
@@ -460,3 +467,5 @@ cpus {
[2] arm/msm/qcom,kpss-acc.txt
[3] ARM Linux kernel documentation - idle states bindings
Documentation/devicetree/bindings/arm/idle-states.txt
+[3] ARM Linux kernel documentation - cpu capacity bindings
+ Documentation/devicetree/bindings/arm/cpu-capacity.txt
--
2.7.0

2016-03-18 17:49:57

by Sai Gurrappadi

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

Hi Juri,

On 03/18/2016 07:24 AM, Juri Lelli wrote:

<snip>

> +
> +==========================================
> +2 - CPU capacity definition
> +==========================================
> +
> +CPU capacity is a number that provides the scheduler information about CPUs
> +heterogeneity. Such heterogeneity can come from micro-architectural differences
> +(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
> +(e.g., SMP systems with multiple frequency domains). Heterogeneity in this
> +context is about differing performance characteristics; this binding tries to
> +capture a first-order approximation of the relative performance of CPUs.

Any reason why this capacity number is not dynamically generated based on the
max frequency for each CPU? The DT property would then instead specify just
the micro-architectural differences between the CPU types.

-Sai

2016-03-20 01:15:30

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

On Fri, Mar 18, 2016 at 02:24:08PM +0000, Juri Lelli wrote:
> ARM systems may be configured to have cpus with different power/performance
> characteristics within the same chip. In this case, additional information
> has to be made available to the kernel (the scheduler in particular) for it
> to be aware of such differences and take decisions accordingly.
>
> Therefore, this patch aims at standardizing cpu capacities device tree
> bindings for ARM platforms. Bindings define cpu capacity parameter, to
> allow operating systems to retrieve such information from the device tree
> and initialize related kernel structures, paving the way for common code in
> the kernel to deal with heterogeneity.
>
> Cc: Rob Herring <[email protected]>
> Cc: Pawel Moll <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Ian Campbell <[email protected]>
> Cc: Kumar Gala <[email protected]>
> Cc: Maxime Ripard <[email protected]>
> Cc: Olof Johansson <[email protected]>
> Cc: Gregory CLEMENT <[email protected]>
> Cc: Paul Walmsley <[email protected]>
> Cc: Linus Walleij <[email protected]>
> Cc: Chen-Yu Tsai <[email protected]>
> Cc: Thomas Petazzoni <[email protected]>
> Cc: [email protected]
> Signed-off-by: Juri Lelli <[email protected]>
> ---
>
> Changes from v1:
> - removed section regarding capacity-scale
> - added information regarding normalization
> ---
> .../devicetree/bindings/arm/cpu-capacity.txt | 222 +++++++++++++++++++++
> Documentation/devicetree/bindings/arm/cpus.txt | 9 +
> 2 files changed, 231 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/arm/cpu-capacity.txt
>
> diff --git a/Documentation/devicetree/bindings/arm/cpu-capacity.txt b/Documentation/devicetree/bindings/arm/cpu-capacity.txt
> new file mode 100644
> index 0000000..fdfc453
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/cpu-capacity.txt
> @@ -0,0 +1,222 @@
> +==========================================
> +ARM CPUs capacity bindings
> +==========================================
> +
> +==========================================
> +1 - Introduction
> +==========================================
> +
> +ARM systems may be configured to have cpus with different power/performance
> +characteristics within the same chip. In this case, additional information
> +has to be made available to the kernel (the scheduler in particular) for
> +it to be aware of such differences and take decisions accordingly.
> +
> +==========================================
> +2 - CPU capacity definition
> +==========================================
> +
> +CPU capacity is a number that provides the scheduler information about CPUs
> +heterogeneity. Such heterogeneity can come from micro-architectural differences
> +(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
> +(e.g., SMP systems with multiple frequency domains). Heterogeneity in this
> +context is about differing performance characteristics; this binding tries to
> +capture a first-order approximation of the relative performance of CPUs.
> +
> +One simple way to estimate CPU capacities is to iteratively run a well-known
> +CPU user space benchmark (e.g, sysbench) on each CPU at maximum frequency and
> +then normalize values w.r.t. the best performing CPU. One can also do a
> +statistically significant study of a wide collection of benchmarks, but pros
> +of such an approach are not really evident at the time of writing.

I'll say again what I did previously. I don't have a problem this being
in DT, but I want to see a defined method for determining the value. The
above is a pretty vague statement. That can be run X to generate the
value on the cpu. Or ARM providing the "golden" value for each core. As
you said, it is only a 1st order approximation, so vendor to vendor
implementation variations should not matter.

I also worry about what happens in more complex cases with lots of
possible OPPs such as Qualcomm chips. This single value may not be
sufficient.

Rob

2016-03-21 10:52:21

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

Hi Sai,

On 18/03/16 10:49, Sai Gurrappadi wrote:
> Hi Juri,
>
> On 03/18/2016 07:24 AM, Juri Lelli wrote:
>
> <snip>
>
> > +
> > +==========================================
> > +2 - CPU capacity definition
> > +==========================================
> > +
> > +CPU capacity is a number that provides the scheduler information about CPUs
> > +heterogeneity. Such heterogeneity can come from micro-architectural differences
> > +(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
> > +(e.g., SMP systems with multiple frequency domains). Heterogeneity in this
> > +context is about differing performance characteristics; this binding tries to
> > +capture a first-order approximation of the relative performance of CPUs.
>
> Any reason why this capacity number is not dynamically generated based on the
> max frequency for each CPU? The DT property would then instead specify just
> the micro-architectural differences between the CPU types.
>

I'm not sure I clearly understand your question, so I'll try to
reiterate it.

Are you asking why we don't dynamically profile the system, at boot for
example, to get this number? Or do you ask why this number couldn't be
only describing micro-arch differences (so, if I get it right, we should
then multiply it by max freq to get the capacity of a CPU)?

We already played with the first option (please refer to v2 and v3), but
we ended up agreeing that dynamic profiling adds overhead to the boot
process (while a DT approach can provide information to speed up boot)
and it is in general not repeatable/reliable (as numbers can vary from
boot to boot for different reasons).

The second option I think can be feasible, but I'm not sure what we gain
in practice. We will still need to specify a per-platform number, right?

Best,

- Juri

2016-03-21 11:10:01

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

On 21 March 2016 at 11:53, Juri Lelli <[email protected]> wrote:
> Hi Sai,
>
> On 18/03/16 10:49, Sai Gurrappadi wrote:
>> Hi Juri,
>>
>> On 03/18/2016 07:24 AM, Juri Lelli wrote:
>>
>> <snip>
>>
>> > +
>> > +==========================================
>> > +2 - CPU capacity definition
>> > +==========================================
>> > +
>> > +CPU capacity is a number that provides the scheduler information about CPUs
>> > +heterogeneity. Such heterogeneity can come from micro-architectural differences
>> > +(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
>> > +(e.g., SMP systems with multiple frequency domains). Heterogeneity in this
>> > +context is about differing performance characteristics; this binding tries to
>> > +capture a first-order approximation of the relative performance of CPUs.
>>
>> Any reason why this capacity number is not dynamically generated based on the
>> max frequency for each CPU? The DT property would then instead specify just
>> the micro-architectural differences between the CPU types.
>>
>
> I'm not sure I clearly understand your question, so I'll try to
> reiterate it.
>
> Are you asking why we don't dynamically profile the system, at boot for
> example, to get this number? Or do you ask why this number couldn't be
> only describing micro-arch differences (so, if I get it right, we should
> then multiply it by max freq to get the capacity of a CPU)?
>
> We already played with the first option (please refer to v2 and v3), but
> we ended up agreeing that dynamic profiling adds overhead to the boot
> process (while a DT approach can provide information to speed up boot)
> and it is in general not repeatable/reliable (as numbers can vary from
> boot to boot for different reasons).
>
> The second option I think can be feasible, but I'm not sure what we gain
> in practice. We will still need to specify a per-platform number, right?

So could we use dt binding like dhrystone = <xyz> (with a unit like
DMIPS/Mhz) which can then be combined with OPP table to gives a
1st-order approximation of each CPU capacity ?

>
> Best,
>
> - Juri

2016-03-21 11:38:29

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

On 19/03/16 20:15, Rob Herring wrote:
> On Fri, Mar 18, 2016 at 02:24:08PM +0000, Juri Lelli wrote:
> > ARM systems may be configured to have cpus with different power/performance
> > characteristics within the same chip. In this case, additional information
> > has to be made available to the kernel (the scheduler in particular) for it
> > to be aware of such differences and take decisions accordingly.
> >
> > Therefore, this patch aims at standardizing cpu capacities device tree
> > bindings for ARM platforms. Bindings define cpu capacity parameter, to
> > allow operating systems to retrieve such information from the device tree
> > and initialize related kernel structures, paving the way for common code in
> > the kernel to deal with heterogeneity.
> >
> > Cc: Rob Herring <[email protected]>
> > Cc: Pawel Moll <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Cc: Ian Campbell <[email protected]>
> > Cc: Kumar Gala <[email protected]>
> > Cc: Maxime Ripard <[email protected]>
> > Cc: Olof Johansson <[email protected]>
> > Cc: Gregory CLEMENT <[email protected]>
> > Cc: Paul Walmsley <[email protected]>
> > Cc: Linus Walleij <[email protected]>
> > Cc: Chen-Yu Tsai <[email protected]>
> > Cc: Thomas Petazzoni <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Juri Lelli <[email protected]>
> > ---
> >
> > Changes from v1:
> > - removed section regarding capacity-scale
> > - added information regarding normalization
> > ---
> > .../devicetree/bindings/arm/cpu-capacity.txt | 222 +++++++++++++++++++++
> > Documentation/devicetree/bindings/arm/cpus.txt | 9 +
> > 2 files changed, 231 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/arm/cpu-capacity.txt
> >
> > diff --git a/Documentation/devicetree/bindings/arm/cpu-capacity.txt b/Documentation/devicetree/bindings/arm/cpu-capacity.txt
> > new file mode 100644
> > index 0000000..fdfc453
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/arm/cpu-capacity.txt
> > @@ -0,0 +1,222 @@
> > +==========================================
> > +ARM CPUs capacity bindings
> > +==========================================
> > +
> > +==========================================
> > +1 - Introduction
> > +==========================================
> > +
> > +ARM systems may be configured to have cpus with different power/performance
> > +characteristics within the same chip. In this case, additional information
> > +has to be made available to the kernel (the scheduler in particular) for
> > +it to be aware of such differences and take decisions accordingly.
> > +
> > +==========================================
> > +2 - CPU capacity definition
> > +==========================================
> > +
> > +CPU capacity is a number that provides the scheduler information about CPUs
> > +heterogeneity. Such heterogeneity can come from micro-architectural differences
> > +(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
> > +(e.g., SMP systems with multiple frequency domains). Heterogeneity in this
> > +context is about differing performance characteristics; this binding tries to
> > +capture a first-order approximation of the relative performance of CPUs.
> > +
> > +One simple way to estimate CPU capacities is to iteratively run a well-known
> > +CPU user space benchmark (e.g, sysbench) on each CPU at maximum frequency and
> > +then normalize values w.r.t. the best performing CPU. One can also do a
> > +statistically significant study of a wide collection of benchmarks, but pros
> > +of such an approach are not really evident at the time of writing.
>
> I'll say again what I did previously. I don't have a problem this being
> in DT, but I want to see a defined method for determining the value. The
> above is a pretty vague statement. That can be run X to generate the
> value on the cpu. Or ARM providing the "golden" value for each core. As
> you said, it is only a 1st order approximation, so vendor to vendor
> implementation variations should not matter.
>

OK, sorry if I didn't get it. :-)

What we usually do to come up with these numbers for a new platform is
really something as simple as:

- set every CPUs to performance governor
- run the following on first CPU of each cluster
# taskset '<CPUmask>' sysbench --test=cpu --num-threads=1 --max-time=10 \
run | grep "events:" | awk '{print $5}'
- normalize numbers w.r.t. highest value obtained by running the former

I'm not sure we can put something like this in the definition above, but
I wont raise any objections if we actually can. :-)

The "golden" value solution I don't think is feasible. Different
implementations of the same CPU, and different configurations of caches
etc., will end up giving different numbers. This values has to be a per
platform thing, IMHO. Also, being it a per platform and relative number,
it will be "confined" to a certain platform only (comparing capacities
across different DTs has no meaning).

> I also worry about what happens in more complex cases with lots of
> possible OPPs such as Qualcomm chips. This single value may not be
> sufficient.
>

Having many OPPs are not a problem. This value only tells about
micro-arch differences and it is used to obtain CPU scale invariance
component. We then have a frequency invariant component to handle clock
frequency differences (there is also an on-going discussion about this
[1]). The capacity values are to be obtained running at max freq.

Thanks,

- Juri

[1] https://lkml.org/lkml/2016/3/14/64

2016-03-21 11:48:46

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

Hi Vincent,

On 21/03/16 12:09, Vincent Guittot wrote:
> On 21 March 2016 at 11:53, Juri Lelli <[email protected]> wrote:
> > Hi Sai,
> >
> > On 18/03/16 10:49, Sai Gurrappadi wrote:
> >> Hi Juri,
> >>
> >> On 03/18/2016 07:24 AM, Juri Lelli wrote:
> >>
> >> <snip>
> >>
> >> > +
> >> > +==========================================
> >> > +2 - CPU capacity definition
> >> > +==========================================
> >> > +
> >> > +CPU capacity is a number that provides the scheduler information about CPUs
> >> > +heterogeneity. Such heterogeneity can come from micro-architectural differences
> >> > +(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
> >> > +(e.g., SMP systems with multiple frequency domains). Heterogeneity in this
> >> > +context is about differing performance characteristics; this binding tries to
> >> > +capture a first-order approximation of the relative performance of CPUs.
> >>
> >> Any reason why this capacity number is not dynamically generated based on the
> >> max frequency for each CPU? The DT property would then instead specify just
> >> the micro-architectural differences between the CPU types.
> >>
> >
> > I'm not sure I clearly understand your question, so I'll try to
> > reiterate it.
> >
> > Are you asking why we don't dynamically profile the system, at boot for
> > example, to get this number? Or do you ask why this number couldn't be
> > only describing micro-arch differences (so, if I get it right, we should
> > then multiply it by max freq to get the capacity of a CPU)?
> >
> > We already played with the first option (please refer to v2 and v3), but
> > we ended up agreeing that dynamic profiling adds overhead to the boot
> > process (while a DT approach can provide information to speed up boot)
> > and it is in general not repeatable/reliable (as numbers can vary from
> > boot to boot for different reasons).
> >
> > The second option I think can be feasible, but I'm not sure what we gain
> > in practice. We will still need to specify a per-platform number, right?
>
> So could we use dt binding like dhrystone = <xyz> (with a unit like
> DMIPS/Mhz) which can then be combined with OPP table to gives a
> 1st-order approximation of each CPU capacity ?
>

But we'll still need to normalize this w.r.t the highest score we get on
a specific platform, right? And while we are at normalizing it, it is
probably simpler if we keep the frequency component as part of the
number, IMHO. But, maybe keeping the frequency component separate is
more acceptable from a DT binding perspective?

Thanks,

- Juri

2016-03-21 12:12:34

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

On Mon, Mar 21, 2016 at 11:49:56AM +0000, Juri Lelli wrote:

> But we'll still need to normalize this w.r.t the highest score we get on
> a specific platform, right? And while we are at normalizing it, it is
> probably simpler if we keep the frequency component as part of the
> number, IMHO. But, maybe keeping the frequency component separate is
> more acceptable from a DT binding perspective?

One possible issue with that: if we keep the frequency number as part of
the core number then that might cause issues for devices with variants
or system deployment decisions that remove some OPPs from a table. If
the top OPP gets removed that would throw off the numbers.


Attachments:
(No filename) (672.00 B)
signature.asc (473.00 B)
Download all attachments

2016-03-21 17:23:39

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

Hi Mark,

On 21/03/16 12:12, Mark Brown wrote:
> On Mon, Mar 21, 2016 at 11:49:56AM +0000, Juri Lelli wrote:
>
> > But we'll still need to normalize this w.r.t the highest score we get on
> > a specific platform, right? And while we are at normalizing it, it is
> > probably simpler if we keep the frequency component as part of the
> > number, IMHO. But, maybe keeping the frequency component separate is
> > more acceptable from a DT binding perspective?
>
> One possible issue with that: if we keep the frequency number as part of
> the core number then that might cause issues for devices with variants
> or system deployment decisions that remove some OPPs from a table. If
> the top OPP gets removed that would throw off the numbers.

If we want to remove the frequency number from the capacity values I
think what we could do is:

- agree on a benchmark (it seems to me this is what Rob is also asking
for) (e.g., sysbench); this is maybe optional, as what below should
work for any kind of benchmark for which events/operations performed
can be measured
- for that benchmark measure the number of operations performed in a
second (e.g., sysbench number of events per second or SEPS)
- divide the number for the frequency we did the profiling at (e.g.,
SEPS/MHz * 1024, to end up with an integer number)
- normalize values and put them in DT (IMHO, we don't want absolute
values there)

To compute the capacities at boot we then have to:

- multiply the value parsed from DT by the max frequency (e.g.,
SEPS/MHz * max_freq)
- normalize capacities obtained with the step above w.r.t. the max
capacity of the system

I think this should work, but we have to understand how do we obtain the
max frequency of each cluster while parsing DT. OPP bindings are
helpful, but AFAIK there are platforms for which firmware is responsible
for setting up and advertise available OPPs. I'm not sure if this
happens later on during the boot process. We might still be able to use
the clock-frequency property in this case, but that might need changing
again if the top OPP gets removed/changed.

OTH, we might simply want to say that capacity values are to be obtained
once the platform is "stable" (no additional changes to configuration,
OPPs, etc.). But this is maybe not acceptable?

Also, I fear that for variants of a particular implementation we will
still have to redo the profiling anyway (like we alreaady did for Juno
and Juno-r2 for example).

Thanks,

- Juri

2016-03-21 17:51:34

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

On Mon, Mar 21, 2016 at 05:24:52PM +0000, Juri Lelli wrote:

> I think this should work, but we have to understand how do we obtain the
> max frequency of each cluster while parsing DT. OPP bindings are
> helpful, but AFAIK there are platforms for which firmware is responsible
> for setting up and advertise available OPPs. I'm not sure if this
> happens later on during the boot process. We might still be able to use
> the clock-frequency property in this case, but that might need changing
> again if the top OPP gets removed/changed.

> OTH, we might simply want to say that capacity values are to be obtained
> once the platform is "stable" (no additional changes to configuration,
> OPPs, etc.). But this is maybe not acceptable?

How about we just punt and let the cpufreq driver tell us - it can parse
DT, use built in tables or whatever? We could even remember the raw
values and recalculate if it ever decides to change for some reason.
Until cpufreq comes up we'll be stuck at whatever OPP that we're at on
startup which may not match whatever we define the numbers relative to
anyway.

> Also, I fear that for variants of a particular implementation we will
> still have to redo the profiling anyway (like we alreaady did for Juno
> and Juno-r2 for example).

This sometimes happens either through binning or through board design
decisions rather than through new silicon so we might be able to reuse.


Attachments:
(No filename) (1.38 kB)
signature.asc (473.00 B)
Download all attachments

2016-03-21 19:25:32

by Sai Gurrappadi

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings



On 03/21/2016 03:53 AM, Juri Lelli wrote:
> Hi Sai,
>
> On 18/03/16 10:49, Sai Gurrappadi wrote:
>> Hi Juri,
>>
>> On 03/18/2016 07:24 AM, Juri Lelli wrote:
>>
>> <snip>
>>
>>> +
>>> +==========================================
>>> +2 - CPU capacity definition
>>> +==========================================
>>> +
>>> +CPU capacity is a number that provides the scheduler information about CPUs
>>> +heterogeneity. Such heterogeneity can come from micro-architectural differences
>>> +(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
>>> +(e.g., SMP systems with multiple frequency domains). Heterogeneity in this
>>> +context is about differing performance characteristics; this binding tries to
>>> +capture a first-order approximation of the relative performance of CPUs.
>>
>> Any reason why this capacity number is not dynamically generated based on the
>> max frequency for each CPU? The DT property would then instead specify just
>> the micro-architectural differences between the CPU types.
>>
>
> I'm not sure I clearly understand your question, so I'll try to
> reiterate it.
>
> Are you asking why we don't dynamically profile the system, at boot for
> example, to get this number? Or do you ask why this number couldn't be
> only describing micro-arch differences (so, if I get it right, we should
> then multiply it by max freq to get the capacity of a CPU)?
>
> We already played with the first option (please refer to v2 and v3), but
> we ended up agreeing that dynamic profiling adds overhead to the boot
> process (while a DT approach can provide information to speed up boot)
> and it is in general not repeatable/reliable (as numbers can vary from
> boot to boot for different reasons).
>
> The second option I think can be feasible, but I'm not sure what we gain
> in practice. We will still need to specify a per-platform number, right?

I meant the second bit. We only need some per-platform fudge factor for
micro-architectural differences. Tying in the Fmax like this statically means
that we need to manually synchronize DVFS tables and this number which seems
unnecessary given that the kernel already has this info on boot.

>
> Best,
>
> - Juri
>

2016-03-22 09:49:16

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] Documentation: arm: define DT cpu capacity bindings

On 21/03/16 17:51, Mark Brown wrote:
> On Mon, Mar 21, 2016 at 05:24:52PM +0000, Juri Lelli wrote:
>
> > I think this should work, but we have to understand how do we obtain the
> > max frequency of each cluster while parsing DT. OPP bindings are
> > helpful, but AFAIK there are platforms for which firmware is responsible
> > for setting up and advertise available OPPs. I'm not sure if this
> > happens later on during the boot process. We might still be able to use
> > the clock-frequency property in this case, but that might need changing
> > again if the top OPP gets removed/changed.
>
> > OTH, we might simply want to say that capacity values are to be obtained
> > once the platform is "stable" (no additional changes to configuration,
> > OPPs, etc.). But this is maybe not acceptable?
>
> How about we just punt and let the cpufreq driver tell us - it can parse
> DT, use built in tables or whatever? We could even remember the raw
> values and recalculate if it ever decides to change for some reason.
> Until cpufreq comes up we'll be stuck at whatever OPP that we're at on
> startup which may not match whatever we define the numbers relative to
> anyway.
>

OK, I'll try and see how that can be done.

Thanks,

- Juri

> > Also, I fear that for variants of a particular implementation we will
> > still have to redo the profiling anyway (like we alreaady did for Juno
> > and Juno-r2 for example).
>
> This sometimes happens either through binning or through board design
> decisions rather than through new silicon so we might be able to reuse.