LinuxLists.cc - [RFC 0/3] Unify CPU topology across ARM64 & RISC-V

2018-11-09 01:51:35

Subject: [RFC 0/3] Unify CPU topology across ARM64 & RISC-V

The cpu-map DT entry in ARM64 can describe the CPU topology in
much better way compared to other existing approaches. RISC-V can
easily adopt this binding to represent it's own CPU topology.
Thus, both cpu-map DT binding and topology parsing code can be
moved to a common location so that RISC-V or any other
architecture can leverage that.

The relevant discussion regarding unifying cpu topology can be
found in [1].

arch_topology seems to be a perfect place to move the common
code. I have not introduced any functional changes in the moved
to code. The only downside in this approach is that the capacity
code will be executed for RISC-V as well. But, it will exit
immediately after not able to find the appropriate DT node. If
the overhead is considered too much, we can always compile out
capacity related functions under a different config for the
architectures that do not support them.

The patches have been tested for RISC-V and compile tested for
ARM64.

The socket changes[2] can be merged on top of this series or vice
versa.

[1] https://lkml.org/lkml/2018/11/6/19
[2] https://lkml.org/lkml/2018/11/7/918

Atish Patra (3):
dt-binding: cpu-topology: Move cpu-map to a common binding.
cpu-topology: Move cpu topology code to common code.
RISC-V: Parse cpu topology during boot.

Documentation/devicetree/bindings/arm/topology.txt | 475 -------------------
.../devicetree/bindings/cpu/cpu-topology.txt | 526 +++++++++++++++++++++
arch/arm64/include/asm/topology.h | 23 +-
arch/arm64/kernel/topology.c | 305 +-----------
arch/riscv/Kconfig | 1 +
arch/riscv/kernel/smpboot.c | 6 +-
drivers/base/arch_topology.c | 303 ++++++++++++
include/linux/arch_topology.h | 23 +
include/linux/topology.h | 1 +
9 files changed, 864 insertions(+), 799 deletions(-)
delete mode 100644 Documentation/devicetree/bindings/arm/topology.txt
create mode 100644 Documentation/devicetree/bindings/cpu/cpu-topology.txt

--
2.7.4

2018-11-09 01:52:10

by Atish Patra

[permalink] [raw]

Subject: [RFC 2/3] cpu-topology: Move cpu topology code to common code.

Both RISC-V & ARM64 are using cpu-map device tree to describe
their cpu topology. It's better to move the relevant code to
a common place instead of duplicate code.
No functional changes done.

Signed-off-by: Atish Patra <[email protected]>
---
arch/arm64/include/asm/topology.h | 23 +--
arch/arm64/kernel/topology.c | 305 +-------------------------------------
drivers/base/arch_topology.c | 303 +++++++++++++++++++++++++++++++++++++
include/linux/arch_topology.h | 23 +++
include/linux/topology.h | 1 +
5 files changed, 332 insertions(+), 323 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 0524f243..5371a8f7 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -4,29 +4,8 @@

#include <linux/cpumask.h>

-struct cpu_topology {
- int thread_id;
- int core_id;
- int package_id;
- int llc_id;
- cpumask_t thread_sibling;
- cpumask_t core_sibling;
- cpumask_t llc_sibling;
-};
-
-extern struct cpu_topology cpu_topology[NR_CPUS];
-
-#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id)
-#define topology_core_id(cpu) (cpu_topology[cpu].core_id)
-#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)
-#define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
-#define topology_llc_cpumask(cpu) (&cpu_topology[cpu].llc_sibling)
-
-void init_cpu_topology(void);
void store_cpu_topology(unsigned int cpuid);
-void remove_cpu_topology(unsigned int cpuid);
-const struct cpumask *cpu_coregroup_mask(int cpu);
-
+int parse_cpu_topology(void);
#ifdef CONFIG_NUMA

struct pci_bus;
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 0825c4a8..f28a710d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -14,250 +14,13 @@
#include <linux/acpi.h>
#include <linux/arch_topology.h>
#include <linux/cacheinfo.h>
-#include <linux/cpu.h>
-#include <linux/cpumask.h>
#include <linux/init.h>
#include <linux/percpu.h>
-#include <linux/node.h>
-#include <linux/nodemask.h>
-#include <linux/of.h>
-#include <linux/sched.h>
-#include <linux/sched/topology.h>
-#include <linux/slab.h>
-#include <linux/smp.h>
-#include <linux/string.h>

#include <asm/cpu.h>
#include <asm/cputype.h>
#include <asm/topology.h>

-static int __init get_cpu_for_node(struct device_node *node)
-{
- struct device_node *cpu_node;
- int cpu;
-
- cpu_node = of_parse_phandle(node, "cpu", 0);
- if (!cpu_node)
- return -1;
-
- cpu = of_cpu_node_to_id(cpu_node);
- if (cpu >= 0)
- topology_parse_cpu_capacity(cpu_node, cpu);
- else
- pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
-
- of_node_put(cpu_node);
- return cpu;
-}
-
-static int __init parse_core(struct device_node *core, int package_id,
- int core_id)
-{
- char name[10];
- bool leaf = true;
- int i = 0;
- int cpu;
- struct device_node *t;
-
- do {
- snprintf(name, sizeof(name), "thread%d", i);
- t = of_get_child_by_name(core, name);
- if (t) {
- leaf = false;
- cpu = get_cpu_for_node(t);
- if (cpu >= 0) {
- cpu_topology[cpu].package_id = package_id;
- cpu_topology[cpu].core_id = core_id;
- cpu_topology[cpu].thread_id = i;
- } else {
- pr_err("%pOF: Can't get CPU for thread\n",
- t);
- of_node_put(t);
- return -EINVAL;
- }
- of_node_put(t);
- }
- i++;
- } while (t);
-
- cpu = get_cpu_for_node(core);
- if (cpu >= 0) {
- if (!leaf) {
- pr_err("%pOF: Core has both threads and CPU\n",
- core);
- return -EINVAL;
- }
-
- cpu_topology[cpu].package_id = package_id;
- cpu_topology[cpu].core_id = core_id;
- } else if (leaf) {
- pr_err("%pOF: Can't get CPU for leaf core\n", core);
- return -EINVAL;
- }
-
- return 0;
-}
-
-static int __init parse_cluster(struct device_node *cluster, int depth)
-{
- char name[10];
- bool leaf = true;
- bool has_cores = false;
- struct device_node *c;
- static int package_id __initdata;
- int core_id = 0;
- int i, ret;
-
- /*
- * First check for child clusters; we currently ignore any
- * information about the nesting of clusters and present the
- * scheduler with a flat list of them.
- */
- i = 0;
- do {
- snprintf(name, sizeof(name), "cluster%d", i);
- c = of_get_child_by_name(cluster, name);
- if (c) {
- leaf = false;
- ret = parse_cluster(c, depth + 1);
- of_node_put(c);
- if (ret != 0)
- return ret;
- }
- i++;
- } while (c);
-
- /* Now check for cores */
- i = 0;
- do {
- snprintf(name, sizeof(name), "core%d", i);
- c = of_get_child_by_name(cluster, name);
- if (c) {
- has_cores = true;
-
- if (depth == 0) {
- pr_err("%pOF: cpu-map children should be clusters\n",
- c);
- of_node_put(c);
- return -EINVAL;
- }
-
- if (leaf) {
- ret = parse_core(c, package_id, core_id++);
- } else {
- pr_err("%pOF: Non-leaf cluster with core %s\n",
- cluster, name);
- ret = -EINVAL;
- }
-
- of_node_put(c);
- if (ret != 0)
- return ret;
- }
- i++;
- } while (c);
-
- if (leaf && !has_cores)
- pr_warn("%pOF: empty cluster\n", cluster);
-
- if (leaf)
- package_id++;
-
- return 0;
-}
-
-static int __init parse_dt_topology(void)
-{
- struct device_node *cn, *map;
- int ret = 0;
- int cpu;
-
- cn = of_find_node_by_path("/cpus");
- if (!cn) {
- pr_err("No CPU information found in DT\n");
- return 0;
- }
-
- /*
- * When topology is provided cpu-map is essentially a root
- * cluster with restricted subnodes.
- */
- map = of_get_child_by_name(cn, "cpu-map");
- if (!map)
- goto out;
-
- ret = parse_cluster(map, 0);
- if (ret != 0)
- goto out_map;
-
- topology_normalize_cpu_scale();
-
- /*
- * Check that all cores are in the topology; the SMP code will
- * only mark cores described in the DT as possible.
- */
- for_each_possible_cpu(cpu)
- if (cpu_topology[cpu].package_id == -1)
- ret = -EINVAL;
-
-out_map:
- of_node_put(map);
-out:
- of_node_put(cn);
- return ret;
-}
-
-/*
- * cpu topology table
- */
-struct cpu_topology cpu_topology[NR_CPUS];
-EXPORT_SYMBOL_GPL(cpu_topology);
-
-const struct cpumask *cpu_coregroup_mask(int cpu)
-{
- const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu));
-
- /* Find the smaller of NUMA, core or LLC siblings */
- if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) {
- /* not numa in package, lets use the package siblings */
- core_mask = &cpu_topology[cpu].core_sibling;
- }
- if (cpu_topology[cpu].llc_id != -1) {
- if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
- core_mask = &cpu_topology[cpu].llc_sibling;
- }
-
- return core_mask;
-}
-
-static void update_siblings_masks(unsigned int cpuid)
-{
- struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid];
- int cpu;
-
- /* update core and thread sibling masks */
- for_each_online_cpu(cpu) {
- cpu_topo = &cpu_topology[cpu];
-
- if (cpuid_topo->llc_id == cpu_topo->llc_id) {
- cpumask_set_cpu(cpu, &cpuid_topo->llc_sibling);
- cpumask_set_cpu(cpuid, &cpu_topo->llc_sibling);
- }
-
- if (cpuid_topo->package_id != cpu_topo->package_id)
- continue;
-
- cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
- cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
-
- if (cpuid_topo->core_id != cpu_topo->core_id)
- continue;
-
- cpumask_set_cpu(cpuid, &cpu_topo->thread_sibling);
- cpumask_set_cpu(cpu, &cpuid_topo->thread_sibling);
- }
-}
-
void store_cpu_topology(unsigned int cpuid)
{
struct cpu_topology *cpuid_topo = &cpu_topology[cpuid];
@@ -296,59 +59,18 @@ void store_cpu_topology(unsigned int cpuid)
update_siblings_masks(cpuid);
}

-static void clear_cpu_topology(int cpu)
-{
- struct cpu_topology *cpu_topo = &cpu_topology[cpu];
-
- cpumask_clear(&cpu_topo->llc_sibling);
- cpumask_set_cpu(cpu, &cpu_topo->llc_sibling);
-
- cpumask_clear(&cpu_topo->core_sibling);
- cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
- cpumask_clear(&cpu_topo->thread_sibling);
- cpumask_set_cpu(cpu, &cpu_topo->thread_sibling);
-}
-
-static void __init reset_cpu_topology(void)
-{
- unsigned int cpu;
-
- for_each_possible_cpu(cpu) {
- struct cpu_topology *cpu_topo = &cpu_topology[cpu];
-
- cpu_topo->thread_id = -1;
- cpu_topo->core_id = 0;
- cpu_topo->package_id = -1;
- cpu_topo->llc_id = -1;
-
- clear_cpu_topology(cpu);
- }
-}
-
-void remove_cpu_topology(unsigned int cpu)
-{
- int sibling;
-
- for_each_cpu(sibling, topology_core_cpumask(cpu))
- cpumask_clear_cpu(cpu, topology_core_cpumask(sibling));
- for_each_cpu(sibling, topology_sibling_cpumask(cpu))
- cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
- for_each_cpu(sibling, topology_llc_cpumask(cpu))
- cpumask_clear_cpu(cpu, topology_llc_cpumask(sibling));
-
- clear_cpu_topology(cpu);
-}
-
-#ifdef CONFIG_ACPI
/*
* Propagate the topology information of the processor_topology_node tree to the
* cpu_topology array.
*/
-static int __init parse_acpi_topology(void)
+int __init parse_acpi_topology(void)
{
bool is_threaded;
int cpu, topology_id;

+ if (acpi_disabled)
+ return 0;
+
is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;

for_each_possible_cpu(cpu) {
@@ -385,23 +107,4 @@ static int __init parse_acpi_topology(void)
return 0;
}

-#else
-static inline int __init parse_acpi_topology(void)
-{
- return -EINVAL;
-}
-#endif

-void __init init_cpu_topology(void)
-{
- reset_cpu_topology();
-
- /*
- * Discard anything that was parsed if we hit an error so we
- * don't use partial information.
- */
- if (!acpi_disabled && parse_acpi_topology())
- reset_cpu_topology();
- else if (of_have_populated_dt() && parse_dt_topology())
- reset_cpu_topology();
-}
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index edfcf8d9..1dc502ea 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -16,6 +16,11 @@
#include <linux/string.h>
#include <linux/sched/topology.h>
#include <linux/cpuset.h>
+#include <linux/cpumask.h>
+#include <linux/init.h>
+#include <linux/percpu.h>
+#include <linux/sched.h>
+#include <linux/smp.h>

DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE;

@@ -278,3 +283,301 @@ static void parsing_done_workfn(struct work_struct *work)
#else
core_initcall(free_raw_capacity);
#endif
+
+static int __init get_cpu_for_node(struct device_node *node)
+{
+ struct device_node *cpu_node;
+ int cpu;
+
+ cpu_node = of_parse_phandle(node, "cpu", 0);
+ if (!cpu_node)
+ return -1;
+
+ cpu = of_cpu_node_to_id(cpu_node);
+ if (cpu >= 0)
+ topology_parse_cpu_capacity(cpu_node, cpu);
+ else
+ pr_crit("Unable to find CPU node for %pOF\n", cpu_node);
+
+ of_node_put(cpu_node);
+ return cpu;
+}
+
+static int __init parse_core(struct device_node *core, int package_id,
+ int core_id)
+{
+ char name[10];
+ bool leaf = true;
+ int i = 0;
+ int cpu;
+ struct device_node *t;
+
+ do {
+ snprintf(name, sizeof(name), "thread%d", i);
+ t = of_get_child_by_name(core, name);
+ if (t) {
+ leaf = false;
+ cpu = get_cpu_for_node(t);
+ if (cpu >= 0) {
+ cpu_topology[cpu].package_id = package_id;
+ cpu_topology[cpu].core_id = core_id;
+ cpu_topology[cpu].thread_id = i;
+ } else {
+ pr_err("%pOF: Can't get CPU for thread\n",
+ t);
+ of_node_put(t);
+ return -EINVAL;
+ }
+ of_node_put(t);
+ }
+ i++;
+ } while (t);
+
+ cpu = get_cpu_for_node(core);
+ if (cpu >= 0) {
+ if (!leaf) {
+ pr_err("%pOF: Core has both threads and CPU\n",
+ core);
+ return -EINVAL;
+ }
+
+ cpu_topology[cpu].package_id = package_id;
+ cpu_topology[cpu].core_id = core_id;
+ } else if (leaf) {
+ pr_err("%pOF: Can't get CPU for leaf core\n", core);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int __init parse_cluster(struct device_node *cluster, int depth)
+{
+ char name[10];
+ bool leaf = true;
+ bool has_cores = false;
+ struct device_node *c;
+ static int package_id __initdata;
+ int core_id = 0;
+ int i, ret;
+
+ /*
+ * First check for child clusters; we currently ignore any
+ * information about the nesting of clusters and present the
+ * scheduler with a flat list of them.
+ */
+ i = 0;
+ do {
+ snprintf(name, sizeof(name), "cluster%d", i);
+ c = of_get_child_by_name(cluster, name);
+ if (c) {
+ leaf = false;
+ ret = parse_cluster(c, depth + 1);
+ of_node_put(c);
+ if (ret != 0)
+ return ret;
+ }
+ i++;
+ } while (c);
+
+ /* Now check for cores */
+ i = 0;
+ do {
+ snprintf(name, sizeof(name), "core%d", i);
+ c = of_get_child_by_name(cluster, name);
+ if (c) {
+ has_cores = true;
+
+ if (depth == 0) {
+ pr_err("%pOF: cpu-map children should be clusters\n",
+ c);
+ of_node_put(c);
+ return -EINVAL;
+ }
+
+ if (leaf) {
+ ret = parse_core(c, package_id, core_id++);
+ } else {
+ pr_err("%pOF: Non-leaf cluster with core %s\n",
+ cluster, name);
+ ret = -EINVAL;
+ }
+
+ of_node_put(c);
+ if (ret != 0)
+ return ret;
+ }
+ i++;
+ } while (c);
+
+ if (leaf && !has_cores)
+ pr_warn("%pOF: empty cluster\n", cluster);
+
+ if (leaf)
+ package_id++;
+
+ return 0;
+}
+
+static int __init parse_dt_topology(void)
+{
+ struct device_node *cn, *map;
+ int ret = 0;
+ int cpu;
+
+ cn = of_find_node_by_path("/cpus");
+ if (!cn) {
+ pr_err("No CPU information found in DT\n");
+ return 0;
+ }
+
+ /*
+ * When topology is provided cpu-map is essentially a root
+ * cluster with restricted subnodes.
+ */
+ map = of_get_child_by_name(cn, "cpu-map");
+ if (!map)
+ goto out;
+
+ ret = parse_cluster(map, 0);
+ if (ret != 0)
+ goto out_map;
+
+ topology_normalize_cpu_scale();
+
+ /*
+ * Check that all cores are in the topology; the SMP code will
+ * only mark cores described in the DT as possible.
+ */
+ for_each_possible_cpu(cpu)
+ if (cpu_topology[cpu].package_id == -1)
+ ret = -EINVAL;
+
+out_map:
+ of_node_put(map);
+out:
+ of_node_put(cn);
+ return ret;
+}
+
+/*
+ * cpu topology table
+ */
+struct cpu_topology cpu_topology[NR_CPUS];
+EXPORT_SYMBOL_GPL(cpu_topology);
+
+const struct cpumask *cpu_coregroup_mask(int cpu)
+{
+ const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu));
+
+ /* Find the smaller of NUMA, core or LLC siblings */
+ if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) {
+ /* not numa in package, lets use the package siblings */
+ core_mask = &cpu_topology[cpu].core_sibling;
+ }
+ if (cpu_topology[cpu].llc_id != -1) {
+ if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
+ core_mask = &cpu_topology[cpu].llc_sibling;
+ }
+
+ return core_mask;
+}
+
+void update_siblings_masks(unsigned int cpuid)
+{
+ struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid];
+ int cpu;
+
+ /* update core and thread sibling masks */
+ for_each_online_cpu(cpu) {
+ cpu_topo = &cpu_topology[cpu];
+
+ if (cpuid_topo->llc_id == cpu_topo->llc_id) {
+ cpumask_set_cpu(cpu, &cpuid_topo->llc_sibling);
+ cpumask_set_cpu(cpuid, &cpu_topo->llc_sibling);
+ }
+
+ if (cpuid_topo->package_id != cpu_topo->package_id)
+ continue;
+
+ cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
+ cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
+
+ if (cpuid_topo->core_id != cpu_topo->core_id)
+ continue;
+
+ cpumask_set_cpu(cpuid, &cpu_topo->thread_sibling);
+ cpumask_set_cpu(cpu, &cpuid_topo->thread_sibling);
+ }
+}
+
+static void clear_cpu_topology(int cpu)
+{
+ struct cpu_topology *cpu_topo = &cpu_topology[cpu];
+
+ cpumask_clear(&cpu_topo->llc_sibling);
+ cpumask_set_cpu(cpu, &cpu_topo->llc_sibling);
+
+ cpumask_clear(&cpu_topo->core_sibling);
+ cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
+ cpumask_clear(&cpu_topo->thread_sibling);
+ cpumask_set_cpu(cpu, &cpu_topo->thread_sibling);
+}
+
+static void __init reset_cpu_topology(void)
+{
+ unsigned int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct cpu_topology *cpu_topo = &cpu_topology[cpu];
+
+ cpu_topo->thread_id = -1;
+ cpu_topo->core_id = 0;
+ cpu_topo->package_id = -1;
+ cpu_topo->llc_id = -1;
+
+ clear_cpu_topology(cpu);
+ }
+}
+
+void remove_cpu_topology(unsigned int cpu)
+{
+ int sibling;
+
+ for_each_cpu(sibling, topology_core_cpumask(cpu))
+ cpumask_clear_cpu(cpu, topology_core_cpumask(sibling));
+ for_each_cpu(sibling, topology_sibling_cpumask(cpu))
+ cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
+ for_each_cpu(sibling, topology_llc_cpumask(cpu))
+ cpumask_clear_cpu(cpu, topology_llc_cpumask(sibling));
+
+ clear_cpu_topology(cpu);
+}
+
+
+#ifdef CONFIG_ACPI
+__weak int __init parse_acpi_topology(void)
+{
+ return 0;
+}
+
+#else
+static inline int __init parse_acpi_topology(void)
+{
+ return 0;
+}
+#endif
+
+void __init init_cpu_topology(void)
+{
+ reset_cpu_topology();
+
+ /*
+ * Discard anything that was parsed if we hit an error so we
+ * don't use partial information.
+ */
+ if (parse_acpi_topology())
+ reset_cpu_topology();
+ else if (of_have_populated_dt() && parse_dt_topology())
+ reset_cpu_topology();
+}
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index d9bdc1a7..8f412906 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -8,6 +8,29 @@
#include <linux/types.h>
#include <linux/percpu.h>

+struct cpu_topology {
+ int thread_id;
+ int core_id;
+ int package_id;
+ int llc_id;
+ cpumask_t thread_sibling;
+ cpumask_t core_sibling;
+ cpumask_t llc_sibling;
+};
+
+extern struct cpu_topology cpu_topology[NR_CPUS];
+
+#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id)
+#define topology_core_id(cpu) (cpu_topology[cpu].core_id)
+#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)
+#define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
+#define topology_llc_cpumask(cpu) (&cpu_topology[cpu].llc_sibling)
+
+void init_cpu_topology(void);
+void update_siblings_masks(unsigned int cpu);
+void remove_cpu_topology(unsigned int cpuid);
+const struct cpumask *cpu_coregroup_mask(int cpu);
+
void topology_normalize_cpu_scale(void);
int topology_update_cpu_topology(void);

diff --git a/include/linux/topology.h b/include/linux/topology.h
index cb0775e1..81501fac 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -32,6 +32,7 @@
#include <linux/mmzone.h>
#include <linux/smp.h>
#include <linux/percpu.h>
+#include <linux/arch_topology.h>
#include <asm/topology.h>

#ifndef nr_cpus_node
--
2.7.4

2018-11-09 01:52:14

by Atish Patra

[permalink] [raw]

Subject: [RFC 3/3] RISC-V: Parse cpu topology during boot.

Currently, there are no topology defined for RISC-V.
Parse the cpu-map node from device tree and setup the
cpu topology.

CPU topology after applying the patch.
$cat /sys/devices/system/cpu/cpu2/topology/core_siblings_list
0-3
$cat /sys/devices/system/cpu/cpu3/topology/core_siblings_list
0-3
$cat /sys/devices/system/cpu/cpu3/topology/physical_package_id
0
$cat /sys/devices/system/cpu/cpu3/topology/core_id
3

Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/Kconfig | 1 +
arch/riscv/kernel/smpboot.c | 6 +++++-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 55da93f4..e2d8ddb5 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -42,6 +42,7 @@ config RISCV
select THREAD_INFO_IN_TASK
select RISCV_TIMER
select GENERIC_IRQ_MULTI_HANDLER
+ select GENERIC_ARCH_TOPOLOGY
select ARCH_HAS_PTE_SPECIAL

config MMU
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index 18cda0e8..6fa95442 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -31,6 +31,7 @@
#include <linux/of.h>
#include <linux/sched/task_stack.h>
#include <linux/sched/mm.h>
+#include <linux/arch_topology.h>
#include <asm/irq.h>
#include <asm/mmu_context.h>
#include <asm/tlbflush.h>
@@ -42,6 +43,7 @@ void *__cpu_up_task_pointer[NR_CPUS];

void __init smp_prepare_boot_cpu(void)
{
+ init_cpu_topology();
}

void __init smp_prepare_cpus(unsigned int max_cpus)
@@ -108,13 +110,15 @@ void __init smp_cpus_done(unsigned int max_cpus)
asmlinkage void __init smp_callin(void)
{
struct mm_struct *mm = &init_mm;
+ unsigned int cpu = smp_processor_id();

/* All kernel threads share the same mm context. */
mmgrab(mm);
current->active_mm = mm;

trap_init();
- notify_cpu_starting(smp_processor_id());
+ notify_cpu_starting(cpu);
+ update_siblings_masks(cpu);
set_cpu_online(smp_processor_id(), 1);
/*
* Remote TLB flushes are ignored while the CPU is offline, so emit
--
2.7.4

2018-11-09 01:52:27

by Atish Patra

[permalink] [raw]

Subject: [RFC 1/3] dt-binding: cpu-topology: Move cpu-map to a common binding.

cpu-map binding can be used to described cpu topology for both
RISC-V & ARM. It makes more sense to move the binding to document
to a common place.

The relevant discussion can be found here.
https://lkml.org/lkml/2018/11/6/19

Signed-off-by: Atish Patra <[email protected]>
---
Documentation/devicetree/bindings/arm/topology.txt | 475 -------------------
.../devicetree/bindings/cpu/cpu-topology.txt | 526 +++++++++++++++++++++
2 files changed, 526 insertions(+), 475 deletions(-)
delete mode 100644 Documentation/devicetree/bindings/arm/topology.txt
create mode 100644 Documentation/devicetree/bindings/cpu/cpu-topology.txt

diff --git a/Documentation/devicetree/bindings/arm/topology.txt b/Documentation/devicetree/bindings/arm/topology.txt
deleted file mode 100644
index de9eb048..00000000
--- a/Documentation/devicetree/bindings/arm/topology.txt
+++ /dev/null
@@ -1,475 +0,0 @@
-===========================================
-ARM topology binding description
-===========================================
-
-===========================================
-1 - Introduction
-===========================================
-
-In an ARM system, the hierarchy of CPUs is defined through three entities that
-are used to describe the layout of physical CPUs in the system:
-
-- cluster
-- core
-- thread
-
-The cpu nodes (bindings defined in [1]) represent the devices that
-correspond to physical CPUs and are to be mapped to the hierarchy levels.
-
-The bottom hierarchy level sits at core or thread level depending on whether
-symmetric multi-threading (SMT) is supported or not.
-
-For instance in a system where CPUs support SMT, "cpu" nodes represent all
-threads existing in the system and map to the hierarchy level "thread" above.
-In systems where SMT is not supported "cpu" nodes represent all cores present
-in the system and map to the hierarchy level "core" above.
-
-ARM topology bindings allow one to associate cpu nodes with hierarchical groups
-corresponding to the system hierarchy; syntactically they are defined as device
-tree nodes.
-
-The remainder of this document provides the topology bindings for ARM, based
-on the Devicetree Specification, available from:
-
-https://www.devicetree.org/specifications/
-
-If not stated otherwise, whenever a reference to a cpu node phandle is made its
-value must point to a cpu node compliant with the cpu node bindings as
-documented in [1].
-A topology description containing phandles to cpu nodes that are not compliant
-with bindings standardized in [1] is therefore considered invalid.
-
-===========================================
-2 - cpu-map node
-===========================================
-
-The ARM CPU topology is defined within the cpu-map node, which is a direct
-child of the cpus node and provides a container where the actual topology
-nodes are listed.
-
-- cpu-map node
-
- Usage: Optional - On ARM SMP systems provide CPUs topology to the OS.
- ARM uniprocessor systems do not require a topology
- description and therefore should not define a
- cpu-map node.
-
- Description: The cpu-map node is just a container node where its
- subnodes describe the CPU topology.
-
- Node name must be "cpu-map".
-
- The cpu-map node's parent node must be the cpus node.
-
- The cpu-map node's child nodes can be:
-
- - one or more cluster nodes
-
- Any other configuration is considered invalid.
-
-The cpu-map node can only contain three types of child nodes:
-
-- cluster node
-- core node
-- thread node
-
-whose bindings are described in paragraph 3.
-
-The nodes describing the CPU topology (cluster/core/thread) can only
-be defined within the cpu-map node and every core/thread in the system
-must be defined within the topology. Any other configuration is
-invalid and therefore must be ignored.
-
-===========================================
-2.1 - cpu-map child nodes naming convention
-===========================================
-
-cpu-map child nodes must follow a naming convention where the node name
-must be "clusterN", "coreN", "threadN" depending on the node type (ie
-cluster/core/thread) (where N = {0, 1, ...} is the node number; nodes which
-are siblings within a single common parent node must be given a unique and
-sequential N value, starting from 0).
-cpu-map child nodes which do not share a common parent node can have the same
-name (ie same number N as other cpu-map child nodes at different device tree
-levels) since name uniqueness will be guaranteed by the device tree hierarchy.
-
-===========================================
-3 - cluster/core/thread node bindings
-===========================================
-
-Bindings for cluster/cpu/thread nodes are defined as follows:
-
-- cluster node
-
- Description: must be declared within a cpu-map node, one node
- per cluster. A system can contain several layers of
- clustering and cluster nodes can be contained in parent
- cluster nodes.
-
- The cluster node name must be "clusterN" as described in 2.1 above.
- A cluster node can not be a leaf node.
-
- A cluster node's child nodes must be:
-
- - one or more cluster nodes; or
- - one or more core nodes
-
- Any other configuration is considered invalid.
-
-- core node
-
- Description: must be declared in a cluster node, one node per core in
- the cluster. If the system does not support SMT, core
- nodes are leaf nodes, otherwise they become containers of
- thread nodes.
-
- The core node name must be "coreN" as described in 2.1 above.
-
- A core node must be a leaf node if SMT is not supported.
-
- Properties for core nodes that are leaf nodes:
-
- - cpu
- Usage: required
- Value type: <phandle>
- Definition: a phandle to the cpu node that corresponds to the
- core node.
-
- If a core node is not a leaf node (CPUs supporting SMT) a core node's
- child nodes can be:
-
- - one or more thread nodes
-
- Any other configuration is considered invalid.
-
-- thread node
-
- Description: must be declared in a core node, one node per thread
- in the core if the system supports SMT. Thread nodes are
- always leaf nodes in the device tree.
-
- The thread node name must be "threadN" as described in 2.1 above.
-
- A thread node must be a leaf node.
-
- A thread node must contain the following property:
-
- - cpu
- Usage: required
- Value type: <phandle>
- Definition: a phandle to the cpu node that corresponds to
- the thread node.
-
-===========================================
-4 - Example dts
-===========================================
-
-Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters):
-
-cpus {
- #size-cells = <0>;
- #address-cells = <2>;
-
- cpu-map {
- cluster0 {
- cluster0 {
- core0 {
- thread0 {
- cpu = <&CPU0>;
- };
- thread1 {
- cpu = <&CPU1>;
- };
- };
-
- core1 {
- thread0 {
- cpu = <&CPU2>;
- };
- thread1 {
- cpu = <&CPU3>;
- };
- };
- };
-
- cluster1 {
- core0 {
- thread0 {
- cpu = <&CPU4>;
- };
- thread1 {
- cpu = <&CPU5>;
- };
- };
-
- core1 {
- thread0 {
- cpu = <&CPU6>;
- };
- thread1 {
- cpu = <&CPU7>;
- };
- };
- };
- };
-
- cluster1 {
- cluster0 {
- core0 {
- thread0 {
- cpu = <&CPU8>;
- };
- thread1 {
- cpu = <&CPU9>;
- };
- };
- core1 {
- thread0 {
- cpu = <&CPU10>;
- };
- thread1 {
- cpu = <&CPU11>;
- };
- };
- };
-
- cluster1 {
- core0 {
- thread0 {
- cpu = <&CPU12>;
- };
- thread1 {
- cpu = <&CPU13>;
- };
- };
- core1 {
- thread0 {
- cpu = <&CPU14>;
- };
- thread1 {
- cpu = <&CPU15>;
- };
- };
- };
- };
- };
-
- CPU0: cpu@0 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x0 0x0>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU1: cpu@1 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x0 0x1>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU2: cpu@100 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x0 0x100>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU3: cpu@101 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x0 0x101>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU4: cpu@10000 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x0 0x10000>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU5: cpu@10001 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x0 0x10001>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU6: cpu@10100 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x0 0x10100>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU7: cpu@10101 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x0 0x10101>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU8: cpu@100000000 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x1 0x0>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU9: cpu@100000001 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x1 0x1>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU10: cpu@100000100 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x1 0x100>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU11: cpu@100000101 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x1 0x101>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU12: cpu@100010000 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x1 0x10000>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU13: cpu@100010001 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x1 0x10001>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU14: cpu@100010100 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x1 0x10100>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-
- CPU15: cpu@100010101 {
- device_type = "cpu";
- compatible = "arm,cortex-a57";
- reg = <0x1 0x10101>;
- enable-method = "spin-table";
- cpu-release-addr = <0 0x20000000>;
- };
-};
-
-Example 2 (ARM 32-bit, dual-cluster, 8-cpu system, no SMT):
-
-cpus {
- #size-cells = <0>;
- #address-cells = <1>;
-
- cpu-map {
- cluster0 {
- core0 {
- cpu = <&CPU0>;
- };
- core1 {
- cpu = <&CPU1>;
- };
- core2 {
- cpu = <&CPU2>;
- };
- core3 {
- cpu = <&CPU3>;
- };
- };
-
- cluster1 {
- core0 {
- cpu = <&CPU4>;
- };
- core1 {
- cpu = <&CPU5>;
- };
- core2 {
- cpu = <&CPU6>;
- };
- core3 {
- cpu = <&CPU7>;
- };
- };
- };
-
- CPU0: cpu@0 {
- device_type = "cpu";
- compatible = "arm,cortex-a15";
- reg = <0x0>;
- };
-
- CPU1: cpu@1 {
- device_type = "cpu";
- compatible = "arm,cortex-a15";
- reg = <0x1>;
- };
-
- CPU2: cpu@2 {
- device_type = "cpu";
- compatible = "arm,cortex-a15";
- reg = <0x2>;
- };
-
- CPU3: cpu@3 {
- device_type = "cpu";
- compatible = "arm,cortex-a15";
- reg = <0x3>;
- };
-
- CPU4: cpu@100 {
- device_type = "cpu";
- compatible = "arm,cortex-a7";
- reg = <0x100>;
- };
-
- CPU5: cpu@101 {
- device_type = "cpu";
- compatible = "arm,cortex-a7";
- reg = <0x101>;
- };
-
- CPU6: cpu@102 {
- device_type = "cpu";
- compatible = "arm,cortex-a7";
- reg = <0x102>;
- };
-
- CPU7: cpu@103 {
- device_type = "cpu";
- compatible = "arm,cortex-a7";
- reg = <0x103>;
- };
-};
-
-===============================================================================
-[1] ARM Linux kernel documentation
- Documentation/devicetree/bindings/arm/cpus.txt
diff --git a/Documentation/devicetree/bindings/cpu/cpu-topology.txt b/Documentation/devicetree/bindings/cpu/cpu-topology.txt
new file mode 100644
index 00000000..d8d1daef
--- /dev/null
+++ b/Documentation/devicetree/bindings/cpu/cpu-topology.txt
@@ -0,0 +1,526 @@
+===========================================
+CPU topology binding description
+===========================================
+
+===========================================
+1 - Introduction
+===========================================
+
+In an ARM/RISC-V system, the hierarchy of CPUs is defined through three entities that
+are used to describe the layout of physical CPUs in the system:
+
+- cluster
+- core
+- thread
+
+The cpu nodes (bindings defined in [1] for ARM or [2] for RISC-V) represent the devices that
+correspond to physical CPUs and are to be mapped to the hierarchy levels.
+
+The bottom hierarchy level sits at core or thread level depending on whether
+symmetric multi-threading (SMT) is supported or not.
+
+For instance in a system where CPUs support SMT, "cpu" nodes represent all
+threads existing in the system and map to the hierarchy level "thread" above.
+In systems where SMT is not supported "cpu" nodes represent all cores present
+in the system and map to the hierarchy level "core" above.
+
+CPU topology bindings allow one to associate cpu nodes with hierarchical groups
+corresponding to the system hierarchy; syntactically they are defined as device
+tree nodes.
+
+The remainder of this document provides the topology bindings for ARM/RISC-V, based
+on the Devicetree Specification, available from:
+
+https://www.devicetree.org/specifications/
+
+If not stated otherwise, whenever a reference to a cpu node phandle is made its
+value must point to a cpu node compliant with the cpu node bindings as
+documented in [1].
+A topology description containing phandles to cpu nodes that are not compliant
+with bindings standardized in [1] is therefore considered invalid.
+
+===========================================
+2 - cpu-map node
+===========================================
+
+The ARM/RISC-V CPU topology is defined within the cpu-map node, which is a direct
+child of the cpus node and provides a container where the actual topology
+nodes are listed.
+
+- cpu-map node
+
+ Usage: Optional - On SMP systems provide CPUs topology to the OS.
+ Uniprocessor systems do not require a topology
+ description and therefore should not define a
+ cpu-map node.
+
+ Description: The cpu-map node is just a container node where its
+ subnodes describe the CPU topology.
+
+ Node name must be "cpu-map".
+
+ The cpu-map node's parent node must be the cpus node.
+
+ The cpu-map node's child nodes can be:
+
+ - one or more cluster nodes
+
+ Any other configuration is considered invalid.
+
+The cpu-map node can only contain three types of child nodes:
+
+- cluster node
+- core node
+- thread node
+
+whose bindings are described in paragraph 3.
+
+The nodes describing the CPU topology (cluster/core/thread) can only
+be defined within the cpu-map node and every core/thread in the system
+must be defined within the topology. Any other configuration is
+invalid and therefore must be ignored.
+
+===========================================
+2.1 - cpu-map child nodes naming convention
+===========================================
+
+cpu-map child nodes must follow a naming convention where the node name
+must be "clusterN", "coreN", "threadN" depending on the node type (ie
+cluster/core/thread) (where N = {0, 1, ...} is the node number; nodes which
+are siblings within a single common parent node must be given a unique and
+sequential N value, starting from 0).
+cpu-map child nodes which do not share a common parent node can have the same
+name (ie same number N as other cpu-map child nodes at different device tree
+levels) since name uniqueness will be guaranteed by the device tree hierarchy.
+
+===========================================
+3 - cluster/core/thread node bindings
+===========================================
+
+Bindings for cluster/cpu/thread nodes are defined as follows:
+
+- cluster node
+
+ Description: must be declared within a cpu-map node, one node
+ per cluster. A system can contain several layers of
+ clustering and cluster nodes can be contained in parent
+ cluster nodes.
+
+ The cluster node name must be "clusterN" as described in 2.1 above.
+ A cluster node can not be a leaf node.
+
+ A cluster node's child nodes must be:
+
+ - one or more cluster nodes; or
+ - one or more core nodes
+
+ Any other configuration is considered invalid.
+
+- core node
+
+ Description: must be declared in a cluster node, one node per core in
+ the cluster. If the system does not support SMT, core
+ nodes are leaf nodes, otherwise they become containers of
+ thread nodes.
+
+ The core node name must be "coreN" as described in 2.1 above.
+
+ A core node must be a leaf node if SMT is not supported.
+
+ Properties for core nodes that are leaf nodes:
+
+ - cpu
+ Usage: required
+ Value type: <phandle>
+ Definition: a phandle to the cpu node that corresponds to the
+ core node.
+
+ If a core node is not a leaf node (CPUs supporting SMT) a core node's
+ child nodes can be:
+
+ - one or more thread nodes
+
+ Any other configuration is considered invalid.
+
+- thread node
+
+ Description: must be declared in a core node, one node per thread
+ in the core if the system supports SMT. Thread nodes are
+ always leaf nodes in the device tree.
+
+ The thread node name must be "threadN" as described in 2.1 above.
+
+ A thread node must be a leaf node.
+
+ A thread node must contain the following property:
+
+ - cpu
+ Usage: required
+ Value type: <phandle>
+ Definition: a phandle to the cpu node that corresponds to
+ the thread node.
+
+===========================================
+4 - Example dts
+===========================================
+
+Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters):
+
+cpus {
+ #size-cells = <0>;
+ #address-cells = <2>;
+
+ cpu-map {
+ cluster0 {
+ cluster0 {
+ core0 {
+ thread0 {
+ cpu = <&CPU0>;
+ };
+ thread1 {
+ cpu = <&CPU1>;
+ };
+ };
+
+ core1 {
+ thread0 {
+ cpu = <&CPU2>;
+ };
+ thread1 {
+ cpu = <&CPU3>;
+ };
+ };
+ };
+
+ cluster1 {
+ core0 {
+ thread0 {
+ cpu = <&CPU4>;
+ };
+ thread1 {
+ cpu = <&CPU5>;
+ };
+ };
+
+ core1 {
+ thread0 {
+ cpu = <&CPU6>;
+ };
+ thread1 {
+ cpu = <&CPU7>;
+ };
+ };
+ };
+ };
+
+ cluster1 {
+ cluster0 {
+ core0 {
+ thread0 {
+ cpu = <&CPU8>;
+ };
+ thread1 {
+ cpu = <&CPU9>;
+ };
+ };
+ core1 {
+ thread0 {
+ cpu = <&CPU10>;
+ };
+ thread1 {
+ cpu = <&CPU11>;
+ };
+ };
+ };
+
+ cluster1 {
+ core0 {
+ thread0 {
+ cpu = <&CPU12>;
+ };
+ thread1 {
+ cpu = <&CPU13>;
+ };
+ };
+ core1 {
+ thread0 {
+ cpu = <&CPU14>;
+ };
+ thread1 {
+ cpu = <&CPU15>;
+ };
+ };
+ };
+ };
+ };
+
+ CPU0: cpu@0 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x0 0x0>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU1: cpu@1 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x0 0x1>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU2: cpu@100 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x0 0x100>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU3: cpu@101 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x0 0x101>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU4: cpu@10000 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x0 0x10000>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU5: cpu@10001 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x0 0x10001>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU6: cpu@10100 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x0 0x10100>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU7: cpu@10101 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x0 0x10101>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU8: cpu@100000000 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x1 0x0>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU9: cpu@100000001 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x1 0x1>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU10: cpu@100000100 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x1 0x100>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU11: cpu@100000101 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x1 0x101>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU12: cpu@100010000 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x1 0x10000>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU13: cpu@100010001 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x1 0x10001>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU14: cpu@100010100 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x1 0x10100>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+
+ CPU15: cpu@100010101 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a57";
+ reg = <0x1 0x10101>;
+ enable-method = "spin-table";
+ cpu-release-addr = <0 0x20000000>;
+ };
+};
+
+Example 2 (ARM 32-bit, dual-cluster, 8-cpu system, no SMT):
+
+cpus {
+ #size-cells = <0>;
+ #address-cells = <1>;
+
+ cpu-map {
+ cluster0 {
+ core0 {
+ cpu = <&CPU0>;
+ };
+ core1 {
+ cpu = <&CPU1>;
+ };
+ core2 {
+ cpu = <&CPU2>;
+ };
+ core3 {
+ cpu = <&CPU3>;
+ };
+ };
+
+ cluster1 {
+ core0 {
+ cpu = <&CPU4>;
+ };
+ core1 {
+ cpu = <&CPU5>;
+ };
+ core2 {
+ cpu = <&CPU6>;
+ };
+ core3 {
+ cpu = <&CPU7>;
+ };
+ };
+ };
+
+ CPU0: cpu@0 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a15";
+ reg = <0x0>;
+ };
+
+ CPU1: cpu@1 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a15";
+ reg = <0x1>;
+ };
+
+ CPU2: cpu@2 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a15";
+ reg = <0x2>;
+ };
+
+ CPU3: cpu@3 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a15";
+ reg = <0x3>;
+ };
+
+ CPU4: cpu@100 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a7";
+ reg = <0x100>;
+ };
+
+ CPU5: cpu@101 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a7";
+ reg = <0x101>;
+ };
+
+ CPU6: cpu@102 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a7";
+ reg = <0x102>;
+ };
+
+ CPU7: cpu@103 {
+ device_type = "cpu";
+ compatible = "arm,cortex-a7";
+ reg = <0x103>;
+ };
+};
+
+Example 3: HiFive Unleashed (RISC-V 64 bit, 4 core system)
+
+cpus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ compatible = "sifive,fu540g", "sifive,fu500";
+ model = "sifive,hifive-unleashed-a00";
+
+ ...
+
+ cpu-map {
+ cluster0 {
+ core0 {
+ cpu = <&L12>;
+ };
+ core1 {
+ cpu = <&L15>;
+ };
+ core2 {
+ cpu0 = <&L18>;
+ };
+ core3 {
+ cpu0 = <&L21>;
+ };
+ };
+ };
+
+ L12: cpu@1 {
+ device_type = "cpu";
+ compatible = "sifive,rocket0", "riscv";
+ reg = <0x1>;
+ }
+
+ L15: cpu@2 {
+ device_type = "cpu";
+ compatible = "sifive,rocket0", "riscv";
+ reg = <0x2>;
+ }
+ L18: cpu@3 {
+ device_type = "cpu";
+ compatible = "sifive,rocket0", "riscv";
+ reg = <0x3>;
+ }
+ L21: cpu@4 {
+ device_type = "cpu";
+ compatible = "sifive,rocket0", "riscv";
+ reg = <0x4>;
+ }
+};
+===============================================================================
+[1] ARM Linux kernel documentation
+ Documentation/devicetree/bindings/arm/cpus.txt
+[1] RISC-V Linux kernel documentation
+ Documentation/devicetree/bindings/riscv/cpus.txt
--
2.7.4

2018-11-15 18:32:45

by Jeffrey Hugo

[permalink] [raw]

Subject: Re: [RFC 0/3] Unify CPU topology across ARM64 & RISC-V

On 11/8/2018 6:50 PM, Atish Patra wrote:
> The cpu-map DT entry in ARM64 can describe the CPU topology in
> much better way compared to other existing approaches. RISC-V can
> easily adopt this binding to represent it's own CPU topology.
> Thus, both cpu-map DT binding and topology parsing code can be
> moved to a common location so that RISC-V or any other
> architecture can leverage that.
>
> The relevant discussion regarding unifying cpu topology can be
> found in [1].
>
> arch_topology seems to be a perfect place to move the common
> code. I have not introduced any functional changes in the moved
> to code. The only downside in this approach is that the capacity
> code will be executed for RISC-V as well. But, it will exit
> immediately after not able to find the appropriate DT node. If
> the overhead is considered too much, we can always compile out
> capacity related functions under a different config for the
> architectures that do not support them.
>
> The patches have been tested for RISC-V and compile tested for
> ARM64.
>
> The socket changes[2] can be merged on top of this series or vice
> versa.
>
> [1] https://lkml.org/lkml/2018/11/6/19
> [2] https://lkml.org/lkml/2018/11/7/918
>
> Atish Patra (3):
> dt-binding: cpu-topology: Move cpu-map to a common binding.
> cpu-topology: Move cpu topology code to common code.
> RISC-V: Parse cpu topology during boot.
>
> Documentation/devicetree/bindings/arm/topology.txt | 475 -------------------
> .../devicetree/bindings/cpu/cpu-topology.txt | 526 +++++++++++++++++++++
> arch/arm64/include/asm/topology.h | 23 +-
> arch/arm64/kernel/topology.c | 305 +-----------
> arch/riscv/Kconfig | 1 +
> arch/riscv/kernel/smpboot.c | 6 +-
> drivers/base/arch_topology.c | 303 ++++++++++++
> include/linux/arch_topology.h | 23 +
> include/linux/topology.h | 1 +
> 9 files changed, 864 insertions(+), 799 deletions(-)
> delete mode 100644 Documentation/devicetree/bindings/arm/topology.txt
> create mode 100644 Documentation/devicetree/bindings/cpu/cpu-topology.txt
>

I was interested in testing these on QDF2400, an ARM64 platform, since
this series touches core ARM64 code and I'd hate to see a regression.
However, I can't figure out what baseline to use to apply these.
Different patches cause different conflicts of a variety of baselines I
attempted.

What are these intended to apply to?

Also, you might want to run them through checkpatch next time. There
are several whitespace errors.

--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

2018-11-17 16:35:29

by Rob Herring (Arm)

[permalink] [raw]

Subject: Re: [RFC 1/3] dt-binding: cpu-topology: Move cpu-map to a common binding.

On Thu, Nov 08, 2018 at 05:50:07PM -0800, Atish Patra wrote:
> cpu-map binding can be used to described cpu topology for both
> RISC-V & ARM. It makes more sense to move the binding to document
> to a common place.
>
> The relevant discussion can be found here.
> https://lkml.org/lkml/2018/11/6/19
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> Documentation/devicetree/bindings/arm/topology.txt | 475 -------------------
> .../devicetree/bindings/cpu/cpu-topology.txt | 526 +++++++++++++++++++++
> 2 files changed, 526 insertions(+), 475 deletions(-)
> delete mode 100644 Documentation/devicetree/bindings/arm/topology.txt
> create mode 100644 Documentation/devicetree/bindings/cpu/cpu-topology.txt

Please send this with '-M' option so we just get the real changes.

Rob

2018-11-19 17:48:19

by Atish Patra

[permalink] [raw]

Subject: Re: [RFC 0/3] Unify CPU topology across ARM64 & RISC-V

On 11/15/18 10:31 AM, Jeffrey Hugo wrote:
> On 11/8/2018 6:50 PM, Atish Patra wrote:
>> The cpu-map DT entry in ARM64 can describe the CPU topology in
>> much better way compared to other existing approaches. RISC-V can
>> easily adopt this binding to represent it's own CPU topology.
>> Thus, both cpu-map DT binding and topology parsing code can be
>> moved to a common location so that RISC-V or any other
>> architecture can leverage that.
>>
>> The relevant discussion regarding unifying cpu topology can be
>> found in [1].
>>
>> arch_topology seems to be a perfect place to move the common
>> code. I have not introduced any functional changes in the moved
>> to code. The only downside in this approach is that the capacity
>> code will be executed for RISC-V as well. But, it will exit
>> immediately after not able to find the appropriate DT node. If
>> the overhead is considered too much, we can always compile out
>> capacity related functions under a different config for the
>> architectures that do not support them.
>>
>> The patches have been tested for RISC-V and compile tested for
>> ARM64.
>>
>> The socket changes[2] can be merged on top of this series or vice
>> versa.
>>
>> [1] https://lkml.org/lkml/2018/11/6/19
>> [2] https://lkml.org/lkml/2018/11/7/918
>>
>> Atish Patra (3):
>> dt-binding: cpu-topology: Move cpu-map to a common binding.
>> cpu-topology: Move cpu topology code to common code.
>> RISC-V: Parse cpu topology during boot.
>>
>> Documentation/devicetree/bindings/arm/topology.txt | 475 -------------------
>> .../devicetree/bindings/cpu/cpu-topology.txt | 526 +++++++++++++++++++++
>> arch/arm64/include/asm/topology.h | 23 +-
>> arch/arm64/kernel/topology.c | 305 +-----------
>> arch/riscv/Kconfig | 1 +
>> arch/riscv/kernel/smpboot.c | 6 +-
>> drivers/base/arch_topology.c | 303 ++++++++++++
>> include/linux/arch_topology.h | 23 +
>> include/linux/topology.h | 1 +
>> 9 files changed, 864 insertions(+), 799 deletions(-)
>> delete mode 100644 Documentation/devicetree/bindings/arm/topology.txt
>> create mode 100644 Documentation/devicetree/bindings/cpu/cpu-topology.txt
>>
>
> I was interested in testing these on QDF2400, an ARM64 platform, since
> this series touches core ARM64 code and I'd hate to see a regression.
> However, I can't figure out what baseline to use to apply these.
> Different patches cause different conflicts of a variety of baselines I
> attempted.
>
> What are these intended to apply to?
>
I had rebased them on top of 4.20-rc1.

> Also, you might want to run them through checkpatch next time. There
> are several whitespace errors.
>
Sorry I missed couple of them.
Thanks for trying to test the patches. I will send a next version as Rob
suggested. Please test that.

Regards,
Atish

2018-11-19 18:52:49

by Atish Patra

[permalink] [raw]

Subject: Re: [RFC 1/3] dt-binding: cpu-topology: Move cpu-map to a common binding.

On 11/17/18 8:33 AM, Rob Herring wrote:
> On Thu, Nov 08, 2018 at 05:50:07PM -0800, Atish Patra wrote:
>> cpu-map binding can be used to described cpu topology for both
>> RISC-V & ARM. It makes more sense to move the binding to document
>> to a common place.
>>
>> The relevant discussion can be found here.
>> https://lkml.org/lkml/2018/11/6/19
>>
>> Signed-off-by: Atish Patra <[email protected]>
>> ---
>> Documentation/devicetree/bindings/arm/topology.txt | 475 -------------------
>> .../devicetree/bindings/cpu/cpu-topology.txt | 526 +++++++++++++++++++++
>> 2 files changed, 526 insertions(+), 475 deletions(-)
>> delete mode 100644 Documentation/devicetree/bindings/arm/topology.txt
>> create mode 100644 Documentation/devicetree/bindings/cpu/cpu-topology.txt
>
> Please send this with '-M' option so we just get the real changes.
>

Sure. I will resend it.

Regards,
Atish
> Rob
>

2018-11-20 11:13:07

by Sudeep Holla

[permalink] [raw]

Subject: Re: [RFC 0/3] Unify CPU topology across ARM64 & RISC-V

On Thu, Nov 15, 2018 at 11:31:33AM -0700, Jeffrey Hugo wrote:

[...]

>
> I was interested in testing these on QDF2400, an ARM64 platform, since this
> series touches core ARM64 code and I'd hate to see a regression. However, I
> can't figure out what baseline to use to apply these. Different patches
> cause different conflicts of a variety of baselines I attempted.
>

Good to know that we can test DT configuration on QDF2400. I always assumed
it's ACPI only.

> What are these intended to apply to?
>

The series alone may not get the package/socket ids correct on QDF2400.
I have not yet added support for the same as I wanted to get the initial
feedback on DT bindings. The movement of DT binding and corresponding
code should not regress and you should be able to validate only that
part.

--
Regards,
Sudeep

2018-11-20 15:35:27

by Jeffrey Hugo

[permalink] [raw]

Subject: Re: [RFC 0/3] Unify CPU topology across ARM64 & RISC-V

On 11/20/2018 4:11 AM, Sudeep Holla wrote:
> On Thu, Nov 15, 2018 at 11:31:33AM -0700, Jeffrey Hugo wrote:
>
> [...]
>
>>
>> I was interested in testing these on QDF2400, an ARM64 platform, since this
>> series touches core ARM64 code and I'd hate to see a regression. However, I
>> can't figure out what baseline to use to apply these. Different patches
>> cause different conflicts of a variety of baselines I attempted.
>>
>
> Good to know that we can test DT configuration on QDF2400. I always assumed
> it's ACPI only.

It is ACPI only in the production configuration. I suppose we could
hack things up to do basic DT sanity, but I expect it would be nasty and
non-trivial.

>
>> What are these intended to apply to?
>>
>
> The series alone may not get the package/socket ids correct on QDF2400.
> I have not yet added support for the same as I wanted to get the initial
> feedback on DT bindings. The movement of DT binding and corresponding
> code should not regress and you should be able to validate only that
> part.
>

On a cursory glance, it looks like some of the reorganized code would
also be used in the ACPI path (things that are common between DT and
ACPI). I do not expect problems, but I still feel its prudent to do a
sanity check on actual hardware.

--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.