2020-07-21 11:41:26

by Srikar Dronamraju

[permalink] [raw]
Subject: [PATCH v2 00/10] Coregroup support on Powerpc

Changelog v1 -> v2:
v1: https://lore.kernel.org/linuxppc-dev/[email protected]/t/#u

powerpc/smp: Merge Power9 topology with Power topology
Replaced a reference to cpu_smt_mask with per_cpu(cpu_sibling_map, cpu)
since cpu_smt_mask is only defined under CONFIG_SCHED_SMT

powerpc/smp: Enable small core scheduling sooner
Restored the previous info msg (Jordan)
Moved big core topology fixup to fixup_topology (Gautham)

powerpc/smp: Dont assume l2-cache to be superset of sibling
Set cpumask after verifying l2-cache. (Gautham)

powerpc/smp: Generalize 2nd sched domain
Moved shared_cache topology fixup to fixup_topology (Gautham)

Powerpc/numa: Detect support for coregroup
Explained Coregroup in commit msg (Michael Ellerman)

Powerpc/smp: Create coregroup domain
Moved coregroup topology fixup to fixup_topology (Gautham)

powerpc/smp: Implement cpu_to_coregroup_id
Move coregroup_enabled before getting associativity (Gautham)

powerpc/smp: Provide an ability to disable coregroup
Patch dropped (Michael Ellerman)

Cleanup of existing powerpc topologies and add coregroup support on
Powerpc. Coregroup is a group of (subset of) cores of a DIE that share
a resource.

Patch 7 of this patch series: "Powerpc/numa: Detect support for coregroup"
depends on
https://lore.kernel.org/linuxppc-dev/[email protected]/t/#u
However it should be easy to rebase the patch without the above patch.

This patch series is based on top of current powerpc/next tree + the
above patch.

On Power 8 Systems
------------------
$ tail /proc/cpuinfo
processor : 255
cpu : POWER8 (architected), altivec supported
clock : 3724.000000MHz
revision : 2.1 (pvr 004b 0201)

timebase : 512000000
platform : pSeries
model : IBM,8408-E8E
machine : CHRP IBM,8408-E8E
MMU : Hash

Before the patchset
-------------------
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
DIE
NUMA
NUMA
$ head /proc/schedstat
version 15
timestamp 4295534931
cpu0 0 0 0 0 0 0 41389823338 17682779896 14117
domain0 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,00000000,00000000,00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 27087859050 152273672 10396
domain0 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

After the patchset
------------------
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
DIE
NUMA
NUMA
$ head /proc/schedstat
version 15
timestamp 4295534931
cpu0 0 0 0 0 0 0 41389823338 17682779896 14117
domain0 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,00000000,00000000,00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 27087859050 152273672 10396
domain0 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

On Power 9 (with device-tree enablement to show coregroups).
(hunks for mimicing a coregroup was posted at
https://lore.kernel.org/linuxppc-dev/[email protected]/t/#m2cb09bb11c7a93257d6123d1d27edb8212f8af21)
-----------------------------------------------------------
$ tail /proc/cpuinfo
processor : 127
cpu : POWER9 (architected), altivec supported
clock : 3000.000000MHz
revision : 2.2 (pvr 004e 0202)

timebase : 512000000
platform : pSeries
model : IBM,9008-22L
machine : CHRP IBM,9008-22L
MMU : Hash

Before patchset
--------------
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
CACHE
DIE
NUMA

$ head /proc/schedstat
version 15
timestamp 4318242208
cpu0 0 0 0 0 0 0 28077107004 4773387362 78205
domain0 00000000,00000000,00000000,00000055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 24177439200 413887604 75393
domain0 00000000,00000000,00000000,000000aa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

After patchset
--------------
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
CACHE
MC
DIE
NUMA

$ head /proc/schedstat
version 15
timestamp 4318242208
cpu0 0 0 0 0 0 0 28077107004 4773387362 78205
domain0 00000000,00000000,00000000,00000055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain4 ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 24177439200 413887604 75393
domain0 00000000,00000000,00000000,000000aa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Cc: linuxppc-dev <[email protected]>
Cc: LKML <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Oliver OHalloran <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Gautham R Shenoy <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Cc: Jordan Niethe <[email protected]>

Srikar Dronamraju (10):
powerpc/smp: Cache node for reuse
powerpc/smp: Merge Power9 topology with Power topology
powerpc/smp: Move powerpc_topology above
powerpc/smp: Enable small core scheduling sooner
powerpc/smp: Dont assume l2-cache to be superset of sibling
powerpc/smp: Generalize 2nd sched domain
Powerpc/numa: Detect support for coregroup
powerpc/smp: Allocate cpumask only after searching thread group
Powerpc/smp: Create coregroup domain
powerpc/smp: Implement cpu_to_coregroup_id

arch/powerpc/include/asm/smp.h | 1 +
arch/powerpc/include/asm/topology.h | 10 ++
arch/powerpc/kernel/smp.c | 255 +++++++++++++++++-----------
arch/powerpc/mm/numa.c | 59 +++++--
4 files changed, 213 insertions(+), 112 deletions(-)

--
2.17.1


2020-07-21 11:41:27

by Srikar Dronamraju

[permalink] [raw]
Subject: [PATCH v2 09/10] Powerpc/smp: Create coregroup domain

Add percpu coregroup maps and masks to create coregroup domain.
If a coregroup doesn't exist, the coregroup domain will be degenerated
in favour of SMT/CACHE domain.

Cc: linuxppc-dev <[email protected]>
Cc: LKML <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Oliver OHalloran <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Gautham R Shenoy <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Cc: Jordan Niethe <[email protected]>
Signed-off-by: Srikar Dronamraju <[email protected]>
---
Changelog v1 -> v2:
Powerpc/smp: Create coregroup domain
Moved coregroup topology fixup to fixup_topology (Gautham)

arch/powerpc/include/asm/topology.h | 10 ++++++++
arch/powerpc/kernel/smp.c | 38 +++++++++++++++++++++++++++++
arch/powerpc/mm/numa.c | 5 ++++
3 files changed, 53 insertions(+)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index f0b6300e7dd3..6609174918ab 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -88,12 +88,22 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)

#if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
extern int find_and_online_cpu_nid(int cpu);
+extern int cpu_to_coregroup_id(int cpu);
#else
static inline int find_and_online_cpu_nid(int cpu)
{
return 0;
}

+static inline int cpu_to_coregroup_id(int cpu)
+{
+#ifdef CONFIG_SMP
+ return cpu_to_core_id(cpu);
+#else
+ return 0;
+#endif
+}
+
#endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */

#include <asm-generic/topology.h>
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 97b762a1944a..a7e1366b7fd3 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -80,6 +80,7 @@ DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
DEFINE_PER_CPU(cpumask_var_t, cpu_core_map);
+DEFINE_PER_CPU(cpumask_var_t, cpu_coregroup_map);

EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
@@ -91,6 +92,7 @@ enum {
smt_idx,
#endif
bigcore_idx,
+ mc_idx,
die_idx,
};

@@ -869,6 +871,21 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
}
#endif

+static struct cpumask *cpu_coregroup_mask(int cpu)
+{
+ return per_cpu(cpu_coregroup_map, cpu);
+}
+
+static bool has_coregroup_support(void)
+{
+ return coregroup_enabled;
+}
+
+static const struct cpumask *cpu_mc_mask(int cpu)
+{
+ return cpu_coregroup_mask(cpu);
+}
+
static const struct cpumask *cpu_bigcore_mask(int cpu)
{
return per_cpu(cpu_sibling_map, cpu);
@@ -879,6 +896,7 @@ static struct sched_domain_topology_level powerpc_topology[] = {
{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
#endif
{ cpu_bigcore_mask, SD_INIT_NAME(BIGCORE) },
+ { cpu_mc_mask, SD_INIT_NAME(MC) },
{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
{ NULL, },
};
@@ -927,6 +945,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
GFP_KERNEL, node);
zalloc_cpumask_var_node(&per_cpu(cpu_core_map, cpu),
GFP_KERNEL, node);
+ if (has_coregroup_support())
+ zalloc_cpumask_var_node(&per_cpu(cpu_coregroup_map, cpu),
+ GFP_KERNEL, node);
+
#ifdef CONFIG_NEED_MULTIPLE_NODES
/*
* numa_node_id() works after this.
@@ -944,6 +966,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
cpumask_set_cpu(boot_cpuid, cpu_l2_cache_mask(boot_cpuid));
cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));

+ if (has_coregroup_support())
+ cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
+
init_big_cores();
if (has_big_cores) {
cpumask_set_cpu(boot_cpuid,
@@ -1235,6 +1260,8 @@ static void remove_cpu_from_masks(int cpu)
set_cpus_unrelated(cpu, i, cpu_sibling_mask);
if (has_big_cores)
set_cpus_unrelated(cpu, i, cpu_smallcore_mask);
+ if (has_coregroup_support())
+ set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
}
}
#endif
@@ -1295,6 +1322,14 @@ static void add_cpu_to_masks(int cpu)
add_cpu_to_smallcore_masks(cpu);
update_mask_by_l2(cpu, cpu_l2_cache_mask);

+ if (has_coregroup_support()) {
+ cpumask_set_cpu(cpu, cpu_coregroup_mask(cpu));
+ for_each_cpu(i, cpu_online_mask) {
+ if (cpu_to_coregroup_id(cpu) == cpu_to_coregroup_id(i))
+ set_cpus_related(cpu, i, cpu_coregroup_mask);
+ }
+ }
+
if (pkg_id == -1) {
struct cpumask *(*mask)(int) = cpu_sibling_mask;

@@ -1386,6 +1421,9 @@ int setup_profiling_timer(unsigned int multiplier)

static void fixup_topology(void)
{
+ if (!has_coregroup_support())
+ powerpc_topology[mc_idx].mask = cpu_bigcore_mask;
+
if (shared_caches) {
pr_info("Using shared cache scheduler topology\n");
powerpc_topology[bigcore_idx].mask = shared_cache_mask;
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 3248160c0327..ef8aa580da21 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1216,6 +1216,11 @@ int find_and_online_cpu_nid(int cpu)
return new_nid;
}

+int cpu_to_coregroup_id(int cpu)
+{
+ return cpu_to_core_id(cpu);
+}
+
static int topology_update_init(void)
{
topology_inited = 1;
--
2.17.1

2020-07-21 11:41:43

by Srikar Dronamraju

[permalink] [raw]
Subject: [PATCH v2 03/10] powerpc/smp: Move powerpc_topology above

Just moving the powerpc_topology description above.
This will help in using functions in this file and avoid declarations.

No other functional changes

Cc: linuxppc-dev <[email protected]>
Cc: LKML <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Oliver OHalloran <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Gautham R Shenoy <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Cc: Jordan Niethe <[email protected]>
Reviewed-by: Gautham R. Shenoy <[email protected]>
Signed-off-by: Srikar Dronamraju <[email protected]>
---
arch/powerpc/kernel/smp.c | 116 +++++++++++++++++++-------------------
1 file changed, 58 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 0e0b118d9b6e..1ce95da00cb6 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -818,6 +818,64 @@ static int init_cpu_l1_cache_map(int cpu)
return err;
}

+static bool shared_caches;
+
+#ifdef CONFIG_SCHED_SMT
+/* cpumask of CPUs with asymmetric SMT dependency */
+static int powerpc_smt_flags(void)
+{
+ int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
+
+ if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
+ printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
+ flags |= SD_ASYM_PACKING;
+ }
+ return flags;
+}
+#endif
+
+/*
+ * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
+ * This topology makes it *much* cheaper to migrate tasks between adjacent cores
+ * since the migrated task remains cache hot. We want to take advantage of this
+ * at the scheduler level so an extra topology level is required.
+ */
+static int powerpc_shared_cache_flags(void)
+{
+ return SD_SHARE_PKG_RESOURCES;
+}
+
+/*
+ * We can't just pass cpu_l2_cache_mask() directly because
+ * returns a non-const pointer and the compiler barfs on that.
+ */
+static const struct cpumask *shared_cache_mask(int cpu)
+{
+ if (shared_caches)
+ return cpu_l2_cache_mask(cpu);
+
+ if (has_big_cores)
+ return cpu_smallcore_mask(cpu);
+
+ return per_cpu(cpu_sibling_map, cpu);
+}
+
+#ifdef CONFIG_SCHED_SMT
+static const struct cpumask *smallcore_smt_mask(int cpu)
+{
+ return cpu_smallcore_mask(cpu);
+}
+#endif
+
+static struct sched_domain_topology_level powerpc_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+ { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+ { shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
+ { cpu_cpu_mask, SD_INIT_NAME(DIE) },
+ { NULL, },
+};
+
static int init_big_cores(void)
{
int cpu;
@@ -1249,8 +1307,6 @@ static void add_cpu_to_masks(int cpu)
set_cpus_related(cpu, i, cpu_core_mask);
}

-static bool shared_caches;
-
/* Activate a secondary processor. */
void start_secondary(void *unused)
{
@@ -1314,62 +1370,6 @@ int setup_profiling_timer(unsigned int multiplier)
return 0;
}

-#ifdef CONFIG_SCHED_SMT
-/* cpumask of CPUs with asymmetric SMT dependency */
-static int powerpc_smt_flags(void)
-{
- int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
-
- if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
- printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
- flags |= SD_ASYM_PACKING;
- }
- return flags;
-}
-#endif
-
-/*
- * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
- * This topology makes it *much* cheaper to migrate tasks between adjacent cores
- * since the migrated task remains cache hot. We want to take advantage of this
- * at the scheduler level so an extra topology level is required.
- */
-static int powerpc_shared_cache_flags(void)
-{
- return SD_SHARE_PKG_RESOURCES;
-}
-
-/*
- * We can't just pass cpu_l2_cache_mask() directly because
- * returns a non-const pointer and the compiler barfs on that.
- */
-static const struct cpumask *shared_cache_mask(int cpu)
-{
- if (shared_caches)
- return cpu_l2_cache_mask(cpu);
-
- if (has_big_cores)
- return cpu_smallcore_mask(cpu);
-
- return per_cpu(cpu_sibling_map, cpu);
-}
-
-#ifdef CONFIG_SCHED_SMT
-static const struct cpumask *smallcore_smt_mask(int cpu)
-{
- return cpu_smallcore_mask(cpu);
-}
-#endif
-
-static struct sched_domain_topology_level powerpc_topology[] = {
-#ifdef CONFIG_SCHED_SMT
- { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
-#endif
- { shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
- { cpu_cpu_mask, SD_INIT_NAME(DIE) },
- { NULL, },
-};
-
void __init smp_cpus_done(unsigned int max_cpus)
{
/*
--
2.17.1

2020-07-21 11:42:26

by Srikar Dronamraju

[permalink] [raw]
Subject: [PATCH v2 07/10] Powerpc/numa: Detect support for coregroup

Add support for grouping cores based on the device-tree classification.
- The last domain in the associativity domains always refers to the
core.
- If primary reference domain happens to be the penultimate domain in
the associativity domains device-tree property, then there are no
coregroups. However if its not a penultimate domain, then there are
coregroups. There can be more than one coregroup. For now we would be
interested in the last or the smallest coregroups.

Cc: linuxppc-dev <[email protected]>
Cc: LKML <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Oliver OHalloran <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Gautham R Shenoy <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Cc: Jordan Niethe <[email protected]>
Reviewed-by: Gautham R. Shenoy <[email protected]>
Signed-off-by: Srikar Dronamraju <[email protected]>
---
Changelog v1 -> v2:
Powerpc/numa: Detect support for coregroup
Explained Coregroup in commit msg (Michael Ellerman)

arch/powerpc/include/asm/smp.h | 1 +
arch/powerpc/kernel/smp.c | 1 +
arch/powerpc/mm/numa.c | 34 +++++++++++++++++++++-------------
3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 49a25e2400f2..5bdc17a7049f 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -28,6 +28,7 @@
extern int boot_cpuid;
extern int spinning_secondaries;
extern u32 *cpu_to_phys_id;
+extern bool coregroup_enabled;

extern void cpu_die(void);
extern int cpu_to_chip_id(int cpu);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 933ebdf97432..320e36a0ec0b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -74,6 +74,7 @@ static DEFINE_PER_CPU(int, cpu_state) = { 0 };

struct task_struct *secondary_current;
bool has_big_cores;
+bool coregroup_enabled;

DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index bc5b2e8112c8..3248160c0327 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -886,7 +886,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
static void __init find_possible_nodes(void)
{
struct device_node *rtas;
- u32 numnodes, i;
+ const __be32 *domains;
+ int prop_length, max_nodes;
+ u32 i;

if (!numa_enabled)
return;
@@ -895,25 +897,31 @@ static void __init find_possible_nodes(void)
if (!rtas)
return;

- if (of_property_read_u32_index(rtas, "ibm,current-associativity-domains",
- min_common_depth, &numnodes)) {
- /*
- * ibm,current-associativity-domains is a fairly recent
- * property. If it doesn't exist, then fallback on
- * ibm,max-associativity-domains. Current denotes what the
- * platform can support compared to max which denotes what the
- * Hypervisor can support.
- */
- if (of_property_read_u32_index(rtas, "ibm,max-associativity-domains",
- min_common_depth, &numnodes))
+ /*
+ * ibm,current-associativity-domains is a fairly recent property. If
+ * it doesn't exist, then fallback on ibm,max-associativity-domains.
+ * Current denotes what the platform can support compared to max
+ * which denotes what the Hypervisor can support.
+ */
+ domains = of_get_property(rtas, "ibm,current-associativity-domains",
+ &prop_length);
+ if (!domains) {
+ domains = of_get_property(rtas, "ibm,max-associativity-domains",
+ &prop_length);
+ if (!domains)
goto out;
}

- for (i = 0; i < numnodes; i++) {
+ max_nodes = of_read_number(&domains[min_common_depth], 1);
+ for (i = 0; i < max_nodes; i++) {
if (!node_possible(i))
node_set(i, node_possible_map);
}

+ prop_length /= sizeof(int);
+ if (prop_length > min_common_depth + 2)
+ coregroup_enabled = 1;
+
out:
of_node_put(rtas);
}
--
2.17.1

2020-07-21 11:43:28

by Srikar Dronamraju

[permalink] [raw]
Subject: [PATCH v2 10/10] powerpc/smp: Implement cpu_to_coregroup_id

Lookup the coregroup id from the associativity array.

If unable to detect the coregroup id, fallback on the core id.
This way, ensure sched_domain degenerates and an extra sched domain is
not created.

Ideally this function should have been implemented in
arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
don't need to find the primary domain again.

If the device-tree mentions more than one coregroup, then kernel
implements only the last or the smallest coregroup, which currently
corresponds to the penultimate domain in the device-tree.

Cc: linuxppc-dev <[email protected]>
Cc: LKML <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Oliver OHalloran <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Gautham R Shenoy <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Cc: Jordan Niethe <[email protected]>
Signed-off-by: Srikar Dronamraju <[email protected]>
---
Changelog v1 -> v2:
powerpc/smp: Implement cpu_to_coregroup_id
Move coregroup_enabled before getting associativity (Gautham)

arch/powerpc/mm/numa.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index ef8aa580da21..ae57b68beaee 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1218,6 +1218,26 @@ int find_and_online_cpu_nid(int cpu)

int cpu_to_coregroup_id(int cpu)
{
+ __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+ int index;
+
+ if (cpu < 0 || cpu > nr_cpu_ids)
+ return -1;
+
+ if (!coregroup_enabled)
+ goto out;
+
+ if (!firmware_has_feature(FW_FEATURE_VPHN))
+ goto out;
+
+ if (vphn_get_associativity(cpu, associativity))
+ goto out;
+
+ index = of_read_number(associativity, 1);
+ if (index > min_common_depth + 1)
+ return of_read_number(&associativity[index - 1], 1);
+
+out:
return cpu_to_core_id(cpu);
}

--
2.17.1

2020-07-22 07:07:42

by Gautham R Shenoy

[permalink] [raw]
Subject: Re: [PATCH v2 10/10] powerpc/smp: Implement cpu_to_coregroup_id

On Tue, Jul 21, 2020 at 05:08:14PM +0530, Srikar Dronamraju wrote:
> Lookup the coregroup id from the associativity array.
>
> If unable to detect the coregroup id, fallback on the core id.
> This way, ensure sched_domain degenerates and an extra sched domain is
> not created.
>
> Ideally this function should have been implemented in
> arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
> don't need to find the primary domain again.
>
> If the device-tree mentions more than one coregroup, then kernel
> implements only the last or the smallest coregroup, which currently
> corresponds to the penultimate domain in the device-tree.
>
> Cc: linuxppc-dev <[email protected]>
> Cc: LKML <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Nick Piggin <[email protected]>
> Cc: Oliver OHalloran <[email protected]>
> Cc: Nathan Lynch <[email protected]>
> Cc: Michael Neuling <[email protected]>
> Cc: Anton Blanchard <[email protected]>
> Cc: Gautham R Shenoy <[email protected]>
> Cc: Vaidyanathan Srinivasan <[email protected]>
> Cc: Jordan Niethe <[email protected]>
> Signed-off-by: Srikar Dronamraju <[email protected]>

Looks good to me.

Reviewed-by : Gautham R. Shenoy <[email protected]>


> ---
> Changelog v1 -> v2:
> powerpc/smp: Implement cpu_to_coregroup_id
> Move coregroup_enabled before getting associativity (Gautham)
>
> arch/powerpc/mm/numa.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index ef8aa580da21..ae57b68beaee 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1218,6 +1218,26 @@ int find_and_online_cpu_nid(int cpu)
>
> int cpu_to_coregroup_id(int cpu)
> {
> + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> + int index;
> +
> + if (cpu < 0 || cpu > nr_cpu_ids)
> + return -1;
> +
> + if (!coregroup_enabled)
> + goto out;
> +
> + if (!firmware_has_feature(FW_FEATURE_VPHN))
> + goto out;
> +
> + if (vphn_get_associativity(cpu, associativity))
> + goto out;
> +
> + index = of_read_number(associativity, 1);
> + if (index > min_common_depth + 1)
> + return of_read_number(&associativity[index - 1], 1);
> +
> +out:
> return cpu_to_core_id(cpu);
> }
>
> --
> 2.17.1
>

2020-07-22 07:08:09

by Gautham R Shenoy

[permalink] [raw]
Subject: Re: [PATCH v2 09/10] Powerpc/smp: Create coregroup domain

Hi Srikar,

On Tue, Jul 21, 2020 at 05:08:13PM +0530, Srikar Dronamraju wrote:
> Add percpu coregroup maps and masks to create coregroup domain.
> If a coregroup doesn't exist, the coregroup domain will be degenerated
> in favour of SMT/CACHE domain.
>
> Cc: linuxppc-dev <[email protected]>
> Cc: LKML <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Nick Piggin <[email protected]>
> Cc: Oliver OHalloran <[email protected]>
> Cc: Nathan Lynch <[email protected]>
> Cc: Michael Neuling <[email protected]>
> Cc: Anton Blanchard <[email protected]>
> Cc: Gautham R Shenoy <[email protected]>
> Cc: Vaidyanathan Srinivasan <[email protected]>
> Cc: Jordan Niethe <[email protected]>
> Signed-off-by: Srikar Dronamraju <[email protected]>

A query below.

> ---
> Changelog v1 -> v2:
> Powerpc/smp: Create coregroup domain
> Moved coregroup topology fixup to fixup_topology (Gautham)
>
> arch/powerpc/include/asm/topology.h | 10 ++++++++
> arch/powerpc/kernel/smp.c | 38 +++++++++++++++++++++++++++++
> arch/powerpc/mm/numa.c | 5 ++++
> 3 files changed, 53 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index f0b6300e7dd3..6609174918ab 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h

[..snip..]

> @@ -91,6 +92,7 @@ enum {
> smt_idx,
> #endif
> bigcore_idx,
> + mc_idx,
> die_idx,
> };
>


[..snip..]

> @@ -879,6 +896,7 @@ static struct sched_domain_topology_level powerpc_topology[] = {
> { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> #endif
> { cpu_bigcore_mask, SD_INIT_NAME(BIGCORE) },
> + { cpu_mc_mask, SD_INIT_NAME(MC) },
> { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> { NULL, },
> };


[..snip..]

> @@ -1386,6 +1421,9 @@ int setup_profiling_timer(unsigned int multiplier)
>
> static void fixup_topology(void)
> {
> + if (!has_coregroup_support())
> + powerpc_topology[mc_idx].mask = cpu_bigcore_mask;
> +

Shouldn't we move this condition after doing the fixup for shared
caches ? Because if we have shared_caches, but not core_group, then we
want the coregroup domain to degenerate correctly.


> if (shared_caches) {
> pr_info("Using shared cache scheduler topology\n");
> powerpc_topology[bigcore_idx].mask = shared_cache_mask;


--
Thanks and regards
gautham.

2020-07-22 07:33:01

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH v2 09/10] Powerpc/smp: Create coregroup domain

> > @@ -1386,6 +1421,9 @@ int setup_profiling_timer(unsigned int multiplier)
> >
> > static void fixup_topology(void)
> > {
> > + if (!has_coregroup_support())
> > + powerpc_topology[mc_idx].mask = cpu_bigcore_mask;
> > +
>
> Shouldn't we move this condition after doing the fixup for shared
> caches ? Because if we have shared_caches, but not core_group, then we
> want the coregroup domain to degenerate correctly.
>

Currently we aren't consolidating, and hence the order doesn't matter for
degeneration. However even if in future, if we want to consolidate the
domains before calling set_sched_topology(), this order would be ideal.

>
> > if (shared_caches) {
> > pr_info("Using shared cache scheduler topology\n");
> > powerpc_topology[bigcore_idx].mask = shared_cache_mask;
>

--
Thanks and Regards
Srikar Dronamraju