2022-04-22 21:02:42

by 王擎

[permalink] [raw]
Subject: [PATCH V2 2/2] arm64: Add complex scheduler level for arm64

From: Wang Qing <[email protected]>

The DSU-110 DynamIQ™ cluster supports blocks that are called complexes
which contain up to two cores of the same type and some shared logic.
Sharing some logic between the cores can make a complex area efficient.

This patch adds complex level for complexs and automatically enables
the load balance among complexs. It will directly benefit a lot of
workload which loves more resources such as memory bandwidth, caches.

Testing has been done with Stream benchmark:
8threads stream (2 little cores * 2(complex) + 3 medium cores + 1 big core)
stream stream
w/o patch w/ patch
MB/sec copy 37579.2 ( 0.00%) 39127.3 ( 4.12%)
MB/sec scale 38261.1 ( 0.00%) 39195.4 ( 2.44%)
MB/sec add 39497.0 ( 0.00%) 41101.5 ( 4.06%)
MB/sec triad 39885.6 ( 0.00%) 40772.7 ( 2.22%)

And in order to support this features, we defined arm64_topology.

V2:
fix commit log and loop more

Signed-off-by: Wang Qing <[email protected]>
---
arch/arm64/Kconfig | 13 +++++++++++
arch/arm64/kernel/smp.c | 48 ++++++++++++++++++++++++++++++++++++++++-
2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index edbe035cb0e3..4063de8c6153 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1207,6 +1207,19 @@ config SCHED_CLUSTER
by sharing mid-level caches, last-level cache tags or internal
busses.

+config SCHED_COMPLEX
+ bool "Complex scheduler support"
+ help
+ DSU supports blocks that are called complexes which contain up to
+ two cores of the same type and some shared logic. Sharing some logic
+ between the cores can make a complex area efficient.
+
+ Complex also can be considered as a shared cache group smaller
+ than cluster.
+
+ Complex scheduler support improves the CPU scheduler's decision
+ making when dealing with machines that have complexs of CPUs.
+
config SCHED_SMT
bool "SMT scheduler support"
help
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 3b46041f2b97..526765112146 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -14,6 +14,7 @@
#include <linux/sched/mm.h>
#include <linux/sched/hotplug.h>
#include <linux/sched/task_stack.h>
+#include <linux/sched/topology.h>
#include <linux/interrupt.h>
#include <linux/cache.h>
#include <linux/profile.h>
@@ -57,6 +58,10 @@
DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
EXPORT_PER_CPU_SYMBOL(cpu_number);

+#ifdef SCHED_COMPLEX
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_t, cpu_complex_map);
+#endif
+
/*
* as from 2.5, kernels no longer have an init_tasks structure
* so we need some other way of telling a new secondary core
@@ -715,6 +720,47 @@ void __init smp_init_cpus(void)
}
}

+#ifdef SCHED_COMPLEX
+static int arm64_complex_flags(void)
+{
+ return SD_SHARE_PKG_RESOURCES;
+}
+
+const struct cpumask *arm64_complex_mask(int cpu)
+{
+ const struct cpumask *core_mask = cpu_cpu_mask(cpu);
+
+ /* Find the smaller shared cache level than clustergroup and coregroup*/
+#ifdef CONFIG_SCHED_MC
+ core_mask = cpu_coregroup_mask(cpu);
+#endif
+#ifdef CONFIG_SCHED_CLUSTER
+ core_mask = cpu_clustergroup_mask(cpu);
+#endif
+
+ find_subset_of_share_cache(core_mask, cpu, &per_cpu(cpu_complex_map, cpu));
+
+ return &per_cpu(cpu_complex_map, cpu);
+}
+#endif
+
+static struct sched_domain_topology_level arm64_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+ { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+#ifdef CONFIG_SCHED_COMPLEX
+ { arm64_complex_mask, arm64_complex_flags, SD_INIT_NAME(CPL) },
+#endif
+#ifdef CONFIG_SCHED_CLUSTER
+ { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
+#endif
+#ifdef CONFIG_SCHED_MC
+ { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+#endif
+ { cpu_cpu_mask, SD_INIT_NAME(DIE) },
+ { NULL, },
+};
+
void __init smp_prepare_cpus(unsigned int max_cpus)
{
const struct cpu_operations *ops;
@@ -723,9 +769,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
unsigned int this_cpu;

init_cpu_topology();
-
this_cpu = smp_processor_id();
store_cpu_topology(this_cpu);
+ set_sched_topology(arm64_topology);
numa_store_cpu_info(this_cpu);
numa_add_cpu(this_cpu);

--
2.7.4


2022-04-27 10:26:31

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] arm64: Add complex scheduler level for arm64

Hi Qing,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on driver-core/driver-core-testing linus/master arm-perf/for-next/perf v5.18-rc4 next-20220422]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/intel-lab-lkp/linux/commits/Qing-Wang/Add-complex-scheduler-level-for-arm64/20220422-201107
base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-allyesconfig (https://download.01.org/0day-ci/archive/20220427/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1cddcfdc3c683b393df1a5c9063252eb60e52818)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/3b18155ccd99fb790e719fa432366dfdb97ab57c
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Qing-Wang/Add-complex-scheduler-level-for-arm64/20220422-201107
git checkout 3b18155ccd99fb790e719fa432366dfdb97ab57c
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

>> arch/arm64/kernel/smp.c:752:4: error: use of undeclared identifier 'arm64_complex_mask'
{ arm64_complex_mask, arm64_complex_flags, SD_INIT_NAME(CPL) },
^
>> arch/arm64/kernel/smp.c:752:24: error: use of undeclared identifier 'arm64_complex_flags'
{ arm64_complex_mask, arm64_complex_flags, SD_INIT_NAME(CPL) },
^
2 errors generated.


vim +/arm64_complex_mask +752 arch/arm64/kernel/smp.c

746
747 static struct sched_domain_topology_level arm64_topology[] = {
748 #ifdef CONFIG_SCHED_SMT
749 { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
750 #endif
751 #ifdef CONFIG_SCHED_COMPLEX
> 752 { arm64_complex_mask, arm64_complex_flags, SD_INIT_NAME(CPL) },
753 #endif
754 #ifdef CONFIG_SCHED_CLUSTER
755 { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
756 #endif
757 #ifdef CONFIG_SCHED_MC
758 { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
759 #endif
760 { cpu_cpu_mask, SD_INIT_NAME(DIE) },
761 { NULL, },
762 };
763

--
0-DAY CI Kernel Test Service
https://01.org/lkp