2022-04-22 20:25:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] arm64: Add complex scheduler level for arm64

On Fri, Apr 22, 2022 at 04:51:26AM -0700, Qing Wang wrote:
> From: Wang Qing <[email protected]>
>
> The DSU-110 DynamIQ™ cluster supports blocks that are called complexes
> which contain up to two cores of the same type and some shared logic.
> Sharing some logic between the cores can make a complex area efficient.
>
> This patch adds complex level for complexs and automatically enables
> the load balance among complexs. It will directly benefit a lot of
> workload which loves more resources such as memory bandwidth, caches.
>
> Testing has been done with Stream benchmark:
> 8threads stream (2 little cores * 2(complex) + 3 medium cores + 1 big core)
> stream stream
> w/o patch w/ patch
> MB/sec copy 37579.2 ( 0.00%) 39127.3 ( 4.12%)
> MB/sec scale 38261.1 ( 0.00%) 39195.4 ( 2.44%)
> MB/sec add 39497.0 ( 0.00%) 41101.5 ( 4.06%)
> MB/sec triad 39885.6 ( 0.00%) 40772.7 ( 2.22%)
>
> And in order to support this features, we defined arm64_topology.
>
> V2:
> fix commit log and loop more
>
> Signed-off-by: Wang Qing <[email protected]>
> ---
> arch/arm64/Kconfig | 13 +++++++++++
> arch/arm64/kernel/smp.c | 48 ++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index edbe035cb0e3..4063de8c6153 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1207,6 +1207,19 @@ config SCHED_CLUSTER
> by sharing mid-level caches, last-level cache tags or internal
> busses.
>
> +config SCHED_COMPLEX
> + bool "Complex scheduler support"
> + help
> + DSU supports blocks that are called complexes which contain up to
> + two cores of the same type and some shared logic. Sharing some logic
> + between the cores can make a complex area efficient.
> +
> + Complex also can be considered as a shared cache group smaller
> + than cluster.
> +
> + Complex scheduler support improves the CPU scheduler's decision
> + making when dealing with machines that have complexs of CPUs.
> +
> config SCHED_SMT
> bool "SMT scheduler support"
> help
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 3b46041f2b97..526765112146 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -14,6 +14,7 @@
> #include <linux/sched/mm.h>
> #include <linux/sched/hotplug.h>
> #include <linux/sched/task_stack.h>
> +#include <linux/sched/topology.h>
> #include <linux/interrupt.h>
> #include <linux/cache.h>
> #include <linux/profile.h>
> @@ -57,6 +58,10 @@
> DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
> EXPORT_PER_CPU_SYMBOL(cpu_number);
>
> +#ifdef SCHED_COMPLEX
> +DEFINE_PER_CPU_READ_MOSTLY(cpumask_t, cpu_complex_map);
> +#endif

ifdefs should not be in .c files.


> +
> /*
> * as from 2.5, kernels no longer have an init_tasks structure
> * so we need some other way of telling a new secondary core
> @@ -715,6 +720,47 @@ void __init smp_init_cpus(void)
> }
> }
>
> +#ifdef SCHED_COMPLEX

same here.

> +static int arm64_complex_flags(void)
> +{
> + return SD_SHARE_PKG_RESOURCES;
> +}
> +
> +const struct cpumask *arm64_complex_mask(int cpu)
> +{
> + const struct cpumask *core_mask = cpu_cpu_mask(cpu);
> +
> + /* Find the smaller shared cache level than clustergroup and coregroup*/
> +#ifdef CONFIG_SCHED_MC
> + core_mask = cpu_coregroup_mask(cpu);
> +#endif
> +#ifdef CONFIG_SCHED_CLUSTER
> + core_mask = cpu_clustergroup_mask(cpu);
> +#endif

See, same here. This is a mess and unmaintainable.

thanks,

greg k-h


2022-04-24 03:26:41

by 王擎

[permalink] [raw]
Subject: [PATCH V2 2/2] arm64: Add complex scheduler level for arm64


>> From: Wang Qing <[email protected]>
>>
>> The DSU-110 DynamIQ™ cluster supports blocks that are called complexes
>> which contain up to two cores of the same type and some shared logic.
>> Sharing some logic between the cores can make a complex area efficient.
>>
>> This patch adds complex level for complexs and automatically enables
>> the load balance among complexs. It will directly benefit a lot of
>> workload which loves more resources such as memory bandwidth, caches.
>>
>> Testing has been done with Stream benchmark:
>> 8threads stream (2 little cores * 2(complex) + 3 medium cores + 1 big core)
>>                 stream                 stream
>>                 w/o patch              w/ patch
>> MB/sec copy     37579.2 (   0.00%)    39127.3 (   4.12%)
>> MB/sec scale    38261.1 (   0.00%)    39195.4 (   2.44%)
>> MB/sec add      39497.0 (   0.00%)    41101.5 (   4.06%)
>> MB/sec triad    39885.6 (   0.00%)    40772.7 (   2.22%)
>>
>> And in order to support this features, we defined arm64_topology.
>>
>> V2:
>> fix commit log and loop more
>>
>> Signed-off-by: Wang Qing <[email protected]>
>> ---
>>  arch/arm64/Kconfig      | 13 +++++++++++
>>  arch/arm64/kernel/smp.c | 48 ++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 60 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index edbe035cb0e3..4063de8c6153 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1207,6 +1207,19 @@ config SCHED_CLUSTER
>>          by sharing mid-level caches, last-level cache tags or internal
>>          busses.
>> 
>> +config SCHED_COMPLEX
>> +     bool "Complex scheduler support"
>> +     help
>> +       DSU supports blocks that are called complexes which contain up to
>> +       two cores of the same type and some shared logic. Sharing some logic
>> +       between the cores can make a complex area efficient.
>> +
>> +       Complex also can be considered as a shared cache group smaller
>> +       than cluster.
>> +
>> +       Complex scheduler support improves the CPU scheduler's decision
>> +       making when dealing with machines that have complexs of CPUs.
>> +
>>  config SCHED_SMT
>>        bool "SMT scheduler support"
>>        help
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index 3b46041f2b97..526765112146 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -14,6 +14,7 @@
>>  #include <linux/sched/mm.h>
>>  #include <linux/sched/hotplug.h>
>>  #include <linux/sched/task_stack.h>
>> +#include <linux/sched/topology.h>
>>  #include <linux/interrupt.h>
>>  #include <linux/cache.h>
>>  #include <linux/profile.h>
>> @@ -57,6 +58,10 @@
>>  DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
>>  EXPORT_PER_CPU_SYMBOL(cpu_number);
>> 
>> +#ifdef SCHED_COMPLEX
>> +DEFINE_PER_CPU_READ_MOSTLY(cpumask_t, cpu_complex_map);
>> +#endif
>
>ifdefs should not be in .c files.

But I see a lot of ifdefs in .c files, change to IsEnabled() instead?
I'm just follow the x86_topology and default_topology does.

Thanks,
Qing

>
>
>> +
>>  /*
>>   * as from 2.5, kernels no longer have an init_tasks structure
>>   * so we need some other way of telling a new secondary core
>> @@ -715,6 +720,47 @@ void __init smp_init_cpus(void)
>>        }
>>  }
>> 
>> +#ifdef SCHED_COMPLEX
>
>same here.
>
>> +static int arm64_complex_flags(void)
>> +{
>> +     return SD_SHARE_PKG_RESOURCES;
>> +}
>> +
>> +const struct cpumask *arm64_complex_mask(int cpu)
>> +{
>> +     const struct cpumask *core_mask = cpu_cpu_mask(cpu);
>> +
>> +     /* Find the smaller shared cache level than clustergroup and coregroup*/
>> +#ifdef CONFIG_SCHED_MC
>> +     core_mask = cpu_coregroup_mask(cpu);
>> +#endif
>> +#ifdef CONFIG_SCHED_CLUSTER
>> +     core_mask = cpu_clustergroup_mask(cpu);
>> +#endif
>
>See, same here.  This is a mess and unmaintainable.
>
>thanks,
>
>greg k-h