2021-05-27 17:36:19

by Beata Michalska

[permalink] [raw]
Subject: [PATCH v6 0/3] Rework CPU capacity asymmetry detection

As of now, the asym_cpu_capacity_level will try to locate the lowest
topology level where the highest available CPU capacity is being
visible to all CPUs. This works perfectly fine for most of existing
asymmetric designs out there, though for some possible and completely
valid setups, combining different cpu microarchitectures within
clusters, this might not be the best approach, resulting in pointing
at a level, at which some of the domains might not see any asymmetry
at all. This could be problematic for misfit migration and/or energy
aware placement. And as such, for affected platforms it might result
in custom changes to wake-up and CPU selection paths.

As mentioned in the previous version, based on the available sources out there,
one of the potentially affected (by original approach) platforms might be
Exynos 9820/990 with it's 'sliced' LLC(SLC) divided between the two custom (big)
cores and the remaining A75/A55 cores, which seems to be reflected in the
made available dt entries for those platforms.

The following patches rework how the asymmetric detection is being
carried out, allowing pinning the asymmetric topology level to the lowest one,
where full range of CPU capacities is visible to all CPUs within given
sched domain. The asym_cpu_capacity_level will also keep track of those
levels where any scope of asymmetry is being observed, to denote
corresponding sched domains with the SD_ASYM_CPUCAPACITY flag
and to enable misfit migration for those.

In order to distinguish the sched domains with partial vs full range
of CPU capacity asymmetry, new sched domain flag has been introduced:
SD_ASYM_CPUCAPACITY_FULL.

The overall idea of changing the asymmetry detection has been suggested
by Valentin Schneider <[email protected]>

Verified on (mostly):
- QEMU (version 4.2.1) with variants of possible asymmetric topologies
- machine: virt
- modifying the device-tree 'cpus' node for virt machine:

qemu-system-aarch64 -kernel $KERNEL_IMG
-drive format=qcow2,file=$IMAGE
-append 'root=/dev/vda earlycon console=ttyAMA0 sched_debug
sched_verbose loglevel=15 kmemleak=on' -m 2G --nographic
-cpu cortex-a57 -machine virt -smp cores=8
-machine dumpdtb=$CUSTOM_DTB.dtb

$KERNEL_PATH/scripts/dtc/dtc -I dtb -O dts $CUSTOM_DTB.dts >
$CUSTOM_DTB.dtb

(modify the dts)

$KERNEL_PATH/scripts/dtc/dtc -I dts -O dtb $CUSTOM_DTB.dts >
$CUSTOM_DTB.dtb

qemu-system-aarch64 -kernel $KERNEL_IMG
-drive format=qcow2,file=$IMAGE
-append 'root=/dev/vda earlycon console=ttyAMA0 sched_debug
sched_verbose loglevel=15 kmemleak=on' -m 2G --nographic
-cpu cortex-a57 -machine virt -smp cores=8
-machine dtb=$CUSTOM_DTB.dtb

v6:
- improving code readability
v5:
- building CPUs list based on their capacity now triggered upon init
and explicit request from arch specific code to rebuild sched domains
- detecting asymmetry scope now done directly in sd_init
v4:
- Based on Peter's idea, reworking asym detection to use per-cpu
capacity list to serve as base for determining the asym scope
v3:
- Additional style/doc fixes
v2:
- Fixed style issues
- Reworked accessing the cached topology data as suggested by Valentin



Beata Michalska (3):
sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
sched/topology: Rework CPU capacity asymmetry detection
sched/doc: Update the CPU capacity asymmetry bits

Documentation/scheduler/sched-capacity.rst | 6 +-
Documentation/scheduler/sched-energy.rst | 2 +-
include/linux/sched/sd_flags.h | 10 ++
kernel/sched/topology.c | 194 +++++++++++++--------
4 files changed, 133 insertions(+), 79 deletions(-)

--
2.17.1


2021-05-27 17:36:25

by Beata Michalska

[permalink] [raw]
Subject: [PATCH v6 1/3] sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag

Introducing new, complementary to SD_ASYM_CPUCAPACITY, sched_domain
topology flag, to distinguish between shed_domains where any CPU
capacity asymmetry is detected (SD_ASYM_CPUCAPACITY) and ones where
a full set of CPU capacities is visible to all domain members
(SD_ASYM_CPUCAPACITY_FULL).

With the distinction between full and partial CPU capacity asymmetry,
brought in by the newly introduced flag, the scope of the original
SD_ASYM_CPUCAPACITY flag gets shifted, still maintaining the existing
behaviour when one is detected on a given sched domain, allowing
misfit migrations within sched domains that do not observe full range
of CPU capacities but still do have members with different capacity
values. It loses though it's meaning when it comes to the lowest CPU
asymmetry sched_domain level per-cpu pointer, which is to be now
denoted by SD_ASYM_CPUCAPACITY_FULL flag.

Signed-off-by: Beata Michalska <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 34b21e971d77..57bde66d95f7 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -90,6 +90,16 @@ SD_FLAG(SD_WAKE_AFFINE, SDF_SHARED_CHILD)
*/
SD_FLAG(SD_ASYM_CPUCAPACITY, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)

+/*
+ * Domain members have different CPU capacities spanning all unique CPU
+ * capacity values.
+ *
+ * SHARED_PARENT: Set from the topmost domain down to the first domain where
+ * all available CPU capacities are visible
+ * NEEDS_GROUPS: Per-CPU capacity is asymmetric between groups.
+ */
+SD_FLAG(SD_ASYM_CPUCAPACITY_FULL, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
+
/*
* Domain members share CPU capacity (i.e. SMT)
*
--
2.17.1

2021-06-02 19:11:48

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v6 0/3] Rework CPU capacity asymmetry detection

On 27/05/2021 17:38, Beata Michalska wrote:

[...]

> Beata Michalska (3):
> sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
> sched/topology: Rework CPU capacity asymmetry detection
> sched/doc: Update the CPU capacity asymmetry bits
>
> Documentation/scheduler/sched-capacity.rst | 6 +-
> Documentation/scheduler/sched-energy.rst | 2 +-
> include/linux/sched/sd_flags.h | 10 ++
> kernel/sched/topology.c | 194 +++++++++++++--------
> 4 files changed, 133 insertions(+), 79 deletions(-)

Looks good to me, even though I would like to see a more compact version
of asym_cpu_capacity_classify(). Details in my response to [PATCH v6 2/3].

Did some level of testing myself and wasn't able to break it.

Reviewed-by: Dietmar Eggemann <[email protected]>