As of now, the asym_cpu_capacity_level will try to locate the lowest
topology level where the highest available CPU capacity is being
visible to all CPUs. This works perfectly fine for most of existing
asymmetric designs out there, though for some possible and completely
valid setups, combining different cpu microarchitectures within
clusters, this might not be the best approach, resulting in pointing
at a level, at which some of the domains might not see any asymmetry
at all. This could be problematic for misfit migration and/or energy
aware placement. And as such, for affected platforms it might result
in custom changes to wake-up and CPU selection paths.
As mentioned in the previous version, based on the available sources out there,
one of the potentially affected (by original approach) platforms might be
Exynos 9820/990 with it's 'sliced' LLC(SLC) divided between the two custom (big)
cores and the remaining A75/A55 cores, which seems to be reflected in the
made available dt entries for those platforms.
The following patches rework how the asymmetric detection is being
carried out, allowing pinning the asymmetric topology level to the lowest one,
where full range of CPU capacities is visible to all CPUs within given
sched domain. The asym_cpu_capacity_level will also keep track of those
levels where any scope of asymmetry is being observed, to denote
corresponding sched domains with the SD_ASYM_CPUCAPACITY flag
and to enable misfit migration for those.
In order to distinguish the sched domains with partial vs full range
of CPU capacity asymmetry, new sched domain flag has been introduced:
SD_ASYM_CPUCAPACITY_FULL.
The overall idea of changing the asymmetry detection has been suggested
by Valentin Schneider <[email protected]>
Verified on (mostly):
- QEMU (version 4.2.1) with variants of possible asymmetric topologies
- machine: virt
- modifying the device-tree 'cpus' node for virt machine:
qemu-system-aarch64 -kernel $KERNEL_IMG
-drive format=qcow2,file=$IMAGE
-append 'root=/dev/vda earlycon console=ttyAMA0 sched_debug
sched_verbose loglevel=15 kmemleak=on' -m 2G --nographic
-cpu cortex-a57 -machine virt -smp cores=8
-machine dumpdtb=$CUSTOM_DTB.dtb
$KERNEL_PATH/scripts/dtc/dtc -I dtb -O dts $CUSTOM_DTB.dts >
$CUSTOM_DTB.dtb
(modify the dts)
$KERNEL_PATH/scripts/dtc/dtc -I dts -O dtb $CUSTOM_DTB.dts >
$CUSTOM_DTB.dtb
qemu-system-aarch64 -kernel $KERNEL_IMG
-drive format=qcow2,file=$IMAGE
-append 'root=/dev/vda earlycon console=ttyAMA0 sched_debug
sched_verbose loglevel=15 kmemleak=on' -m 2G --nographic
-cpu cortex-a57 -machine virt -smp cores=8
-machine dtb=$CUSTOM_DTB.dtb
v5:
- building CPUs list based on their capacity now triggered upon init
and explicit request from arch specific code to rebuild sched domains
- detecting asymmetry scope now done directly in sd_init
v4:
- Based on Peter's idea, reworking asym detection to use per-cpu
capacity list to serve as base for determining the asym scope
v3:
- Additional style/doc fixes
v2:
- Fixed style issues
- Reworked accessing the cached topology data as suggested by Valentin
Beata Michalska (3):
sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
sched/topology: Rework CPU capacity asymmetry detection
sched/doc: Update the CPU capacity asymmetry bits
Documentation/scheduler/sched-capacity.rst | 6 +-
Documentation/scheduler/sched-energy.rst | 2 +-
include/linux/sched/sd_flags.h | 10 ++
kernel/sched/topology.c | 194 +++++++++++++--------
4 files changed, 133 insertions(+), 79 deletions(-)
--
2.17.1
Currently the CPU capacity asymmetry detection, performed through
asym_cpu_capacity_level, tries to identify the lowest topology level
at which the highest CPU capacity is being observed, not necessarily
finding the level at which all possible capacity values are visible
to all CPUs, which might be bit problematic for some possible/valid
asymmetric topologies i.e.:
DIE [ ]
MC [ ][ ]
CPU [0] [1] [2] [3] [4] [5] [6] [7]
Capacity |.....| |.....| |.....| |.....|
L M B B
Where:
arch_scale_cpu_capacity(L) = 512
arch_scale_cpu_capacity(M) = 871
arch_scale_cpu_capacity(B) = 1024
In this particular case, the asymmetric topology level will point
at MC, as all possible CPU masks for that level do cover the CPU
with the highest capacity. It will work just fine for the first
cluster, not so much for the second one though (consider the
find_energy_efficient_cpu which might end up attempting the energy
aware wake-up for a domain that does not see any asymmetry at all)
Rework the way the capacity asymmetry levels are being detected,
allowing to point to the lowest topology level (for a given CPU), where
full set of available CPU capacities is visible to all CPUs within given
domain. As a result, the per-cpu sd_asym_cpucapacity might differ across
the domains. This will have an impact on EAS wake-up placement in a way
that it might see different rage of CPUs to be considered, depending on
the given current and target CPUs.
Additionally, those levels, where any range of asymmetry (not
necessarily full) is being detected will get identified as well.
The selected asymmetric topology level will be denoted by
SD_ASYM_CPUCAPACITY_FULL sched domain flag whereas the 'sub-levels'
would receive the already used SD_ASYM_CPUCAPACITY flag. This allows
maintaining the current behaviour for asymmetric topologies, with
misfit migration operating correctly on lower levels, if applicable,
as any asymmetry is enough to trigger the misfit migration.
The logic there relies on the SD_ASYM_CPUCAPACITY flag and does not
relate to the full asymmetry level denoted by the sd_asym_cpucapacity
pointer.
Detecting the CPU capacity asymmetry is being based on a set of
available CPU capacities for all possible CPUs. This data is being
generated upon init and updated once CPU topology changes are being
detected (through arch_update_cpu_topology). As such, any changes
to identified CPU capacities (like initializing cpufreq) need to be
explicitly advertised by corresponding archs to trigger rebuilding
the data.
This patch also removes the additional -dflags- parameter used when
building sched domains as the asymmetry flags are now being set
directly in sd_init.
Suggested-by: Peter Zijlstra <[email protected]>
Suggested-by: Valentin Schneider <[email protected]>
Signed-off-by: Beata Michalska <[email protected]>
---
kernel/sched/topology.c | 194 ++++++++++++++++++++++++----------------
1 file changed, 118 insertions(+), 76 deletions(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 55a0a243e871..bb3d3f6b5d98 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -675,7 +675,7 @@ static void update_top_cache_domain(int cpu)
sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
- sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY);
+ sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL);
rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd);
}
@@ -1266,6 +1266,112 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
update_group_capacity(sd, cpu);
}
+/**
+ * Asymmetric CPU capacity bits
+ */
+struct asym_cap_data {
+ struct list_head link;
+ unsigned long capacity;
+ struct cpumask *cpu_mask;
+};
+
+/*
+ * Set of available CPUs grouped by their corresponding capacities
+ * Each list entry contains a CPU mask reflecting CPUs that share the same
+ * capacity.
+ * The lifespan of data is unlimited.
+ */
+static LIST_HEAD(asym_cap_list);
+
+/*
+ * Verify whether there is any CPU capacity asymmetry in a given sched domain
+ * Provides sd_flags reflecting the asymmetry scope.
+ */
+static inline int
+asym_cpu_capacity_classify(struct sched_domain *sd,
+ const struct cpumask *cpu_map)
+{
+ int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
+ struct asym_cap_data *entry;
+ int asym_cap_count = 0;
+
+ if (list_is_singular(&asym_cap_list))
+ goto leave;
+
+ list_for_each_entry(entry, &asym_cap_list, link) {
+ if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
+ ++asym_cap_count;
+ } else {
+ /*
+ * CPUs with given capacity might be offline
+ * so make sure this is not the case
+ */
+ if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
+ sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
+ if (asym_cap_count > 1)
+ break;
+ }
+ }
+ }
+ WARN_ON_ONCE(!asym_cap_count);
+leave:
+ return asym_cap_count > 1 ? sd_asym_flags : 0;
+}
+
+static inline struct asym_cap_data *
+asym_cpu_capacity_get_data(unsigned long capacity)
+{
+ struct asym_cap_data *entry = NULL;
+
+ list_for_each_entry(entry, &asym_cap_list, link) {
+ if (capacity == entry->capacity)
+ goto done;
+ }
+
+ entry = kzalloc(sizeof(*entry) + cpumask_size(), GFP_KERNEL);
+ if (WARN_ONCE(!entry, "Failed to allocate memory for asymmetry data\n"))
+ goto done;
+ entry->capacity = capacity;
+ entry->cpu_mask = (struct cpumask *)((char *)entry + sizeof(*entry));
+ list_add(&entry->link, &asym_cap_list);
+done:
+ return entry;
+}
+
+/*
+ * Build-up/update list of CPUs grouped by their capacities
+ * An update requires explicit request to rebuild sched domains
+ * with state indicating CPU topology changes.
+ */
+static void asym_cpu_capacity_scan(void)
+{
+ struct asym_cap_data *entry, *next;
+ int cpu;
+
+ list_for_each_entry(entry, &asym_cap_list, link)
+ cpumask_clear(entry->cpu_mask);
+
+ entry = list_first_entry_or_null(&asym_cap_list,
+ struct asym_cap_data, link);
+
+ for_each_cpu_and(cpu, cpu_possible_mask,
+ housekeeping_cpumask(HK_FLAG_DOMAIN)) {
+ unsigned long capacity = arch_scale_cpu_capacity(cpu);
+
+ if (!entry || capacity != entry->capacity)
+ entry = asym_cpu_capacity_get_data(capacity);
+ if (entry)
+ __cpumask_set_cpu(cpu, entry->cpu_mask);
+ }
+
+ list_for_each_entry_safe(entry, next, &asym_cap_list, link) {
+ if (cpumask_empty(entry->cpu_mask)) {
+ list_del(&entry->link);
+ kfree(entry);
+ }
+ }
+}
+
/*
* Initializers for schedule domains
* Non-inlined to reduce accumulated stack pressure in build_sched_domains()
@@ -1399,7 +1505,7 @@ int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE;
static struct sched_domain *
sd_init(struct sched_domain_topology_level *tl,
const struct cpumask *cpu_map,
- struct sched_domain *child, int dflags, int cpu)
+ struct sched_domain *child, int cpu)
{
struct sd_data *sdd = &tl->data;
struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
@@ -1420,9 +1526,6 @@ sd_init(struct sched_domain_topology_level *tl,
"wrong sd_flags in topology description\n"))
sd_flags &= TOPOLOGY_SD_FLAGS;
- /* Apply detected topology flags */
- sd_flags |= dflags;
-
*sd = (struct sched_domain){
.min_interval = sd_weight,
.max_interval = 2*sd_weight,
@@ -1457,10 +1560,10 @@ sd_init(struct sched_domain_topology_level *tl,
cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
sd_id = cpumask_first(sched_domain_span(sd));
+ sd->flags |= asym_cpu_capacity_classify(sd, cpu_map);
/*
* Convert topological properties into behaviour.
*/
-
/* Don't attempt to spread across CPUs of different capacities. */
if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
sd->child->flags &= ~SD_PREFER_SIBLING;
@@ -1926,9 +2029,9 @@ static void __sdt_free(const struct cpumask *cpu_map)
static struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
const struct cpumask *cpu_map, struct sched_domain_attr *attr,
- struct sched_domain *child, int dflags, int cpu)
+ struct sched_domain *child, int cpu)
{
- struct sched_domain *sd = sd_init(tl, cpu_map, child, dflags, cpu);
+ struct sched_domain *sd = sd_init(tl, cpu_map, child, cpu);
if (child) {
sd->level = child->level + 1;
@@ -1990,65 +2093,6 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
return true;
}
-/*
- * Find the sched_domain_topology_level where all CPU capacities are visible
- * for all CPUs.
- */
-static struct sched_domain_topology_level
-*asym_cpu_capacity_level(const struct cpumask *cpu_map)
-{
- int i, j, asym_level = 0;
- bool asym = false;
- struct sched_domain_topology_level *tl, *asym_tl = NULL;
- unsigned long cap;
-
- /* Is there any asymmetry? */
- cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));
-
- for_each_cpu(i, cpu_map) {
- if (arch_scale_cpu_capacity(i) != cap) {
- asym = true;
- break;
- }
- }
-
- if (!asym)
- return NULL;
-
- /*
- * Examine topology from all CPU's point of views to detect the lowest
- * sched_domain_topology_level where a highest capacity CPU is visible
- * to everyone.
- */
- for_each_cpu(i, cpu_map) {
- unsigned long max_capacity = arch_scale_cpu_capacity(i);
- int tl_id = 0;
-
- for_each_sd_topology(tl) {
- if (tl_id < asym_level)
- goto next_level;
-
- for_each_cpu_and(j, tl->mask(i), cpu_map) {
- unsigned long capacity;
-
- capacity = arch_scale_cpu_capacity(j);
-
- if (capacity <= max_capacity)
- continue;
-
- max_capacity = capacity;
- asym_level = tl_id;
- asym_tl = tl;
- }
-next_level:
- tl_id++;
- }
- }
-
- return asym_tl;
-}
-
-
/*
* Build sched domains for a given set of CPUs and attach the sched domains
* to the individual CPUs
@@ -2061,7 +2105,6 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
struct s_data d;
struct rq *rq = NULL;
int i, ret = -ENOMEM;
- struct sched_domain_topology_level *tl_asym;
bool has_asym = false;
if (WARN_ON(cpumask_empty(cpu_map)))
@@ -2071,24 +2114,19 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
if (alloc_state != sa_rootdomain)
goto error;
- tl_asym = asym_cpu_capacity_level(cpu_map);
-
/* Set up domains for CPUs specified by the cpu_map: */
for_each_cpu(i, cpu_map) {
struct sched_domain_topology_level *tl;
- int dflags = 0;
sd = NULL;
for_each_sd_topology(tl) {
- if (tl == tl_asym) {
- dflags |= SD_ASYM_CPUCAPACITY;
- has_asym = true;
- }
if (WARN_ON(!topology_span_sane(tl, cpu_map, i)))
goto error;
- sd = build_sched_domain(tl, cpu_map, attr, sd, dflags, i);
+ sd = build_sched_domain(tl, cpu_map, attr, sd, i);
+
+ has_asym |= sd->flags & SD_ASYM_CPUCAPACITY;
if (tl == sched_domain_topology)
*per_cpu_ptr(d.sd, i) = sd;
@@ -2217,6 +2255,7 @@ int sched_init_domains(const struct cpumask *cpu_map)
zalloc_cpumask_var(&fallback_doms, GFP_KERNEL);
arch_update_cpu_topology();
+ asym_cpu_capacity_scan();
ndoms_cur = 1;
doms_cur = alloc_sched_domains(ndoms_cur);
if (!doms_cur)
@@ -2299,6 +2338,9 @@ void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
/* Let the architecture update CPU core mappings: */
new_topology = arch_update_cpu_topology();
+ /* Trigger rebuilding CPU capacity asymmetry data */
+ if (new_topology)
+ asym_cpu_capacity_scan();
if (!doms_new) {
WARN_ON_ONCE(dattr_new);
--
2.17.1
Update the documentation bits referring to capacity aware scheduling
with regards to newly introduced SD_ASYM_CPUCAPACITY_FULL sched_domain
flag.
Signed-off-by: Beata Michalska <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
---
Documentation/scheduler/sched-capacity.rst | 6 ++++--
Documentation/scheduler/sched-energy.rst | 2 +-
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/Documentation/scheduler/sched-capacity.rst b/Documentation/scheduler/sched-capacity.rst
index 9b7cbe43b2d1..805f85f330b5 100644
--- a/Documentation/scheduler/sched-capacity.rst
+++ b/Documentation/scheduler/sched-capacity.rst
@@ -284,8 +284,10 @@ whether the system exhibits asymmetric CPU capacities. Should that be the
case:
- The sched_asym_cpucapacity static key will be enabled.
-- The SD_ASYM_CPUCAPACITY flag will be set at the lowest sched_domain level that
- spans all unique CPU capacity values.
+- The SD_ASYM_CPUCAPACITY_FULL flag will be set at the lowest sched_domain
+ level that spans all unique CPU capacity values.
+- The SD_ASYM_CPUCAPACITY flag will be set for any sched_domain that spans
+ CPUs with any range of asymmetry.
The sched_asym_cpucapacity static key is intended to guard sections of code that
cater to asymmetric CPU capacity systems. Do note however that said key is
diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst
index afe02d394402..8fbce5e767d9 100644
--- a/Documentation/scheduler/sched-energy.rst
+++ b/Documentation/scheduler/sched-energy.rst
@@ -328,7 +328,7 @@ section lists these dependencies and provides hints as to how they can be met.
As mentioned in the introduction, EAS is only supported on platforms with
asymmetric CPU topologies for now. This requirement is checked at run-time by
-looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
+looking for the presence of the SD_ASYM_CPUCAPACITY_FULL flag when the scheduling
domains are built.
See Documentation/scheduler/sched-capacity.rst for requirements to be met for this
--
2.17.1
Hi Beata,
On 24/05/21 11:16, Beata Michalska wrote:
> Currently the CPU capacity asymmetry detection, performed through
> asym_cpu_capacity_level, tries to identify the lowest topology level
> at which the highest CPU capacity is being observed, not necessarily
> finding the level at which all possible capacity values are visible
> to all CPUs, which might be bit problematic for some possible/valid
> asymmetric topologies i.e.:
>
> DIE [ ]
> MC [ ][ ]
>
> CPU [0] [1] [2] [3] [4] [5] [6] [7]
> Capacity |.....| |.....| |.....| |.....|
> L M B B
>
> Where:
> arch_scale_cpu_capacity(L) = 512
> arch_scale_cpu_capacity(M) = 871
> arch_scale_cpu_capacity(B) = 1024
>
> In this particular case, the asymmetric topology level will point
> at MC, as all possible CPU masks for that level do cover the CPU
> with the highest capacity. It will work just fine for the first
> cluster, not so much for the second one though (consider the
> find_energy_efficient_cpu which might end up attempting the energy
> aware wake-up for a domain that does not see any asymmetry at all)
>
> Rework the way the capacity asymmetry levels are being detected,
> allowing to point to the lowest topology level (for a given CPU), where
> full set of available CPU capacities is visible to all CPUs within given
> domain. As a result, the per-cpu sd_asym_cpucapacity might differ across
> the domains. This will have an impact on EAS wake-up placement in a way
> that it might see different rage of CPUs to be considered, depending on
> the given current and target CPUs.
>
> Additionally, those levels, where any range of asymmetry (not
> necessarily full) is being detected will get identified as well.
> The selected asymmetric topology level will be denoted by
> SD_ASYM_CPUCAPACITY_FULL sched domain flag whereas the 'sub-levels'
> would receive the already used SD_ASYM_CPUCAPACITY flag. This allows
> maintaining the current behaviour for asymmetric topologies, with
> misfit migration operating correctly on lower levels, if applicable,
> as any asymmetry is enough to trigger the misfit migration.
> The logic there relies on the SD_ASYM_CPUCAPACITY flag and does not
> relate to the full asymmetry level denoted by the sd_asym_cpucapacity
> pointer.
>
> Detecting the CPU capacity asymmetry is being based on a set of
> available CPU capacities for all possible CPUs. This data is being
> generated upon init and updated once CPU topology changes are being
> detected (through arch_update_cpu_topology). As such, any changes
> to identified CPU capacities (like initializing cpufreq) need to be
> explicitly advertised by corresponding archs to trigger rebuilding
> the data.
>
> This patch also removes the additional -dflags- parameter used when
^^^^^^^^^^^^^^^^^^^^^^^
s/^/Also remove/
> building sched domains as the asymmetry flags are now being set
> directly in sd_init.
>
Few nits below, but beyond that:
Tested-by: Valentin Schneider <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
> +static inline int
> +asym_cpu_capacity_classify(struct sched_domain *sd,
> + const struct cpumask *cpu_map)
> +{
> + int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
> + struct asym_cap_data *entry;
> + int asym_cap_count = 0;
> +
> + if (list_is_singular(&asym_cap_list))
> + goto leave;
> +
> + list_for_each_entry(entry, &asym_cap_list, link) {
> + if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> + ++asym_cap_count;
> + } else {
> + /*
> + * CPUs with given capacity might be offline
> + * so make sure this is not the case
> + */
> + if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> + sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> + if (asym_cap_count > 1)
> + break;
> + }
Readability nit: That could be made into an else if ().
> + }
> + }
> + WARN_ON_ONCE(!asym_cap_count);
> +leave:
> + return asym_cap_count > 1 ? sd_asym_flags : 0;
> +}
> +
> +static void asym_cpu_capacity_scan(void)
> +{
> + struct asym_cap_data *entry, *next;
> + int cpu;
> +
> + list_for_each_entry(entry, &asym_cap_list, link)
> + cpumask_clear(entry->cpu_mask);
> +
> + entry = list_first_entry_or_null(&asym_cap_list,
> + struct asym_cap_data, link);
> +
> + for_each_cpu_and(cpu, cpu_possible_mask,
> + housekeeping_cpumask(HK_FLAG_DOMAIN)) {
> + unsigned long capacity = arch_scale_cpu_capacity(cpu);
> +
> + if (!entry || capacity != entry->capacity)
> + entry = asym_cpu_capacity_get_data(capacity);
> + if (entry)
> + __cpumask_set_cpu(cpu, entry->cpu_mask);
That 'if' is only there in case the alloc within the helper failed, which
is a bit of a shame.
You could pass the CPU to that helper function and have it set the right
bit, or you could even forgo the capacity != entry->capacity check here and
let the helper function do it all.
Yes, that means more asym_cap_list iterations, but that's
O(nr_cpus * nr_caps); a topology rebuild is along the lines of
O(nr_cpus² * nr_topology_levels), so not such a big deal comparatively.
> + }
> +
> + list_for_each_entry_safe(entry, next, &asym_cap_list, link) {
> + if (cpumask_empty(entry->cpu_mask)) {
> + list_del(&entry->link);
> + kfree(entry);
> + }
> + }
> +}
> +
On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
> Hi Beata,
>
> On 24/05/21 11:16, Beata Michalska wrote:
> > Currently the CPU capacity asymmetry detection, performed through
> > asym_cpu_capacity_level, tries to identify the lowest topology level
> > at which the highest CPU capacity is being observed, not necessarily
> > finding the level at which all possible capacity values are visible
> > to all CPUs, which might be bit problematic for some possible/valid
> > asymmetric topologies i.e.:
> >
> > DIE [ ]
> > MC [ ][ ]
> >
> > CPU [0] [1] [2] [3] [4] [5] [6] [7]
> > Capacity |.....| |.....| |.....| |.....|
> > L M B B
> >
> > Where:
> > arch_scale_cpu_capacity(L) = 512
> > arch_scale_cpu_capacity(M) = 871
> > arch_scale_cpu_capacity(B) = 1024
> >
> > In this particular case, the asymmetric topology level will point
> > at MC, as all possible CPU masks for that level do cover the CPU
> > with the highest capacity. It will work just fine for the first
> > cluster, not so much for the second one though (consider the
> > find_energy_efficient_cpu which might end up attempting the energy
> > aware wake-up for a domain that does not see any asymmetry at all)
> >
> > Rework the way the capacity asymmetry levels are being detected,
> > allowing to point to the lowest topology level (for a given CPU), where
> > full set of available CPU capacities is visible to all CPUs within given
> > domain. As a result, the per-cpu sd_asym_cpucapacity might differ across
> > the domains. This will have an impact on EAS wake-up placement in a way
> > that it might see different rage of CPUs to be considered, depending on
> > the given current and target CPUs.
> >
> > Additionally, those levels, where any range of asymmetry (not
> > necessarily full) is being detected will get identified as well.
> > The selected asymmetric topology level will be denoted by
> > SD_ASYM_CPUCAPACITY_FULL sched domain flag whereas the 'sub-levels'
> > would receive the already used SD_ASYM_CPUCAPACITY flag. This allows
> > maintaining the current behaviour for asymmetric topologies, with
> > misfit migration operating correctly on lower levels, if applicable,
> > as any asymmetry is enough to trigger the misfit migration.
> > The logic there relies on the SD_ASYM_CPUCAPACITY flag and does not
> > relate to the full asymmetry level denoted by the sd_asym_cpucapacity
> > pointer.
> >
> > Detecting the CPU capacity asymmetry is being based on a set of
> > available CPU capacities for all possible CPUs. This data is being
> > generated upon init and updated once CPU topology changes are being
> > detected (through arch_update_cpu_topology). As such, any changes
> > to identified CPU capacities (like initializing cpufreq) need to be
> > explicitly advertised by corresponding archs to trigger rebuilding
> > the data.
> >
> > This patch also removes the additional -dflags- parameter used when
> ^^^^^^^^^^^^^^^^^^^^^^^
> s/^/Also remove/
I would kind of ... disagree.
All the commit msg is constructed using passive structure, the suggestion
would actually break that. And it does 'sound' bit imperative but I guess
that is subjective. I'd rather stay with impersonal structure (which is
applied through out the whole patchset).
>
> > building sched domains as the asymmetry flags are now being set
> > directly in sd_init.
> >
>
> Few nits below, but beyond that:
>
> Tested-by: Valentin Schneider <[email protected]>
> Reviewed-by: Valentin Schneider <[email protected]>
>
Thanks a lot for the review and testing!
> > +static inline int
> > +asym_cpu_capacity_classify(struct sched_domain *sd,
> > + const struct cpumask *cpu_map)
> > +{
> > + int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
> > + struct asym_cap_data *entry;
> > + int asym_cap_count = 0;
> > +
> > + if (list_is_singular(&asym_cap_list))
> > + goto leave;
> > +
> > + list_for_each_entry(entry, &asym_cap_list, link) {
> > + if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> > + ++asym_cap_count;
> > + } else {
> > + /*
> > + * CPUs with given capacity might be offline
> > + * so make sure this is not the case
> > + */
> > + if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> > + sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> > + if (asym_cap_count > 1)
> > + break;
> > + }
>
> Readability nit: That could be made into an else if ().
It could but then this way the -comment- gets more exposed.
But that might be my personal perception so I can change that.
>
>
> > + }
> > + }
> > + WARN_ON_ONCE(!asym_cap_count);
> > +leave:
> > + return asym_cap_count > 1 ? sd_asym_flags : 0;
> > +}
> > +
>
> > +static void asym_cpu_capacity_scan(void)
> > +{
> > + struct asym_cap_data *entry, *next;
> > + int cpu;
> > +
> > + list_for_each_entry(entry, &asym_cap_list, link)
> > + cpumask_clear(entry->cpu_mask);
> > +
> > + entry = list_first_entry_or_null(&asym_cap_list,
> > + struct asym_cap_data, link);
> > +
> > + for_each_cpu_and(cpu, cpu_possible_mask,
> > + housekeeping_cpumask(HK_FLAG_DOMAIN)) {
> > + unsigned long capacity = arch_scale_cpu_capacity(cpu);
> > +
> > + if (!entry || capacity != entry->capacity)
> > + entry = asym_cpu_capacity_get_data(capacity);
> > + if (entry)
> > + __cpumask_set_cpu(cpu, entry->cpu_mask);
>
> That 'if' is only there in case the alloc within the helper failed, which
> is a bit of a shame.
>
> You could pass the CPU to that helper function and have it set the right
> bit, or you could even forgo the capacity != entry->capacity check here and
> let the helper function do it all.
>
> Yes, that means more asym_cap_list iterations, but that's
> O(nr_cpus * nr_caps); a topology rebuild is along the lines of
> O(nr_cpus? * nr_topology_levels), so not such a big deal comparatively.
>
I could drop that check and make the helper function update the CPUs mask
(along with dropping the initial grabbing of the first entry)
+
switching to list_for_each_entry_reverse which would result in less
iterations for most (if not all) of the use cases.
---
BR
B
> > + }
> > +
> > + list_for_each_entry_safe(entry, next, &asym_cap_list, link) {
> > + if (cpumask_empty(entry->cpu_mask)) {
> > + list_del(&entry->link);
> > + kfree(entry);
> > + }
> > + }
> > +}
> > +
On Mon, May 24, 2021 at 11:55:08PM +0100, Beata Michalska wrote:
> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
> > Hi Beata,
> >
> > On 24/05/21 11:16, Beata Michalska wrote:
> > > Currently the CPU capacity asymmetry detection, performed through
> > > asym_cpu_capacity_level, tries to identify the lowest topology level
> > > at which the highest CPU capacity is being observed, not necessarily
> > > finding the level at which all possible capacity values are visible
> > > to all CPUs, which might be bit problematic for some possible/valid
> > > asymmetric topologies i.e.:
> > >
> > > DIE [ ]
> > > MC [ ][ ]
> > >
> > > CPU [0] [1] [2] [3] [4] [5] [6] [7]
> > > Capacity |.....| |.....| |.....| |.....|
> > > L M B B
> > >
> > > Where:
> > > arch_scale_cpu_capacity(L) = 512
> > > arch_scale_cpu_capacity(M) = 871
> > > arch_scale_cpu_capacity(B) = 1024
> > >
> > > In this particular case, the asymmetric topology level will point
> > > at MC, as all possible CPU masks for that level do cover the CPU
> > > with the highest capacity. It will work just fine for the first
> > > cluster, not so much for the second one though (consider the
> > > find_energy_efficient_cpu which might end up attempting the energy
> > > aware wake-up for a domain that does not see any asymmetry at all)
> > >
> > > Rework the way the capacity asymmetry levels are being detected,
> > > allowing to point to the lowest topology level (for a given CPU), where
> > > full set of available CPU capacities is visible to all CPUs within given
> > > domain. As a result, the per-cpu sd_asym_cpucapacity might differ across
> > > the domains. This will have an impact on EAS wake-up placement in a way
> > > that it might see different rage of CPUs to be considered, depending on
> > > the given current and target CPUs.
> > >
> > > Additionally, those levels, where any range of asymmetry (not
> > > necessarily full) is being detected will get identified as well.
> > > The selected asymmetric topology level will be denoted by
> > > SD_ASYM_CPUCAPACITY_FULL sched domain flag whereas the 'sub-levels'
> > > would receive the already used SD_ASYM_CPUCAPACITY flag. This allows
> > > maintaining the current behaviour for asymmetric topologies, with
> > > misfit migration operating correctly on lower levels, if applicable,
> > > as any asymmetry is enough to trigger the misfit migration.
> > > The logic there relies on the SD_ASYM_CPUCAPACITY flag and does not
> > > relate to the full asymmetry level denoted by the sd_asym_cpucapacity
> > > pointer.
> > >
> > > Detecting the CPU capacity asymmetry is being based on a set of
> > > available CPU capacities for all possible CPUs. This data is being
> > > generated upon init and updated once CPU topology changes are being
> > > detected (through arch_update_cpu_topology). As such, any changes
> > > to identified CPU capacities (like initializing cpufreq) need to be
> > > explicitly advertised by corresponding archs to trigger rebuilding
> > > the data.
> > >
> > > This patch also removes the additional -dflags- parameter used when
> > ^^^^^^^^^^^^^^^^^^^^^^^
> > s/^/Also remove/
> I would kind of ... disagree.
> All the commit msg is constructed using passive structure, the suggestion
> would actually break that. And it does 'sound' bit imperative but I guess
> that is subjective. I'd rather stay with impersonal structure (which is
> applied through out the whole patchset).
> >
> > > building sched domains as the asymmetry flags are now being set
> > > directly in sd_init.
> > >
> >
> > Few nits below, but beyond that:
> >
> > Tested-by: Valentin Schneider <[email protected]>
> > Reviewed-by: Valentin Schneider <[email protected]>
> >
> Thanks a lot for the review and testing!
>
> > > +static inline int
> > > +asym_cpu_capacity_classify(struct sched_domain *sd,
> > > + const struct cpumask *cpu_map)
> > > +{
> > > + int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
> > > + struct asym_cap_data *entry;
> > > + int asym_cap_count = 0;
> > > +
> > > + if (list_is_singular(&asym_cap_list))
> > > + goto leave;
> > > +
> > > + list_for_each_entry(entry, &asym_cap_list, link) {
> > > + if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> > > + ++asym_cap_count;
> > > + } else {
> > > + /*
> > > + * CPUs with given capacity might be offline
> > > + * so make sure this is not the case
> > > + */
> > > + if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> > > + sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> > > + if (asym_cap_count > 1)
> > > + break;
> > > + }
> >
> > Readability nit: That could be made into an else if ().
> It could but then this way the -comment- gets more exposed.
> But that might be my personal perception so I can change that.
> >
> >
> > > + }
> > > + }
> > > + WARN_ON_ONCE(!asym_cap_count);
> > > +leave:
> > > + return asym_cap_count > 1 ? sd_asym_flags : 0;
> > > +}
> > > +
> >
> > > +static void asym_cpu_capacity_scan(void)
> > > +{
> > > + struct asym_cap_data *entry, *next;
> > > + int cpu;
> > > +
> > > + list_for_each_entry(entry, &asym_cap_list, link)
> > > + cpumask_clear(entry->cpu_mask);
> > > +
> > > + entry = list_first_entry_or_null(&asym_cap_list,
> > > + struct asym_cap_data, link);
> > > +
> > > + for_each_cpu_and(cpu, cpu_possible_mask,
> > > + housekeeping_cpumask(HK_FLAG_DOMAIN)) {
> > > + unsigned long capacity = arch_scale_cpu_capacity(cpu);
> > > +
> > > + if (!entry || capacity != entry->capacity)
> > > + entry = asym_cpu_capacity_get_data(capacity);
> > > + if (entry)
> > > + __cpumask_set_cpu(cpu, entry->cpu_mask);
> >
> > That 'if' is only there in case the alloc within the helper failed, which
> > is a bit of a shame.
> >
> > You could pass the CPU to that helper function and have it set the right
> > bit, or you could even forgo the capacity != entry->capacity check here and
> > let the helper function do it all.
> >
> > Yes, that means more asym_cap_list iterations, but that's
> > O(nr_cpus * nr_caps); a topology rebuild is along the lines of
> > O(nr_cpus? * nr_topology_levels), so not such a big deal comparatively.
> >
> I could drop that check and make the helper function update the CPUs mask
> (along with dropping the initial grabbing of the first entry)
> +
> switching to list_for_each_entry_reverse which would result in less
> iterations for most (if not all) of the use cases.
>
Ignore the 'reverse' idea - the items are already prepended so regular
iteration should pick the last item added.
---
BR
B.
>
> ---
> BR
> B
> > > + }
> > > +
> > > + list_for_each_entry_safe(entry, next, &asym_cap_list, link) {
> > > + if (cpumask_empty(entry->cpu_mask)) {
> > > + list_del(&entry->link);
> > > + kfree(entry);
> > > + }
> > > + }
> > > +}
> > > +
On 24/05/2021 12:16, Beata Michalska wrote:
[...]
> Rework the way the capacity asymmetry levels are being detected,
> allowing to point to the lowest topology level (for a given CPU), where
> full set of available CPU capacities is visible to all CPUs within given
> domain. As a result, the per-cpu sd_asym_cpucapacity might differ across
> the domains. This will have an impact on EAS wake-up placement in a way
> that it might see different rage of CPUs to be considered, depending on
s/rage/range ;-)
[...]
> @@ -1266,6 +1266,112 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
> update_group_capacity(sd, cpu);
> }
>
> +/**
> + * Asymmetric CPU capacity bits
> + */
> +struct asym_cap_data {
> + struct list_head link;
> + unsigned long capacity;
> + struct cpumask *cpu_mask;
Not sure if this has been discussed already but shouldn't the flexible
array members` approach known from struct sched_group, struct
sched_domain or struct em_perf_domain be used here?
IIRC the last time this has been discussed in this thread:
https://lkml.kernel.org/r/[email protected]
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 0de6eef91bc8..03e492e91bd7 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1271,8 +1271,8 @@ static void init_sched_groups_capacity(int cpu,
struct sched_domain *sd)
*/
struct asym_cap_data {
struct list_head link;
- unsigned long capacity;
- struct cpumask *cpu_mask;
+ unsigned long capacity;
+ unsigned long cpumask[];
};
/*
@@ -1299,14 +1299,14 @@ asym_cpu_capacity_classify(struct sched_domain *sd,
goto leave;
list_for_each_entry(entry, &asym_cap_list, link) {
- if (cpumask_intersects(sched_domain_span(sd),
entry->cpu_mask)) {
+ if (cpumask_intersects(sched_domain_span(sd),
to_cpumask(entry->cpumask))) {
++asym_cap_count;
} else {
/*
* CPUs with given capacity might be offline
* so make sure this is not the case
*/
- if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
+ if
(cpumask_intersects(to_cpumask(entry->cpumask), cpu_map)) {
sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
if (asym_cap_count > 1)
break;
@@ -1332,7 +1332,6 @@ asym_cpu_capacity_get_data(unsigned long capacity)
if (WARN_ONCE(!entry, "Failed to allocate memory for asymmetry
data\n"))
goto done;
entry->capacity = capacity;
- entry->cpu_mask = (struct cpumask *)((char *)entry +
sizeof(*entry));
list_add(&entry->link, &asym_cap_list);
done:
return entry;
@@ -1349,7 +1348,7 @@ static void asym_cpu_capacity_scan(void)
int cpu;
list_for_each_entry(entry, &asym_cap_list, link)
- cpumask_clear(entry->cpu_mask);
+ cpumask_clear(to_cpumask(entry->cpumask));
entry = list_first_entry_or_null(&asym_cap_list,
struct asym_cap_data, link);
@@ -1361,11 +1360,11 @@ static void asym_cpu_capacity_scan(void)
if (!entry || capacity != entry->capacity)
entry = asym_cpu_capacity_get_data(capacity);
if (entry)
- __cpumask_set_cpu(cpu, entry->cpu_mask);
+ __cpumask_set_cpu(cpu, to_cpumask(entry->cpumask));
}
list_for_each_entry_safe(entry, next, &asym_cap_list, link) {
- if (cpumask_empty(entry->cpu_mask)) {
+ if (cpumask_empty(to_cpumask(entry->cpumask))) {
list_del(&entry->link);
kfree(entry);
}
[...]
On Tue, May 25, 2021 at 10:25:36AM +0200, Dietmar Eggemann wrote:
> On 24/05/2021 12:16, Beata Michalska wrote:
>
> [...]
>
> > Rework the way the capacity asymmetry levels are being detected,
> > allowing to point to the lowest topology level (for a given CPU), where
> > full set of available CPU capacities is visible to all CPUs within given
> > domain. As a result, the per-cpu sd_asym_cpucapacity might differ across
> > the domains. This will have an impact on EAS wake-up placement in a way
> > that it might see different rage of CPUs to be considered, depending on
>
> s/rage/range ;-)
Right ..... :)
>
> [...]
>
> > @@ -1266,6 +1266,112 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
> > update_group_capacity(sd, cpu);
> > }
> >
> > +/**
> > + * Asymmetric CPU capacity bits
> > + */
> > +struct asym_cap_data {
> > + struct list_head link;
> > + unsigned long capacity;
> > + struct cpumask *cpu_mask;
>
> Not sure if this has been discussed already but shouldn't the flexible
> array members` approach known from struct sched_group, struct
> sched_domain or struct em_perf_domain be used here?
> IIRC the last time this has been discussed in this thread:
> https://lkml.kernel.org/r/[email protected]
>
If I got right the discussion you have pointed to, it was about using
cpumask_var_t which is not the case here. I do not mind moving the code
to use the array but I am not sure if this changes much. Looking at the
code changes to support that (to_cpumask namely) it was introduced for
cases where cpumask_var_t was not appropriate, which again isn't the case
here.
---
BR
B.
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 0de6eef91bc8..03e492e91bd7 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1271,8 +1271,8 @@ static void init_sched_groups_capacity(int cpu,
> struct sched_domain *sd)
> */
> struct asym_cap_data {
> struct list_head link;
> - unsigned long capacity;
> - struct cpumask *cpu_mask;
> + unsigned long capacity;
> + unsigned long cpumask[];
> };
>
> /*
> @@ -1299,14 +1299,14 @@ asym_cpu_capacity_classify(struct sched_domain *sd,
> goto leave;
>
> list_for_each_entry(entry, &asym_cap_list, link) {
> - if (cpumask_intersects(sched_domain_span(sd),
> entry->cpu_mask)) {
> + if (cpumask_intersects(sched_domain_span(sd),
> to_cpumask(entry->cpumask))) {
> ++asym_cap_count;
> } else {
> /*
> * CPUs with given capacity might be offline
> * so make sure this is not the case
> */
> - if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> + if
> (cpumask_intersects(to_cpumask(entry->cpumask), cpu_map)) {
> sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> if (asym_cap_count > 1)
> break;
> @@ -1332,7 +1332,6 @@ asym_cpu_capacity_get_data(unsigned long capacity)
> if (WARN_ONCE(!entry, "Failed to allocate memory for asymmetry
> data\n"))
> goto done;
> entry->capacity = capacity;
> - entry->cpu_mask = (struct cpumask *)((char *)entry +
> sizeof(*entry));
> list_add(&entry->link, &asym_cap_list);
> done:
> return entry;
> @@ -1349,7 +1348,7 @@ static void asym_cpu_capacity_scan(void)
> int cpu;
>
> list_for_each_entry(entry, &asym_cap_list, link)
> - cpumask_clear(entry->cpu_mask);
> + cpumask_clear(to_cpumask(entry->cpumask));
>
> entry = list_first_entry_or_null(&asym_cap_list,
> struct asym_cap_data, link);
> @@ -1361,11 +1360,11 @@ static void asym_cpu_capacity_scan(void)
> if (!entry || capacity != entry->capacity)
> entry = asym_cpu_capacity_get_data(capacity);
> if (entry)
> - __cpumask_set_cpu(cpu, entry->cpu_mask);
> + __cpumask_set_cpu(cpu, to_cpumask(entry->cpumask));
> }
>
> list_for_each_entry_safe(entry, next, &asym_cap_list, link) {
> - if (cpumask_empty(entry->cpu_mask)) {
> + if (cpumask_empty(to_cpumask(entry->cpumask))) {
> list_del(&entry->link);
> kfree(entry);
> }
>
> [...]
On 24/05/21 23:55, Beata Michalska wrote:
> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
>> On 24/05/21 11:16, Beata Michalska wrote:
>> > This patch also removes the additional -dflags- parameter used when
>> ^^^^^^^^^^^^^^^^^^^^^^^
>> s/^/Also remove/
> I would kind of ... disagree.
> All the commit msg is constructed using passive structure, the suggestion
> would actually break that. And it does 'sound' bit imperative but I guess
> that is subjective. I'd rather stay with impersonal structure (which is
> applied through out the whole patchset).
It's mainly about the 'This patch' formulation, some take exception to that :-)
>>
>> > building sched domains as the asymmetry flags are now being set
>> > directly in sd_init.
>> >
>>
>> Few nits below, but beyond that:
>>
>> Tested-by: Valentin Schneider <[email protected]>
>> Reviewed-by: Valentin Schneider <[email protected]>
>>
> Thanks a lot for the review and testing!
>
>> > +static inline int
>> > +asym_cpu_capacity_classify(struct sched_domain *sd,
>> > + const struct cpumask *cpu_map)
>> > +{
>> > + int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
>> > + struct asym_cap_data *entry;
>> > + int asym_cap_count = 0;
>> > +
>> > + if (list_is_singular(&asym_cap_list))
>> > + goto leave;
>> > +
>> > + list_for_each_entry(entry, &asym_cap_list, link) {
>> > + if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
>> > + ++asym_cap_count;
>> > + } else {
>> > + /*
>> > + * CPUs with given capacity might be offline
>> > + * so make sure this is not the case
>> > + */
>> > + if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
>> > + sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
>> > + if (asym_cap_count > 1)
>> > + break;
>> > + }
>>
>> Readability nit: That could be made into an else if ().
> It could but then this way the -comment- gets more exposed.
> But that might be my personal perception so I can change that.
As always those are quite subjective! Methink something like this would
still draw attention to the offline case:
/*
* Count how many unique capacities this domain covers. If a
* capacity isn't covered, we need to check if any CPU with
* that capacity is actually online, otherwise it can be
* ignored.
*/
if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
++asym_cap_count;
} else if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
if (asym_cap_count > 1)
break;
}
On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
> On 24/05/21 23:55, Beata Michalska wrote:
> > On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
> >> On 24/05/21 11:16, Beata Michalska wrote:
> >> > This patch also removes the additional -dflags- parameter used when
> >> ^^^^^^^^^^^^^^^^^^^^^^^
> >> s/^/Also remove/
> > I would kind of ... disagree.
> > All the commit msg is constructed using passive structure, the suggestion
> > would actually break that. And it does 'sound' bit imperative but I guess
> > that is subjective. I'd rather stay with impersonal structure (which is
> > applied through out the whole patchset).
>
> It's mainly about the 'This patch' formulation, some take exception to that :-)
>
Will rephrase
> >>
> >> > building sched domains as the asymmetry flags are now being set
> >> > directly in sd_init.
> >> >
> >>
> >> Few nits below, but beyond that:
> >>
> >> Tested-by: Valentin Schneider <[email protected]>
> >> Reviewed-by: Valentin Schneider <[email protected]>
> >>
> > Thanks a lot for the review and testing!
> >
> >> > +static inline int
> >> > +asym_cpu_capacity_classify(struct sched_domain *sd,
> >> > + const struct cpumask *cpu_map)
> >> > +{
> >> > + int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
> >> > + struct asym_cap_data *entry;
> >> > + int asym_cap_count = 0;
> >> > +
> >> > + if (list_is_singular(&asym_cap_list))
> >> > + goto leave;
> >> > +
> >> > + list_for_each_entry(entry, &asym_cap_list, link) {
> >> > + if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> >> > + ++asym_cap_count;
> >> > + } else {
> >> > + /*
> >> > + * CPUs with given capacity might be offline
> >> > + * so make sure this is not the case
> >> > + */
> >> > + if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> >> > + sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> >> > + if (asym_cap_count > 1)
> >> > + break;
> >> > + }
> >>
> >> Readability nit: That could be made into an else if ().
> > It could but then this way the -comment- gets more exposed.
> > But that might be my personal perception so I can change that.
>
> As always those are quite subjective! Methink something like this would
> still draw attention to the offline case:
>
> /*
> * Count how many unique capacities this domain covers. If a
> * capacity isn't covered, we need to check if any CPU with
> * that capacity is actually online, otherwise it can be
> * ignored.
> */
> if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> ++asym_cap_count;
> } else if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> if (asym_cap_count > 1)
> break;
> }
Noted.
Will wait for some more comments before sending out 'polished' version.
---
BR
B.
On 25/05/2021 11:30, Beata Michalska wrote:
> On Tue, May 25, 2021 at 10:25:36AM +0200, Dietmar Eggemann wrote:
>> On 24/05/2021 12:16, Beata Michalska wrote:
[...]
>>> @@ -1266,6 +1266,112 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
>>> update_group_capacity(sd, cpu);
>>> }
>>>
>>> +/**
>>> + * Asymmetric CPU capacity bits
>>> + */
>>> +struct asym_cap_data {
>>> + struct list_head link;
>>> + unsigned long capacity;
>>> + struct cpumask *cpu_mask;
>>
>> Not sure if this has been discussed already but shouldn't the flexible
>> array members` approach known from struct sched_group, struct
>> sched_domain or struct em_perf_domain be used here?
>> IIRC the last time this has been discussed in this thread:
>> https://lkml.kernel.org/r/[email protected]
>>
> If I got right the discussion you have pointed to, it was about using
> cpumask_var_t which is not the case here. I do not mind moving the code
> to use the array but I am not sure if this changes much. Looking at the
> code changes to support that (to_cpumask namely) it was introduced for
> cases where cpumask_var_t was not appropriate, which again isn't the case
> here.
Yeah, it was more about using `flexible array members` or allocating the
cpumask separately.
Looks like you're using some kind of a mixed approach:
(1) struct asym_cap_data {
...
struct cpumask *cpu_mask;
(2) entry = kzalloc(sizeof(*entry) + cpumask_size(), GFP_KERNEL);
(3) entry->cpu_mask = (struct cpumask *)((char *)entry +
sizeof(*entry));
(4) cpumask_intersects(foo, entry->cpu_mask)
E.g. struct em_perf_domain has
(1) struct em_perf_domain {
...
unsigned long cpus[];
(2) like yours
(3) is not needed.
(4) cpumask_copy(em_span_cpus(pd), foo)
with #define em_span_cpus(em) (to_cpumask((em)->cpus))
IMHO, it's better to keep this approach aligned between the different
data structures.
On Tue, May 25, 2021 at 01:59:30PM +0200, Dietmar Eggemann wrote:
> On 25/05/2021 11:30, Beata Michalska wrote:
> > On Tue, May 25, 2021 at 10:25:36AM +0200, Dietmar Eggemann wrote:
> >> On 24/05/2021 12:16, Beata Michalska wrote:
>
> [...]
>
> >>> @@ -1266,6 +1266,112 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
> >>> update_group_capacity(sd, cpu);
> >>> }
> >>>
> >>> +/**
> >>> + * Asymmetric CPU capacity bits
> >>> + */
> >>> +struct asym_cap_data {
> >>> + struct list_head link;
> >>> + unsigned long capacity;
> >>> + struct cpumask *cpu_mask;
> >>
> >> Not sure if this has been discussed already but shouldn't the flexible
> >> array members` approach known from struct sched_group, struct
> >> sched_domain or struct em_perf_domain be used here?
> >> IIRC the last time this has been discussed in this thread:
> >> https://lkml.kernel.org/r/[email protected]
> >>
> > If I got right the discussion you have pointed to, it was about using
> > cpumask_var_t which is not the case here. I do not mind moving the code
> > to use the array but I am not sure if this changes much. Looking at the
> > code changes to support that (to_cpumask namely) it was introduced for
> > cases where cpumask_var_t was not appropriate, which again isn't the case
> > here.
>
> Yeah, it was more about using `flexible array members` or allocating the
> cpumask separately.
>
> Looks like you're using some kind of a mixed approach:
>
> (1) struct asym_cap_data {
> ...
> struct cpumask *cpu_mask;
>
> (2) entry = kzalloc(sizeof(*entry) + cpumask_size(), GFP_KERNEL);
>
> (3) entry->cpu_mask = (struct cpumask *)((char *)entry +
> sizeof(*entry));
>
> (4) cpumask_intersects(foo, entry->cpu_mask)
>
>
> E.g. struct em_perf_domain has
>
> (1) struct em_perf_domain {
> ...
> unsigned long cpus[];
>
> (2) like yours
>
> (3) is not needed.
>
> (4) cpumask_copy(em_span_cpus(pd), foo)
>
> with #define em_span_cpus(em) (to_cpumask((em)->cpus))
>
> IMHO, it's better to keep this approach aligned between the different
> data structures.
I would actually go the other way round as it seems more 'clean'
that way and it does not need the conversion but I don't mind playing along.
---
BR
B.
On 25/05/2021 12:29, Beata Michalska wrote:
> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
>> On 24/05/21 23:55, Beata Michalska wrote:
>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
>>>> On 24/05/21 11:16, Beata Michalska wrote:
[...]
>>>>> +static inline int
>>>>> +asym_cpu_capacity_classify(struct sched_domain *sd,
>>>>> + const struct cpumask *cpu_map)
>>>>> +{
>>>>> + int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
>>>>> + struct asym_cap_data *entry;
>>>>> + int asym_cap_count = 0;
>>>>> +
>>>>> + if (list_is_singular(&asym_cap_list))
>>>>> + goto leave;
>>>>> +
>>>>> + list_for_each_entry(entry, &asym_cap_list, link) {
>>>>> + if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
>>>>> + ++asym_cap_count;
>>>>> + } else {
>>>>> + /*
>>>>> + * CPUs with given capacity might be offline
>>>>> + * so make sure this is not the case
>>>>> + */
>>>>> + if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
>>>>> + sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
>>>>> + if (asym_cap_count > 1)
>>>>> + break;
>>>>> + }
>>>>
>>>> Readability nit: That could be made into an else if ().
>>> It could but then this way the -comment- gets more exposed.
>>> But that might be my personal perception so I can change that.
>>
>> As always those are quite subjective! Methink something like this would
>> still draw attention to the offline case:
>>
>> /*
>> * Count how many unique capacities this domain covers. If a
>> * capacity isn't covered, we need to check if any CPU with
>> * that capacity is actually online, otherwise it can be
>> * ignored.
>> */
>> if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
>> ++asym_cap_count;
>> } else if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
>> sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
>> if (asym_cap_count > 1)
>> break;
>> }
> Noted.
> Will wait for some more comments before sending out 'polished' version.
For me asym_cpu_capacity_classify() is pretty hard to digest ;-) But I
wasn't able to break it. It also performs correctly on (non-existing SMT)
layer (with sd span eq. single CPU).
Something like this (separating asym_cap_list iteration and flags
construction would be easier for me. But like already said here,
it's subjective.
I left the two optimizations (list_is_singular(), break on asym_cap_count
> 1) out for now. asym_cap_list shouldn't have > 4 entries (;-)).
static inline int
asym_cpu_capacity_classify(struct sched_domain *sd,
const struct cpumask *cpu_map)
{
int sd_span_match = 0, cpu_map_match = 0, flags = 0;
struct asym_cap_data *entry;
list_for_each_entry(entry, &asym_cap_list, link) {
if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask))
++sd_span_match;
else if (cpumask_intersects(cpu_map, entry->cpu_mask))
++cpu_map_match;
}
WARN_ON_ONCE(!sd_span_match);
if (sd_span_match > 1) {
flags |= SD_ASYM_CPUCAPACITY;
if (!cpu_map_match)
flags |= SD_ASYM_CPUCAPACITY_FULL;
}
return flags;
}
BTW, how would this mechanism behave on a system with SMT and asymmetric CPU
capacity? Something EAS wouldn't allow but I guess asym_cap_list will be
constructed and the SD_ASYM_CPUCAPACITY_XXX flags will be set?
On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
> On 25/05/2021 12:29, Beata Michalska wrote:
> > On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
> >> On 24/05/21 23:55, Beata Michalska wrote:
> >>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
> >>>> On 24/05/21 11:16, Beata Michalska wrote:
>
> [...]
>
> >>>>> +static inline int
> >>>>> +asym_cpu_capacity_classify(struct sched_domain *sd,
> >>>>> + const struct cpumask *cpu_map)
> >>>>> +{
> >>>>> + int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
> >>>>> + struct asym_cap_data *entry;
> >>>>> + int asym_cap_count = 0;
> >>>>> +
> >>>>> + if (list_is_singular(&asym_cap_list))
> >>>>> + goto leave;
> >>>>> +
> >>>>> + list_for_each_entry(entry, &asym_cap_list, link) {
> >>>>> + if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> >>>>> + ++asym_cap_count;
> >>>>> + } else {
> >>>>> + /*
> >>>>> + * CPUs with given capacity might be offline
> >>>>> + * so make sure this is not the case
> >>>>> + */
> >>>>> + if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> >>>>> + sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> >>>>> + if (asym_cap_count > 1)
> >>>>> + break;
> >>>>> + }
> >>>>
> >>>> Readability nit: That could be made into an else if ().
> >>> It could but then this way the -comment- gets more exposed.
> >>> But that might be my personal perception so I can change that.
> >>
> >> As always those are quite subjective! Methink something like this would
> >> still draw attention to the offline case:
> >>
> >> /*
> >> * Count how many unique capacities this domain covers. If a
> >> * capacity isn't covered, we need to check if any CPU with
> >> * that capacity is actually online, otherwise it can be
> >> * ignored.
> >> */
> >> if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> >> ++asym_cap_count;
> >> } else if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> >> sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> >> if (asym_cap_count > 1)
> >> break;
> >> }
> > Noted.
> > Will wait for some more comments before sending out 'polished' version.
>
> For me asym_cpu_capacity_classify() is pretty hard to digest ;-) But I
> wasn't able to break it. It also performs correctly on (non-existing SMT)
> layer (with sd span eq. single CPU).
>
> Something like this (separating asym_cap_list iteration and flags
> construction would be easier for me. But like already said here,
> it's subjective.
> I left the two optimizations (list_is_singular(), break on asym_cap_count
> > 1) out for now. asym_cap_list shouldn't have > 4 entries (;-)).
>
> static inline int
> asym_cpu_capacity_classify(struct sched_domain *sd,
> const struct cpumask *cpu_map)
> {
> int sd_span_match = 0, cpu_map_match = 0, flags = 0;
> struct asym_cap_data *entry;
>
> list_for_each_entry(entry, &asym_cap_list, link) {
> if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask))
> ++sd_span_match;
> else if (cpumask_intersects(cpu_map, entry->cpu_mask))
> ++cpu_map_match;
> }
>
> WARN_ON_ONCE(!sd_span_match);
>
> if (sd_span_match > 1) {
> flags |= SD_ASYM_CPUCAPACITY;
> if (!cpu_map_match)
> flags |= SD_ASYM_CPUCAPACITY_FULL;
> }
>
> return flags;
> }
So I planned to drop the list_is_singular check as it is needless really.
Otherwise, I am not really convinced by the suggestion. I could add comments
around current version to make it more ..... 'digestible' but I'd rather
stay with it as it seems more compact to me (subjective).
>
> BTW, how would this mechanism behave on a system with SMT and asymmetric CPU
> capacity? Something EAS wouldn't allow but I guess asym_cap_list will be
> constructed and the SD_ASYM_CPUCAPACITY_XXX flags will be set?
Yes, the list would get created and flags set. I do not think there is
a difference with current approach (?). So EAS would be disabled (it only cares
about SD_ASYM_CPUCAPACITY_FULL flag) but the misift might still kick in.
---
BR
B.
On Wed, May 26, 2021 at 01:15:46PM +0100, Beata Michalska wrote:
> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
> > On 25/05/2021 12:29, Beata Michalska wrote:
> > > On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
> > >> On 24/05/21 23:55, Beata Michalska wrote:
> > >>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
> > >>>> On 24/05/21 11:16, Beata Michalska wrote:
> >
> > [...]
> >
> > >>>>> +static inline int
> > >>>>> +asym_cpu_capacity_classify(struct sched_domain *sd,
> > >>>>> + const struct cpumask *cpu_map)
> > >>>>> +{
> > >>>>> + int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
> > >>>>> + struct asym_cap_data *entry;
> > >>>>> + int asym_cap_count = 0;
> > >>>>> +
> > >>>>> + if (list_is_singular(&asym_cap_list))
> > >>>>> + goto leave;
> > >>>>> +
> > >>>>> + list_for_each_entry(entry, &asym_cap_list, link) {
> > >>>>> + if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> > >>>>> + ++asym_cap_count;
> > >>>>> + } else {
> > >>>>> + /*
> > >>>>> + * CPUs with given capacity might be offline
> > >>>>> + * so make sure this is not the case
> > >>>>> + */
> > >>>>> + if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> > >>>>> + sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> > >>>>> + if (asym_cap_count > 1)
> > >>>>> + break;
> > >>>>> + }
> > >>>>
> > >>>> Readability nit: That could be made into an else if ().
> > >>> It could but then this way the -comment- gets more exposed.
> > >>> But that might be my personal perception so I can change that.
> > >>
> > >> As always those are quite subjective! Methink something like this would
> > >> still draw attention to the offline case:
> > >>
> > >> /*
> > >> * Count how many unique capacities this domain covers. If a
> > >> * capacity isn't covered, we need to check if any CPU with
> > >> * that capacity is actually online, otherwise it can be
> > >> * ignored.
> > >> */
> > >> if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> > >> ++asym_cap_count;
> > >> } else if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> > >> sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> > >> if (asym_cap_count > 1)
> > >> break;
> > >> }
> > > Noted.
> > > Will wait for some more comments before sending out 'polished' version.
> >
> > For me asym_cpu_capacity_classify() is pretty hard to digest ;-) But I
> > wasn't able to break it. It also performs correctly on (non-existing SMT)
> > layer (with sd span eq. single CPU).
> >
> > Something like this (separating asym_cap_list iteration and flags
> > construction would be easier for me. But like already said here,
> > it's subjective.
> > I left the two optimizations (list_is_singular(), break on asym_cap_count
> > > 1) out for now. asym_cap_list shouldn't have > 4 entries (;-)).
> >
> > static inline int
> > asym_cpu_capacity_classify(struct sched_domain *sd,
> > const struct cpumask *cpu_map)
> > {
> > int sd_span_match = 0, cpu_map_match = 0, flags = 0;
> > struct asym_cap_data *entry;
> >
> > list_for_each_entry(entry, &asym_cap_list, link) {
> > if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask))
> > ++sd_span_match;
> > else if (cpumask_intersects(cpu_map, entry->cpu_mask))
> > ++cpu_map_match;
> > }
> >
> > WARN_ON_ONCE(!sd_span_match);
> >
> > if (sd_span_match > 1) {
> > flags |= SD_ASYM_CPUCAPACITY;
> > if (!cpu_map_match)
> > flags |= SD_ASYM_CPUCAPACITY_FULL;
> > }
> >
> > return flags;
> > }
> So I planned to drop the list_is_singular check as it is needless really.
> Otherwise, I am not really convinced by the suggestion. I could add comments
> around current version to make it more ..... 'digestible' but I'd rather
> stay with it as it seems more compact to me (subjective).
>
> >
> > BTW, how would this mechanism behave on a system with SMT and asymmetric CPU
> > capacity? Something EAS wouldn't allow but I guess asym_cap_list will be
> > constructed and the SD_ASYM_CPUCAPACITY_XXX flags will be set?
> Yes, the list would get created and flags set. I do not think there is
> a difference with current approach (?). So EAS would be disabled (it only cares
> about SD_ASYM_CPUCAPACITY_FULL flag) but the misift might still kick in.
>
That depends on the arch_scale_cpu_capacity. I would imagine it would
return SCHED_CAPACITY_SCALE for those, which means no asymmetry will
be detected ?
> ---
> BR
> B.
On Wed, May 26, 2021 at 08:17:25PM +0200, Dietmar Eggemann wrote:
> On 26/05/2021 14:15, Beata Michalska wrote:
> > On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
> >> On 25/05/2021 12:29, Beata Michalska wrote:
> >>> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
> >>>> On 24/05/21 23:55, Beata Michalska wrote:
> >>>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
> >>>>>> On 24/05/21 11:16, Beata Michalska wrote:
>
> [...]
>
> >> static inline int
> >> asym_cpu_capacity_classify(struct sched_domain *sd,
> >> const struct cpumask *cpu_map)
> >> {
> >> int sd_span_match = 0, cpu_map_match = 0, flags = 0;
> >> struct asym_cap_data *entry;
> >>
> >> list_for_each_entry(entry, &asym_cap_list, link) {
> >> if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask))
> >> ++sd_span_match;
> >> else if (cpumask_intersects(cpu_map, entry->cpu_mask))
> >> ++cpu_map_match;
> >> }
> >>
> >> WARN_ON_ONCE(!sd_span_match);
> >>
> >> if (sd_span_match > 1) {
> >> flags |= SD_ASYM_CPUCAPACITY;
> >> if (!cpu_map_match)
> >> flags |= SD_ASYM_CPUCAPACITY_FULL;
> >> }
> >>
> >> return flags;
> >> }
> > So I planned to drop the list_is_singular check as it is needless really.
> > Otherwise, I am not really convinced by the suggestion. I could add comments
> > around current version to make it more ..... 'digestible' but I'd rather
> > stay with it as it seems more compact to me (subjective).
>
> You could pass in `const struct cpumask *sd_span` instead of `struct
> sched_domain *sd` though. To make it clear that both masks are used to
> compare against the cpumasks of the asym_cap_list entries.
>
I could definitely do that, though if I switch to arrays for CPUs masks,
it might get a bit confusing again.
No strong preferences here though. Can do either or both.
Thanks.
---
BR
B.
> static inline int
> -asym_cpu_capacity_classify(struct sched_domain *sd,
> +asym_cpu_capacity_classify(const struct cpumask *sd_span,
> const struct cpumask *cpu_map)
> {
> int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
> @@ -1377,14 +1378,14 @@ asym_cpu_capacity_classify(struct sched_domain *sd,
> goto leave;
>
> list_for_each_entry(entry, &asym_cap_list, link) {
> - if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
> + if (cpumask_intersects(sd_span, entry->cpu_mask)) {
> ++asym_cap_count;
> } else {
> /*
> * CPUs with given capacity might be offline
> * so make sure this is not the case
> */
> - if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
> + if (cpumask_intersects(cpu_map, entry->cpu_mask)) {
> sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
> if (asym_cap_count > 1)
> break;
> @@ -1395,7 +1396,6 @@ asym_cpu_capacity_classify(struct sched_domain *sd,
> leave:
> return asym_cap_count > 1 ? sd_asym_flags : 0;
> }
> -#endif
>
> static inline struct asym_cap_data *
> asym_cpu_capacity_get_data(unsigned long capacity)
> @@ -1589,6 +1589,7 @@ sd_init(struct sched_domain_topology_level *tl,
> struct sd_data *sdd = &tl->data;
> struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
> int sd_id, sd_weight, sd_flags = 0;
> + struct cpumask *sd_span;
>
> #ifdef CONFIG_NUMA
> /*
> @@ -1636,10 +1637,11 @@ sd_init(struct sched_domain_topology_level *tl,
> #endif
> };
>
> - cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
> - sd_id = cpumask_first(sched_domain_span(sd));
> + sd_span = sched_domain_span(sd);
> + cpumask_and(sd_span, cpu_map, tl->mask(cpu));
> + sd_id = cpumask_first(sd_span);
>
> - sd->flags |= asym_cpu_capacity_classify(sd, cpu_map);
> + sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
> /*
> * Convert topological properties into behaviour.
> */
On 26/05/2021 14:15, Beata Michalska wrote:
> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
>> On 25/05/2021 12:29, Beata Michalska wrote:
>>> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
>>>> On 24/05/21 23:55, Beata Michalska wrote:
>>>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
>>>>>> On 24/05/21 11:16, Beata Michalska wrote:
[...]
>> static inline int
>> asym_cpu_capacity_classify(struct sched_domain *sd,
>> const struct cpumask *cpu_map)
>> {
>> int sd_span_match = 0, cpu_map_match = 0, flags = 0;
>> struct asym_cap_data *entry;
>>
>> list_for_each_entry(entry, &asym_cap_list, link) {
>> if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask))
>> ++sd_span_match;
>> else if (cpumask_intersects(cpu_map, entry->cpu_mask))
>> ++cpu_map_match;
>> }
>>
>> WARN_ON_ONCE(!sd_span_match);
>>
>> if (sd_span_match > 1) {
>> flags |= SD_ASYM_CPUCAPACITY;
>> if (!cpu_map_match)
>> flags |= SD_ASYM_CPUCAPACITY_FULL;
>> }
>>
>> return flags;
>> }
> So I planned to drop the list_is_singular check as it is needless really.
> Otherwise, I am not really convinced by the suggestion. I could add comments
> around current version to make it more ..... 'digestible' but I'd rather
> stay with it as it seems more compact to me (subjective).
You could pass in `const struct cpumask *sd_span` instead of `struct
sched_domain *sd` though. To make it clear that both masks are used to
compare against the cpumasks of the asym_cap_list entries.
static inline int
-asym_cpu_capacity_classify(struct sched_domain *sd,
+asym_cpu_capacity_classify(const struct cpumask *sd_span,
const struct cpumask *cpu_map)
{
int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
@@ -1377,14 +1378,14 @@ asym_cpu_capacity_classify(struct sched_domain *sd,
goto leave;
list_for_each_entry(entry, &asym_cap_list, link) {
- if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
+ if (cpumask_intersects(sd_span, entry->cpu_mask)) {
++asym_cap_count;
} else {
/*
* CPUs with given capacity might be offline
* so make sure this is not the case
*/
- if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
+ if (cpumask_intersects(cpu_map, entry->cpu_mask)) {
sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
if (asym_cap_count > 1)
break;
@@ -1395,7 +1396,6 @@ asym_cpu_capacity_classify(struct sched_domain *sd,
leave:
return asym_cap_count > 1 ? sd_asym_flags : 0;
}
-#endif
static inline struct asym_cap_data *
asym_cpu_capacity_get_data(unsigned long capacity)
@@ -1589,6 +1589,7 @@ sd_init(struct sched_domain_topology_level *tl,
struct sd_data *sdd = &tl->data;
struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
int sd_id, sd_weight, sd_flags = 0;
+ struct cpumask *sd_span;
#ifdef CONFIG_NUMA
/*
@@ -1636,10 +1637,11 @@ sd_init(struct sched_domain_topology_level *tl,
#endif
};
- cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
- sd_id = cpumask_first(sched_domain_span(sd));
+ sd_span = sched_domain_span(sd);
+ cpumask_and(sd_span, cpu_map, tl->mask(cpu));
+ sd_id = cpumask_first(sd_span);
- sd->flags |= asym_cpu_capacity_classify(sd, cpu_map);
+ sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
/*
* Convert topological properties into behaviour.
*/
On 26/05/2021 14:51, Beata Michalska wrote:
> On Wed, May 26, 2021 at 01:15:46PM +0100, Beata Michalska wrote:
>> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
>>> On 25/05/2021 12:29, Beata Michalska wrote:
>>>> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
>>>>> On 24/05/21 23:55, Beata Michalska wrote:
>>>>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
>>>>>>> On 24/05/21 11:16, Beata Michalska wrote:
[...]
>>> BTW, how would this mechanism behave on a system with SMT and asymmetric CPU
>>> capacity? Something EAS wouldn't allow but I guess asym_cap_list will be
>>> constructed and the SD_ASYM_CPUCAPACITY_XXX flags will be set?
>> Yes, the list would get created and flags set. I do not think there is
>> a difference with current approach (?). So EAS would be disabled (it only cares
>> about SD_ASYM_CPUCAPACITY_FULL flag) but the misift might still kick in.
>>
> That depends on the arch_scale_cpu_capacity. I would imagine it would
> return SCHED_CAPACITY_SCALE for those, which means no asymmetry will
> be detected ?
I was thinking about an erroneous dts file like:
cpu-map {
cluster0 {
core0 {
thread0 {
cpu = <&A53_0>;
};
thread1 {
cpu = <&A53_1>;
};
};
core1 {
thread0 {
cpu = <&A53_2>;
};
thread1 {
cpu = <&A53_3>;
};
};
core2 {
thread0 {
cpu = <&A53_4>;
};
thread1 {
cpu = <&A53_5>;
};
};
};
cluster1 {
core0 {
thread0 {
cpu = <&A53_6>;
};
thread1 {
cpu = <&A53_7>;
};
};
};
};
A53_0: cpu@0 {
capacity-dmips-mhz = <446>;
A53_1: cpu@1 {
capacity-dmips-mhz = <1024>;
A53_2: cpu@2 {
capacity-dmips-mhz = <871>;
A53_3: cpu@3 {
capacity-dmips-mhz = <1024>;
A53_4: cpu@4 {
capacity-dmips-mhz = <446>;
A53_5: cpu@5 {
capacity-dmips-mhz = <871>;
A53_6: cpu@6 {
capacity-dmips-mhz = <1024>;
A53_7: cpu@7 {
capacity-dmips-mhz = <1024>;
Here I guess SD_ASYM_CPUCAPACITY will be attached to SMT[0-5]. So this
'capacity-dmips-mhz' config error won't be detected.
In case all CPUs (i.e. hw threads would have the correct
capacity-dmips-mhz = <1024> or not being set (default 1024))
asym_cap_list would corrcetly only have 1 entry.
On Wed, May 26, 2021 at 08:17:41PM +0200, Dietmar Eggemann wrote:
> On 26/05/2021 14:51, Beata Michalska wrote:
> > On Wed, May 26, 2021 at 01:15:46PM +0100, Beata Michalska wrote:
> >> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
> >>> On 25/05/2021 12:29, Beata Michalska wrote:
> >>>> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
> >>>>> On 24/05/21 23:55, Beata Michalska wrote:
> >>>>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
> >>>>>>> On 24/05/21 11:16, Beata Michalska wrote:
>
> [...]
>
> >>> BTW, how would this mechanism behave on a system with SMT and asymmetric CPU
> >>> capacity? Something EAS wouldn't allow but I guess asym_cap_list will be
> >>> constructed and the SD_ASYM_CPUCAPACITY_XXX flags will be set?
> >> Yes, the list would get created and flags set. I do not think there is
> >> a difference with current approach (?). So EAS would be disabled (it only cares
> >> about SD_ASYM_CPUCAPACITY_FULL flag) but the misift might still kick in.
> >>
> > That depends on the arch_scale_cpu_capacity. I would imagine it would
> > return SCHED_CAPACITY_SCALE for those, which means no asymmetry will
> > be detected ?
>
> I was thinking about an erroneous dts file like:
>
> cpu-map {
> cluster0 {
> core0 {
> thread0 {
> cpu = <&A53_0>;
> };
> thread1 {
> cpu = <&A53_1>;
> };
> };
> core1 {
> thread0 {
> cpu = <&A53_2>;
> };
> thread1 {
> cpu = <&A53_3>;
> };
> };
> core2 {
> thread0 {
> cpu = <&A53_4>;
> };
> thread1 {
> cpu = <&A53_5>;
> };
> };
> };
>
> cluster1 {
> core0 {
> thread0 {
> cpu = <&A53_6>;
> };
> thread1 {
> cpu = <&A53_7>;
> };
> };
> };
> };
>
> A53_0: cpu@0 {
> capacity-dmips-mhz = <446>;
> A53_1: cpu@1 {
> capacity-dmips-mhz = <1024>;
> A53_2: cpu@2 {
> capacity-dmips-mhz = <871>;
> A53_3: cpu@3 {
> capacity-dmips-mhz = <1024>;
> A53_4: cpu@4 {
> capacity-dmips-mhz = <446>;
> A53_5: cpu@5 {
> capacity-dmips-mhz = <871>;
> A53_6: cpu@6 {
> capacity-dmips-mhz = <1024>;
> A53_7: cpu@7 {
> capacity-dmips-mhz = <1024>;
>
> Here I guess SD_ASYM_CPUCAPACITY will be attached to SMT[0-5]. So this
> 'capacity-dmips-mhz' config error won't be detected.
>
> In case all CPUs (i.e. hw threads would have the correct
> capacity-dmips-mhz = <1024> or not being set (default 1024))
> asym_cap_list would corrcetly only have 1 entry.
We could possibly add a warning (like in EAS) if the asymmetry is detected
for SMT which would give some indication that there is smth ... wrong ?
---
BR
B.
On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
> For me asym_cpu_capacity_classify() is pretty hard to digest ;-) But I
> wasn't able to break it. It also performs correctly on (non-existing SMT)
> layer (with sd span eq. single CPU).
This is the simplest form I could come up with this morning:
static inline int
asym_cpu_capacity_classify(struct sched_domain *sd,
const struct cpumask *cpu_map)
{
struct asym_cap_data *entry;
int i = 0, n = 0;
list_for_each_entry(entry, &asym_cap_list, link) {
if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask))
i++;
else
n++;
}
if (WARN_ON_ONCE(!i) || i == 1) /* no asymmetry */
return 0;
if (n) /* partial asymmetry */
return SD_ASYM_CPUCAPACITY;
/* full asymmetry */
return SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
}
The early termination and everything was cute; but this isn't
performance critical code and clarity is paramount.
On Thu, May 27, 2021 at 05:08:42PM +0200, Dietmar Eggemann wrote:
> On 26/05/2021 23:40, Beata Michalska wrote:
> > On Wed, May 26, 2021 at 08:17:41PM +0200, Dietmar Eggemann wrote:
> >> On 26/05/2021 14:51, Beata Michalska wrote:
> >>> On Wed, May 26, 2021 at 01:15:46PM +0100, Beata Michalska wrote:
> >>>> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
> >>>>> On 25/05/2021 12:29, Beata Michalska wrote:
> >>>>>> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
> >>>>>>> On 24/05/21 23:55, Beata Michalska wrote:
> >>>>>>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
> >>>>>>>>> On 24/05/21 11:16, Beata Michalska wrote:
>
> [...]
>
> >> cpu-map {
> >> cluster0 {
> >> core0 {
> >> thread0 {
> >> cpu = <&A53_0>;
> >> };
> >> thread1 {
> >> cpu = <&A53_1>;
> >> };
> >> };
> >> core1 {
> >> thread0 {
> >> cpu = <&A53_2>;
> >> };
> >> thread1 {
> >> cpu = <&A53_3>;
> >> };
> >> };
> >> core2 {
> >> thread0 {
> >> cpu = <&A53_4>;
> >> };
> >> thread1 {
> >> cpu = <&A53_5>;
> >> };
> >> };
> >> };
> >>
> >> cluster1 {
> >> core0 {
> >> thread0 {
> >> cpu = <&A53_6>;
> >> };
> >> thread1 {
> >> cpu = <&A53_7>;
> >> };
> >> };
> >> };
> >> };
> >>
> >> A53_0: cpu@0 {
> >> capacity-dmips-mhz = <446>;
> >> A53_1: cpu@1 {
> >> capacity-dmips-mhz = <1024>;
> >> A53_2: cpu@2 {
> >> capacity-dmips-mhz = <871>;
> >> A53_3: cpu@3 {
> >> capacity-dmips-mhz = <1024>;
> >> A53_4: cpu@4 {
> >> capacity-dmips-mhz = <446>;
> >> A53_5: cpu@5 {
> >> capacity-dmips-mhz = <871>;
> >> A53_6: cpu@6 {
> >> capacity-dmips-mhz = <1024>;
> >> A53_7: cpu@7 {
> >> capacity-dmips-mhz = <1024>;
> >>
> >> Here I guess SD_ASYM_CPUCAPACITY will be attached to SMT[0-5]. So this
> >> 'capacity-dmips-mhz' config error won't be detected.
> >>
> >> In case all CPUs (i.e. hw threads would have the correct
> >> capacity-dmips-mhz = <1024> or not being set (default 1024))
> >> asym_cap_list would corrcetly only have 1 entry.
> > We could possibly add a warning (like in EAS) if the asymmetry is detected
> > for SMT which would give some indication that there is smth ... wrong ?
>
> Maybe, in case you find an easy way to detect this.
>
> But the issue already exists today. Not with the topology mentioned
> above but in case we slightly change it to:
>
> cpus = { ([446 1024] [871 1024] [446 1024] ) ([1024 1024]) }
> ^^^^
> so that we have a 1024 CPU in the lowest sd for each CPU, we would get
> SD_ASYM_CPUCAPACITY on SMT.
The asymmetry capacity flags are being set on a sched domain level, so
we could use the SD_SHARE_CPUCAPACITY|SD_SHARE_PKG_RESOURCES (cpu_smt_flags)
flags to determine if having asymmetry is valid or not ? If this is enough
this could be handled by the classify function?
---
BR
B.
On 27/05/2021 09:03, Peter Zijlstra wrote:
> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
>
>> For me asym_cpu_capacity_classify() is pretty hard to digest ;-) But I
>> wasn't able to break it. It also performs correctly on (non-existing SMT)
>> layer (with sd span eq. single CPU).
>
> This is the simplest form I could come up with this morning:
>
> static inline int
> asym_cpu_capacity_classify(struct sched_domain *sd,
> const struct cpumask *cpu_map)
> {
> struct asym_cap_data *entry;
> int i = 0, n = 0;
>
> list_for_each_entry(entry, &asym_cap_list, link) {
> if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask))
> i++;
> else
> n++;
> }
>
> if (WARN_ON_ONCE(!i) || i == 1) /* no asymmetry */
> return 0;
>
> if (n) /* partial asymmetry */
> return SD_ASYM_CPUCAPACITY;
>
> /* full asymmetry */
> return SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
> }
>
>
> The early termination and everything was cute; but this isn't
> performance critical code and clarity is paramount.
This is definitely easier to grasp.
What about the missing `if (cpumask_intersects(entry->cpu_mask,
cpu_map))` condition in the else path to increment n?
Example:
cpus = {[446 446] [871 871] [1024 1024]}
So 3 asym_cap_list entries.
After hp'ing out CPU4 and CPU5:
DIE: 'partial asymmetry'
In case we would increment n only when the condition is met, we would
have `full asymmetry`.
I guess we want to allow EAS task placement, hence have
sd_asym_cpucapacity set in case there are only 446 and 871 left?
On Thu, May 27, 2021 at 02:22:52PM +0200, Dietmar Eggemann wrote:
> On 27/05/2021 09:03, Peter Zijlstra wrote:
> > On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
> >
> >> For me asym_cpu_capacity_classify() is pretty hard to digest ;-) But I
> >> wasn't able to break it. It also performs correctly on (non-existing SMT)
> >> layer (with sd span eq. single CPU).
> >
> > This is the simplest form I could come up with this morning:
> >
> > static inline int
> > asym_cpu_capacity_classify(struct sched_domain *sd,
> > const struct cpumask *cpu_map)
> > {
> > struct asym_cap_data *entry;
> > int i = 0, n = 0;
> >
> > list_for_each_entry(entry, &asym_cap_list, link) {
> > if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask))
> > i++;
> > else
> > n++;
> > }
> >
> > if (WARN_ON_ONCE(!i) || i == 1) /* no asymmetry */
> > return 0;
> >
> > if (n) /* partial asymmetry */
> > return SD_ASYM_CPUCAPACITY;
> >
> > /* full asymmetry */
> > return SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
> > }
> >
> >
> > The early termination and everything was cute; but this isn't
> > performance critical code and clarity is paramount.
>
> This is definitely easier to grasp.
>
> What about the missing `if (cpumask_intersects(entry->cpu_mask,
> cpu_map))` condition in the else path to increment n?
>
> Example:
>
> cpus = {[446 446] [871 871] [1024 1024]}
>
> So 3 asym_cap_list entries.
>
> After hp'ing out CPU4 and CPU5:
>
> DIE: 'partial asymmetry'
>
> In case we would increment n only when the condition is met, we would
> have `full asymmetry`.
>
> I guess we want to allow EAS task placement, hence have
> sd_asym_cpucapacity set in case there are only 446 and 871 left?
>
I will rewrite the function as per all the suggestions and make things ....
more readable.
---
BR
B.
On 26/05/2021 23:40, Beata Michalska wrote:
> On Wed, May 26, 2021 at 08:17:41PM +0200, Dietmar Eggemann wrote:
>> On 26/05/2021 14:51, Beata Michalska wrote:
>>> On Wed, May 26, 2021 at 01:15:46PM +0100, Beata Michalska wrote:
>>>> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
>>>>> On 25/05/2021 12:29, Beata Michalska wrote:
>>>>>> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
>>>>>>> On 24/05/21 23:55, Beata Michalska wrote:
>>>>>>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
>>>>>>>>> On 24/05/21 11:16, Beata Michalska wrote:
[...]
>> cpu-map {
>> cluster0 {
>> core0 {
>> thread0 {
>> cpu = <&A53_0>;
>> };
>> thread1 {
>> cpu = <&A53_1>;
>> };
>> };
>> core1 {
>> thread0 {
>> cpu = <&A53_2>;
>> };
>> thread1 {
>> cpu = <&A53_3>;
>> };
>> };
>> core2 {
>> thread0 {
>> cpu = <&A53_4>;
>> };
>> thread1 {
>> cpu = <&A53_5>;
>> };
>> };
>> };
>>
>> cluster1 {
>> core0 {
>> thread0 {
>> cpu = <&A53_6>;
>> };
>> thread1 {
>> cpu = <&A53_7>;
>> };
>> };
>> };
>> };
>>
>> A53_0: cpu@0 {
>> capacity-dmips-mhz = <446>;
>> A53_1: cpu@1 {
>> capacity-dmips-mhz = <1024>;
>> A53_2: cpu@2 {
>> capacity-dmips-mhz = <871>;
>> A53_3: cpu@3 {
>> capacity-dmips-mhz = <1024>;
>> A53_4: cpu@4 {
>> capacity-dmips-mhz = <446>;
>> A53_5: cpu@5 {
>> capacity-dmips-mhz = <871>;
>> A53_6: cpu@6 {
>> capacity-dmips-mhz = <1024>;
>> A53_7: cpu@7 {
>> capacity-dmips-mhz = <1024>;
>>
>> Here I guess SD_ASYM_CPUCAPACITY will be attached to SMT[0-5]. So this
>> 'capacity-dmips-mhz' config error won't be detected.
>>
>> In case all CPUs (i.e. hw threads would have the correct
>> capacity-dmips-mhz = <1024> or not being set (default 1024))
>> asym_cap_list would corrcetly only have 1 entry.
> We could possibly add a warning (like in EAS) if the asymmetry is detected
> for SMT which would give some indication that there is smth ... wrong ?
Maybe, in case you find an easy way to detect this.
But the issue already exists today. Not with the topology mentioned
above but in case we slightly change it to:
cpus = { ([446 1024] [871 1024] [446 1024] ) ([1024 1024]) }
^^^^
so that we have a 1024 CPU in the lowest sd for each CPU, we would get
SD_ASYM_CPUCAPACITY on SMT.
On 27/05/2021 19:07, Beata Michalska wrote:
> On Thu, May 27, 2021 at 05:08:42PM +0200, Dietmar Eggemann wrote:
>> On 26/05/2021 23:40, Beata Michalska wrote:
>>> On Wed, May 26, 2021 at 08:17:41PM +0200, Dietmar Eggemann wrote:
>>>> On 26/05/2021 14:51, Beata Michalska wrote:
>>>>> On Wed, May 26, 2021 at 01:15:46PM +0100, Beata Michalska wrote:
>>>>>> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
>>>>>>> On 25/05/2021 12:29, Beata Michalska wrote:
>>>>>>>> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
>>>>>>>>> On 24/05/21 23:55, Beata Michalska wrote:
>>>>>>>>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
>>>>>>>>>>> On 24/05/21 11:16, Beata Michalska wrote:
[...]
>>> We could possibly add a warning (like in EAS) if the asymmetry is detected
>>> for SMT which would give some indication that there is smth ... wrong ?
>>
>> Maybe, in case you find an easy way to detect this.
>>
>> But the issue already exists today. Not with the topology mentioned
>> above but in case we slightly change it to:
>>
>> cpus = { ([446 1024] [871 1024] [446 1024] ) ([1024 1024]) }
>> ^^^^
>> so that we have a 1024 CPU in the lowest sd for each CPU, we would get
>> SD_ASYM_CPUCAPACITY on SMT.
> The asymmetry capacity flags are being set on a sched domain level, so
> we could use the SD_SHARE_CPUCAPACITY|SD_SHARE_PKG_RESOURCES (cpu_smt_flags)
> flags to determine if having asymmetry is valid or not ? If this is enough
> this could be handled by the classify function?
Or maybe something directly in sd_init(), like the WARN_ONCE() which triggers
if somebody wants to sneak in a ~topology flag via a
sched_domain_topology_level table?
IMHO checking `SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY` will be sufficient
here.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 62d412013df8..77b73abbb9a4 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1561,6 +1561,11 @@ sd_init(struct sched_domain_topology_level *tl,
sd_id = cpumask_first(sched_domain_span(sd));
sd->flags |= asym_cpu_capacity_classify(sd, cpu_map);
+
+ WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) ==
+ (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY),
+ "CPU capacity asymmetry not supported on SMT\n");
+
/*
* Convert topological properties into behaviour.
*/
In case we can agree on something simple here I guess you can incorporate it into v7.
On Wed, Jun 02, 2021 at 07:17:12PM +0200, Dietmar Eggemann wrote:
> On 27/05/2021 19:07, Beata Michalska wrote:
> > On Thu, May 27, 2021 at 05:08:42PM +0200, Dietmar Eggemann wrote:
> >> On 26/05/2021 23:40, Beata Michalska wrote:
> >>> On Wed, May 26, 2021 at 08:17:41PM +0200, Dietmar Eggemann wrote:
> >>>> On 26/05/2021 14:51, Beata Michalska wrote:
> >>>>> On Wed, May 26, 2021 at 01:15:46PM +0100, Beata Michalska wrote:
> >>>>>> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
> >>>>>>> On 25/05/2021 12:29, Beata Michalska wrote:
> >>>>>>>> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
> >>>>>>>>> On 24/05/21 23:55, Beata Michalska wrote:
> >>>>>>>>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
> >>>>>>>>>>> On 24/05/21 11:16, Beata Michalska wrote:
>
> [...]
>
> >>> We could possibly add a warning (like in EAS) if the asymmetry is detected
> >>> for SMT which would give some indication that there is smth ... wrong ?
> >>
> >> Maybe, in case you find an easy way to detect this.
> >>
> >> But the issue already exists today. Not with the topology mentioned
> >> above but in case we slightly change it to:
> >>
> >> cpus = { ([446 1024] [871 1024] [446 1024] ) ([1024 1024]) }
> >> ^^^^
> >> so that we have a 1024 CPU in the lowest sd for each CPU, we would get
> >> SD_ASYM_CPUCAPACITY on SMT.
> > The asymmetry capacity flags are being set on a sched domain level, so
> > we could use the SD_SHARE_CPUCAPACITY|SD_SHARE_PKG_RESOURCES (cpu_smt_flags)
> > flags to determine if having asymmetry is valid or not ? If this is enough
> > this could be handled by the classify function?
>
> Or maybe something directly in sd_init(), like the WARN_ONCE() which triggers
> if somebody wants to sneak in a ~topology flag via a
> sched_domain_topology_level table?
>
> IMHO checking `SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY` will be sufficient
> here.
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 62d412013df8..77b73abbb9a4 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1561,6 +1561,11 @@ sd_init(struct sched_domain_topology_level *tl,
> sd_id = cpumask_first(sched_domain_span(sd));
>
> sd->flags |= asym_cpu_capacity_classify(sd, cpu_map);
> +
> + WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) ==
> + (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY),
> + "CPU capacity asymmetry not supported on SMT\n");
> +
> /*
> * Convert topological properties into behaviour.
> */
>
> In case we can agree on something simple here I guess you can incorporate it into v7.
So what I have done is :
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 77e6f79235ad..ec4ae225687e 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1324,6 +1324,7 @@ asym_cpu_capacity_classify(struct sched_domain *sd,
if (!asym_cap_miss)
sd_asym_flags |= SD_ASYM_CPUCAPACITY_FULL;
+ WARN_ONCE(cpu_smt_flags() & sd->flags, "Detected CPU capacity asymmetry on SMT level");
leave:
return sd_asym_flags;
}
Comment can be adjusted.
This would sit in the classify function to nicely wrap asymmetry bits in one
place. What do you think ?
---
BR
B.
On 02/06/2021 21:48, Beata Michalska wrote:
> On Wed, Jun 02, 2021 at 07:17:12PM +0200, Dietmar Eggemann wrote:
>> On 27/05/2021 19:07, Beata Michalska wrote:
>>> On Thu, May 27, 2021 at 05:08:42PM +0200, Dietmar Eggemann wrote:
>>>> On 26/05/2021 23:40, Beata Michalska wrote:
>>>>> On Wed, May 26, 2021 at 08:17:41PM +0200, Dietmar Eggemann wrote:
>>>>>> On 26/05/2021 14:51, Beata Michalska wrote:
>>>>>>> On Wed, May 26, 2021 at 01:15:46PM +0100, Beata Michalska wrote:
>>>>>>>> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
>>>>>>>>> On 25/05/2021 12:29, Beata Michalska wrote:
>>>>>>>>>> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
>>>>>>>>>>> On 24/05/21 23:55, Beata Michalska wrote:
>>>>>>>>>>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
>>>>>>>>>>>>> On 24/05/21 11:16, Beata Michalska wrote:
[...]
> So what I have done is :
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 77e6f79235ad..ec4ae225687e 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1324,6 +1324,7 @@ asym_cpu_capacity_classify(struct sched_domain *sd,
> if (!asym_cap_miss)
> sd_asym_flags |= SD_ASYM_CPUCAPACITY_FULL;
>
> + WARN_ONCE(cpu_smt_flags() & sd->flags, "Detected CPU capacity asymmetry on SMT level");
> leave:
> return sd_asym_flags;
> }
>
> Comment can be adjusted.
> This would sit in the classify function to nicely wrap asymmetry bits in one
> place. What do you think ?
... and you would need to pass in the sched domain pointer ;-)
Still prefer to check it in sd_init() since there is where we set the flags.
But you can't do 'cpu_smt_flags() & sd->flags'. MC level would hit too,
since it has SD_SHARE_PKG_RESOURCES as well.
On Thu, Jun 03, 2021 at 11:09:48AM +0200, Dietmar Eggemann wrote:
> On 02/06/2021 21:48, Beata Michalska wrote:
> > On Wed, Jun 02, 2021 at 07:17:12PM +0200, Dietmar Eggemann wrote:
> >> On 27/05/2021 19:07, Beata Michalska wrote:
> >>> On Thu, May 27, 2021 at 05:08:42PM +0200, Dietmar Eggemann wrote:
> >>>> On 26/05/2021 23:40, Beata Michalska wrote:
> >>>>> On Wed, May 26, 2021 at 08:17:41PM +0200, Dietmar Eggemann wrote:
> >>>>>> On 26/05/2021 14:51, Beata Michalska wrote:
> >>>>>>> On Wed, May 26, 2021 at 01:15:46PM +0100, Beata Michalska wrote:
> >>>>>>>> On Wed, May 26, 2021 at 11:52:25AM +0200, Dietmar Eggemann wrote:
> >>>>>>>>> On 25/05/2021 12:29, Beata Michalska wrote:
> >>>>>>>>>> On Tue, May 25, 2021 at 10:53:07AM +0100, Valentin Schneider wrote:
> >>>>>>>>>>> On 24/05/21 23:55, Beata Michalska wrote:
> >>>>>>>>>>>> On Mon, May 24, 2021 at 07:01:04PM +0100, Valentin Schneider wrote:
> >>>>>>>>>>>>> On 24/05/21 11:16, Beata Michalska wrote:
>
> [...]
>
> > So what I have done is :
> >
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 77e6f79235ad..ec4ae225687e 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -1324,6 +1324,7 @@ asym_cpu_capacity_classify(struct sched_domain *sd,
> > if (!asym_cap_miss)
> > sd_asym_flags |= SD_ASYM_CPUCAPACITY_FULL;
> >
> > + WARN_ONCE(cpu_smt_flags() & sd->flags, "Detected CPU capacity asymmetry on SMT level");
> > leave:
> > return sd_asym_flags;
> > }
> >
> > Comment can be adjusted.
> > This would sit in the classify function to nicely wrap asymmetry bits in one
> > place. What do you think ?
>
> ... and you would need to pass in the sched domain pointer ;-)
Yes, as that was for current version.
>
> Still prefer to check it in sd_init() since there is where we set the flags.
>
> But you can't do 'cpu_smt_flags() & sd->flags'. MC level would hit too,
> since it has SD_SHARE_PKG_RESOURCES as well.
Yeah, I would need to check:
cpu_smt_flags() & sd->flags == cpu_smt_flags()
and if I am to move it to sd_init then additionally checking for
SD_ASYM_CPUCAPACITY ... and #ifdef for SMT .....
so I guess I will go with your proposal using the SD_SHARE_CPUCAPACITY directly.
Will update in the v7.
---
BR
B.