2020-08-12 12:54:08

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 00/17] sched: Instrument sched domain flags

Hi,

I've repeatedly stared at an SD flag and asked myself "how should that be
set up in the domain hierarchy anyway?". I figured that if we formalize our
flags zoology a bit, we could also do some runtime assertions on them -
this is what this series is all about.

Patches
=======

The idea is to associate the flags with metaflags that describes how they
should be set in a sched domain hierarchy ("if this SD has it, all its {parents,
children} have it") or how they behave wrt degeneration - details are in the
comments and commit logs.

The good thing is that the debugging bits go away when CONFIG_SCHED_DEBUG isn't
set. The bad thing is that this replaces SD_* flags definitions with some
unsavoury macros. This is mainly because I wanted to avoid having to duplicate
work between declaring the flags and declaring their metaflags. Conceptually
they are pretty close to the macros used for SCHED_FEAT.

o Patches 1-2 remove a derelict flag and align the arm scheduler topology
with arm64's
o Patches 3-8 instrument SD flags with metadata and add assertions
o Patches 9-10 are additional topology cleanups
o Patches 11-16 each add a new flag to SD_DEGENERATE_GROUPS_MASK
o Patch 17 leverage the previous 6 patches to further factorize domain
degeneration

Revisions
=========

v4 -> v5
--------

The final git diff between v4 and v5 isn't too big; there is no diff on the
domain degeneration side of things, it has just been split up in more
patches.

o Shuffled the series around to facilitate bisection (Ingo)
I kept the arm bits at the start because that was a bit simpler, and in
the unlikely case that needs to be reverted it won't be too hard.
o Split the degeneration mask tweak into individual commits per new flag
(Ingo)
o Collected Reviewed-by from Dietmar; since I shuffled the whole lot I
didn't keep your Tested-by, sorry!

o Turned the SD flags into an enum with automagic power of 2 assignment.
I poked the dwarves and they assured me the SD flag values haven't
changed.
o [new patch] Made SD flag debug file output flag names

v3 -> v4
--------

o Reordered the series to have fixes / cleanups first

o Added SD_ASYM_CPUCAPACITY propagation (Quentin)
o Made ARM revert back to the default sched topology (Dietmar)
o Removed SD_SERIALIZE degeneration special case (Peter)

o Made SD_NUMA and SD_SERIALIZE have SDF_NEEDS_GROUPS

As discussed on v3, I thought this wasn't required, but thinking some more
about it there can be cases where that changes the current behaviour. For
instance, in the following wacky triangle:

0\ 30
| \
20 | 2
| /
1/ 30

there are two unique distances thus two NUMA topology levels, however the
first one for node 2 would have the same span as its child domain and thus
should be degenerated. If we don't give SD_NUMA and SD_SERIALIZE
SDF_NEEDS_GROUPS, this domain wouldn't be denegerated since its child
*doesn't* have either SD_NUMA or SD_SERIALIZE (it's the first NUMA domain),
and we'd have this weird NUMA domain lingering with a single group.

v2 -> v3
--------

o Reworded comment for SD_OVERLAP (it's about the groups, not the domains)

o Added more flags to the SD degeneration mask
o Added generation of an SD flag mask for the degeneration functions (Peter)

RFC -> v2
---------

o Rebased on top of tip/sched/core
o Aligned wording of comments between flags
o Rectified some flag descriptions (Morten)
o Added removal of SD_SHARE_POWERDOMAIN (Morten)

Valentin Schneider (17):
ARM, sched/topology: Remove SD_SHARE_POWERDOMAIN
ARM: Revert back to default scheduler topology.
sched/topology: Split out SD_* flags declaration to its own file
sched/topology: Define and assign sched_domain flag metadata
sched/topology: Verify SD_* flags setup when sched_debug is on
sched/debug: Output SD flag names rather than their values
sched/topology: Introduce SD metaflag for flags needing > 1 groups
sched/topology: Use prebuilt SD flag degeneration mask
sched/topology: Remove SD_SERIALIZE degeneration special case
sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards
sched/topology: Mark SD_PREFER_SIBLING as SDF_NEEDS_GROUPS
sched/topology: Mark SD_BALANCE_WAKE as SDF_NEEDS_GROUPS
sched/topology: Mark SD_SERIALIZE as SDF_NEEDS_GROUPS
sched/topology: Mark SD_ASYM_PACKING as SDF_NEEDS_GROUPS
sched/topology: Mark SD_OVERLAP as SDF_NEEDS_GROUPS
sched/topology: Mark SD_NUMA as SDF_NEEDS_GROUPS
sched/topology: Expand use of SD_DEGENERATE_GROUPS_MASK to flags not
needing groups

arch/arm/kernel/topology.c | 26 ------
include/linux/sched/sd_flags.h | 156 +++++++++++++++++++++++++++++++++
include/linux/sched/topology.h | 45 +++++++---
kernel/sched/debug.c | 53 ++++++++++-
kernel/sched/topology.c | 54 ++++++------
5 files changed, 265 insertions(+), 69 deletions(-)
create mode 100644 include/linux/sched/sd_flags.h

--
2.27.0


2020-08-12 12:54:16

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 01/17] ARM, sched/topology: Remove SD_SHARE_POWERDOMAIN

This flag was introduced in 2014 by commit

d77b3ed5c9f8 ("sched: Add a new SD_SHARE_POWERDOMAIN for sched_domain")

but AFAIA it was never leveraged by the scheduler. The closest thing I can
think of is EAS caring about frequency domains, and it does that by
leveraging performance domains.

Remove the flag.

Cc: Russell King <[email protected]>
Suggested-by: Morten Rasmussen <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
arch/arm/kernel/topology.c | 2 +-
include/linux/sched/topology.h | 13 ++++++-------
kernel/sched/topology.c | 10 +++-------
3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index b5adaf744630..353f3ee660e4 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -243,7 +243,7 @@ void store_cpu_topology(unsigned int cpuid)

static inline int cpu_corepower_flags(void)
{
- return SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN;
+ return SD_SHARE_PKG_RESOURCES;
}

static struct sched_domain_topology_level arm_topology[] = {
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 820511289857..6ec7d7c1d1e3 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -18,13 +18,12 @@
#define SD_WAKE_AFFINE 0x0010 /* Wake task to waking CPU */
#define SD_ASYM_CPUCAPACITY 0x0020 /* Domain members have different CPU capacities */
#define SD_SHARE_CPUCAPACITY 0x0040 /* Domain members share CPU capacity */
-#define SD_SHARE_POWERDOMAIN 0x0080 /* Domain members share power domain */
-#define SD_SHARE_PKG_RESOURCES 0x0100 /* Domain members share CPU pkg resources */
-#define SD_SERIALIZE 0x0200 /* Only a single load balancing instance */
-#define SD_ASYM_PACKING 0x0400 /* Place busy groups earlier in the domain */
-#define SD_PREFER_SIBLING 0x0800 /* Prefer to place tasks in a sibling domain */
-#define SD_OVERLAP 0x1000 /* sched_domains of this level overlap */
-#define SD_NUMA 0x2000 /* cross-node balancing */
+#define SD_SHARE_PKG_RESOURCES 0x0080 /* Domain members share CPU pkg resources */
+#define SD_SERIALIZE 0x0100 /* Only a single load balancing instance */
+#define SD_ASYM_PACKING 0x0200 /* Place busy groups earlier in the domain */
+#define SD_PREFER_SIBLING 0x0400 /* Prefer to place tasks in a sibling domain */
+#define SD_OVERLAP 0x0800 /* sched_domains of this level overlap */
+#define SD_NUMA 0x1000 /* cross-node balancing */

#ifdef CONFIG_SCHED_SMT
static inline int cpu_smt_flags(void)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9079d865a935..865fff3ef20a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -148,8 +148,7 @@ static int sd_degenerate(struct sched_domain *sd)
SD_BALANCE_EXEC |
SD_SHARE_CPUCAPACITY |
SD_ASYM_CPUCAPACITY |
- SD_SHARE_PKG_RESOURCES |
- SD_SHARE_POWERDOMAIN)) {
+ SD_SHARE_PKG_RESOURCES)) {
if (sd->groups != sd->groups->next)
return 0;
}
@@ -180,8 +179,7 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
SD_ASYM_CPUCAPACITY |
SD_SHARE_CPUCAPACITY |
SD_SHARE_PKG_RESOURCES |
- SD_PREFER_SIBLING |
- SD_SHARE_POWERDOMAIN);
+ SD_PREFER_SIBLING);
if (nr_node_ids == 1)
pflags &= ~SD_SERIALIZE;
}
@@ -1292,7 +1290,6 @@ int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE;
* SD_SHARE_CPUCAPACITY - describes SMT topologies
* SD_SHARE_PKG_RESOURCES - describes shared caches
* SD_NUMA - describes NUMA topologies
- * SD_SHARE_POWERDOMAIN - describes shared power domain
*
* Odd one out, which beside describing the topology has a quirk also
* prescribes the desired behaviour that goes along with it:
@@ -1303,8 +1300,7 @@ int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE;
(SD_SHARE_CPUCAPACITY | \
SD_SHARE_PKG_RESOURCES | \
SD_NUMA | \
- SD_ASYM_PACKING | \
- SD_SHARE_POWERDOMAIN)
+ SD_ASYM_PACKING)

static struct sched_domain *
sd_init(struct sched_domain_topology_level *tl,
--
2.27.0

2020-08-12 12:54:30

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 03/17] sched/topology: Split out SD_* flags declaration to its own file

To associate the SD flags with some metadata, we need some more structure
in the way they are declared.

Rather than shove that in a free-standing macro list, move the declaration
in a separate file that can be re-imported with different SD_FLAG
definitions. This is inspired by what is done with the syscall
table (see uapi/asm/unistd.h and sys_call_table).

The value assigned to a given SD flag now depends on the order it appears
in sd_flags.h. No change in functionality.

Reviewed-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 35 ++++++++++++++++++++++++++++++++++
include/linux/sched/topology.h | 26 ++++++++++++-------------
2 files changed, 48 insertions(+), 13 deletions(-)
create mode 100644 include/linux/sched/sd_flags.h

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
new file mode 100644
index 000000000000..5a74751a1a83
--- /dev/null
+++ b/include/linux/sched/sd_flags.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * sched-domains (multiprocessor balancing) flag declarations.
+ */
+
+#ifndef SD_FLAG
+#error "Incorrect import of SD flags definitions"
+#endif
+
+/* Balance when about to become idle */
+SD_FLAG(SD_BALANCE_NEWIDLE)
+/* Balance on exec */
+SD_FLAG(SD_BALANCE_EXEC)
+/* Balance on fork, clone */
+SD_FLAG(SD_BALANCE_FORK)
+/* Balance on wakeup */
+SD_FLAG(SD_BALANCE_WAKE)
+/* Wake task to waking CPU */
+SD_FLAG(SD_WAKE_AFFINE)
+/* Domain members have different CPU capacities */
+SD_FLAG(SD_ASYM_CPUCAPACITY)
+/* Domain members share CPU capacity */
+SD_FLAG(SD_SHARE_CPUCAPACITY)
+/* Domain members share CPU pkg resources */
+SD_FLAG(SD_SHARE_PKG_RESOURCES)
+/* Only a single load balancing instance */
+SD_FLAG(SD_SERIALIZE)
+/* Place busy groups earlier in the domain */
+SD_FLAG(SD_ASYM_PACKING)
+/* Prefer to place tasks in a sibling domain */
+SD_FLAG(SD_PREFER_SIBLING)
+/* sched_domains of this level overlap */
+SD_FLAG(SD_OVERLAP)
+/* cross-node balancing */
+SD_FLAG(SD_NUMA)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 6ec7d7c1d1e3..3e41c0401b5f 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -11,19 +11,19 @@
*/
#ifdef CONFIG_SMP

-#define SD_BALANCE_NEWIDLE 0x0001 /* Balance when about to become idle */
-#define SD_BALANCE_EXEC 0x0002 /* Balance on exec */
-#define SD_BALANCE_FORK 0x0004 /* Balance on fork, clone */
-#define SD_BALANCE_WAKE 0x0008 /* Balance on wakeup */
-#define SD_WAKE_AFFINE 0x0010 /* Wake task to waking CPU */
-#define SD_ASYM_CPUCAPACITY 0x0020 /* Domain members have different CPU capacities */
-#define SD_SHARE_CPUCAPACITY 0x0040 /* Domain members share CPU capacity */
-#define SD_SHARE_PKG_RESOURCES 0x0080 /* Domain members share CPU pkg resources */
-#define SD_SERIALIZE 0x0100 /* Only a single load balancing instance */
-#define SD_ASYM_PACKING 0x0200 /* Place busy groups earlier in the domain */
-#define SD_PREFER_SIBLING 0x0400 /* Prefer to place tasks in a sibling domain */
-#define SD_OVERLAP 0x0800 /* sched_domains of this level overlap */
-#define SD_NUMA 0x1000 /* cross-node balancing */
+/* Generate SD flag indexes */
+#define SD_FLAG(name) __##name,
+enum {
+ #include <linux/sched/sd_flags.h>
+ __SD_FLAG_CNT,
+};
+#undef SD_FLAG
+/* Generate SD flag bits */
+#define SD_FLAG(name) name = 1 << __##name,
+enum {
+ #include <linux/sched/sd_flags.h>
+};
+#undef SD_FLAG

#ifdef CONFIG_SCHED_SMT
static inline int cpu_smt_flags(void)
--
2.27.0

2020-08-12 12:54:30

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 04/17] sched/topology: Define and assign sched_domain flag metadata

There are some expectations regarding how sched domain flags should be laid
out, but none of them are checked or asserted in
sched_domain_debug_one(). After staring at said flags for a while, I've
come to realize there's two repeating patterns:

- Shared with children: those flags are set from the base CPU domain
upwards. Any domain that has it set will have it set in its children. It
hints at "some property holds true / some behaviour is enabled until this
level".

- Shared with parents: those flags are set from the topmost domain
downwards. Any domain that has it set will have it set in its parents. It
hints at "some property isn't visible / some behaviour is disabled until
this level".

There are two outliers that (currently) do not map to either of these:

o SD_PREFER_SIBLING, which is cleared below levels with
SD_ASYM_CPUCAPACITY. The change was introduced by commit

9c63e84db29b ("sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains")

as it could break misfit migration on some systems. In light of this, we
might want to change it back to make it fit one of the two categories and
fix the issue another way.

o SD_ASYM_CPUCAPACITY, which gets set on a single level and isn't
propagated up nor down. From a topology description point of view, it
really wants to be SDF_SHARED_PARENT; this will be rectified in a later
patch.

Tweak the sched_domain flag declaration to assign each flag an expected
layout, and include the rationale for each flag "meta type" assignment as a
comment. Consolidate the flag metadata into an array; the index of a flag's
metadata can easily be found with log2(flag), IOW __ffs(flag).

Reviewed-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 147 +++++++++++++++++++++++++++------
include/linux/sched/topology.h | 15 +++-
2 files changed, 134 insertions(+), 28 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 5a74751a1a83..ea0ec1a33da4 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -7,29 +7,124 @@
#error "Incorrect import of SD flags definitions"
#endif

-/* Balance when about to become idle */
-SD_FLAG(SD_BALANCE_NEWIDLE)
-/* Balance on exec */
-SD_FLAG(SD_BALANCE_EXEC)
-/* Balance on fork, clone */
-SD_FLAG(SD_BALANCE_FORK)
-/* Balance on wakeup */
-SD_FLAG(SD_BALANCE_WAKE)
-/* Wake task to waking CPU */
-SD_FLAG(SD_WAKE_AFFINE)
-/* Domain members have different CPU capacities */
-SD_FLAG(SD_ASYM_CPUCAPACITY)
-/* Domain members share CPU capacity */
-SD_FLAG(SD_SHARE_CPUCAPACITY)
-/* Domain members share CPU pkg resources */
-SD_FLAG(SD_SHARE_PKG_RESOURCES)
-/* Only a single load balancing instance */
-SD_FLAG(SD_SERIALIZE)
-/* Place busy groups earlier in the domain */
-SD_FLAG(SD_ASYM_PACKING)
-/* Prefer to place tasks in a sibling domain */
-SD_FLAG(SD_PREFER_SIBLING)
-/* sched_domains of this level overlap */
-SD_FLAG(SD_OVERLAP)
-/* cross-node balancing */
-SD_FLAG(SD_NUMA)
+/*
+ * Expected flag uses
+ *
+ * SHARED_CHILD: These flags are meant to be set from the base domain upwards.
+ * If a domain has this flag set, all of its children should have it set. This
+ * is usually because the flag describes some shared resource (all CPUs in that
+ * domain share the same resource), or because they are tied to a scheduling
+ * behaviour that we want to disable at some point in the hierarchy for
+ * scalability reasons.
+ *
+ * In those cases it doesn't make sense to have the flag set for a domain but
+ * not have it in (some of) its children: sched domains ALWAYS span their child
+ * domains, so operations done with parent domains will cover CPUs in the lower
+ * child domains.
+ *
+ *
+ * SHARED_PARENT: These flags are meant to be set from the highest domain
+ * downwards. If a domain has this flag set, all of its parents should have it
+ * set. This is usually for topology properties that start to appear above a
+ * certain level (e.g. domain starts spanning CPUs outside of the base CPU's
+ * socket).
+ */
+#define SDF_SHARED_CHILD 0x1
+#define SDF_SHARED_PARENT 0x2
+
+/*
+ * Balance when about to become idle
+ *
+ * SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ */
+SD_FLAG(SD_BALANCE_NEWIDLE, SDF_SHARED_CHILD)
+
+/*
+ * Balance on exec
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_BALANCE_EXEC, SDF_SHARED_CHILD)
+
+/*
+ * Balance on fork, clone
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_BALANCE_FORK, SDF_SHARED_CHILD)
+
+/*
+ * Balance on wakeup
+ *
+ * SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ */
+SD_FLAG(SD_BALANCE_WAKE, SDF_SHARED_CHILD)
+
+/*
+ * Consider waking task on waking CPU.
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_WAKE_AFFINE, SDF_SHARED_CHILD)
+
+/*
+ * Domain members have different CPU capacities
+ */
+SD_FLAG(SD_ASYM_CPUCAPACITY, 0)
+
+/*
+ * Domain members share CPU capacity (i.e. SMT)
+ *
+ * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
+ * CPU capacity.
+ */
+SD_FLAG(SD_SHARE_CPUCAPACITY, SDF_SHARED_CHILD)
+
+/*
+ * Domain members share CPU package resources (i.e. caches)
+ *
+ * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
+ * the same cache(s).
+ */
+SD_FLAG(SD_SHARE_PKG_RESOURCES, SDF_SHARED_CHILD)
+
+/*
+ * Only a single load balancing instance
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE. Could be set from a
+ * different level upwards, but it doesn't change that if a domain has this flag
+ * set, then all of its parents need to have it too (otherwise the serialization
+ * doesn't make sense).
+ */
+SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT)
+
+/*
+ * Place busy tasks earlier in the domain
+ *
+ * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
+ * up, but currently assumed to be set from the base domain upwards (see
+ * update_top_cache_domain()).
+ */
+SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD)
+
+/*
+ * Prefer to place tasks in a sibling domain
+ *
+ * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
+ * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
+ */
+SD_FLAG(SD_PREFER_SIBLING, 0)
+
+/*
+ * sched_groups of this level overlap
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE.
+ */
+SD_FLAG(SD_OVERLAP, SDF_SHARED_PARENT)
+
+/*
+ * Cross-node balancing
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE.
+ */
+SD_FLAG(SD_NUMA, SDF_SHARED_PARENT)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 3e41c0401b5f..32f602ff37a0 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -12,19 +12,30 @@
#ifdef CONFIG_SMP

/* Generate SD flag indexes */
-#define SD_FLAG(name) __##name,
+#define SD_FLAG(name, mflags) __##name,
enum {
#include <linux/sched/sd_flags.h>
__SD_FLAG_CNT,
};
#undef SD_FLAG
/* Generate SD flag bits */
-#define SD_FLAG(name) name = 1 << __##name,
+#define SD_FLAG(name, mflags) name = 1 << __##name,
enum {
#include <linux/sched/sd_flags.h>
};
#undef SD_FLAG

+#ifdef CONFIG_SCHED_DEBUG
+#define SD_FLAG(_name, mflags) [__##_name] = { .meta_flags = mflags, .name = #_name },
+static const struct {
+ unsigned int meta_flags;
+ char *name;
+} sd_flag_debug[] = {
+#include <linux/sched/sd_flags.h>
+};
+#undef SD_FLAG
+#endif
+
#ifdef CONFIG_SCHED_SMT
static inline int cpu_smt_flags(void)
{
--
2.27.0

2020-08-12 12:54:30

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 02/17] ARM: Revert back to default scheduler topology.

The ARM-specific GMC level is meant to be built using the thread sibling
mask, but no devicetree in arch/arm/boot/dts uses the 'thread' cpu-map
binding. With SD_SHARE_POWERDOMAIN gone, this topology level can be
removed, at which point ARM no longer benefits from having a custom defined
topology table.

Delete the GMC topology level by making ARM use the default scheduler
topology table. This essentially reverts commit

fb2aa85564f4 ("sched, ARM: Create a dedicated scheduler topology table")

Cc: Russell King <[email protected]>
Suggested-by: Dietmar Eggemann <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
arch/arm/kernel/topology.c | 26 --------------------------
1 file changed, 26 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 353f3ee660e4..ef0058de432b 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -177,15 +177,6 @@ static inline void parse_dt_topology(void) {}
static inline void update_cpu_capacity(unsigned int cpuid) {}
#endif

-/*
- * The current assumption is that we can power gate each core independently.
- * This will be superseded by DT binding once available.
- */
-const struct cpumask *cpu_corepower_mask(int cpu)
-{
- return &cpu_topology[cpu].thread_sibling;
-}
-
/*
* store_cpu_topology is called at boot when only one cpu is running
* and with the mutex cpu_hotplug.lock locked, when several cpus have booted,
@@ -241,20 +232,6 @@ void store_cpu_topology(unsigned int cpuid)
update_siblings_masks(cpuid);
}

-static inline int cpu_corepower_flags(void)
-{
- return SD_SHARE_PKG_RESOURCES;
-}
-
-static struct sched_domain_topology_level arm_topology[] = {
-#ifdef CONFIG_SCHED_MC
- { cpu_corepower_mask, cpu_corepower_flags, SD_INIT_NAME(GMC) },
- { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
-#endif
- { cpu_cpu_mask, SD_INIT_NAME(DIE) },
- { NULL, },
-};
-
/*
* init_cpu_topology is called at boot when only one cpu is running
* which prevent simultaneous write access to cpu_topology array
@@ -265,7 +242,4 @@ void __init init_cpu_topology(void)
smp_wmb();

parse_dt_topology();
-
- /* Set scheduler topology descriptor */
- set_sched_topology(arm_topology);
}
--
2.27.0

2020-08-12 12:54:35

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 08/17] sched/topology: Use prebuilt SD flag degeneration mask

Leverage SD_DEGENERATE_GROUPS_MASK in sd_degenerate() and
sd_parent_degenerate().

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/topology.c | 20 ++++----------------
1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index f128fcf46a41..5f2bc99ff659 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -160,15 +160,9 @@ static int sd_degenerate(struct sched_domain *sd)
return 1;

/* Following flags need at least 2 groups */
- if (sd->flags & (SD_BALANCE_NEWIDLE |
- SD_BALANCE_FORK |
- SD_BALANCE_EXEC |
- SD_SHARE_CPUCAPACITY |
- SD_ASYM_CPUCAPACITY |
- SD_SHARE_PKG_RESOURCES)) {
- if (sd->groups != sd->groups->next)
- return 0;
- }
+ if ((sd->flags & SD_DEGENERATE_GROUPS_MASK) &&
+ (sd->groups != sd->groups->next))
+ return 0;

/* Following flags don't use groups */
if (sd->flags & (SD_WAKE_AFFINE))
@@ -190,13 +184,7 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)

/* Flags needing groups don't count if only 1 group in parent */
if (parent->groups == parent->groups->next) {
- pflags &= ~(SD_BALANCE_NEWIDLE |
- SD_BALANCE_FORK |
- SD_BALANCE_EXEC |
- SD_ASYM_CPUCAPACITY |
- SD_SHARE_CPUCAPACITY |
- SD_SHARE_PKG_RESOURCES |
- SD_PREFER_SIBLING);
+ pflags &= ~(SD_DEGENERATE_GROUPS_MASK | SD_PREFER_SIBLING);
if (nr_node_ids == 1)
pflags &= ~SD_SERIALIZE;
}
--
2.27.0

2020-08-12 12:54:39

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 10/17] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards

We currently set this flag *only* on domains whose topology level exactly
match the level where we detect asymmetry (as returned by
asym_cpu_capacity_level()). This is rather problematic.

Say there are two clusters in the system, one with a lone big CPU and the
other with a mix of big and LITTLE CPUs (as is allowed by DynamIQ):

DIE [ ]
MC [ ][ ]
0 1 2 3 4
L L B B B

asym_cpu_capacity_level() will figure out that the MC level is the one
where all CPUs can see a CPU of max capacity, and we will thus set
SD_ASYM_CPUCAPACITY at MC level for all CPUs.

That lone big CPU will degenerate its MC domain, since it would be alone in
there, and will end up with just a DIE domain. Since the flag was only set
at MC, this CPU ends up not seeing any SD with the flag set, which is
broken.

Rather than clearing dflags at every topology level, clear it before
entering the topology level loop. This will properly propagate upwards
flags that are set starting from a certain level.

Reviewed-by: Quentin Perret <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 4 +++-
kernel/sched/topology.c | 3 +--
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 21a43ad6f26a..4f07b405564e 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -83,9 +83,11 @@ SD_FLAG(SD_WAKE_AFFINE, SDF_SHARED_CHILD)
/*
* Domain members have different CPU capacities
*
+ * SHARED_PARENT: Set from the topmost domain down to the first domain where
+ * asymmetry is detected.
* NEEDS_GROUPS: Per-CPU capacity is asymmetric between groups.
*/
-SD_FLAG(SD_ASYM_CPUCAPACITY, SDF_NEEDS_GROUPS)
+SD_FLAG(SD_ASYM_CPUCAPACITY, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)

/*
* Domain members share CPU capacity (i.e. SMT)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 00ad7cef2ec1..02fd8db747b2 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1988,11 +1988,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
/* Set up domains for CPUs specified by the cpu_map: */
for_each_cpu(i, cpu_map) {
struct sched_domain_topology_level *tl;
+ int dflags = 0;

sd = NULL;
for_each_sd_topology(tl) {
- int dflags = 0;
-
if (tl == tl_asym) {
dflags |= SD_ASYM_CPUCAPACITY;
has_asym = true;
--
2.27.0

2020-08-12 12:54:54

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 12/17] sched/topology: Mark SD_BALANCE_WAKE as SDF_NEEDS_GROUPS

Even if no mainline topology uses this flag, it is a load balancing flag
just like SD_BALANCE_FORK and requires 2+ groups to have any effect.

Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 75c9749a6e2d..9043d18ce418 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -70,8 +70,9 @@ SD_FLAG(SD_BALANCE_FORK, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
* Balance on wakeup
*
* SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ * NEEDS_GROUPS: Load balancing flag.
*/
-SD_FLAG(SD_BALANCE_WAKE, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_WAKE, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)

/*
* Consider waking task on waking CPU.
--
2.27.0

2020-08-12 12:54:57

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 14/17] sched/topology: Mark SD_ASYM_PACKING as SDF_NEEDS_GROUPS

Being a load-balancing flag, it requires 2+ groups to have any effect.

Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 6eb302528659..f855f80f0052 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -123,10 +123,11 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
* Place busy tasks earlier in the domain
*
* SHARED_CHILD: Usually set on the SMT level. Technically could be set further
- * up, but currently assumed to be set from the base domain upwards (see
- * update_top_cache_domain()).
+ * up, but currently assumed to be set from the base domain
+ * upwards (see update_top_cache_domain()).
+ * NEEDS_GROUPS: Load balancing flag.
*/
-SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD)
+SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)

/*
* Prefer to place tasks in a sibling domain
--
2.27.0

2020-08-12 12:55:06

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 17/17] sched/topology: Expand use of SD_DEGENERATE_GROUPS_MASK to flags not needing groups

All SD flags requiring 2+ sched_group to have any effect are now decorated
with the SDF_NEEDS_GROUPS metaflag. This means we can now use the bitwise
negation of SD_DEGENERATE_MASK in sd_degenerate() instead of explicitly
using SD_WAKE_AFFINE (IOW the only flag without SDF_NEEDS_GROUPS).

From now on, any flag without SDF_NEEDS_GROUPS will be correctly accounted
as a flag not requiring 2+ sched_groups.

Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/topology.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8064f495641b..3bb145ef5abd 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -165,7 +165,7 @@ static int sd_degenerate(struct sched_domain *sd)
return 0;

/* Following flags don't use groups */
- if (sd->flags & (SD_WAKE_AFFINE))
+ if (sd->flags & ~SD_DEGENERATE_GROUPS_MASK)
return 0;

return 1;
--
2.27.0

2020-08-12 12:55:07

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 15/17] sched/topology: Mark SD_OVERLAP as SDF_NEEDS_GROUPS

A sched_domain can only have overlapping sched_groups if it has more than
one group.

Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index f855f80f0052..021b909d3941 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -143,8 +143,9 @@ SD_FLAG(SD_PREFER_SIBLING, SDF_NEEDS_GROUPS)
* sched_groups of this level overlap
*
* SHARED_PARENT: Set for all NUMA levels above NODE.
+ * NEEDS_GROUPS: Overlaps can only exist with more than one group.
*/
-SD_FLAG(SD_OVERLAP, SDF_SHARED_PARENT)
+SD_FLAG(SD_OVERLAP, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)

/*
* Cross-node balancing
--
2.27.0

2020-08-12 12:55:24

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 11/17] sched/topology: Mark SD_PREFER_SIBLING as SDF_NEEDS_GROUPS

SD_PREFER_SIBLING is currently considered in sd_parent_degenerate() but not
in sd_degenerate(). It too hinges on load balancing, and thus won't have
any effect when set on a domain with a single group. Add it to
SD_DEGENERATE_GROUPS_MASK.

Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 4 +++-
kernel/sched/topology.c | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 4f07b405564e..75c9749a6e2d 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -131,8 +131,10 @@ SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD)
*
* Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
* flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
+ *
+ * NEEDS_GROUPS: Load balancing flag.
*/
-SD_FLAG(SD_PREFER_SIBLING, 0)
+SD_FLAG(SD_PREFER_SIBLING, SDF_NEEDS_GROUPS)

/*
* sched_groups of this level overlap
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 02fd8db747b2..8064f495641b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -184,7 +184,7 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)

/* Flags needing groups don't count if only 1 group in parent */
if (parent->groups == parent->groups->next)
- pflags &= ~(SD_DEGENERATE_GROUPS_MASK | SD_PREFER_SIBLING);
+ pflags &= ~SD_DEGENERATE_GROUPS_MASK;

if (~cflags & pflags)
return 0;
--
2.27.0

2020-08-12 12:55:36

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 16/17] sched/topology: Mark SD_NUMA as SDF_NEEDS_GROUPS

There would be no point in preserving a sched_domain with a single group
just because it has this flag set. Add it to SD_DEGENERATE_GROUPS_MASK.

Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 021b909d3941..71675afa55f0 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -151,5 +151,6 @@ SD_FLAG(SD_OVERLAP, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
* Cross-node balancing
*
* SHARED_PARENT: Set for all NUMA levels above NODE.
+ * NEEDS_GROUPS: No point in preserving domain if it has a single group.
*/
-SD_FLAG(SD_NUMA, SDF_SHARED_PARENT)
+SD_FLAG(SD_NUMA, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
--
2.27.0

2020-08-12 12:56:14

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 09/17] sched/topology: Remove SD_SERIALIZE degeneration special case

If there is only a single NUMA node in the system, the only NUMA topology
level that will be generated will be NODE (identity distance), which
doesn't have SD_SERIALIZE.

This means we don't need this special case in sd_parent_degenerate(), as
having the NODE level "naturally" covers it. Thus, remove it.

Suggested-by: Peter Zijlstra <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/topology.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 5f2bc99ff659..00ad7cef2ec1 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -183,11 +183,9 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
return 0;

/* Flags needing groups don't count if only 1 group in parent */
- if (parent->groups == parent->groups->next) {
+ if (parent->groups == parent->groups->next)
pflags &= ~(SD_DEGENERATE_GROUPS_MASK | SD_PREFER_SIBLING);
- if (nr_node_ids == 1)
- pflags &= ~SD_SERIALIZE;
- }
+
if (~cflags & pflags)
return 0;

--
2.27.0

2020-08-12 12:56:16

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 07/17] sched/topology: Introduce SD metaflag for flags needing > 1 groups

In preparation of cleaning up the sd_degenerate*() functions, mark flags
used in sd_degenerate() with the new SDF_NEEDS_GROUPS flag. With this,
build a compile-time mask of those SD flags.

Note that sd_parent_degenerate() uses an extra flag in its mask,
SD_PREFER_SIBLING, which remains singled out for now.

Suggested-by: Peter Zijlstra <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 39 ++++++++++++++++++++++++----------
include/linux/sched/topology.h | 7 ++++++
2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index ea0ec1a33da4..21a43ad6f26a 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -8,7 +8,7 @@
#endif

/*
- * Expected flag uses
+ * Hierarchical metaflags
*
* SHARED_CHILD: These flags are meant to be set from the base domain upwards.
* If a domain has this flag set, all of its children should have it set. This
@@ -29,29 +29,42 @@
* certain level (e.g. domain starts spanning CPUs outside of the base CPU's
* socket).
*/
-#define SDF_SHARED_CHILD 0x1
-#define SDF_SHARED_PARENT 0x2
+#define SDF_SHARED_CHILD 0x1
+#define SDF_SHARED_PARENT 0x2
+
+/*
+ * Behavioural metaflags
+ *
+ * NEEDS_GROUPS: These flags are only relevant if the domain they are set on has
+ * more than one group. This is usually for balancing flags (load balancing
+ * involves equalizing a metric between groups), or for flags describing some
+ * shared resource (which would be shared between groups).
+ */
+#define SDF_NEEDS_GROUPS 0x4

/*
* Balance when about to become idle
*
* SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ * NEEDS_GROUPS: Load balancing flag.
*/
-SD_FLAG(SD_BALANCE_NEWIDLE, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_NEWIDLE, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)

/*
* Balance on exec
*
* SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ * NEEDS_GROUPS: Load balancing flag.
*/
-SD_FLAG(SD_BALANCE_EXEC, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_EXEC, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)

/*
* Balance on fork, clone
*
* SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ * NEEDS_GROUPS: Load balancing flag.
*/
-SD_FLAG(SD_BALANCE_FORK, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_FORK, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)

/*
* Balance on wakeup
@@ -69,24 +82,28 @@ SD_FLAG(SD_WAKE_AFFINE, SDF_SHARED_CHILD)

/*
* Domain members have different CPU capacities
+ *
+ * NEEDS_GROUPS: Per-CPU capacity is asymmetric between groups.
*/
-SD_FLAG(SD_ASYM_CPUCAPACITY, 0)
+SD_FLAG(SD_ASYM_CPUCAPACITY, SDF_NEEDS_GROUPS)

/*
* Domain members share CPU capacity (i.e. SMT)
*
* SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
- * CPU capacity.
+ * CPU capacity.
+ * NEEDS_GROUPS: Capacity is shared between groups.
*/
-SD_FLAG(SD_SHARE_CPUCAPACITY, SDF_SHARED_CHILD)
+SD_FLAG(SD_SHARE_CPUCAPACITY, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)

/*
* Domain members share CPU package resources (i.e. caches)
*
* SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
- * the same cache(s).
+ * the same cache(s).
+ * NEEDS_GROUPS: Caches are shared between groups.
*/
-SD_FLAG(SD_SHARE_PKG_RESOURCES, SDF_SHARED_CHILD)
+SD_FLAG(SD_SHARE_PKG_RESOURCES, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)

/*
* Only a single load balancing instance
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 32f602ff37a0..2d59ca77103e 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -25,6 +25,13 @@ enum {
};
#undef SD_FLAG

+/* Generate a mask of SD flags with the SDF_NEEDS_GROUPS metaflag */
+#define SD_FLAG(name, mflags) (name * !!((mflags) & SDF_NEEDS_GROUPS)) |
+static const unsigned int SD_DEGENERATE_GROUPS_MASK =
+#include <linux/sched/sd_flags.h>
+0;
+#undef SD_FLAG
+
#ifdef CONFIG_SCHED_DEBUG
#define SD_FLAG(_name, mflags) [__##_name] = { .meta_flags = mflags, .name = #_name },
static const struct {
--
2.27.0

2020-08-12 12:56:26

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 06/17] sched/debug: Output SD flag names rather than their values

Decoding the output of /proc/sys/kernel/sched_domain/cpu*/domain*/flags has
always been somewhat annoying, as one needs to go fetch the bit -> name
mapping from the source code itself. This encoding can be saved in a script
somewhere, but that isn't safe from flags being added, removed or even
shuffled around.

What matters for debugging purposes is to get *which* flags are set in a
given domain, their associated value is pretty much meaningless.

Make the sd flags debug file output flag names.

Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/debug.c | 53 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 36c54265bb2b..e9e036a41c55 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -245,6 +245,57 @@ set_table_entry(struct ctl_table *entry,
entry->proc_handler = proc_handler;
}

+static int sd_ctl_doflags(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ unsigned long flags = *(unsigned long *)table->data;
+ size_t data_size = 0;
+ size_t len = 0;
+ char *tmp;
+ int idx;
+
+ if (write)
+ return 0;
+
+ for_each_set_bit(idx, &flags, __SD_FLAG_CNT) {
+ char *name = sd_flag_debug[idx].name;
+
+ /* Name plus whitespace */
+ data_size += strlen(name) + 1;
+ }
+
+ if (*ppos > data_size) {
+ *lenp = 0;
+ return 0;
+ }
+
+ tmp = kcalloc(data_size + 1, sizeof(tmp), GFP_KERNEL);
+ for_each_set_bit(idx, &flags, __SD_FLAG_CNT) {
+ char *name = sd_flag_debug[idx].name;
+
+ len += snprintf(tmp + len, strlen(name) + 2, "%s ", name);
+ }
+
+ tmp += *ppos;
+ len -= *ppos;
+
+ if (len > *lenp)
+ len = *lenp;
+ if (len)
+ memcpy(buffer, tmp, len);
+ if (len < *lenp) {
+ ((char *)buffer)[len] = '\n';
+ len++;
+ }
+
+ *lenp = len;
+ *ppos += len;
+
+ kfree(tmp);
+
+ return 0;
+}
+
static struct ctl_table *
sd_alloc_ctl_domain_table(struct sched_domain *sd)
{
@@ -258,7 +309,7 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd)
set_table_entry(&table[2], "busy_factor", &sd->busy_factor, sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry(&table[3], "imbalance_pct", &sd->imbalance_pct, sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry(&table[4], "cache_nice_tries", &sd->cache_nice_tries, sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[5], "flags", &sd->flags, sizeof(int), 0444, proc_dointvec_minmax);
+ set_table_entry(&table[5], "flags", &sd->flags, sizeof(int), 0444, sd_ctl_doflags);
set_table_entry(&table[6], "max_newidle_lb_cost", &sd->max_newidle_lb_cost, sizeof(long), 0644, proc_doulongvec_minmax);
set_table_entry(&table[7], "name", sd->name, CORENAME_MAX_SIZE, 0444, proc_dostring);
/* &table[8] is terminator */
--
2.27.0

2020-08-12 12:57:57

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 13/17] sched/topology: Mark SD_SERIALIZE as SDF_NEEDS_GROUPS

There would be no point in preserving a sched_domain with a single group
just because it has this flag set. Add it to SD_DEGENERATE_GROUPS_MASK.

Signed-off-by: Valentin Schneider <[email protected]>
---
include/linux/sched/sd_flags.h | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 9043d18ce418..6eb302528659 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -112,11 +112,12 @@ SD_FLAG(SD_SHARE_PKG_RESOURCES, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
* Only a single load balancing instance
*
* SHARED_PARENT: Set for all NUMA levels above NODE. Could be set from a
- * different level upwards, but it doesn't change that if a domain has this flag
- * set, then all of its parents need to have it too (otherwise the serialization
- * doesn't make sense).
+ * different level upwards, but it doesn't change that if a
+ * domain has this flag set, then all of its parents need to have
+ * it too (otherwise the serialization doesn't make sense).
+ * NEEDS_GROUPS: No point in preserving domain if it has a single group.
*/
-SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT)
+SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)

/*
* Place busy tasks earlier in the domain
--
2.27.0

2020-08-12 12:58:16

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH v5 05/17] sched/topology: Verify SD_* flags setup when sched_debug is on

Now that we have some description of what we expect the flags layout to
be, we can use that to assert at runtime that the actual layout is sane.

Reviewed-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/topology.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 865fff3ef20a..f128fcf46a41 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -29,6 +29,8 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
struct cpumask *groupmask)
{
struct sched_group *group = sd->groups;
+ unsigned long flags = sd->flags;
+ unsigned int idx;

cpumask_clear(groupmask);

@@ -43,6 +45,21 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
printk(KERN_ERR "ERROR: domain->groups does not contain CPU%d\n", cpu);
}

+ for_each_set_bit(idx, &flags, __SD_FLAG_CNT) {
+ unsigned int flag = BIT(idx);
+ unsigned int meta_flags = sd_flag_debug[idx].meta_flags;
+
+ if ((meta_flags & SDF_SHARED_CHILD) && sd->child &&
+ !(sd->child->flags & flag))
+ printk(KERN_ERR "ERROR: flag %s set here but not in child\n",
+ sd_flag_debug[idx].name);
+
+ if ((meta_flags & SDF_SHARED_PARENT) && sd->parent &&
+ !(sd->parent->flags & flag))
+ printk(KERN_ERR "ERROR: flag %s set here but not in parent\n",
+ sd_flag_debug[idx].name);
+ }
+
printk(KERN_DEBUG "%*s groups:", level + 1, "");
do {
if (!group) {
--
2.27.0

2020-08-12 17:07:47

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v5 06/17] sched/debug: Output SD flag names rather than their values

Hi Valentin,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on tip/auto-latest linux/master linus/master v5.8 next-20200812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Valentin-Schneider/sched-Instrument-sched-domain-flags/20200812-205638
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 949bcb8135a96a6923e676646bd29cbe69e8350f
config: i386-randconfig-s001-20200811 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.2-168-g9554805c-dirty
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


sparse warnings: (new ones prefixed by >>)

kernel/sched/debug.c:327:9: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct sched_domain *[assigned] sd @@ got struct sched_domain [noderef] __rcu *parent @@
kernel/sched/debug.c:327:9: sparse: expected struct sched_domain *[assigned] sd
kernel/sched/debug.c:327:9: sparse: got struct sched_domain [noderef] __rcu *parent
kernel/sched/debug.c:334:9: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct sched_domain *[assigned] sd @@ got struct sched_domain [noderef] __rcu *parent @@
kernel/sched/debug.c:334:9: sparse: expected struct sched_domain *[assigned] sd
kernel/sched/debug.c:334:9: sparse: got struct sched_domain [noderef] __rcu *parent
kernel/sched/debug.c:486:22: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/debug.c:486:22: sparse: struct task_struct [noderef] __rcu *
kernel/sched/debug.c:486:22: sparse: struct task_struct *
kernel/sched/debug.c:694:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct task_struct *tsk @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/debug.c:694:9: sparse: expected struct task_struct *tsk
kernel/sched/debug.c:694:9: sparse: got struct task_struct [noderef] __rcu *curr
kernel/sched/debug.c:694:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct task_struct *tsk @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/debug.c:694:9: sparse: expected struct task_struct *tsk
kernel/sched/debug.c:694:9: sparse: got struct task_struct [noderef] __rcu *curr
>> kernel/sched/debug.c:279:17: sparse: sparse: non size-preserving pointer to integer cast
>> kernel/sched/debug.c:279:17: sparse: sparse: non size-preserving integer to pointer cast

vim +279 kernel/sched/debug.c

247
248 static int sd_ctl_doflags(struct ctl_table *table, int write,
249 void *buffer, size_t *lenp, loff_t *ppos)
250 {
251 unsigned long flags = *(unsigned long *)table->data;
252 size_t data_size = 0;
253 size_t len = 0;
254 char *tmp;
255 int idx;
256
257 if (write)
258 return 0;
259
260 for_each_set_bit(idx, &flags, __SD_FLAG_CNT) {
261 char *name = sd_flag_debug[idx].name;
262
263 /* Name plus whitespace */
264 data_size += strlen(name) + 1;
265 }
266
267 if (*ppos > data_size) {
268 *lenp = 0;
269 return 0;
270 }
271
272 tmp = kcalloc(data_size + 1, sizeof(tmp), GFP_KERNEL);
273 for_each_set_bit(idx, &flags, __SD_FLAG_CNT) {
274 char *name = sd_flag_debug[idx].name;
275
276 len += snprintf(tmp + len, strlen(name) + 2, "%s ", name);
277 }
278
> 279 tmp += *ppos;
280 len -= *ppos;
281
282 if (len > *lenp)
283 len = *lenp;
284 if (len)
285 memcpy(buffer, tmp, len);
286 if (len < *lenp) {
287 ((char *)buffer)[len] = '\n';
288 len++;
289 }
290
291 *lenp = len;
292 *ppos += len;
293
294 kfree(tmp);
295
296 return 0;
297 }
298

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (4.46 kB)
.config.gz (34.56 kB)
Download all attachments

2020-08-12 18:02:00

by kernel test robot

[permalink] [raw]
Subject: [PATCH] sched/debug: fix noderef.cocci warnings

From: kernel test robot <[email protected]>

kernel/sched/debug.c:272:30-36: ERROR: application of sizeof to pointer

sizeof when applied to a pointer typed expression gives the size of
the pointer

Generated by: scripts/coccinelle/misc/noderef.cocci

CC: Valentin Schneider <[email protected]>
Signed-off-by: kernel test robot <[email protected]>
---

url: https://github.com/0day-ci/linux/commits/Valentin-Schneider/sched-Instrument-sched-domain-flags/20200812-205638
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 949bcb8135a96a6923e676646bd29cbe69e8350f

debug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -269,7 +269,7 @@ static int sd_ctl_doflags(struct ctl_tab
return 0;
}

- tmp = kcalloc(data_size + 1, sizeof(tmp), GFP_KERNEL);
+ tmp = kcalloc(data_size + 1, sizeof(*tmp), GFP_KERNEL);
for_each_set_bit(idx, &flags, __SD_FLAG_CNT) {
char *name = sd_flag_debug[idx].name;

2020-08-12 18:05:35

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v5 06/17] sched/debug: Output SD flag names rather than their values

Hi Valentin,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on tip/auto-latest linux/master linus/master v5.8 next-20200812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Valentin-Schneider/sched-Instrument-sched-domain-flags/20200812-205638
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 949bcb8135a96a6923e676646bd29cbe69e8350f
config: riscv-randconfig-c003-20200811 (attached as .config)
compiler: riscv64-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


coccinelle warnings: (new ones prefixed by >>)

>> kernel/sched/debug.c:272:30-36: ERROR: application of sizeof to pointer

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (1.12 kB)
.config.gz (22.03 kB)
Download all attachments

2020-08-12 18:52:43

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v5 06/17] sched/debug: Output SD flag names rather than their values


On 12/08/20 17:35, kernel test robot wrote:
> Hi Valentin,
>
> Thank you for the patch! Perhaps something to improve:
>

[...]


> url: https://github.com/0day-ci/linux/commits/Valentin-Schneider/sched-Instrument-sched-domain-flags/20200812-205638
> base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 949bcb8135a96a6923e676646bd29cbe69e8350f
> config: i386-randconfig-s001-20200811 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
> reproduce:
> # apt-get install sparse
> # sparse version: v0.6.2-168-g9554805c-dirty
> # save the attached .config to linux build tree
> make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386
>

>>> kernel/sched/debug.c:279:17: sparse: sparse: non size-preserving pointer to integer cast
>>> kernel/sched/debug.c:279:17: sparse: sparse: non size-preserving integer to pointer cast
>
> 271
> 272 tmp = kcalloc(data_size + 1, sizeof(tmp), GFP_KERNEL);
> 273 for_each_set_bit(idx, &flags, __SD_FLAG_CNT) {
> 274 char *name = sd_flag_debug[idx].name;
> 275
> 276 len += snprintf(tmp + len, strlen(name) + 2, "%s ", name);
> 277 }
> 278
> > 279 tmp += *ppos;

I pretty much copied kernel/sysctl.c::_proc_do_string() and I think that's
exactly the same types here: char* buffer incremented by loff_t offset. It
does look fine to me, but I can't really parse that warning.

> 280 len -= *ppos;
> 281
> 282 if (len > *lenp)
> 283 len = *lenp;
> 284 if (len)
> 285 memcpy(buffer, tmp, len);
> 286 if (len < *lenp) {
> 287 ((char *)buffer)[len] = '\n';
> 288 len++;
> 289 }
> 290
> 291 *lenp = len;
> 292 *ppos += len;
> 293
> 294 kfree(tmp);
> 295
> 296 return 0;
> 297 }
> 298
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/[email protected]

2020-08-12 18:53:18

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/debug: fix noderef.cocci warnings


On 12/08/20 18:59, kernel test robot wrote:
> From: kernel test robot <[email protected]>
>
> kernel/sched/debug.c:272:30-36: ERROR: application of sizeof to pointer
>
> sizeof when applied to a pointer typed expression gives the size of
> the pointer
>
> Generated by: scripts/coccinelle/misc/noderef.cocci
>
> CC: Valentin Schneider <[email protected]>
> Signed-off-by: kernel test robot <[email protected]>
> ---
>
> url: https://github.com/0day-ci/linux/commits/Valentin-Schneider/sched-Instrument-sched-domain-flags/20200812-205638
> base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 949bcb8135a96a6923e676646bd29cbe69e8350f
>
> debug.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -269,7 +269,7 @@ static int sd_ctl_doflags(struct ctl_tab
> return 0;
> }
>
> - tmp = kcalloc(data_size + 1, sizeof(tmp), GFP_KERNEL);
> + tmp = kcalloc(data_size + 1, sizeof(*tmp), GFP_KERNEL);

Praised be coccinelle for rubbing in my face that I can't write code; also
I'm not even checking if the allocation succeeded which is clearly daft,
even if this is debug stuff. I'll blame the heat and try to move on...

> for_each_set_bit(idx, &flags, __SD_FLAG_CNT) {
> char *name = sd_flag_debug[idx].name;
>

2020-08-13 12:06:41

by Luc Van Oostenryck

[permalink] [raw]
Subject: Re: [PATCH v5 06/17] sched/debug: Output SD flag names rather than their values

On Wed, Aug 12, 2020 at 07:51:08PM +0100, Valentin Schneider wrote:
> On 12/08/20 17:35, kernel test robot wrote:
>
> > config: i386-randconfig-s001-20200811 (attached as .config)
> > reproduce:
> > # sparse version: v0.6.2-168-g9554805c-dirty
> > make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386
> >
> >>> kernel/sched/debug.c:279:17: sparse: sparse: non size-preserving pointer to integer cast
> >>> kernel/sched/debug.c:279:17: sparse: sparse: non size-preserving integer to pointer cast
> >
> > > 279 tmp += *ppos;
>
> I pretty much copied kernel/sysctl.c::_proc_do_string() and I think that's
> exactly the same types here: char* buffer incremented by loff_t offset. It
> does look fine to me, but I can't really parse that warning.

The warnings mean that there is a cast from a pointer to an integer with
a size other than the size of a pointer and the other way around.

I's indeed the case here, on i386, where pointers are 32-bit and loff_t
is 64-bit. But yes, I agree:
1) these messages are far from clear
2) these casts are internal and are probably not appropriate here.

I'll look later what can be done at sparse level.

Regards,
-- Luc

2020-08-13 13:16:12

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v5 06/17] sched/debug: Output SD flag names rather than their values

On 13/08/2020 13:02, Luc Van Oostenryck wrote:
> On Wed, Aug 12, 2020 at 07:51:08PM +0100, Valentin Schneider wrote:
>> On 12/08/20 17:35, kernel test robot wrote:
>>
>>> config: i386-randconfig-s001-20200811 (attached as .config)
>>> reproduce:
>>> # sparse version: v0.6.2-168-g9554805c-dirty
>>> make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386
>>>
>>>>> kernel/sched/debug.c:279:17: sparse: sparse: non size-preserving pointer to integer cast
>>>>> kernel/sched/debug.c:279:17: sparse: sparse: non size-preserving integer to pointer cast
>>>
>>> > 279 tmp += *ppos;
>>
>> I pretty much copied kernel/sysctl.c::_proc_do_string() and I think that's
>> exactly the same types here: char* buffer incremented by loff_t offset. It
>> does look fine to me, but I can't really parse that warning.
>
> The warnings mean that there is a cast from a pointer to an integer with
> a size other than the size of a pointer and the other way around.
>
> I's indeed the case here, on i386, where pointers are 32-bit and loff_t
> is 64-bit. But yes, I agree:
> 1) these messages are far from clear
> 2) these casts are internal and are probably not appropriate here.
>
> I'll look later what can be done at sparse level.
>

Thanks!

> Regards,
> -- Luc
>

2020-08-13 19:17:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v5 02/17] ARM: Revert back to default scheduler topology.


* Valentin Schneider <[email protected]> wrote:

> The ARM-specific GMC level is meant to be built using the thread sibling
> mask, but no devicetree in arch/arm/boot/dts uses the 'thread' cpu-map
> binding. With SD_SHARE_POWERDOMAIN gone, this topology level can be
> removed, at which point ARM no longer benefits from having a custom defined
> topology table.
>
> Delete the GMC topology level by making ARM use the default scheduler
> topology table. This essentially reverts commit
>
> fb2aa85564f4 ("sched, ARM: Create a dedicated scheduler topology table")
>
> Cc: Russell King <[email protected]>
> Suggested-by: Dietmar Eggemann <[email protected]>
> Reviewed-by: Dietmar Eggemann <[email protected]>
> Signed-off-by: Valentin Schneider <[email protected]>

Minor changelog nit, it's helpful to add this final sentence:

No change in functionality is expected.

( If indeed no change in functionality is expected. ;-)

Thanks,

Ingo

2020-08-13 22:30:17

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v5 02/17] ARM: Revert back to default scheduler topology.


On 13/08/20 20:16, Ingo Molnar wrote:
> * Valentin Schneider <[email protected]> wrote:
>
>> The ARM-specific GMC level is meant to be built using the thread sibling
>> mask, but no devicetree in arch/arm/boot/dts uses the 'thread' cpu-map
>> binding. With SD_SHARE_POWERDOMAIN gone, this topology level can be
>> removed, at which point ARM no longer benefits from having a custom defined
>> topology table.
>>
>> Delete the GMC topology level by making ARM use the default scheduler
>> topology table. This essentially reverts commit
>>
>> fb2aa85564f4 ("sched, ARM: Create a dedicated scheduler topology table")
>>
>> Cc: Russell King <[email protected]>
>> Suggested-by: Dietmar Eggemann <[email protected]>
>> Reviewed-by: Dietmar Eggemann <[email protected]>
>> Signed-off-by: Valentin Schneider <[email protected]>
>
> Minor changelog nit, it's helpful to add this final sentence:
>
> No change in functionality is expected.
>
> ( If indeed no change in functionality is expected. ;-)
>

Right, that's indeed the case here given the GMC domain would always be
degenerated anyway.

> Thanks,
>
> Ingo