Hi,
I got a nice splat while testing out the toggling of
sched_asym_cpucapacity, so this is a cpuset fix plus a topology patch.
Details are in the logs.
v2 changes:
- Use static_branch_{inc,dec} rather than enable/disable
v3 changes:
- New patch: add fix for empty cpumap in sched domain rebuild
- Move static_branch_dec outside of RCU read-side section (Quentin)
v4 changes:
- Patch 1/2: Directly tweak the cpuset array (Dietmar)
- Patch 2/2: Add an example to the changelog (Dietmar)
Cheers,
Valentin
Valentin Schneider (2):
sched/topology: Don't try to build empty sched domains
sched/topology: Allow sched_asym_cpucapacity to be disabled
kernel/cgroup/cpuset.c | 3 ++-
kernel/sched/topology.c | 11 +++++++++--
2 files changed, 11 insertions(+), 3 deletions(-)
--
2.22.0
Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.
This leads to the following splat:
[ 30.618174] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 30.623697] Modules linked in:
[ 30.626731] CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
[ 30.635003] Hardware name: ARM Juno development board (r0) (DT)
[ 30.640877] Workqueue: events cpuset_hotplug_workfn
[ 30.645713] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 30.650464] pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
[ 30.655126] lr : build_sched_domains (kernel/sched/topology.c:1966)
[...]
[ 30.742047] Call trace:
[ 30.744474] build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
[ 30.748793] partition_sched_domains_locked (kernel/sched/topology.c:2250)
[ 30.753971] rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
[ 30.758977] rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
[ 30.763209] cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
[ 30.767613] process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
[ 30.771586] worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
[ 30.775217] kthread (kernel/kthread.c:255)
[ 30.778418] ret_from_fork (arch/arm64/kernel/entry.S:1167)
[ 30.781965] Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)
The faulty line in question is
cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));
and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.
Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.
The above splat was obtained on my Juno r0 with:
cgcreate -g cpuset:asym
cgset -r cpuset.cpus=0-3 asym
cgset -r cpuset.mems=0 asym
cgset -r cpuset.cpu_exclusive=1 asym
cgcreate -g cpuset:smp
cgset -r cpuset.cpus=4-5 smp
cgset -r cpuset.mems=0 smp
cgset -r cpuset.cpu_exclusive=1 smp
cgset -r cpuset.sched_load_balance=0 .
echo 0 > /sys/devices/system/cpu/cpu4/online
echo 0 > /sys/devices/system/cpu/cpu5/online
Cc: <[email protected]>
Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/cgroup/cpuset.c | 3 ++-
kernel/sched/topology.c | 5 ++++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c52bc91f882b..c87ee6412b36 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -798,7 +798,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
continue;
- if (is_sched_load_balance(cp))
+ if (is_sched_load_balance(cp) &&
+ !cpumask_empty(cp->effective_cpus))
csa[csn++] = cp;
/* skip @cp's subtree if not a partition root */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 3623ffe85d18..2e7af755e17a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1945,7 +1945,7 @@ static struct sched_domain_topology_level
static int
build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *attr)
{
- enum s_alloc alloc_state;
+ enum s_alloc alloc_state = sa_none;
struct sched_domain *sd;
struct s_data d;
struct rq *rq = NULL;
@@ -1953,6 +1953,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
struct sched_domain_topology_level *tl_asym;
bool has_asym = false;
+ if (WARN_ON(cpumask_empty(cpu_map)))
+ goto error;
+
alloc_state = __visit_domain_allocation_hell(&d, cpu_map);
if (alloc_state != sa_rootdomain)
goto error;
--
2.22.0
While the static key is correctly initialized as being disabled, it will
remain forever enabled once turned on. This means that if we start with an
asymmetric system and hotplug out enough CPUs to end up with an SMP system,
the static key will remain set - which is obviously wrong. We should detect
this and turn off things like misfit migration and capacity aware wakeups.
As Quentin pointed out, having separate root domains makes this slightly
trickier. We could have exclusive cpusets that create an SMP island - IOW,
the domains within this root domain will not see any asymmetry. This means
we can't just disable the key on domain destruction, we need to count how
many asymmetric root domains we have.
Consider the following example using Juno r0 which is 2+4 big.LITTLE, where
two identical cpusets are created: they both span both big and LITTLE CPUs:
asym0 asym1
[ ][ ]
L L B L L B
cgcreate -g cpuset:asym0
cgset -r cpuset.cpus=0,1,3 asym0
cgset -r cpuset.mems=0 asym0
cgset -r cpuset.cpu_exclusive=1 asym0
cgcreate -g cpuset:asym1
cgset -r cpuset.cpus=2,4,5 asym1
cgset -r cpuset.mems=0 asym1
cgset -r cpuset.cpu_exclusive=1 asym1
cgset -r cpuset.sched_load_balance=0 .
(the CPU numbering may look odd because on the Juno LITTLEs are CPUs 0,3-5
and bigs are CPUs 1-2)
If we make one of those SMP (IOW remove asymmetry) by e.g. hotplugging its
big core, we would end up with an SMP cpuset and an asymmetric cpuset - the
static key must remain set, because we still have one asymmetric root domain.
With the above example, this could be done with:
echo 0 > /sys/devices/system/cpu/cpu2/online
Which would result in:
asym0 asym1
[ ][ ]
L L B L L
When both SMP and asymmetric cpusets are present, all CPUs will observe
sched_asym_cpucapacity being set (it is system-wide), but not all CPUs
observe asymmetry in their sched domain hierarchy:
per_cpu(sd_asym_cpucapacity, <any CPU in asym0>) == <some SD at DIE level>
per_cpu(sd_asym_cpucapacity, <any CPU in asym1>) == NULL
Change the simple key enablement to an increment, and decrement the key
counter when destroying domains that cover asymmetric CPUs.
Cc: <[email protected]>
Fixes: df054e8445a4 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations")
Reviewed-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/topology.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 2e7af755e17a..6ec1e595b1d4 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2026,7 +2026,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
rcu_read_unlock();
if (has_asym)
- static_branch_enable_cpuslocked(&sched_asym_cpucapacity);
+ static_branch_inc_cpuslocked(&sched_asym_cpucapacity);
if (rq && sched_debug_enabled) {
pr_info("root domain span: %*pbl (max cpu_capacity = %lu)\n",
@@ -2121,9 +2121,12 @@ int sched_init_domains(const struct cpumask *cpu_map)
*/
static void detach_destroy_domains(const struct cpumask *cpu_map)
{
+ unsigned int cpu = cpumask_any(cpu_map);
int i;
+ if (rcu_access_pointer(per_cpu(sd_asym_cpucapacity, cpu)))
+ static_branch_dec_cpuslocked(&sched_asym_cpucapacity);
+
rcu_read_lock();
for_each_cpu(i, cpu_map)
cpu_attach_domain(NULL, &def_root_domain, i);
--
2.22.0
On 23/10/2019 17:37, Valentin Schneider wrote:
> Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
> cpuset code feeding empty cpumasks to the sched domain rebuild machinery.
> This leads to the following splat:
[...]
> The faulty line in question is
>
> cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));
>
> and we're not checking the return value against nr_cpu_ids (we shouldn't
> have to!), which leads to the above.
>
> Prevent generate_sched_domains() from returning empty cpumasks, and add
> some assertion in build_sched_domains() to scream bloody murder if it
> happens again.
>
> The above splat was obtained on my Juno r0 with:
>
> cgcreate -g cpuset:asym
> cgset -r cpuset.cpus=0-3 asym
> cgset -r cpuset.mems=0 asym
> cgset -r cpuset.cpu_exclusive=1 asym
>
> cgcreate -g cpuset:smp
> cgset -r cpuset.cpus=4-5 smp
> cgset -r cpuset.mems=0 smp
> cgset -r cpuset.cpu_exclusive=1 smp
>
> cgset -r cpuset.sched_load_balance=0 .
>
> echo 0 > /sys/devices/system/cpu/cpu4/online
> echo 0 > /sys/devices/system/cpu/cpu5/online
>
> Cc: <[email protected]>
> Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Sorry for being picky but IMHO you should also mention that it fixes
f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting
information")
Tested it on a hikey620 (8 CPus SMP) with v5.4-rc4 and a local fix for
asym_cpu_capacity_level().
2 exclusive cpusets [0-3] and [4-7], hp'ing out [0-3] and then hp'ing in
[0] again.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 5a174ae6ecf3..8f83e8e3ea9a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2203,8 +2203,19 @@ void partition_sched_domains_locked(int
ndoms_new, cpumask_var_t doms_new[],
for (i = 0; i < ndoms_cur; i++) {
for (j = 0; j < n && !new_topology; j++) {
if (cpumask_equal(doms_cur[i], doms_new[j]) &&
- dattrs_equal(dattr_cur, i, dattr_new, j))
+ dattrs_equal(dattr_cur, i, dattr_new, j)) {
+ struct root_domain *rd;
+
+ /*
+ * This domain won't be destroyed and as such
+ * its dl_bw->total_bw needs to be cleared. It
+ * will be recomputed in function
+ * update_tasks_root_domain().
+ */
+ rd = cpu_rq(cpumask_any(doms_cur[i]))->rd;
We have an issue here if doms_cur[i] is empty.
+ dl_clear_root_domain(rd);
goto match1;
There is yet another similar issue behind the first one
(asym_cpu_capacity_level()).
342 static bool build_perf_domains(const struct cpumask *cpu_map)
343 {
344 int i, nr_pd = 0, nr_cs = 0, nr_cpus = cpumask_weight(cpu_map);
345 struct perf_domain *pd = NULL, *tmp;
346 int cpu = cpumask_first(cpu_map); <--- !!!
347 struct root_domain *rd = cpu_rq(cpu)->rd; <--- !!!
348 struct cpufreq_policy *policy;
349 struct cpufreq_governor *gov;
...
406 tmp = rd->pd; <--- !!!
Caught when running hikey620 (8 CPus SMP) with v5.4-rc4 and a local fix
for asym_cpu_capacity_level() with CONFIG_ENERGY_MODEL=y.
There might be other places in build_sched_domains() suffering from the
same issue. So I assume it's wise to not call it with an empty cpu_map
and warn if done so.
[...]
On 24/10/2019 17:19, Dietmar Eggemann wrote:
> Sorry for being picky but IMHO you should also mention that it fixes
>
> f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting
> information")
>
I can append the following to the changelog, although I'd like some
feedback from the cgroup folks before doing a respin:
"""
Note that commit
f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information")
introduced a similar issue. Since doms_new is assigned to doms_cur without
any filtering, we can end up with an empty cpumask in the doms_cur array.
The next time we go through a rebuild, this will break on:
rd = cpu_rq(cpumask_any(doms_cur[i]))->rd;
If there wasn't enough already, this is yet another argument for *not*
handing over empty cpumasks to the sched domain rebuild.
"""
I tagged the commit that introduces the static key with Fixes: because it
was introduced earlier - I don't think it would make sense to have two
"Fixes:" lines? In any case, it'll now be listed in the changelog.
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: cd1cb3350561d2bf544ddfef76fbf0b1c9c7178f
Gitweb: https://git.kernel.org/tip/cd1cb3350561d2bf544ddfef76fbf0b1c9c7178f
Author: Valentin Schneider <[email protected]>
AuthorDate: Wed, 23 Oct 2019 16:37:44 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 29 Oct 2019 09:58:45 +01:00
sched/topology: Don't try to build empty sched domains
Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.
This leads to the following splat:
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
Hardware name: ARM Juno development board (r0) (DT)
Workqueue: events cpuset_hotplug_workfn
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
lr : build_sched_domains (kernel/sched/topology.c:1966)
Call trace:
build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
partition_sched_domains_locked (kernel/sched/topology.c:2250)
rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
kthread (kernel/kthread.c:255)
ret_from_fork (arch/arm64/kernel/entry.S:1167)
Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)
The faulty line in question is:
cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));
and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.
Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.
The above splat was obtained on my Juno r0 with the following reproducer:
$ cgcreate -g cpuset:asym
$ cgset -r cpuset.cpus=0-3 asym
$ cgset -r cpuset.mems=0 asym
$ cgset -r cpuset.cpu_exclusive=1 asym
$ cgcreate -g cpuset:smp
$ cgset -r cpuset.cpus=4-5 smp
$ cgset -r cpuset.mems=0 smp
$ cgset -r cpuset.cpu_exclusive=1 smp
$ cgset -r cpuset.sched_load_balance=0 .
$ echo 0 > /sys/devices/system/cpu/cpu4/online
$ echo 0 > /sys/devices/system/cpu/cpu5/online
Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/cgroup/cpuset.c | 3 ++-
kernel/sched/topology.c | 5 ++++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c52bc91..c87ee64 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -798,7 +798,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
continue;
- if (is_sched_load_balance(cp))
+ if (is_sched_load_balance(cp) &&
+ !cpumask_empty(cp->effective_cpus))
csa[csn++] = cp;
/* skip @cp's subtree if not a partition root */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b5667a2..9318acf 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1948,7 +1948,7 @@ next_level:
static int
build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *attr)
{
- enum s_alloc alloc_state;
+ enum s_alloc alloc_state = sa_none;
struct sched_domain *sd;
struct s_data d;
struct rq *rq = NULL;
@@ -1956,6 +1956,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
struct sched_domain_topology_level *tl_asym;
bool has_asym = false;
+ if (WARN_ON(cpumask_empty(cpu_map)))
+ goto error;
+
alloc_state = __visit_domain_allocation_hell(&d, cpu_map);
if (alloc_state != sa_rootdomain)
goto error;
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: e284df705cf1eeedb5ec3a66ed82d17a64659150
Gitweb: https://git.kernel.org/tip/e284df705cf1eeedb5ec3a66ed82d17a64659150
Author: Valentin Schneider <[email protected]>
AuthorDate: Wed, 23 Oct 2019 16:37:45 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 29 Oct 2019 09:58:46 +01:00
sched/topology: Allow sched_asym_cpucapacity to be disabled
While the static key is correctly initialized as being disabled, it will
remain forever enabled once turned on. This means that if we start with an
asymmetric system and hotplug out enough CPUs to end up with an SMP system,
the static key will remain set - which is obviously wrong. We should detect
this and turn off things like misfit migration and capacity aware wakeups.
As Quentin pointed out, having separate root domains makes this slightly
trickier. We could have exclusive cpusets that create an SMP island - IOW,
the domains within this root domain will not see any asymmetry. This means
we can't just disable the key on domain destruction, we need to count how
many asymmetric root domains we have.
Consider the following example using Juno r0 which is 2+4 big.LITTLE, where
two identical cpusets are created: they both span both big and LITTLE CPUs:
asym0 asym1
[ ][ ]
L L B L L B
$ cgcreate -g cpuset:asym0
$ cgset -r cpuset.cpus=0,1,3 asym0
$ cgset -r cpuset.mems=0 asym0
$ cgset -r cpuset.cpu_exclusive=1 asym0
$ cgcreate -g cpuset:asym1
$ cgset -r cpuset.cpus=2,4,5 asym1
$ cgset -r cpuset.mems=0 asym1
$ cgset -r cpuset.cpu_exclusive=1 asym1
$ cgset -r cpuset.sched_load_balance=0 .
(the CPU numbering may look odd because on the Juno LITTLEs are CPUs 0,3-5
and bigs are CPUs 1-2)
If we make one of those SMP (IOW remove asymmetry) by e.g. hotplugging its
big core, we would end up with an SMP cpuset and an asymmetric cpuset - the
static key must remain set, because we still have one asymmetric root domain.
With the above example, this could be done with:
$ echo 0 > /sys/devices/system/cpu/cpu2/online
Which would result in:
asym0 asym1
[ ][ ]
L L B L L
When both SMP and asymmetric cpusets are present, all CPUs will observe
sched_asym_cpucapacity being set (it is system-wide), but not all CPUs
observe asymmetry in their sched domain hierarchy:
per_cpu(sd_asym_cpucapacity, <any CPU in asym0>) == <some SD at DIE level>
per_cpu(sd_asym_cpucapacity, <any CPU in asym1>) == NULL
Change the simple key enablement to an increment, and decrement the key
counter when destroying domains that cover asymmetric CPUs.
Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: df054e8445a4 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/topology.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9318acf..49b835f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2029,7 +2029,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
rcu_read_unlock();
if (has_asym)
- static_branch_enable_cpuslocked(&sched_asym_cpucapacity);
+ static_branch_inc_cpuslocked(&sched_asym_cpucapacity);
if (rq && sched_debug_enabled) {
pr_info("root domain span: %*pbl (max cpu_capacity = %lu)\n",
@@ -2124,8 +2124,12 @@ int sched_init_domains(const struct cpumask *cpu_map)
*/
static void detach_destroy_domains(const struct cpumask *cpu_map)
{
+ unsigned int cpu = cpumask_any(cpu_map);
int i;
+ if (rcu_access_pointer(per_cpu(sd_asym_cpucapacity, cpu)))
+ static_branch_dec_cpuslocked(&sched_asym_cpucapacity);
+
rcu_read_lock();
for_each_cpu(i, cpu_map)
cpu_attach_domain(NULL, &def_root_domain, i);
On Wed, Oct 23, 2019 at 04:37:44PM +0100, Valentin Schneider <[email protected]> wrote:
> Prevent generate_sched_domains() from returning empty cpumasks, and add
> some assertion in build_sched_domains() to scream bloody murder if it
> happens again.
Good catch. It makes sense to prune the empty domains in
generate_sched_domains already.
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index c52bc91f882b..c87ee6412b36 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -798,7 +798,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
> cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
> continue;
>
> - if (is_sched_load_balance(cp))
> + if (is_sched_load_balance(cp) &&
> + !cpumask_empty(cp->effective_cpus))
> csa[csn++] = cp;
If I didn't overlook anything, cp->effective_cpus can contain CPUs
exluded by housekeeping_cpumask(HK_FLAG_DOMAIN) later, i.e. possibly
still returning domains with empty cpusets.
I'd suggest moving the emptiness check down into the loop where domain
cpumasks are ultimately constructed.
Michal
Hi Michal,
On 31/10/2019 17:23, Michal Koutn? wrote:
> On Wed, Oct 23, 2019 at 04:37:44PM +0100, Valentin Schneider <[email protected]> wrote:
>> Prevent generate_sched_domains() from returning empty cpumasks, and add
>> some assertion in build_sched_domains() to scream bloody murder if it
>> happens again.
> Good catch. It makes sense to prune the empty domains in
> generate_sched_domains already.
>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index c52bc91f882b..c87ee6412b36 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -798,7 +798,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
>> cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
>> continue;
>>
>> - if (is_sched_load_balance(cp))
>> + if (is_sched_load_balance(cp) &&
>> + !cpumask_empty(cp->effective_cpus))
>> csa[csn++] = cp;
> If I didn't overlook anything, cp->effective_cpus can contain CPUs
> exluded by housekeeping_cpumask(HK_FLAG_DOMAIN) later, i.e. possibly
> still returning domains with empty cpusets.
>
> I'd suggest moving the emptiness check down into the loop where domain
> cpumasks are ultimately constructed.
>
Ah, wasn't aware of this - thanks for having a look!
I think I need to have the check before the final cpumask gets built,
because at this point the cpumask array is already built and it's handed
off directly to the sched domain rebuild.
Do you reckon the following would work?
----8<----
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c87ee6412b36..e4c10785dc7c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -798,8 +798,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
continue;
+ /*
+ * Skip cpusets that would lead to an empty sched domain.
+ * That could be because effective_cpus is empty, or because
+ * it's only spanning CPUs outside the housekeeping mask.
+ */
if (is_sched_load_balance(cp) &&
- !cpumask_empty(cp->effective_cpus))
+ cpumask_intersects(cp->effective_cpus,
+ housekeeping_cpumask(HK_FLAG_DOMAIN)))
csa[csn++] = cp;
/* skip @cp's subtree if not a partition root */
On Thu, Oct 31, 2019 at 06:23:12PM +0100, Valentin Schneider <[email protected]> wrote:
> Do you reckon the following would work?
LGTM (i.e. cpuset will be skipped if no CPUs taking part in load
balancing remain in it after hot(un)plug event).
Michal