So after the feedback from the v1, I decided to take it one step further
and propose to re-implement isolcpus= on top of housekeeping. I expect
it to be controversial because it brings a behaviour change: isolcpus=
won't disable load balancing anymore. So you change the affinity of a
task at your own risk. OTOH it might make the implementation of isolcpus
more extendable, maybe more acceptable for a cpuset interface. I
leave you judge.
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
core/isolation-v2
HEAD: 79ef9a911551a4f700b057ed794c223bd3a97d7b
Thanks,
Frederic
---
Frederic Weisbecker (12):
housekeeping: Move housekeeping related code to its own file
watchdog: Use housekeeping_cpumask() instead of ad-hoc version
housekeeping: Provide a dynamic off-case to housekeeping_any_cpu()
housekeeping: Make housekeeping cpumask private
housekeeping: Use its own static key
housekeeping: Rename is_housekeeping_cpu to housekeeping_cpu
housekeeping: Move it under own config, independant from NO_HZ
housekeeping: Introduce housekeeping flags
workqueue: Affine unbound workqueues to housekeeping cpumask
housekeeping: Affine unbound kthreads
housekeeping: Handle nohz_full= parameter
housekeeping: Reimplement isolcpus on housekeeping
drivers/base/cpu.c | 10 ++-
drivers/net/ethernet/tile/tilegx.c | 6 +-
include/linux/housekeeping.h | 52 +++++++++++++++
include/linux/sched.h | 2 -
include/linux/tick.h | 38 +----------
init/Kconfig | 7 +++
init/main.c | 2 +
kernel/Makefile | 1 +
kernel/cgroup/cpuset.c | 13 +---
kernel/housekeeping.c | 126 +++++++++++++++++++++++++++++++++++++
kernel/kthread.c | 5 +-
kernel/rcu/tree_plugin.h | 3 +-
kernel/rcu/update.c | 3 +-
kernel/sched/core.c | 25 ++------
kernel/sched/fair.c | 3 +-
kernel/sched/topology.c | 19 +-----
kernel/time/tick-sched.c | 31 +--------
kernel/watchdog.c | 13 ++--
kernel/workqueue.c | 3 +-
19 files changed, 230 insertions(+), 132 deletions(-)
To keep a proper housekeeping namespace.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 2 +-
kernel/sched/core.c | 6 +++---
kernel/sched/fair.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index cbe8d63..320cc2b 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -29,7 +29,7 @@ static inline void housekeeping_affine(struct task_struct *t) { }
static inline void housekeeping_init(void) { }
#endif /* CONFIG_NO_HZ_FULL */
-static inline bool is_housekeeping_cpu(int cpu)
+static inline bool housekeeping_cpu(int cpu)
{
#ifdef CONFIG_NO_HZ_FULL
if (static_branch_unlikely(&housekeeping_overriden))
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 536d6a5..fd00ae3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -527,7 +527,7 @@ int get_nohz_timer_target(void)
int i, cpu = smp_processor_id();
struct sched_domain *sd;
- if (!idle_cpu(cpu) && is_housekeeping_cpu(cpu))
+ if (!idle_cpu(cpu) && housekeeping_cpu(cpu))
return cpu;
rcu_read_lock();
@@ -536,14 +536,14 @@ int get_nohz_timer_target(void)
if (cpu == i)
continue;
- if (!idle_cpu(i) && is_housekeeping_cpu(i)) {
+ if (!idle_cpu(i) && housekeeping_cpu(i)) {
cpu = i;
goto unlock;
}
}
}
- if (!is_housekeeping_cpu(cpu))
+ if (!housekeeping_cpu(cpu))
cpu = housekeeping_any_cpu();
unlock:
rcu_read_unlock();
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5455e98..0d8f7b1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8739,7 +8739,7 @@ void nohz_balance_enter_idle(int cpu)
return;
/* Spare idle load balancing on CPUs that don't want to be disturbed: */
- if (!is_housekeeping_cpu(cpu))
+ if (!housekeeping_cpu(cpu))
return;
if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
--
2.7.4
We want to be able to force the kthreads affinity when we'll repinplement
isolcpus= on top of housekeeping. Unfortunately many kthreads also
override their own affinity so we may need to revisit some of them.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 1 +
kernel/kthread.c | 5 +++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index 0959601..2a9b808 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -11,6 +11,7 @@ enum hk_flags {
HK_FLAG_MISC = (1 << 2),
HK_FLAG_SCHED = (1 << 3),
HK_FLAG_WORKQUEUE = (1 << 4),
+ HK_FLAG_KTHREAD = (1 << 5),
};
#ifdef CONFIG_CPU_ISOLATION
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 26db528..d4f1e63 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -21,6 +21,7 @@
#include <linux/ptrace.h>
#include <linux/uaccess.h>
#include <linux/cgroup.h>
+#include <linux/housekeeping.h>
#include <trace/events/sched.h>
static DEFINE_SPINLOCK(kthread_create_lock);
@@ -317,7 +318,7 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
* The kernel thread should not inherit these properties.
*/
sched_setscheduler_nocheck(task, SCHED_NORMAL, ¶m);
- set_cpus_allowed_ptr(task, cpu_all_mask);
+ set_cpus_allowed_ptr(task, housekeeping_cpumask(HK_FLAG_KTHREAD));
}
kfree(create);
return task;
@@ -536,7 +537,7 @@ int kthreadd(void *unused)
/* Setup a clean context for our children to inherit. */
set_task_comm(tsk, "kthreadd");
ignore_signals(tsk);
- set_cpus_allowed_ptr(tsk, cpu_all_mask);
+ set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_FLAG_KTHREAD));
set_mems_allowed(node_states[N_MEMORY]);
current->flags |= PF_NOFREEZE;
--
2.7.4
We want to centralize the isolation management from the housekeeping
subsystem. Therefore we need to handle the nohz_full= parameter from
there.
Since nohz_full= so far has involved unbound timers, watchdog, RCU
and tilegx NAPI isolation, we keep that default behaviour.
nohz_full= is deemed to be deprecated in the future. We want to control
the isolation features from the isolcpus= parameter.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 1 +
include/linux/tick.h | 1 +
kernel/housekeeping.c | 42 +++++++++++++++++++++++++++++-------------
kernel/time/tick-sched.c | 13 +++----------
4 files changed, 34 insertions(+), 23 deletions(-)
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index 2a9b808..b53a2b2 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -12,6 +12,7 @@ enum hk_flags {
HK_FLAG_SCHED = (1 << 3),
HK_FLAG_WORKQUEUE = (1 << 4),
HK_FLAG_KTHREAD = (1 << 5),
+ HK_FLAG_TICK = (1 << 6),
};
#ifdef CONFIG_CPU_ISOLATION
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 68afc09..3c82cf5 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -228,6 +228,7 @@ static inline void tick_dep_clear_signal(struct signal_struct *signal,
extern void tick_nohz_full_kick_cpu(int cpu);
extern void __tick_nohz_task_switch(void);
+extern void __init tick_nohz_full_setup(cpumask_var_t cpumask);
#else
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
index e2196d1..633a0d9 100644
--- a/kernel/housekeeping.c
+++ b/kernel/housekeeping.c
@@ -49,23 +49,39 @@ bool housekeeping_test_cpu(int cpu, enum hk_flags flags)
void __init housekeeping_init(void)
{
- if (!tick_nohz_full_enabled())
+ if (!housekeeping_flags)
return;
- if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
- WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
- cpumask_clear(tick_nohz_full_mask);
- tick_nohz_full_running = false;
- return;
- }
-
- cpumask_andnot(housekeeping_mask,
- cpu_possible_mask, tick_nohz_full_mask);
-
- housekeeping_flags = HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC;
-
static_branch_enable(&housekeeping_overriden);
/* We need at least one CPU to handle housekeeping work */
WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
}
+
+static int __init housekeeping_nohz_full_setup(char *str)
+{
+ cpumask_var_t non_housekeeping_mask;
+
+ alloc_bootmem_cpumask_var(&non_housekeeping_mask);
+ if (cpulist_parse(str, non_housekeeping_mask) < 0) {
+ pr_warn("Housekeeping: Incorrect nohz_full cpumask\n");
+ free_bootmem_cpumask_var(non_housekeeping_mask);
+ return 0;
+ }
+
+ alloc_bootmem_cpumask_var(&housekeeping_mask);
+ cpumask_andnot(housekeeping_mask, cpu_possible_mask, non_housekeeping_mask);
+
+ if (cpumask_empty(housekeeping_mask))
+ cpumask_set_cpu(smp_processor_id(), housekeeping_mask);
+
+ housekeeping_flags = HK_FLAG_TICK | HK_FLAG_TIMER |
+ HK_FLAG_RCU | HK_FLAG_MISC;
+
+ tick_nohz_full_setup(non_housekeeping_mask);
+
+ free_bootmem_cpumask_var(non_housekeeping_mask);
+
+ return 1;
+}
+__setup("nohz_full=", housekeeping_nohz_full_setup);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9d29dee..f09dd43 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -384,20 +384,13 @@ void __tick_nohz_task_switch(void)
local_irq_restore(flags);
}
-/* Parse the boot-time nohz CPU list from the kernel parameters. */
-static int __init tick_nohz_full_setup(char *str)
+/* Get the boot-time nohz CPU list from the kernel parameters. */
+void __init tick_nohz_full_setup(cpumask_var_t cpumask)
{
alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
- if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
- pr_warn("NO_HZ: Incorrect nohz_full cpumask\n");
- free_bootmem_cpumask_var(tick_nohz_full_mask);
- return 1;
- }
+ cpumask_copy(tick_nohz_full_mask, cpumask);
tick_nohz_full_running = true;
-
- return 1;
}
-__setup("nohz_full=", tick_nohz_full_setup);
static int tick_nohz_cpu_down(unsigned int cpu)
{
--
2.7.4
We want to centralize the isolation features on the housekeeping
subsystem and scheduler isolation is a significant part of it.
While at it, this is a proposition for a reimplementation of isolcpus=
that doesn't involve scheduler domain isolation. Therefore this
brings a behaviour change: all user tasks inherit init/1 affinity which
avoid the isolcpus= range. But if a task later overrides its affinity
which turns out to intersect an isolated CPU, load balancing may occur
on it.
OTOH such a reimplementation that doesn't shortcut scheduler internals
makes a better candidate for an interface extension to cpuset.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
drivers/base/cpu.c | 10 ++++++++-
include/linux/sched.h | 2 --
kernel/cgroup/cpuset.c | 13 ++---------
kernel/housekeeping.c | 57 +++++++++++++++++++++++++++++++++++++++++--------
kernel/sched/core.c | 16 +-------------
kernel/sched/topology.c | 19 ++---------------
6 files changed, 62 insertions(+), 55 deletions(-)
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 2c3b359..35b2b10 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -18,6 +18,7 @@
#include <linux/cpufeature.h>
#include <linux/tick.h>
#include <linux/pm_qos.h>
+#include <linux/housekeeping.h>
#include "base.h"
@@ -271,8 +272,15 @@ static ssize_t print_cpus_isolated(struct device *dev,
struct device_attribute *attr, char *buf)
{
int n = 0, len = PAGE_SIZE-2;
+ cpumask_var_t isolated;
- n = scnprintf(buf, len, "%*pbl\n", cpumask_pr_args(cpu_isolated_map));
+ if (!alloc_cpumask_var(&isolated, GFP_KERNEL))
+ return -ENOMEM;
+
+ cpumask_andnot(isolated, cpu_possible_mask, housekeeping_cpumask(HK_FLAG_SCHED));
+ n = scnprintf(buf, len, "%*pbl\n", cpumask_pr_args(isolated));
+
+ free_cpumask_var(isolated);
return n;
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c28b182..816ff52 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -166,8 +166,6 @@ struct task_group;
/* Task command name length: */
#define TASK_COMM_LEN 16
-extern cpumask_var_t cpu_isolated_map;
-
extern void scheduler_tick(void);
#define MAX_SCHEDULE_TIMEOUT LONG_MAX
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 8d51516..5d71020 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -639,7 +639,6 @@ static int generate_sched_domains(cpumask_var_t **domains,
int csn; /* how many cpuset ptrs in csa so far */
int i, j, k; /* indices for partition finding loops */
cpumask_var_t *doms; /* resulting partition; i.e. sched domains */
- cpumask_var_t non_isolated_cpus; /* load balanced CPUs */
struct sched_domain_attr *dattr; /* attributes for custom domains */
int ndoms = 0; /* number of sched domains in result */
int nslot; /* next empty doms[] struct cpumask slot */
@@ -649,10 +648,6 @@ static int generate_sched_domains(cpumask_var_t **domains,
dattr = NULL;
csa = NULL;
- if (!alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL))
- goto done;
- cpumask_andnot(non_isolated_cpus, cpu_possible_mask, cpu_isolated_map);
-
/* Special case for the 99% of systems with one, full, sched domain */
if (is_sched_load_balance(&top_cpuset)) {
ndoms = 1;
@@ -665,8 +660,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
*dattr = SD_ATTR_INIT;
update_domain_attr_tree(dattr, &top_cpuset);
}
- cpumask_and(doms[0], top_cpuset.effective_cpus,
- non_isolated_cpus);
+ cpumask_copy(doms[0], top_cpuset.effective_cpus);
goto done;
}
@@ -689,8 +683,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
* the corresponding sched domain.
*/
if (!cpumask_empty(cp->cpus_allowed) &&
- !(is_sched_load_balance(cp) &&
- cpumask_intersects(cp->cpus_allowed, non_isolated_cpus)))
+ !(is_sched_load_balance(cp)))
continue;
if (is_sched_load_balance(cp))
@@ -772,7 +765,6 @@ static int generate_sched_domains(cpumask_var_t **domains,
if (apn == b->pn) {
cpumask_or(dp, dp, b->effective_cpus);
- cpumask_and(dp, dp, non_isolated_cpus);
if (dattr)
update_domain_attr_tree(dattr + nslot, b);
@@ -785,7 +777,6 @@ static int generate_sched_domains(cpumask_var_t **domains,
BUG_ON(nslot != ndoms);
done:
- free_cpumask_var(non_isolated_cpus);
kfree(csa);
/*
diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
index 633a0d9..1fd9316 100644
--- a/kernel/housekeeping.c
+++ b/kernel/housekeeping.c
@@ -58,30 +58,69 @@ void __init housekeeping_init(void)
WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
}
-static int __init housekeeping_nohz_full_setup(char *str)
+static int __init housekeeping_setup(char *str, enum hk_flags flags)
{
cpumask_var_t non_housekeeping_mask;
alloc_bootmem_cpumask_var(&non_housekeeping_mask);
if (cpulist_parse(str, non_housekeeping_mask) < 0) {
- pr_warn("Housekeeping: Incorrect nohz_full cpumask\n");
free_bootmem_cpumask_var(non_housekeeping_mask);
return 0;
}
- alloc_bootmem_cpumask_var(&housekeeping_mask);
- cpumask_andnot(housekeeping_mask, cpu_possible_mask, non_housekeeping_mask);
+ if (!housekeeping_flags) {
+ alloc_bootmem_cpumask_var(&housekeeping_mask);
+ cpumask_andnot(housekeeping_mask,
+ cpu_possible_mask, non_housekeeping_mask);
+ if (cpumask_empty(housekeeping_mask))
+ cpumask_set_cpu(smp_processor_id(), housekeeping_mask);
+ } else {
+ cpumask_var_t tmp;
- if (cpumask_empty(housekeeping_mask))
- cpumask_set_cpu(smp_processor_id(), housekeeping_mask);
+ alloc_bootmem_cpumask_var(&tmp);
+ cpumask_andnot(tmp, cpu_possible_mask, non_housekeeping_mask);
+ if (!cpumask_equal(tmp, housekeeping_mask)) {
+ pr_warn("Housekeeping: nohz_full= must match isolcpus=\n");
+ free_bootmem_cpumask_var(tmp);
+ free_bootmem_cpumask_var(non_housekeeping_mask);
+ return 0;
+ }
+ free_bootmem_cpumask_var(tmp);
+ }
- housekeeping_flags = HK_FLAG_TICK | HK_FLAG_TIMER |
- HK_FLAG_RCU | HK_FLAG_MISC;
+ if ((flags & HK_FLAG_TICK) && !(housekeeping_flags & HK_FLAG_TICK))
+ tick_nohz_full_setup(non_housekeeping_mask);
- tick_nohz_full_setup(non_housekeeping_mask);
+ housekeeping_flags |= flags;
free_bootmem_cpumask_var(non_housekeeping_mask);
return 1;
}
+
+static int __init housekeeping_nohz_full_setup(char *str)
+{
+ unsigned int flags;
+ int ret;
+
+ flags = HK_FLAG_TICK | HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC;
+ ret = housekeeping_setup(str, flags);
+ if (!ret)
+ pr_warn("Housekeeping: Incorrect nohz_full cpumask\n");
+ return ret;
+}
__setup("nohz_full=", housekeeping_nohz_full_setup);
+
+static int __init housekeeping_isolcpus_setup(char *str)
+{
+ unsigned int flags;
+ int ret;
+
+ flags = HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC |
+ HK_FLAG_SCHED | HK_FLAG_WORKQUEUE | HK_FLAG_KTHREAD;
+ ret = housekeeping_setup(str, flags);
+ if (!ret)
+ pr_warn("Housekeeping: Error, all isolcpus= values must be between 0 and %d\n", nr_cpu_ids);
+ return ret;
+}
+__setup("isolcpus=", housekeeping_isolcpus_setup);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 877c85d..269f3ac 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -84,9 +84,6 @@ __read_mostly int scheduler_running;
*/
int sysctl_sched_rt_runtime = 950000;
-/* CPUs with isolated domains */
-cpumask_var_t cpu_isolated_map;
-
/*
* __task_rq_lock - lock the rq @p resides on.
*/
@@ -5672,10 +5669,6 @@ static inline void sched_init_smt(void) { }
void __init sched_init_smp(void)
{
- cpumask_var_t non_isolated_cpus;
-
- alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
-
sched_init_numa();
/*
@@ -5685,16 +5678,12 @@ void __init sched_init_smp(void)
*/
mutex_lock(&sched_domains_mutex);
sched_init_domains(cpu_active_mask);
- cpumask_andnot(non_isolated_cpus, cpu_possible_mask, cpu_isolated_map);
- if (cpumask_empty(non_isolated_cpus))
- cpumask_set_cpu(smp_processor_id(), non_isolated_cpus);
mutex_unlock(&sched_domains_mutex);
/* Move init over to a non-isolated CPU */
- if (set_cpus_allowed_ptr(current, non_isolated_cpus) < 0)
+ if (set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_SCHED)) < 0)
BUG();
sched_init_granularity();
- free_cpumask_var(non_isolated_cpus);
init_sched_rt_class();
init_sched_dl_class();
@@ -5898,9 +5887,6 @@ void __init sched_init(void)
calc_load_update = jiffies + LOAD_FREQ;
#ifdef CONFIG_SMP
- /* May be allocated at isolcpus cmdline parse time */
- if (cpu_isolated_map == NULL)
- zalloc_cpumask_var(&cpu_isolated_map, GFP_NOWAIT);
idle_thread_set_boot_cpu();
set_cpu_rq_start_time(smp_processor_id());
#endif
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index bd8b6d6..e060e28 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -466,21 +466,6 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
update_top_cache_domain(cpu);
}
-/* Setup the mask of CPUs configured for isolated domains */
-static int __init isolated_cpu_setup(char *str)
-{
- int ret;
-
- alloc_bootmem_cpumask_var(&cpu_isolated_map);
- ret = cpulist_parse(str, cpu_isolated_map);
- if (ret) {
- pr_err("sched: Error, all isolcpus= values must be between 0 and %d\n", nr_cpu_ids);
- return 0;
- }
- return 1;
-}
-__setup("isolcpus=", isolated_cpu_setup);
-
struct s_data {
struct sched_domain ** __percpu sd;
struct root_domain *rd;
@@ -1775,7 +1760,7 @@ int sched_init_domains(const struct cpumask *cpu_map)
doms_cur = alloc_sched_domains(ndoms_cur);
if (!doms_cur)
doms_cur = &fallback_doms;
- cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
+ cpumask_copy(doms_cur[0], cpu_map);
err = build_sched_domains(doms_cur[0], NULL);
register_sched_domain_sysctl();
@@ -1871,7 +1856,7 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
if (doms_new == NULL) {
n = 0;
doms_new = &fallback_doms;
- cpumask_andnot(doms_new[0], cpu_active_mask, cpu_isolated_map);
+ cpumask_copy(doms_new[0], cpu_active_mask);
WARN_ON_ONCE(dattr_new);
}
--
2.7.4
Although the unbound workqueue cpumask can be overriden through sysfs,
we also want to affine the workqueues when isolcpus= will be
reimplemented on top of housekeeping.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 1 +
kernel/workqueue.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index b1a62544..0959601 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -10,6 +10,7 @@ enum hk_flags {
HK_FLAG_RCU = (1 << 1),
HK_FLAG_MISC = (1 << 2),
HK_FLAG_SCHED = (1 << 3),
+ HK_FLAG_WORKQUEUE = (1 << 4),
};
#ifdef CONFIG_CPU_ISOLATION
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ca937b0..256e3cb 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -48,6 +48,7 @@
#include <linux/nodemask.h>
#include <linux/moduleparam.h>
#include <linux/uaccess.h>
+#include <linux/housekeeping.h>
#include "workqueue_internal.h"
@@ -5546,7 +5547,7 @@ int __init workqueue_init_early(void)
WARN_ON(__alignof__(struct pool_workqueue) < __alignof__(long long));
BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
- cpumask_copy(wq_unbound_cpumask, cpu_possible_mask);
+ cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_FLAG_WORKQUEUE));
pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
--
2.7.4
Complete the housekeeping split from CONFIG_NO_HZ_FULL by moving it
under its own config. This way we finally separate the isolation code
from nohz.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 6 +++---
init/Kconfig | 7 +++++++
kernel/Makefile | 2 +-
3 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index 320cc2b..dcbec47 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -5,7 +5,7 @@
#include <linux/init.h>
#include <linux/tick.h>
-#ifdef CONFIG_NO_HZ_FULL
+#ifdef CONFIG_CPU_ISOLATION
DECLARE_STATIC_KEY_FALSE(housekeeping_overriden);
extern int housekeeping_any_cpu(void);
extern const struct cpumask *housekeeping_cpumask(void);
@@ -27,11 +27,11 @@ static inline const struct cpumask *housekeeping_cpumask(void)
static inline void housekeeping_affine(struct task_struct *t) { }
static inline void housekeeping_init(void) { }
-#endif /* CONFIG_NO_HZ_FULL */
+#endif /* CONFIG_CPU_ISOLATION */
static inline bool housekeeping_cpu(int cpu)
{
-#ifdef CONFIG_NO_HZ_FULL
+#ifdef CONFIG_CPU_ISOLATION
if (static_branch_unlikely(&housekeeping_overriden))
return housekeeping_test_cpu(cpu);
#endif
diff --git a/init/Kconfig b/init/Kconfig
index 8514b25..f35bdae 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -472,6 +472,13 @@ config TASK_IO_ACCOUNTING
endmenu # "CPU/Task time and stats accounting"
+config CPU_ISOLATION
+ bool "CPU isolation"
+ help
+ Make sure that CPUs running critical tasks are not disturbed by
+ any source of "noise" such as unbound workqueues, timers, kthreads...
+ Unbound jobs get offloaded to housekeeping CPUs.
+
source "kernel/rcu/Kconfig"
config BUILD_BIN2C
diff --git a/kernel/Makefile b/kernel/Makefile
index 8a85c4b..445f876 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -109,7 +109,7 @@ obj-$(CONFIG_JUMP_LABEL) += jump_label.o
obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o
obj-$(CONFIG_TORTURE_TEST) += torture.o
obj-$(CONFIG_MEMBARRIER) += membarrier.o
-obj-$(CONFIG_NO_HZ_FULL) += housekeeping.o
+obj-$(CONFIG_CPU_ISOLATION) += housekeeping.o
obj-$(CONFIG_HAS_IOMEM) += memremap.o
--
2.7.4
Before we implement isolcpus under housekeeping, we need the isolation
features to be more finegrained. For example some people want nohz_full
without the full scheduler isolation, others want full scheduler
isolation without nohz_full.
So let's cut all these isolation features piecewise, at the risk of
overcutting it right now. We can still merge some flags later if they
always make sense together.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
drivers/net/ethernet/tile/tilegx.c | 4 ++--
include/linux/housekeeping.h | 26 +++++++++++++++++---------
kernel/housekeeping.c | 26 +++++++++++++++-----------
kernel/rcu/tree_plugin.h | 2 +-
kernel/rcu/update.c | 2 +-
kernel/sched/core.c | 8 ++++----
kernel/sched/fair.c | 2 +-
kernel/watchdog.c | 3 ++-
8 files changed, 43 insertions(+), 30 deletions(-)
diff --git a/drivers/net/ethernet/tile/tilegx.c b/drivers/net/ethernet/tile/tilegx.c
index eb74e09..0bd765b 100644
--- a/drivers/net/ethernet/tile/tilegx.c
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -2270,8 +2270,8 @@ static int __init tile_net_init_module(void)
tile_net_dev_init(name, mac);
if (!network_cpus_init())
- cpumask_and(&network_cpus_map, housekeeping_cpumask(),
- cpu_online_mask);
+ cpumask_and(&network_cpus_map,
+ housekeeping_cpumask(HK_FLAG_MISC), cpu_online_mask);
return 0;
}
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index dcbec47..b1a62544 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -5,35 +5,43 @@
#include <linux/init.h>
#include <linux/tick.h>
+enum hk_flags {
+ HK_FLAG_TIMER = 1,
+ HK_FLAG_RCU = (1 << 1),
+ HK_FLAG_MISC = (1 << 2),
+ HK_FLAG_SCHED = (1 << 3),
+};
+
#ifdef CONFIG_CPU_ISOLATION
DECLARE_STATIC_KEY_FALSE(housekeeping_overriden);
-extern int housekeeping_any_cpu(void);
-extern const struct cpumask *housekeeping_cpumask(void);
-extern void housekeeping_affine(struct task_struct *t);
-extern bool housekeeping_test_cpu(int cpu);
+extern int housekeeping_any_cpu(enum hk_flags flags);
+extern const struct cpumask *housekeeping_cpumask(enum hk_flags flags);
+extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
+extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
extern void __init housekeeping_init(void);
#else
-static inline int housekeeping_any_cpu(void)
+static inline int housekeeping_any_cpu(enum hk_flags flags)
{
return smp_processor_id();
}
-static inline const struct cpumask *housekeeping_cpumask(void)
+static inline const struct cpumask *housekeeping_cpumask(enum hk_flags flags)
{
return cpu_possible_mask;
}
-static inline void housekeeping_affine(struct task_struct *t) { }
+static inline void housekeeping_affine(struct task_struct *t,
+ enum hk_flags flags) { }
static inline void housekeeping_init(void) { }
#endif /* CONFIG_CPU_ISOLATION */
-static inline bool housekeeping_cpu(int cpu)
+static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
{
#ifdef CONFIG_CPU_ISOLATION
if (static_branch_unlikely(&housekeeping_overriden))
- return housekeeping_test_cpu(cpu);
+ return housekeeping_test_cpu(cpu, flags);
#endif
return true;
}
diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
index f8be7e6..e2196d1 100644
--- a/kernel/housekeeping.c
+++ b/kernel/housekeeping.c
@@ -14,34 +14,36 @@
DEFINE_STATIC_KEY_FALSE(housekeeping_overriden);
EXPORT_SYMBOL_GPL(housekeeping_overriden);
static cpumask_var_t housekeeping_mask;
+static unsigned int housekeeping_flags;
-int housekeeping_any_cpu(void)
+int housekeeping_any_cpu(enum hk_flags flags)
{
if (static_branch_unlikely(&housekeeping_overriden))
- return cpumask_any_and(housekeeping_mask, cpu_online_mask);
-
+ if (housekeeping_flags & flags)
+ return cpumask_any_and(housekeeping_mask, cpu_online_mask);
return smp_processor_id();
}
-const struct cpumask *housekeeping_cpumask(void)
+const struct cpumask *housekeeping_cpumask(enum hk_flags flags)
{
if (static_branch_unlikely(&housekeeping_overriden))
- return housekeeping_mask;
-
+ if (housekeeping_flags & flags)
+ return housekeeping_mask;
return cpu_possible_mask;
}
-void housekeeping_affine(struct task_struct *t)
+void housekeeping_affine(struct task_struct *t, enum hk_flags flags)
{
if (static_branch_unlikely(&housekeeping_overriden))
- set_cpus_allowed_ptr(t, housekeeping_mask);
+ if (housekeeping_flags & flags)
+ set_cpus_allowed_ptr(t, housekeeping_mask);
}
-bool housekeeping_test_cpu(int cpu)
+bool housekeeping_test_cpu(int cpu, enum hk_flags flags)
{
if (static_branch_unlikely(&housekeeping_overriden))
- return cpumask_test_cpu(cpu, housekeeping_mask);
-
+ if (housekeeping_flags & flags)
+ return cpumask_test_cpu(cpu, housekeeping_mask);
return true;
}
@@ -60,6 +62,8 @@ void __init housekeeping_init(void)
cpumask_andnot(housekeeping_mask,
cpu_possible_mask, tick_nohz_full_mask);
+ housekeeping_flags = HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC;
+
static_branch_enable(&housekeeping_overriden);
/* We need at least one CPU to handle housekeeping work */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c66d162..47f2865 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2542,7 +2542,7 @@ static void rcu_bind_gp_kthread(void)
if (!tick_nohz_full_enabled())
return;
- housekeeping_affine(current);
+ housekeeping_affine(current, HK_FLAG_RCU);
}
/* Record the current task on dyntick-idle entry. */
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index bfe973d..7acfd74 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -719,7 +719,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
LIST_HEAD(rcu_tasks_holdouts);
/* Run on housekeeping CPUs by default. Sysadm can move if desired. */
- housekeeping_affine(current);
+ housekeeping_affine(current, HK_FLAG_RCU);
/*
* Each pass through the following loop makes one check for
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fd00ae3..877c85d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -527,7 +527,7 @@ int get_nohz_timer_target(void)
int i, cpu = smp_processor_id();
struct sched_domain *sd;
- if (!idle_cpu(cpu) && housekeeping_cpu(cpu))
+ if (!idle_cpu(cpu) && housekeeping_cpu(cpu, HK_FLAG_TIMER))
return cpu;
rcu_read_lock();
@@ -536,15 +536,15 @@ int get_nohz_timer_target(void)
if (cpu == i)
continue;
- if (!idle_cpu(i) && housekeeping_cpu(i)) {
+ if (!idle_cpu(i) && housekeeping_cpu(i, HK_FLAG_TIMER)) {
cpu = i;
goto unlock;
}
}
}
- if (!housekeeping_cpu(cpu))
- cpu = housekeeping_any_cpu();
+ if (!housekeeping_cpu(cpu, HK_FLAG_TIMER))
+ cpu = housekeeping_any_cpu(HK_FLAG_TIMER);
unlock:
rcu_read_unlock();
return cpu;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0d8f7b1..74af955 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8739,7 +8739,7 @@ void nohz_balance_enter_idle(int cpu)
return;
/* Spare idle load balancing on CPUs that don't want to be disturbed: */
- if (!housekeeping_cpu(cpu))
+ if (!housekeeping_cpu(cpu, HK_FLAG_SCHED))
return;
if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index cdd0d11..631131c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -944,7 +944,8 @@ void __init lockup_detector_init(void)
if (tick_nohz_full_enabled())
pr_info("Disabling watchdog on nohz_full cores by default\n");
- cpumask_copy(&watchdog_cpumask, housekeeping_cpumask());
+ cpumask_copy(&watchdog_cpumask,
+ housekeeping_cpumask(HK_FLAG_TIMER));
if (watchdog_enabled)
watchdog_enable_all_cpus();
--
2.7.4
The housekeeping code is currently tied to the nohz code. As we are
planning to make housekeeping independant from it, start with moving
the relevant code to its own file.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
drivers/net/ethernet/tile/tilegx.c | 2 +-
include/linux/housekeeping.h | 56 ++++++++++++++++++++++++++++++++++++++
include/linux/tick.h | 37 -------------------------
init/main.c | 2 ++
kernel/Makefile | 1 +
kernel/housekeeping.c | 32 ++++++++++++++++++++++
kernel/rcu/tree_plugin.h | 1 +
kernel/rcu/update.c | 1 +
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 1 +
kernel/time/tick-sched.c | 18 ------------
kernel/watchdog.c | 1 +
12 files changed, 97 insertions(+), 56 deletions(-)
create mode 100644 include/linux/housekeeping.h
create mode 100644 kernel/housekeeping.c
diff --git a/drivers/net/ethernet/tile/tilegx.c b/drivers/net/ethernet/tile/tilegx.c
index aec9538..eb74e09 100644
--- a/drivers/net/ethernet/tile/tilegx.c
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -40,7 +40,7 @@
#include <linux/tcp.h>
#include <linux/net_tstamp.h>
#include <linux/ptp_clock_kernel.h>
-#include <linux/tick.h>
+#include <linux/housekeeping.h>
#include <asm/checksum.h>
#include <asm/homecache.h>
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
new file mode 100644
index 0000000..3d6a8e6
--- /dev/null
+++ b/include/linux/housekeeping.h
@@ -0,0 +1,56 @@
+#ifndef _LINUX_HOUSEKEEPING_H
+#define _LINUX_HOUSEKEEPING_H
+
+#include <linux/cpumask.h>
+#include <linux/init.h>
+#include <linux/tick.h>
+
+#ifdef CONFIG_NO_HZ_FULL
+extern cpumask_var_t housekeeping_mask;
+
+static inline int housekeeping_any_cpu(void)
+{
+ return cpumask_any_and(housekeeping_mask, cpu_online_mask);
+}
+
+extern void __init housekeeping_init(void);
+
+#else
+
+static inline int housekeeping_any_cpu(void)
+{
+ return smp_processor_id();
+}
+
+static inline void housekeeping_init(void) { }
+#endif /* CONFIG_NO_HZ_FULL */
+
+
+static inline const struct cpumask *housekeeping_cpumask(void)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ if (tick_nohz_full_enabled())
+ return housekeeping_mask;
+#endif
+ return cpu_possible_mask;
+}
+
+static inline bool is_housekeeping_cpu(int cpu)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ if (tick_nohz_full_enabled())
+ return cpumask_test_cpu(cpu, housekeeping_mask);
+#endif
+ return true;
+}
+
+static inline void housekeeping_affine(struct task_struct *t)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ if (tick_nohz_full_enabled())
+ set_cpus_allowed_ptr(t, housekeeping_mask);
+
+#endif
+}
+
+#endif /* _LINUX_HOUSEKEEPING_H */
diff --git a/include/linux/tick.h b/include/linux/tick.h
index fe01e68..68afc09 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -137,7 +137,6 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
#ifdef CONFIG_NO_HZ_FULL
extern bool tick_nohz_full_running;
extern cpumask_var_t tick_nohz_full_mask;
-extern cpumask_var_t housekeeping_mask;
static inline bool tick_nohz_full_enabled(void)
{
@@ -161,11 +160,6 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
cpumask_or(mask, mask, tick_nohz_full_mask);
}
-static inline int housekeeping_any_cpu(void)
-{
- return cpumask_any_and(housekeeping_mask, cpu_online_mask);
-}
-
extern void tick_nohz_dep_set(enum tick_dep_bits bit);
extern void tick_nohz_dep_clear(enum tick_dep_bits bit);
extern void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit);
@@ -235,10 +229,6 @@ static inline void tick_dep_clear_signal(struct signal_struct *signal,
extern void tick_nohz_full_kick_cpu(int cpu);
extern void __tick_nohz_task_switch(void);
#else
-static inline int housekeeping_any_cpu(void)
-{
- return smp_processor_id();
-}
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
@@ -260,33 +250,6 @@ static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void __tick_nohz_task_switch(void) { }
#endif
-static inline const struct cpumask *housekeeping_cpumask(void)
-{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- return housekeeping_mask;
-#endif
- return cpu_possible_mask;
-}
-
-static inline bool is_housekeeping_cpu(int cpu)
-{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- return cpumask_test_cpu(cpu, housekeeping_mask);
-#endif
- return true;
-}
-
-static inline void housekeeping_affine(struct task_struct *t)
-{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- set_cpus_allowed_ptr(t, housekeeping_mask);
-
-#endif
-}
-
static inline void tick_nohz_task_switch(void)
{
if (tick_nohz_full_enabled())
diff --git a/init/main.c b/init/main.c
index 881d624..4047c85 100644
--- a/init/main.c
+++ b/init/main.c
@@ -46,6 +46,7 @@
#include <linux/cgroup.h>
#include <linux/efi.h>
#include <linux/tick.h>
+#include <linux/housekeeping.h>
#include <linux/interrupt.h>
#include <linux/taskstats_kern.h>
#include <linux/delayacct.h>
@@ -604,6 +605,7 @@ asmlinkage __visible void __init start_kernel(void)
early_irq_init();
init_IRQ();
tick_init();
+ housekeeping_init();
rcu_init_nohz();
init_timers();
hrtimers_init();
diff --git a/kernel/Makefile b/kernel/Makefile
index 4cb8e8b..8a85c4b 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -109,6 +109,7 @@ obj-$(CONFIG_JUMP_LABEL) += jump_label.o
obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o
obj-$(CONFIG_TORTURE_TEST) += torture.o
obj-$(CONFIG_MEMBARRIER) += membarrier.o
+obj-$(CONFIG_NO_HZ_FULL) += housekeeping.o
obj-$(CONFIG_HAS_IOMEM) += memremap.o
diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
new file mode 100644
index 0000000..6d8afd5
--- /dev/null
+++ b/kernel/housekeeping.c
@@ -0,0 +1,32 @@
+/*
+ * Housekeeping management. Manage the targets for routine code that can run on
+ * any CPU: unbound workqueues, timers, kthreads and any offloadable work.
+ *
+ * Copyright (C) 2017 Red Hat, Inc., Frederic Weisbecker
+ *
+ */
+
+#include <linux/housekeeping.h>
+#include <linux/tick.h>
+#include <linux/init.h>
+
+cpumask_var_t housekeeping_mask;
+
+void __init housekeeping_init(void)
+{
+ if (!tick_nohz_full_enabled())
+ return;
+
+ if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
+ WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
+ cpumask_clear(tick_nohz_full_mask);
+ tick_nohz_full_running = false;
+ return;
+ }
+
+ cpumask_andnot(housekeeping_mask,
+ cpu_possible_mask, tick_nohz_full_mask);
+
+ /* We need at least one CPU to handle housekeeping work */
+ WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
+}
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 908b309..c66d162 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -29,6 +29,7 @@
#include <linux/oom.h>
#include <linux/sched/debug.h>
#include <linux/smpboot.h>
+#include <linux/housekeeping.h>
#include <uapi/linux/sched/types.h>
#include "../time/tick-internal.h"
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 00e77c4..bfe973d 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -51,6 +51,7 @@
#include <linux/kthread.h>
#include <linux/tick.h>
#include <linux/rcupdate_wait.h>
+#include <linux/housekeeping.h>
#define CREATE_TRACE_POINTS
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9f9948..536d6a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -26,6 +26,7 @@
#include <linux/profile.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/housekeeping.h>
#include <asm/switch_to.h>
#include <asm/tlb.h>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8d58687..5455e98 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -32,6 +32,7 @@
#include <linux/mempolicy.h>
#include <linux/migrate.h>
#include <linux/task_work.h>
+#include <linux/housekeeping.h>
#include <trace/events/sched.h>
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c7a899c..9d29dee 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -165,7 +165,6 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
#ifdef CONFIG_NO_HZ_FULL
cpumask_var_t tick_nohz_full_mask;
-cpumask_var_t housekeeping_mask;
bool tick_nohz_full_running;
static atomic_t tick_dep_mask;
@@ -437,13 +436,6 @@ void __init tick_nohz_init(void)
return;
}
- if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
- WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
- cpumask_clear(tick_nohz_full_mask);
- tick_nohz_full_running = false;
- return;
- }
-
/*
* Full dynticks uses irq work to drive the tick rescheduling on safe
* locking contexts. But then we need irq work to raise its own
@@ -452,7 +444,6 @@ void __init tick_nohz_init(void)
if (!arch_irq_work_has_interrupt()) {
pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support irq work self-IPIs\n");
cpumask_clear(tick_nohz_full_mask);
- cpumask_copy(housekeeping_mask, cpu_possible_mask);
tick_nohz_full_running = false;
return;
}
@@ -465,9 +456,6 @@ void __init tick_nohz_init(void)
cpumask_clear_cpu(cpu, tick_nohz_full_mask);
}
- cpumask_andnot(housekeeping_mask,
- cpu_possible_mask, tick_nohz_full_mask);
-
for_each_cpu(cpu, tick_nohz_full_mask)
context_tracking_cpu_set(cpu);
@@ -477,12 +465,6 @@ void __init tick_nohz_init(void)
WARN_ON(ret < 0);
pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
cpumask_pr_args(tick_nohz_full_mask));
-
- /*
- * We need at least one CPU to handle housekeeping work such
- * as timekeeping, unbound timers, workqueues, ...
- */
- WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
}
#endif
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 06d3389..7a9df162 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -24,6 +24,7 @@
#include <linux/workqueue.h>
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
+#include <linux/housekeeping.h>
#include <asm/irq_regs.h>
#include <linux/kvm_para.h>
--
2.7.4
Housekeeping code still depends on nohz_full static key. Since we want
to decouple housekeeping from nohz, let's create a housekeeping own static
key. It's mostly relevant for calls to is_housekeeping_cpu() from the
scheduler.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 3 ++-
kernel/housekeeping.c | 14 +++++++++-----
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index 31a1401..cbe8d63 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -6,6 +6,7 @@
#include <linux/tick.h>
#ifdef CONFIG_NO_HZ_FULL
+DECLARE_STATIC_KEY_FALSE(housekeeping_overriden);
extern int housekeeping_any_cpu(void);
extern const struct cpumask *housekeeping_cpumask(void);
extern void housekeeping_affine(struct task_struct *t);
@@ -31,7 +32,7 @@ static inline void housekeeping_init(void) { }
static inline bool is_housekeeping_cpu(int cpu)
{
#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
+ if (static_branch_unlikely(&housekeeping_overriden))
return housekeeping_test_cpu(cpu);
#endif
return true;
diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
index 0183e75..f8be7e6 100644
--- a/kernel/housekeeping.c
+++ b/kernel/housekeeping.c
@@ -9,12 +9,15 @@
#include <linux/housekeeping.h>
#include <linux/tick.h>
#include <linux/init.h>
+#include <linux/static_key.h>
+DEFINE_STATIC_KEY_FALSE(housekeeping_overriden);
+EXPORT_SYMBOL_GPL(housekeeping_overriden);
static cpumask_var_t housekeeping_mask;
int housekeeping_any_cpu(void)
{
- if (tick_nohz_full_enabled())
+ if (static_branch_unlikely(&housekeeping_overriden))
return cpumask_any_and(housekeeping_mask, cpu_online_mask);
return smp_processor_id();
@@ -22,7 +25,7 @@ int housekeeping_any_cpu(void)
const struct cpumask *housekeeping_cpumask(void)
{
- if (tick_nohz_full_enabled())
+ if (static_branch_unlikely(&housekeeping_overriden))
return housekeeping_mask;
return cpu_possible_mask;
@@ -30,19 +33,18 @@ const struct cpumask *housekeeping_cpumask(void)
void housekeeping_affine(struct task_struct *t)
{
- if (tick_nohz_full_enabled())
+ if (static_branch_unlikely(&housekeeping_overriden))
set_cpus_allowed_ptr(t, housekeeping_mask);
}
bool housekeeping_test_cpu(int cpu)
{
- if (tick_nohz_full_enabled())
+ if (static_branch_unlikely(&housekeeping_overriden))
return cpumask_test_cpu(cpu, housekeeping_mask);
return true;
}
-
void __init housekeeping_init(void)
{
if (!tick_nohz_full_enabled())
@@ -58,6 +60,8 @@ void __init housekeeping_init(void)
cpumask_andnot(housekeeping_mask,
cpu_possible_mask, tick_nohz_full_mask);
+ static_branch_enable(&housekeeping_overriden);
+
/* We need at least one CPU to handle housekeeping work */
WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
}
--
2.7.4
While trying to disable the watchog on nohz_full CPUs, the watchdog
implements an ad-hoc version of housekeeping_cpumask(). Lets replace
those re-invented lines.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
kernel/watchdog.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7a9df162..cdd0d11 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -941,15 +941,10 @@ void __init lockup_detector_init(void)
{
set_sample_period();
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled()) {
+ if (tick_nohz_full_enabled())
pr_info("Disabling watchdog on nohz_full cores by default\n");
- cpumask_copy(&watchdog_cpumask, housekeeping_mask);
- } else
- cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
-#else
- cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
-#endif
+
+ cpumask_copy(&watchdog_cpumask, housekeeping_cpumask());
if (watchdog_enabled)
watchdog_enable_all_cpus();
--
2.7.4
Nobody needs to access this detail. housekeeping_cpumask() already
takes care about it.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 31 ++++++++++---------------------
kernel/housekeeping.c | 33 ++++++++++++++++++++++++++++++++-
2 files changed, 42 insertions(+), 22 deletions(-)
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index 64d0ee5..31a1401 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -6,46 +6,35 @@
#include <linux/tick.h>
#ifdef CONFIG_NO_HZ_FULL
-extern cpumask_var_t housekeeping_mask;
+extern int housekeeping_any_cpu(void);
+extern const struct cpumask *housekeeping_cpumask(void);
+extern void housekeeping_affine(struct task_struct *t);
+extern bool housekeeping_test_cpu(int cpu);
extern void __init housekeeping_init(void);
+
#else
-static inline void housekeeping_init(void) { }
-#endif /* CONFIG_NO_HZ_FULL */
static inline int housekeeping_any_cpu(void)
{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- return cpumask_any_and(housekeeping_mask, cpu_online_mask);
-#endif
return smp_processor_id();
}
static inline const struct cpumask *housekeeping_cpumask(void)
{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- return housekeeping_mask;
-#endif
return cpu_possible_mask;
}
+static inline void housekeeping_affine(struct task_struct *t) { }
+static inline void housekeeping_init(void) { }
+#endif /* CONFIG_NO_HZ_FULL */
+
static inline bool is_housekeeping_cpu(int cpu)
{
#ifdef CONFIG_NO_HZ_FULL
if (tick_nohz_full_enabled())
- return cpumask_test_cpu(cpu, housekeeping_mask);
+ return housekeeping_test_cpu(cpu);
#endif
return true;
}
-static inline void housekeeping_affine(struct task_struct *t)
-{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- set_cpus_allowed_ptr(t, housekeeping_mask);
-
-#endif
-}
-
#endif /* _LINUX_HOUSEKEEPING_H */
diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
index 6d8afd5..0183e75 100644
--- a/kernel/housekeeping.c
+++ b/kernel/housekeeping.c
@@ -10,7 +10,38 @@
#include <linux/tick.h>
#include <linux/init.h>
-cpumask_var_t housekeeping_mask;
+static cpumask_var_t housekeeping_mask;
+
+int housekeeping_any_cpu(void)
+{
+ if (tick_nohz_full_enabled())
+ return cpumask_any_and(housekeeping_mask, cpu_online_mask);
+
+ return smp_processor_id();
+}
+
+const struct cpumask *housekeeping_cpumask(void)
+{
+ if (tick_nohz_full_enabled())
+ return housekeeping_mask;
+
+ return cpu_possible_mask;
+}
+
+void housekeeping_affine(struct task_struct *t)
+{
+ if (tick_nohz_full_enabled())
+ set_cpus_allowed_ptr(t, housekeeping_mask);
+}
+
+bool housekeeping_test_cpu(int cpu)
+{
+ if (tick_nohz_full_enabled())
+ return cpumask_test_cpu(cpu, housekeeping_mask);
+
+ return true;
+}
+
void __init housekeeping_init(void)
{
--
2.7.4
housekeeping_any_cpu() doesn't handle correctly the case where
CONFIG_NO_HZ_FULL=y and no CPU is in nohz_full mode. So far no caller
needs this but let's prepare to avoid any future surprise.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index 3d6a8e6..64d0ee5 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -7,24 +7,19 @@
#ifdef CONFIG_NO_HZ_FULL
extern cpumask_var_t housekeeping_mask;
-
-static inline int housekeeping_any_cpu(void)
-{
- return cpumask_any_and(housekeeping_mask, cpu_online_mask);
-}
-
extern void __init housekeeping_init(void);
-
#else
-
-static inline int housekeeping_any_cpu(void)
-{
- return smp_processor_id();
-}
-
static inline void housekeeping_init(void) { }
#endif /* CONFIG_NO_HZ_FULL */
+static inline int housekeeping_any_cpu(void)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ if (tick_nohz_full_enabled())
+ return cpumask_any_and(housekeeping_mask, cpu_online_mask);
+#endif
+ return smp_processor_id();
+}
static inline const struct cpumask *housekeeping_cpumask(void)
{
--
2.7.4
On Wed, 23 Aug 2017, Frederic Weisbecker wrote:
> While at it, this is a proposition for a reimplementation of isolcpus=
> that doesn't involve scheduler domain isolation. Therefore this
> brings a behaviour change: all user tasks inherit init/1 affinity which
> avoid the isolcpus= range. But if a task later overrides its affinity
> which turns out to intersect an isolated CPU, load balancing may occur
> on it.
I think that change is good maybe even a bugfix. I had some people be very
surprised when they set affinities to multiple cpus and the processeds
kept sticking to one cpu because of isolcpus.
On Wed, Aug 23, 2017 at 09:55:51AM -0500, Christopher Lameter wrote:
> On Wed, 23 Aug 2017, Frederic Weisbecker wrote:
>
> > While at it, this is a proposition for a reimplementation of isolcpus=
> > that doesn't involve scheduler domain isolation. Therefore this
> > brings a behaviour change: all user tasks inherit init/1 affinity which
> > avoid the isolcpus= range. But if a task later overrides its affinity
> > which turns out to intersect an isolated CPU, load balancing may occur
> > on it.
>
> I think that change is good maybe even a bugfix. I had some people be very
> surprised when they set affinities to multiple cpus and the processeds
> kept sticking to one cpu because of isolcpus.
That's good to hear! I'll keep that direction then, unless someone complains.
On Wed, Aug 23, 2017 at 03:51:11AM +0200, Frederic Weisbecker wrote:
> We want to centralize the isolation features on the housekeeping
> subsystem and scheduler isolation is a significant part of it.
>
> While at it, this is a proposition for a reimplementation of isolcpus=
> that doesn't involve scheduler domain isolation. Therefore this
> brings a behaviour change: all user tasks inherit init/1 affinity which
> avoid the isolcpus= range. But if a task later overrides its affinity
> which turns out to intersect an isolated CPU, load balancing may occur
> on it.
>
> OTOH such a reimplementation that doesn't shortcut scheduler internals
> makes a better candidate for an interface extension to cpuset.
Not sure we can do this. It'll break users that rely on the no
scheduling thing, that's a well documented part of isolcpus.
On Wed, Aug 23, 2017 at 09:55:51AM -0500, Christopher Lameter wrote:
> On Wed, 23 Aug 2017, Frederic Weisbecker wrote:
>
> > While at it, this is a proposition for a reimplementation of isolcpus=
> > that doesn't involve scheduler domain isolation. Therefore this
> > brings a behaviour change: all user tasks inherit init/1 affinity which
> > avoid the isolcpus= range. But if a task later overrides its affinity
> > which turns out to intersect an isolated CPU, load balancing may occur
> > on it.
>
> I think that change is good maybe even a bugfix. I had some people be very
> surprised when they set affinities to multiple cpus and the processeds
> kept sticking to one cpu because of isolcpus.
Those people cannot read. And no its not a bug fix. Its documented and
intended behaviour.
On Mon, Aug 28, 2017 at 12:09:57PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 23, 2017 at 03:51:11AM +0200, Frederic Weisbecker wrote:
> > We want to centralize the isolation features on the housekeeping
> > subsystem and scheduler isolation is a significant part of it.
> >
> > While at it, this is a proposition for a reimplementation of isolcpus=
> > that doesn't involve scheduler domain isolation. Therefore this
> > brings a behaviour change: all user tasks inherit init/1 affinity which
> > avoid the isolcpus= range. But if a task later overrides its affinity
> > which turns out to intersect an isolated CPU, load balancing may occur
> > on it.
> >
> > OTOH such a reimplementation that doesn't shortcut scheduler internals
> > makes a better candidate for an interface extension to cpuset.
>
> Not sure we can do this. It'll break users that rely on the no
> scheduling thing, that's a well documented part of isolcpus.
That was my worry :-s That NULL domain was probably a design mistake and
I fear we now have to maintain it.
On Mon, Aug 28, 2017 at 03:23:06PM +0200, Frederic Weisbecker wrote:
> On Mon, Aug 28, 2017 at 12:09:57PM +0200, Peter Zijlstra wrote:
> > On Wed, Aug 23, 2017 at 03:51:11AM +0200, Frederic Weisbecker wrote:
> > > We want to centralize the isolation features on the housekeeping
> > > subsystem and scheduler isolation is a significant part of it.
> > >
> > > While at it, this is a proposition for a reimplementation of isolcpus=
> > > that doesn't involve scheduler domain isolation. Therefore this
> > > brings a behaviour change: all user tasks inherit init/1 affinity which
> > > avoid the isolcpus= range. But if a task later overrides its affinity
> > > which turns out to intersect an isolated CPU, load balancing may occur
> > > on it.
> > >
> > > OTOH such a reimplementation that doesn't shortcut scheduler internals
> > > makes a better candidate for an interface extension to cpuset.
> >
> > Not sure we can do this. It'll break users that rely on the no
> > scheduling thing, that's a well documented part of isolcpus.
>
> That was my worry :-s That NULL domain was probably a design mistake and
> I fear we now have to maintain it.
I'm fairly sure that was very intentional. If you want to isolate stuff
you don't want load-balancing. You get the same NULL domain with cpusets
if you disable balancing for a set of CPUs.
Now, I completely hate the isolcpus feature and wish is a speedy death,
but replacing it with something sensible is difficult because cgroups
:-(
On Mon, Aug 28, 2017 at 03:31:16PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 28, 2017 at 03:23:06PM +0200, Frederic Weisbecker wrote:
> > On Mon, Aug 28, 2017 at 12:09:57PM +0200, Peter Zijlstra wrote:
> > > On Wed, Aug 23, 2017 at 03:51:11AM +0200, Frederic Weisbecker wrote:
> > > > We want to centralize the isolation features on the housekeeping
> > > > subsystem and scheduler isolation is a significant part of it.
> > > >
> > > > While at it, this is a proposition for a reimplementation of isolcpus=
> > > > that doesn't involve scheduler domain isolation. Therefore this
> > > > brings a behaviour change: all user tasks inherit init/1 affinity which
> > > > avoid the isolcpus= range. But if a task later overrides its affinity
> > > > which turns out to intersect an isolated CPU, load balancing may occur
> > > > on it.
> > > >
> > > > OTOH such a reimplementation that doesn't shortcut scheduler internals
> > > > makes a better candidate for an interface extension to cpuset.
> > >
> > > Not sure we can do this. It'll break users that rely on the no
> > > scheduling thing, that's a well documented part of isolcpus.
> >
> > That was my worry :-s That NULL domain was probably a design mistake and
> > I fear we now have to maintain it.
>
> I'm fairly sure that was very intentional. If you want to isolate stuff
> you don't want load-balancing.
Yes I guess that was intentional. In fact having NULL domains is convenient
as it also isolates from many things: tasks, workqueues, timers.
Although for example I guess (IIUC) that if you create an unbound timer on a NULL
domain, it will be stuck on it for ever as we can't walk any hierarchy from the
current CPU domain. I'm not sure how much that can apply to unbound workqueues
as well. But the thing is with NULL domains: things can not migrate in and neither
can them migrate out, which is not exactly what CPU isolation wants.
> You get the same NULL domain with cpusets if you disable balancing for a set of CPUs.
Ok, I didn't know that.
>
> Now, I completely hate the isolcpus feature and wish is a speedy death,
> but replacing it with something sensible is difficult because cgroups
> :-(
Ah, that would break cgroup somehow?
On Mon, Aug 28, 2017 at 05:27:15PM +0200, Frederic Weisbecker wrote:
> On Mon, Aug 28, 2017 at 03:31:16PM +0200, Peter Zijlstra wrote:
> > I'm fairly sure that was very intentional. If you want to isolate stuff
> > you don't want load-balancing.
>
> Yes I guess that was intentional. In fact having NULL domains is convenient
> as it also isolates from many things: tasks, workqueues, timers.
Huh, what? That's entirely unrelated to the NULL domain.
The reason people like isolcpus= is that is ensures _nothing_ runs on
those CPUs before you explicitly place something there.
_That_ is what ensures there are no timers etc.. placed on those CPUs.
Once you run something on that CPU, it stays there.
It is also what I dislike about isolcpus, its a boot time feature, if
you want to reconfigure your system you need a reboot.
> Although for example I guess (IIUC) that if you create an unbound
> timer on a NULL domain, it will be stuck on it for ever as we can't
> walk any hierarchy from the current CPU domain.
Not sure what you're on about. Timers have their own hierarchy.
> I'm not sure how much that can apply to unbound workqueues
> as well.
Well, unbound workqueued will not immediately end up on those CPUs,
since they'll have an affinity exlusive of those CPUs per construction.
But IIRC there's an affinity setting for workqueues where you could
force it on if you wanted to.
> But the thing is with NULL domains: things can not migrate in and neither
> can them migrate out, which is not exactly what CPU isolation wants.
No, its exactly what they want. You get what you put in and nothing
more. If you want something else, use cpusets.
> > Now, I completely hate the isolcpus feature and wish is a speedy death,
> > but replacing it with something sensible is difficult because cgroups
> > :-(
>
> Ah, that would break cgroup somehow?
Well, ideally something like this would start the system with all the
'crap' threads in !root cgroup. But that means cgroupfs needs to be
populated with at least two directories on boot. And current cgroup
cruft doesn't expect that.
On Mon, 28 Aug 2017, Peter Zijlstra wrote:
> > I think that change is good maybe even a bugfix. I had some people be very
> > surprised when they set affinities to multiple cpus and the processeds
> > kept sticking to one cpu because of isolcpus.
>
> Those people cannot read. And no its not a bug fix. Its documented and
> intended behaviour.
Well the next step was to create a cgroup with those processors and
suddenly load balancing worked again.
This is all pretty confusing stuff. I would rather get rid of isolcpus and
rely on the process affinities set to a single processor, and the removal
of the this processor from all other processes as a sufficient.
I think this already does the right thing. What is mentioned in the isolcpus
documentation is a worry about "suboptimal scheduler performance".
Could we address that issue (if it is still there) and then get rid of
isolcpus?
On Mon, 28 Aug 2017, Peter Zijlstra wrote:
> Well, ideally something like this would start the system with all the
> 'crap' threads in !root cgroup. But that means cgroupfs needs to be
> populated with at least two directories on boot. And current cgroup
> cruft doesn't expect that.
Maybe an affinity mask for bootup will take care of that? I once wrote an
init wrapper that restricted the number of cpus for the threads that init
spawns but we can probably do much better.
On Mon, Aug 28, 2017 at 06:24:16PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 28, 2017 at 05:27:15PM +0200, Frederic Weisbecker wrote:
> > On Mon, Aug 28, 2017 at 03:31:16PM +0200, Peter Zijlstra wrote:
>
> > > I'm fairly sure that was very intentional. If you want to isolate stuff
> > > you don't want load-balancing.
> >
> > Yes I guess that was intentional. In fact having NULL domains is convenient
> > as it also isolates from many things: tasks, workqueues, timers.
>
> Huh, what? That's entirely unrelated to the NULL domain.
>
> The reason people like isolcpus= is that is ensures _nothing_ runs on
> those CPUs before you explicitly place something there.
>
> _That_ is what ensures there are no timers etc.. placed on those CPUs.
Sure that's what I meant.
>
> Once you run something on that CPU, it stays there.
>
> It is also what I dislike about isolcpus, its a boot time feature, if
> you want to reconfigure your system you need a reboot.
Indeed.
>
> > Although for example I guess (IIUC) that if you create an unbound
> > timer on a NULL domain, it will be stuck on it for ever as we can't
> > walk any hierarchy from the current CPU domain.
>
> Not sure what you're on about. Timers have their own hierarchy.
Check out get_nohz_timer_target() which relies on scheduler hierarchies to
look up a CPU to enqueue an unpinned timer on.
>
> > I'm not sure how much that can apply to unbound workqueues
> > as well.
>
> Well, unbound workqueued will not immediately end up on those CPUs,
> since they'll have an affinity exlusive of those CPUs per construction.
Ah that's right.
> But IIRC there's an affinity setting for workqueues where you could
> force it on if you wanted to.
Yep: /sys/devices/virtual/workqueue/cpumask
>
> > But the thing is with NULL domains: things can not migrate in and neither
> > can them migrate out, which is not exactly what CPU isolation wants.
>
> No, its exactly what they want. You get what you put in and nothing
> more. If you want something else, use cpusets.
That's still a subtle behaviour that involves knowledge of some scheduler
core details. I wish we hadn't exposed such a low level scheduler control
as a general purpose kernel parameter.
Anyway at least that confirms one worry we had: kernel parameters are kernel
ABI that we can't break.
>
> > > Now, I completely hate the isolcpus feature and wish is a speedy death,
> > > but replacing it with something sensible is difficult because cgroups
> > > :-(
> >
> > Ah, that would break cgroup somehow?
>
> Well, ideally something like this would start the system with all the
> 'crap' threads in !root cgroup. But that means cgroupfs needs to be
> populated with at least two directories on boot. And current cgroup
> cruft doesn't expect that.
Ah I see.
Thanks!
On Mon, 28 Aug 2017, Frederic Weisbecker wrote:
> On Mon, Aug 28, 2017 at 06:24:16PM +0200, Peter Zijlstra wrote:
> > On Mon, Aug 28, 2017 at 05:27:15PM +0200, Frederic Weisbecker wrote:
> > > Although for example I guess (IIUC) that if you create an unbound
> > > timer on a NULL domain, it will be stuck on it for ever as we can't
> > > walk any hierarchy from the current CPU domain.
> >
> > Not sure what you're on about. Timers have their own hierarchy.
>
> Check out get_nohz_timer_target() which relies on scheduler hierarchies to
> look up a CPU to enqueue an unpinned timer on.
Which is one of the most idiotic things we have in that code
path. Anna-Maria has posted this series which gets rid of that nonsense, by
queueing the timer on the current cpu into a wheel, which gets pulled in by
others. That makes a lot of sense because most of these timers get canceled
before expiry anyway. But we still need to fix the fallout and the few
corner cases to make that work reliably. We'll do that hopefully sooner
than later.
Thanks,
tglx
On Wed, 2017-08-23 at 03:51 +0200, Frederic Weisbecker wrote:
> The housekeeping code is currently tied to the nohz code. As we are
> planning to make housekeeping independant from it, start with moving
> the relevant code to its own file.
>
Why are nohz full and housekeeping being
decoupled from each other?
Won't people want to use them together?
What use case am I missing?
--
All rights reversed
On Thu, Aug 31, 2017 at 04:16:16PM -0400, Rik van Riel wrote:
> On Wed, 2017-08-23 at 03:51 +0200, Frederic Weisbecker wrote:
> > The housekeeping code is currently tied to the nohz code. As we are
> > planning to make housekeeping independant from it, start with moving
> > the relevant code to its own file.
> >
> Why are nohz full and housekeeping being
> decoupled from each other?
>
> Won't people want to use them together?
>
> What use case am I missing?
So nohz is really just a feature and it shouldn't decide about other isolation
features. It should be the opposite: isolation picks up nohz, alongside other
isolation things. I think we did a layering misdesign. So it's mostly just a code
reorganisation.
While at it, isolcpus= is also part of the isolation toolset. So I thought we should
centralize all this isolation code in a common subsystem.
On Thu, Aug 31, 2017 at 08:53:56PM +0200, Thomas Gleixner wrote:
> On Mon, 28 Aug 2017, Frederic Weisbecker wrote:
> > On Mon, Aug 28, 2017 at 06:24:16PM +0200, Peter Zijlstra wrote:
> > > On Mon, Aug 28, 2017 at 05:27:15PM +0200, Frederic Weisbecker wrote:
> > > > Although for example I guess (IIUC) that if you create an unbound
> > > > timer on a NULL domain, it will be stuck on it for ever as we can't
> > > > walk any hierarchy from the current CPU domain.
> > >
> > > Not sure what you're on about. Timers have their own hierarchy.
> >
> > Check out get_nohz_timer_target() which relies on scheduler hierarchies to
> > look up a CPU to enqueue an unpinned timer on.
>
> Which is one of the most idiotic things we have in that code
> path. Anna-Maria has posted this series which gets rid of that nonsense, by
> queueing the timer on the current cpu into a wheel, which gets pulled in by
> others. That makes a lot of sense because most of these timers get canceled
> before expiry anyway. But we still need to fix the fallout and the few
> corner cases to make that work reliably. We'll do that hopefully sooner
> than later.
Sure, I definetly agree with that change.