2017-07-21 13:21:38

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 0/9] Introduce housekeeping subsystem

I'm leaving for two weeks so this is food for thoughts in the meantime :)

We have a design issue with nohz_full: it drives the isolation features
through the *housekeeping*() functions: kthreads, unpinned timers,
watchdog, ...

But things should work the other way around because the tick is just an
isolation feature among others.

So we need a housekeeping subsystem to drive all these isolation
features, including nohz full in a later iteration. For now this is a
basic draft. In the long run this subsystem should also drive the tick
offloading (remove residual 1Hz) and all unbound kthreads.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
nohz/0hz

HEAD: 68e3af1de5db228bf6c2a5e721bce59a02cfc4e1

Thanks,
Frederic
---

Frederic Weisbecker (9):
housekeeping: Move housekeeping related code to its own file
watchdog: Use housekeeping_cpumask() instead of ad-hoc version
housekeeping: Provide a dynamic off-case to housekeeping_any_cpu()
housekeeping: Make housekeeping cpumask private
housekeeping: Use its own static key
housekeeping: Rename is_housekeeping_cpu to housekeeping_cpu
housekeeping: Use own boot option, independant from nohz
housekeeping: Move it under own config, independant from NO_HZ
workqueue: Affine unbound workqueues to housekeeping cpumask


drivers/net/ethernet/tile/tilegx.c | 2 +-
include/linux/housekeeping.h | 39 +++++++++++++++++++++++
include/linux/tick.h | 37 ----------------------
init/Kconfig | 6 ++++
kernel/Makefile | 1 +
kernel/housekeeping.c | 65 ++++++++++++++++++++++++++++++++++++++
kernel/rcu/tree_plugin.h | 1 +
kernel/rcu/update.c | 1 +
kernel/sched/core.c | 7 ++--
kernel/sched/fair.c | 3 +-
kernel/time/tick-sched.c | 18 -----------
kernel/watchdog.c | 12 +++----
kernel/workqueue.c | 3 +-
13 files changed, 126 insertions(+), 69 deletions(-)


2017-07-21 13:21:46

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 1/9] housekeeping: Move housekeeping related code to its own file

The housekeeping code is currently tied to the nohz code. As we are
planning to make housekeeping independant from it, start with moving
the relevant code to its own file.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
drivers/net/ethernet/tile/tilegx.c | 2 +-
include/linux/housekeeping.h | 56 ++++++++++++++++++++++++++++++++++++++
include/linux/tick.h | 37 -------------------------
init/main.c | 2 ++
kernel/Makefile | 1 +
kernel/housekeeping.c | 32 ++++++++++++++++++++++
kernel/rcu/tree_plugin.h | 1 +
kernel/rcu/update.c | 1 +
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 1 +
kernel/time/tick-sched.c | 18 ------------
kernel/watchdog.c | 1 +
12 files changed, 97 insertions(+), 56 deletions(-)
create mode 100644 include/linux/housekeeping.h
create mode 100644 kernel/housekeeping.c

diff --git a/drivers/net/ethernet/tile/tilegx.c b/drivers/net/ethernet/tile/tilegx.c
index aec9538..eb74e09 100644
--- a/drivers/net/ethernet/tile/tilegx.c
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -40,7 +40,7 @@
#include <linux/tcp.h>
#include <linux/net_tstamp.h>
#include <linux/ptp_clock_kernel.h>
-#include <linux/tick.h>
+#include <linux/housekeeping.h>

#include <asm/checksum.h>
#include <asm/homecache.h>
diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
new file mode 100644
index 0000000..3d6a8e6
--- /dev/null
+++ b/include/linux/housekeeping.h
@@ -0,0 +1,56 @@
+#ifndef _LINUX_HOUSEKEEPING_H
+#define _LINUX_HOUSEKEEPING_H
+
+#include <linux/cpumask.h>
+#include <linux/init.h>
+#include <linux/tick.h>
+
+#ifdef CONFIG_NO_HZ_FULL
+extern cpumask_var_t housekeeping_mask;
+
+static inline int housekeeping_any_cpu(void)
+{
+ return cpumask_any_and(housekeeping_mask, cpu_online_mask);
+}
+
+extern void __init housekeeping_init(void);
+
+#else
+
+static inline int housekeeping_any_cpu(void)
+{
+ return smp_processor_id();
+}
+
+static inline void housekeeping_init(void) { }
+#endif /* CONFIG_NO_HZ_FULL */
+
+
+static inline const struct cpumask *housekeeping_cpumask(void)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ if (tick_nohz_full_enabled())
+ return housekeeping_mask;
+#endif
+ return cpu_possible_mask;
+}
+
+static inline bool is_housekeeping_cpu(int cpu)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ if (tick_nohz_full_enabled())
+ return cpumask_test_cpu(cpu, housekeeping_mask);
+#endif
+ return true;
+}
+
+static inline void housekeeping_affine(struct task_struct *t)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ if (tick_nohz_full_enabled())
+ set_cpus_allowed_ptr(t, housekeeping_mask);
+
+#endif
+}
+
+#endif /* _LINUX_HOUSEKEEPING_H */
diff --git a/include/linux/tick.h b/include/linux/tick.h
index fe01e68..68afc09 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -137,7 +137,6 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
#ifdef CONFIG_NO_HZ_FULL
extern bool tick_nohz_full_running;
extern cpumask_var_t tick_nohz_full_mask;
-extern cpumask_var_t housekeeping_mask;

static inline bool tick_nohz_full_enabled(void)
{
@@ -161,11 +160,6 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
cpumask_or(mask, mask, tick_nohz_full_mask);
}

-static inline int housekeeping_any_cpu(void)
-{
- return cpumask_any_and(housekeeping_mask, cpu_online_mask);
-}
-
extern void tick_nohz_dep_set(enum tick_dep_bits bit);
extern void tick_nohz_dep_clear(enum tick_dep_bits bit);
extern void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit);
@@ -235,10 +229,6 @@ static inline void tick_dep_clear_signal(struct signal_struct *signal,
extern void tick_nohz_full_kick_cpu(int cpu);
extern void __tick_nohz_task_switch(void);
#else
-static inline int housekeeping_any_cpu(void)
-{
- return smp_processor_id();
-}
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
@@ -260,33 +250,6 @@ static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void __tick_nohz_task_switch(void) { }
#endif

-static inline const struct cpumask *housekeeping_cpumask(void)
-{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- return housekeeping_mask;
-#endif
- return cpu_possible_mask;
-}
-
-static inline bool is_housekeeping_cpu(int cpu)
-{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- return cpumask_test_cpu(cpu, housekeeping_mask);
-#endif
- return true;
-}
-
-static inline void housekeeping_affine(struct task_struct *t)
-{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- set_cpus_allowed_ptr(t, housekeeping_mask);
-
-#endif
-}
-
static inline void tick_nohz_task_switch(void)
{
if (tick_nohz_full_enabled())
diff --git a/init/main.c b/init/main.c
index 9789ab7..9904a1e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -46,6 +46,7 @@
#include <linux/cgroup.h>
#include <linux/efi.h>
#include <linux/tick.h>
+#include <linux/housekeeping.h>
#include <linux/interrupt.h>
#include <linux/taskstats_kern.h>
#include <linux/delayacct.h>
@@ -607,6 +608,7 @@ asmlinkage __visible void __init start_kernel(void)
early_irq_init();
init_IRQ();
tick_init();
+ housekeeping_init();
rcu_init_nohz();
init_timers();
hrtimers_init();
diff --git a/kernel/Makefile b/kernel/Makefile
index 4cb8e8b..8a85c4b 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -109,6 +109,7 @@ obj-$(CONFIG_JUMP_LABEL) += jump_label.o
obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o
obj-$(CONFIG_TORTURE_TEST) += torture.o
obj-$(CONFIG_MEMBARRIER) += membarrier.o
+obj-$(CONFIG_NO_HZ_FULL) += housekeeping.o

obj-$(CONFIG_HAS_IOMEM) += memremap.o

diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
new file mode 100644
index 0000000..6d8afd5
--- /dev/null
+++ b/kernel/housekeeping.c
@@ -0,0 +1,32 @@
+/*
+ * Housekeeping management. Manage the targets for routine code that can run on
+ * any CPU: unbound workqueues, timers, kthreads and any offloadable work.
+ *
+ * Copyright (C) 2017 Red Hat, Inc., Frederic Weisbecker
+ *
+ */
+
+#include <linux/housekeeping.h>
+#include <linux/tick.h>
+#include <linux/init.h>
+
+cpumask_var_t housekeeping_mask;
+
+void __init housekeeping_init(void)
+{
+ if (!tick_nohz_full_enabled())
+ return;
+
+ if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
+ WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
+ cpumask_clear(tick_nohz_full_mask);
+ tick_nohz_full_running = false;
+ return;
+ }
+
+ cpumask_andnot(housekeeping_mask,
+ cpu_possible_mask, tick_nohz_full_mask);
+
+ /* We need at least one CPU to handle housekeeping work */
+ WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
+}
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 908b309..c66d162 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -29,6 +29,7 @@
#include <linux/oom.h>
#include <linux/sched/debug.h>
#include <linux/smpboot.h>
+#include <linux/housekeeping.h>
#include <uapi/linux/sched/types.h>
#include "../time/tick-internal.h"

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 00e77c4..bfe973d 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -51,6 +51,7 @@
#include <linux/kthread.h>
#include <linux/tick.h>
#include <linux/rcupdate_wait.h>
+#include <linux/housekeeping.h>

#define CREATE_TRACE_POINTS

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d3d39a2..751c040 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -26,6 +26,7 @@
#include <linux/profile.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/housekeeping.h>

#include <asm/switch_to.h>
#include <asm/tlb.h>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c95880e..05dfced 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -32,6 +32,7 @@
#include <linux/mempolicy.h>
#include <linux/migrate.h>
#include <linux/task_work.h>
+#include <linux/housekeeping.h>

#include <trace/events/sched.h>

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c7a899c..9d29dee 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -165,7 +165,6 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)

#ifdef CONFIG_NO_HZ_FULL
cpumask_var_t tick_nohz_full_mask;
-cpumask_var_t housekeeping_mask;
bool tick_nohz_full_running;
static atomic_t tick_dep_mask;

@@ -437,13 +436,6 @@ void __init tick_nohz_init(void)
return;
}

- if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
- WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
- cpumask_clear(tick_nohz_full_mask);
- tick_nohz_full_running = false;
- return;
- }
-
/*
* Full dynticks uses irq work to drive the tick rescheduling on safe
* locking contexts. But then we need irq work to raise its own
@@ -452,7 +444,6 @@ void __init tick_nohz_init(void)
if (!arch_irq_work_has_interrupt()) {
pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support irq work self-IPIs\n");
cpumask_clear(tick_nohz_full_mask);
- cpumask_copy(housekeeping_mask, cpu_possible_mask);
tick_nohz_full_running = false;
return;
}
@@ -465,9 +456,6 @@ void __init tick_nohz_init(void)
cpumask_clear_cpu(cpu, tick_nohz_full_mask);
}

- cpumask_andnot(housekeeping_mask,
- cpu_possible_mask, tick_nohz_full_mask);
-
for_each_cpu(cpu, tick_nohz_full_mask)
context_tracking_cpu_set(cpu);

@@ -477,12 +465,6 @@ void __init tick_nohz_init(void)
WARN_ON(ret < 0);
pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
cpumask_pr_args(tick_nohz_full_mask));
-
- /*
- * We need at least one CPU to handle housekeeping work such
- * as timekeeping, unbound timers, workqueues, ...
- */
- WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
}
#endif

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 06d3389..7a9df162 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -24,6 +24,7 @@
#include <linux/workqueue.h>
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
+#include <linux/housekeeping.h>

#include <asm/irq_regs.h>
#include <linux/kvm_para.h>
--
2.7.4

2017-07-21 13:21:52

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 5/9] housekeeping: Use its own static key

Housekeeping code still depends on nohz_full static key. Since we want
to decouple housekeeping from nohz, lets create a housekeeping own static
key. It's mostly relevant for calls to is_housekeeping_cpu() from the
scheduler.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 3 ++-
kernel/housekeeping.c | 14 +++++++++-----
2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index 31a1401..cbe8d63 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -6,6 +6,7 @@
#include <linux/tick.h>

#ifdef CONFIG_NO_HZ_FULL
+DECLARE_STATIC_KEY_FALSE(housekeeping_overriden);
extern int housekeeping_any_cpu(void);
extern const struct cpumask *housekeeping_cpumask(void);
extern void housekeeping_affine(struct task_struct *t);
@@ -31,7 +32,7 @@ static inline void housekeeping_init(void) { }
static inline bool is_housekeeping_cpu(int cpu)
{
#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
+ if (static_branch_unlikely(&housekeeping_overriden))
return housekeeping_test_cpu(cpu);
#endif
return true;
diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
index 0183e75..f8be7e6 100644
--- a/kernel/housekeeping.c
+++ b/kernel/housekeeping.c
@@ -9,12 +9,15 @@
#include <linux/housekeeping.h>
#include <linux/tick.h>
#include <linux/init.h>
+#include <linux/static_key.h>

+DEFINE_STATIC_KEY_FALSE(housekeeping_overriden);
+EXPORT_SYMBOL_GPL(housekeeping_overriden);
static cpumask_var_t housekeeping_mask;

int housekeeping_any_cpu(void)
{
- if (tick_nohz_full_enabled())
+ if (static_branch_unlikely(&housekeeping_overriden))
return cpumask_any_and(housekeeping_mask, cpu_online_mask);

return smp_processor_id();
@@ -22,7 +25,7 @@ int housekeeping_any_cpu(void)

const struct cpumask *housekeeping_cpumask(void)
{
- if (tick_nohz_full_enabled())
+ if (static_branch_unlikely(&housekeeping_overriden))
return housekeeping_mask;

return cpu_possible_mask;
@@ -30,19 +33,18 @@ const struct cpumask *housekeeping_cpumask(void)

void housekeeping_affine(struct task_struct *t)
{
- if (tick_nohz_full_enabled())
+ if (static_branch_unlikely(&housekeeping_overriden))
set_cpus_allowed_ptr(t, housekeeping_mask);
}

bool housekeeping_test_cpu(int cpu)
{
- if (tick_nohz_full_enabled())
+ if (static_branch_unlikely(&housekeeping_overriden))
return cpumask_test_cpu(cpu, housekeeping_mask);

return true;
}

-
void __init housekeeping_init(void)
{
if (!tick_nohz_full_enabled())
@@ -58,6 +60,8 @@ void __init housekeeping_init(void)
cpumask_andnot(housekeeping_mask,
cpu_possible_mask, tick_nohz_full_mask);

+ static_branch_enable(&housekeeping_overriden);
+
/* We need at least one CPU to handle housekeeping work */
WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
}
--
2.7.4

2017-07-21 13:21:50

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 3/9] housekeeping: Provide a dynamic off-case to housekeeping_any_cpu()

housekeeping_any_cpu() doesn't handle correctly the case where
CONFIG_NO_HZ_FULL=y and no CPU is in nohz_full mode. So far no caller
needs this but lets prepare to avoid any future surprise.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index 3d6a8e6..64d0ee5 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -7,24 +7,19 @@

#ifdef CONFIG_NO_HZ_FULL
extern cpumask_var_t housekeeping_mask;
-
-static inline int housekeeping_any_cpu(void)
-{
- return cpumask_any_and(housekeeping_mask, cpu_online_mask);
-}
-
extern void __init housekeeping_init(void);
-
#else
-
-static inline int housekeeping_any_cpu(void)
-{
- return smp_processor_id();
-}
-
static inline void housekeeping_init(void) { }
#endif /* CONFIG_NO_HZ_FULL */

+static inline int housekeeping_any_cpu(void)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ if (tick_nohz_full_enabled())
+ return cpumask_any_and(housekeeping_mask, cpu_online_mask);
+#endif
+ return smp_processor_id();
+}

static inline const struct cpumask *housekeeping_cpumask(void)
{
--
2.7.4

2017-07-21 13:22:16

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 9/9] workqueue: Affine unbound workqueues to housekeeping cpumask

Although the unbound workqueue cpumask can be overriden through sysfs,
the housekeeping cpumask which drives the CPU isolation provides a
relevant default value.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
kernel/workqueue.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a86688f..4303c06 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -48,6 +48,7 @@
#include <linux/nodemask.h>
#include <linux/moduleparam.h>
#include <linux/uaccess.h>
+#include <linux/housekeeping.h>

#include "workqueue_internal.h"

@@ -5524,7 +5525,7 @@ int __init workqueue_init_early(void)
WARN_ON(__alignof__(struct pool_workqueue) < __alignof__(long long));

BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
- cpumask_copy(wq_unbound_cpumask, cpu_possible_mask);
+ cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask());

pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);

--
2.7.4

2017-07-21 13:22:31

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

The housekeeping is currently driven by nohz_full where any CPU that
is not in the nohz_full range is considered as a housekeeper. This is
a design mistake because nohz is just a detail among all the existing
isolation features. Nohz shouldn't imply anything else than tick related
things.

We rather want to drive all the isolation features from the housekeeping
subsystem which is responsible for all the work that can be either
affined (unpinned workqueues, timers, kthreads, ...) or offloaded
(scheduler tick, ...).

Let's start with a boot option to define the housekeepers. We should be
able to further enhance that through cpusets.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 2 --
init/main.c | 2 --
kernel/housekeeping.c | 22 ++++++++++------------
3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index 320cc2b..ba769c8 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -11,7 +11,6 @@ extern int housekeeping_any_cpu(void);
extern const struct cpumask *housekeeping_cpumask(void);
extern void housekeeping_affine(struct task_struct *t);
extern bool housekeeping_test_cpu(int cpu);
-extern void __init housekeeping_init(void);

#else

@@ -26,7 +25,6 @@ static inline const struct cpumask *housekeeping_cpumask(void)
}

static inline void housekeeping_affine(struct task_struct *t) { }
-static inline void housekeeping_init(void) { }
#endif /* CONFIG_NO_HZ_FULL */

static inline bool housekeeping_cpu(int cpu)
diff --git a/init/main.c b/init/main.c
index 9904a1e..9789ab7 100644
--- a/init/main.c
+++ b/init/main.c
@@ -46,7 +46,6 @@
#include <linux/cgroup.h>
#include <linux/efi.h>
#include <linux/tick.h>
-#include <linux/housekeeping.h>
#include <linux/interrupt.h>
#include <linux/taskstats_kern.h>
#include <linux/delayacct.h>
@@ -608,7 +607,6 @@ asmlinkage __visible void __init start_kernel(void)
early_irq_init();
init_IRQ();
tick_init();
- housekeeping_init();
rcu_init_nohz();
init_timers();
hrtimers_init();
diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
index f8be7e6..a54765d 100644
--- a/kernel/housekeeping.c
+++ b/kernel/housekeeping.c
@@ -45,23 +45,21 @@ bool housekeeping_test_cpu(int cpu)
return true;
}

-void __init housekeeping_init(void)
+/* Parse the boot-time housekeeping CPU list from the kernel parameters. */
+static int __init housekeeping_setup(char *str)
{
- if (!tick_nohz_full_enabled())
- return;
-
- if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
- WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
- cpumask_clear(tick_nohz_full_mask);
- tick_nohz_full_running = false;
- return;
+ alloc_bootmem_cpumask_var(&housekeeping_mask);
+ if (cpulist_parse(str, housekeeping_mask) < 0) {
+ pr_warn("Housekeeping: Incorrect cpumask\n");
+ free_bootmem_cpumask_var(housekeeping_mask);
+ return 1;
}

- cpumask_andnot(housekeeping_mask,
- cpu_possible_mask, tick_nohz_full_mask);
-
static_branch_enable(&housekeeping_overriden);

/* We need at least one CPU to handle housekeeping work */
WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
+
+ return 1;
}
+__setup("housekeeping=", housekeeping_setup);
--
2.7.4

2017-07-21 13:22:29

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 8/9] housekeeping: Move it under own config, independant from NO_HZ

Complete the housekeeping split from CONFIG_NO_HZ_FULL by moving it
under its own config. This way we finally separate the isolation code
from nohz.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 6 +++---
init/Kconfig | 6 ++++++
kernel/Makefile | 2 +-
3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index ba769c8..d79a665 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -5,7 +5,7 @@
#include <linux/init.h>
#include <linux/tick.h>

-#ifdef CONFIG_NO_HZ_FULL
+#ifdef CONFIG_HOUSEKEEPING
DECLARE_STATIC_KEY_FALSE(housekeeping_overriden);
extern int housekeeping_any_cpu(void);
extern const struct cpumask *housekeeping_cpumask(void);
@@ -25,11 +25,11 @@ static inline const struct cpumask *housekeeping_cpumask(void)
}

static inline void housekeeping_affine(struct task_struct *t) { }
-#endif /* CONFIG_NO_HZ_FULL */
+#endif /* CONFIG_HOUSEKEEPING */

static inline bool housekeeping_cpu(int cpu)
{
-#ifdef CONFIG_NO_HZ_FULL
+#ifdef CONFIG_HOUSEKEEPING
if (static_branch_unlikely(&housekeeping_overriden))
return housekeeping_test_cpu(cpu);
#endif
diff --git a/init/Kconfig b/init/Kconfig
index 8514b25..cc7af32 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -472,6 +472,12 @@ config TASK_IO_ACCOUNTING

endmenu # "CPU/Task time and stats accounting"

+config HOUSEKEEPING
+ bool "Housekeeping work tuning"
+ help
+ Allow to affine and offload kernel internal routine jobs that are
+ usually executed on any CPU.
+
source "kernel/rcu/Kconfig"

config BUILD_BIN2C
diff --git a/kernel/Makefile b/kernel/Makefile
index 8a85c4b..58be459 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -109,7 +109,7 @@ obj-$(CONFIG_JUMP_LABEL) += jump_label.o
obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o
obj-$(CONFIG_TORTURE_TEST) += torture.o
obj-$(CONFIG_MEMBARRIER) += membarrier.o
-obj-$(CONFIG_NO_HZ_FULL) += housekeeping.o
+obj-$(CONFIG_HOUSEKEEPING) += housekeeping.o

obj-$(CONFIG_HAS_IOMEM) += memremap.o

--
2.7.4

2017-07-21 13:23:03

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 6/9] housekeeping: Rename is_housekeeping_cpu to housekeeping_cpu

To keep a proper housekeeping namespace.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 2 +-
kernel/sched/core.c | 6 +++---
kernel/sched/fair.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index cbe8d63..320cc2b 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -29,7 +29,7 @@ static inline void housekeeping_affine(struct task_struct *t) { }
static inline void housekeeping_init(void) { }
#endif /* CONFIG_NO_HZ_FULL */

-static inline bool is_housekeeping_cpu(int cpu)
+static inline bool housekeeping_cpu(int cpu)
{
#ifdef CONFIG_NO_HZ_FULL
if (static_branch_unlikely(&housekeeping_overriden))
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 751c040..adfee7f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -527,7 +527,7 @@ int get_nohz_timer_target(void)
int i, cpu = smp_processor_id();
struct sched_domain *sd;

- if (!idle_cpu(cpu) && is_housekeeping_cpu(cpu))
+ if (!idle_cpu(cpu) && housekeeping_cpu(cpu))
return cpu;

rcu_read_lock();
@@ -536,14 +536,14 @@ int get_nohz_timer_target(void)
if (cpu == i)
continue;

- if (!idle_cpu(i) && is_housekeeping_cpu(i)) {
+ if (!idle_cpu(i) && housekeeping_cpu(i)) {
cpu = i;
goto unlock;
}
}
}

- if (!is_housekeeping_cpu(cpu))
+ if (!housekeeping_cpu(cpu))
cpu = housekeeping_any_cpu();
unlock:
rcu_read_unlock();
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 05dfced..8ecd204 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8600,7 +8600,7 @@ void nohz_balance_enter_idle(int cpu)
return;

/* Spare idle load balancing on CPUs that don't want to be disturbed: */
- if (!is_housekeeping_cpu(cpu))
+ if (!housekeeping_cpu(cpu))
return;

if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
--
2.7.4

2017-07-21 13:21:44

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 2/9] watchdog: Use housekeeping_cpumask() instead of ad-hoc version

While trying to disable the watchog on nohz_full CPUs, the watchdog
implements an ad-hoc version of housekeeping_cpumask(). Lets replace
those re-invented lines.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
kernel/watchdog.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7a9df162..cdd0d11 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -941,15 +941,10 @@ void __init lockup_detector_init(void)
{
set_sample_period();

-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled()) {
+ if (tick_nohz_full_enabled())
pr_info("Disabling watchdog on nohz_full cores by default\n");
- cpumask_copy(&watchdog_cpumask, housekeeping_mask);
- } else
- cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
-#else
- cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
-#endif
+
+ cpumask_copy(&watchdog_cpumask, housekeeping_cpumask());

if (watchdog_enabled)
watchdog_enable_all_cpus();
--
2.7.4

2017-07-21 13:23:26

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC PATCH 4/9] housekeeping: Make housekeeping cpumask private

Nobody needs to access this detail. housekeeping_cpumask() already
takes care about it.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Luiz Capitulino <[email protected]>
---
include/linux/housekeeping.h | 31 ++++++++++---------------------
kernel/housekeeping.c | 33 ++++++++++++++++++++++++++++++++-
2 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
index 64d0ee5..31a1401 100644
--- a/include/linux/housekeeping.h
+++ b/include/linux/housekeeping.h
@@ -6,46 +6,35 @@
#include <linux/tick.h>

#ifdef CONFIG_NO_HZ_FULL
-extern cpumask_var_t housekeeping_mask;
+extern int housekeeping_any_cpu(void);
+extern const struct cpumask *housekeeping_cpumask(void);
+extern void housekeeping_affine(struct task_struct *t);
+extern bool housekeeping_test_cpu(int cpu);
extern void __init housekeeping_init(void);
+
#else
-static inline void housekeeping_init(void) { }
-#endif /* CONFIG_NO_HZ_FULL */

static inline int housekeeping_any_cpu(void)
{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- return cpumask_any_and(housekeeping_mask, cpu_online_mask);
-#endif
return smp_processor_id();
}

static inline const struct cpumask *housekeeping_cpumask(void)
{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- return housekeeping_mask;
-#endif
return cpu_possible_mask;
}

+static inline void housekeeping_affine(struct task_struct *t) { }
+static inline void housekeeping_init(void) { }
+#endif /* CONFIG_NO_HZ_FULL */
+
static inline bool is_housekeeping_cpu(int cpu)
{
#ifdef CONFIG_NO_HZ_FULL
if (tick_nohz_full_enabled())
- return cpumask_test_cpu(cpu, housekeeping_mask);
+ return housekeeping_test_cpu(cpu);
#endif
return true;
}

-static inline void housekeeping_affine(struct task_struct *t)
-{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_enabled())
- set_cpus_allowed_ptr(t, housekeeping_mask);
-
-#endif
-}
-
#endif /* _LINUX_HOUSEKEEPING_H */
diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
index 6d8afd5..0183e75 100644
--- a/kernel/housekeeping.c
+++ b/kernel/housekeeping.c
@@ -10,7 +10,38 @@
#include <linux/tick.h>
#include <linux/init.h>

-cpumask_var_t housekeeping_mask;
+static cpumask_var_t housekeeping_mask;
+
+int housekeeping_any_cpu(void)
+{
+ if (tick_nohz_full_enabled())
+ return cpumask_any_and(housekeeping_mask, cpu_online_mask);
+
+ return smp_processor_id();
+}
+
+const struct cpumask *housekeeping_cpumask(void)
+{
+ if (tick_nohz_full_enabled())
+ return housekeeping_mask;
+
+ return cpu_possible_mask;
+}
+
+void housekeeping_affine(struct task_struct *t)
+{
+ if (tick_nohz_full_enabled())
+ set_cpus_allowed_ptr(t, housekeeping_mask);
+}
+
+bool housekeeping_test_cpu(int cpu)
+{
+ if (tick_nohz_full_enabled())
+ return cpumask_test_cpu(cpu, housekeeping_mask);
+
+ return true;
+}
+

void __init housekeeping_init(void)
{
--
2.7.4

2017-07-21 19:48:33

by Chris Metcalf

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] Introduce housekeeping subsystem

On 7/21/2017 9:21 AM, Frederic Weisbecker wrote:
> I'm leaving for two weeks so this is food for thoughts in the meantime :)
>
> We have a design issue with nohz_full: it drives the isolation features
> through the *housekeeping*() functions: kthreads, unpinned timers,
> watchdog, ...
>
> But things should work the other way around because the tick is just an
> isolation feature among others.
>
> So we need a housekeeping subsystem to drive all these isolation
> features, including nohz full in a later iteration. For now this is a
> basic draft. In the long run this subsystem should also drive the tick
> offloading (remove residual 1Hz) and all unbound kthreads.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> nohz/0hz
>
> HEAD: 68e3af1de5db228bf6c2a5e721bce59a02cfc4e1

For the series:

Reviewed-by: Chris Metcalf <[email protected]>

I spotted a few typos that you should grep for and fix for your next
version:
"watchog", "Lets/lets" instead of "Let's/let's", "overriden" (should
have two d's).

The new housekeeping=MASK boot option seems like it might make it a little
irritating to specify nohz_full=MASK as well. I guess if setting
NO_HZ_FULL_ALL
implied "all but housekeeping", it becomes a reasonably tidy solution.
To make
this work right you might have to make the housekeeping option early_param
instead so its value is available early enough.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2017-08-10 12:54:46

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] Introduce housekeeping subsystem

On Fri, Jul 21, 2017 at 03:48:16PM -0400, Chris Metcalf wrote:
> On 7/21/2017 9:21 AM, Frederic Weisbecker wrote:
> >I'm leaving for two weeks so this is food for thoughts in the meantime :)
> >
> >We have a design issue with nohz_full: it drives the isolation features
> >through the *housekeeping*() functions: kthreads, unpinned timers,
> >watchdog, ...
> >
> >But things should work the other way around because the tick is just an
> >isolation feature among others.
> >
> >So we need a housekeeping subsystem to drive all these isolation
> >features, including nohz full in a later iteration. For now this is a
> >basic draft. In the long run this subsystem should also drive the tick
> >offloading (remove residual 1Hz) and all unbound kthreads.
> >
> >git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > nohz/0hz
> >
> >HEAD: 68e3af1de5db228bf6c2a5e721bce59a02cfc4e1
>
> For the series:
>
> Reviewed-by: Chris Metcalf <[email protected]>
>
> I spotted a few typos that you should grep for and fix for your next
> version:
> "watchog", "Lets/lets" instead of "Let's/let's", "overriden" (should have
> two d's).

Thanks, I'll check those.

>
> The new housekeeping=MASK boot option seems like it might make it a little
> irritating to specify nohz_full=MASK as well. I guess if setting
> NO_HZ_FULL_ALL
> implied "all but housekeeping", it becomes a reasonably tidy solution. To
> make
> this work right you might have to make the housekeeping option early_param
> instead so its value is available early enough.

Good point. But perhaps I should add a new NO_HZ_FULL_BUT_HOUSEKEEPING option.
Otherwise we'll change the meaning of NO_HZ_FULL_ALL way too much, to the point
that its default behaviour will be the exact opposite of the current one: by default
every CPU is housekeeping, so NO_HZ_FULL_ALL would have no effect anymore if we
don't set housekeeping boot option.

Also I plan to add a housekeeping option to offload the residual 1Hz tick from
nohz_full CPUs. So having "housekeeping=0,tick_offload" would make CPU 0 the
housekeeper, make the other CPUs nohz_full and handle their 1hz tick from CPU 0.

Just a few thoughts.

2017-08-10 13:57:54

by Chris Metcalf

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] Introduce housekeeping subsystem

On 8/10/2017 8:54 AM, Frederic Weisbecker wrote:
> But perhaps I should add a new NO_HZ_FULL_BUT_HOUSEKEEPING option.
> Otherwise we'll change the meaning of NO_HZ_FULL_ALL way too much, to the point
> that its default behaviour will be the exact opposite of the current one: by default
> every CPU is housekeeping, so NO_HZ_FULL_ALL would have no effect anymore if we
> don't set housekeeping boot option.

Maybe a CONFIG_HOUSEKEEPING_BOOT_ONLY as a way to restrict housekeeping
by default to just the boot cpu. In conjunction with NOHZ_FULL_ALL you would
then get the expected semantics.

> Also I plan to add a housekeeping option to offload the residual 1Hz tick from
> nohz_full CPUs. So having "housekeeping=0,tick_offload" would make CPU 0 the
> housekeeper, make the other CPUs nohz_full and handle their 1hz tick from CPU 0.

It does seem like that might be implied by requesting NOHZ_FULL on the core...
or maybe it's just implied by TASK_ISOLATION. I've done a bad job of finding time
to work on that since last year's Plumbers, but September looks good :)

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2017-08-11 06:37:22

by Mike Galbraith

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] Introduce housekeeping subsystem

On Thu, 2017-08-10 at 09:57 -0400, Chris Metcalf wrote:
> On 8/10/2017 8:54 AM, Frederic Weisbecker wrote:
> > But perhaps I should add a new NO_HZ_FULL_BUT_HOUSEKEEPING option.
> > Otherwise we'll change the meaning of NO_HZ_FULL_ALL way too much, to the point
> > that its default behaviour will be the exact opposite of the current one: by default
> > every CPU is housekeeping, so NO_HZ_FULL_ALL would have no effect anymore if we
> > don't set housekeeping boot option.
>
> Maybe a CONFIG_HOUSEKEEPING_BOOT_ONLY as a way to restrict housekeeping
> by default to just the boot cpu. In conjunction with NOHZ_FULL_ALL you would
> then get the expected semantics.

A big box with only the boot cpu for housekeeping is likely screwed.

Personally, I think NOHZ_FULL_ALL should just die.

-Mike

2017-08-11 15:01:37

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] Introduce housekeeping subsystem

On Fri, Aug 11, 2017 at 08:36:28AM +0200, Mike Galbraith wrote:
> On Thu, 2017-08-10 at 09:57 -0400, Chris Metcalf wrote:
> > On 8/10/2017 8:54 AM, Frederic Weisbecker wrote:
> > > But perhaps I should add a new NO_HZ_FULL_BUT_HOUSEKEEPING option.
> > > Otherwise we'll change the meaning of NO_HZ_FULL_ALL way too much, to the point
> > > that its default behaviour will be the exact opposite of the current one: by default
> > > every CPU is housekeeping, so NO_HZ_FULL_ALL would have no effect anymore if we
> > > don't set housekeeping boot option.
> >
> > Maybe a CONFIG_HOUSEKEEPING_BOOT_ONLY as a way to restrict housekeeping
> > by default to just the boot cpu. In conjunction with NOHZ_FULL_ALL you would
> > then get the expected semantics.
>
> A big box with only the boot cpu for housekeeping is likely screwed.

Indeed we probably shouldn't introduce new config that affine housekeeping to a
single CPU.

> Personally, I think?NOHZ_FULL_ALL should just die.

Yeah, although it's still useful for automatic boot testing to detect issues
with nohz_full on.

Thanks.

2017-08-11 15:08:27

by Chris Metcalf

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] Introduce housekeeping subsystem

On 8/11/2017 2:36 AM, Mike Galbraith wrote:
> On Thu, 2017-08-10 at 09:57 -0400, Chris Metcalf wrote:
>> On 8/10/2017 8:54 AM, Frederic Weisbecker wrote:
>>> But perhaps I should add a new NO_HZ_FULL_BUT_HOUSEKEEPING option.
>>> Otherwise we'll change the meaning of NO_HZ_FULL_ALL way too much, to the point
>>> that its default behaviour will be the exact opposite of the current one: by default
>>> every CPU is housekeeping, so NO_HZ_FULL_ALL would have no effect anymore if we
>>> don't set housekeeping boot option.
>> Maybe a CONFIG_HOUSEKEEPING_BOOT_ONLY as a way to restrict housekeeping
>> by default to just the boot cpu. In conjunction with NOHZ_FULL_ALL you would
>> then get the expected semantics.
> A big box with only the boot cpu for housekeeping is likely screwed.

Fair point - this kind of configuration would be primarily useful for
dedicated systems that were running a high-traffic-rate networking
application on many cores, for example. In this mode you don't end up
putting a lot of burden on the housekeeping core. In any case,
probably not worth adding an additional kernel config for.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

Subject: Re: [RFC PATCH 0/9] Introduce housekeeping subsystem

On Fri, 11 Aug 2017, Chris Metcalf wrote:

> > > Maybe a CONFIG_HOUSEKEEPING_BOOT_ONLY as a way to restrict housekeeping
> > > by default to just the boot cpu. In conjunction with NOHZ_FULL_ALL you
> > > would
> > > then get the expected semantics.
> > A big box with only the boot cpu for housekeeping is likely screwed.
>
> Fair point - this kind of configuration would be primarily useful for
> dedicated systems that were running a high-traffic-rate networking
> application on many cores, for example. In this mode you don't end up
> putting a lot of burden on the housekeeping core. In any case,
> probably not worth adding an additional kernel config for.

The standard server config at this point is a two NUMA node with lots of
cores on each. For such a thing a single housekeeping cpu is usually
sufficient. Having a rather large number of NUMA nodes is unusual.

The question is also what is considered a "large" system at this point?
Lots of cores? Lots of NUMA nodes?

Ah, Chris since you are here: What is happening with the dataplane
patches?


2017-08-11 15:50:22

by Chris Metcalf

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] Introduce housekeeping subsystem

On 8/11/2017 11:35 AM, Christopher Lameter wrote:
> Ah, Chris since you are here: What is happening with the dataplane
> patches?

Work has been crazy and I keep expecting to have a chunk of time to work
on it and it doesn't happen.

September is looking relatively good though for my having time to work
on it. I really would like to get out a new spin. Fingers crossed.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2017-08-11 19:10:14

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Fri, 21 Jul 2017 15:21:28 +0200
Frederic Weisbecker <[email protected]> wrote:

> The housekeeping is currently driven by nohz_full where any CPU that
> is not in the nohz_full range is considered as a housekeeper. This is
> a design mistake because nohz is just a detail among all the existing
> isolation features. Nohz shouldn't imply anything else than tick related
> things.
>
> We rather want to drive all the isolation features from the housekeeping
> subsystem which is responsible for all the work that can be either
> affined (unpinned workqueues, timers, kthreads, ...) or offloaded
> (scheduler tick, ...).

That makes a lot of sense. I think this is moving in the right
direction. I have a comment below though.

>
> Let's start with a boot option to define the housekeepers. We should be
> able to further enhance that through cpusets.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Chris Metcalf <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Wanpeng Li <[email protected]>
> Cc: Luiz Capitulino <[email protected]>
> ---
> include/linux/housekeeping.h | 2 --
> init/main.c | 2 --
> kernel/housekeeping.c | 22 ++++++++++------------
> 3 files changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
> index 320cc2b..ba769c8 100644
> --- a/include/linux/housekeeping.h
> +++ b/include/linux/housekeeping.h
> @@ -11,7 +11,6 @@ extern int housekeeping_any_cpu(void);
> extern const struct cpumask *housekeeping_cpumask(void);
> extern void housekeeping_affine(struct task_struct *t);
> extern bool housekeeping_test_cpu(int cpu);
> -extern void __init housekeeping_init(void);
>
> #else
>
> @@ -26,7 +25,6 @@ static inline const struct cpumask *housekeeping_cpumask(void)
> }
>
> static inline void housekeeping_affine(struct task_struct *t) { }
> -static inline void housekeeping_init(void) { }
> #endif /* CONFIG_NO_HZ_FULL */
>
> static inline bool housekeeping_cpu(int cpu)
> diff --git a/init/main.c b/init/main.c
> index 9904a1e..9789ab7 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -46,7 +46,6 @@
> #include <linux/cgroup.h>
> #include <linux/efi.h>
> #include <linux/tick.h>
> -#include <linux/housekeeping.h>
> #include <linux/interrupt.h>
> #include <linux/taskstats_kern.h>
> #include <linux/delayacct.h>
> @@ -608,7 +607,6 @@ asmlinkage __visible void __init start_kernel(void)
> early_irq_init();
> init_IRQ();
> tick_init();
> - housekeeping_init();
> rcu_init_nohz();
> init_timers();
> hrtimers_init();
> diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
> index f8be7e6..a54765d 100644
> --- a/kernel/housekeeping.c
> +++ b/kernel/housekeeping.c
> @@ -45,23 +45,21 @@ bool housekeeping_test_cpu(int cpu)
> return true;
> }
>
> -void __init housekeeping_init(void)
> +/* Parse the boot-time housekeeping CPU list from the kernel parameters. */
> +static int __init housekeeping_setup(char *str)
> {
> - if (!tick_nohz_full_enabled())
> - return;
> -
> - if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
> - WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
> - cpumask_clear(tick_nohz_full_mask);
> - tick_nohz_full_running = false;
> - return;
> + alloc_bootmem_cpumask_var(&housekeeping_mask);
> + if (cpulist_parse(str, housekeeping_mask) < 0) {
> + pr_warn("Housekeeping: Incorrect cpumask\n");
> + free_bootmem_cpumask_var(housekeeping_mask);
> + return 1;
> }
>
> - cpumask_andnot(housekeeping_mask,
> - cpu_possible_mask, tick_nohz_full_mask);
> -
> static_branch_enable(&housekeeping_overriden);
>
> /* We need at least one CPU to handle housekeeping work */
> WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
> +
> + return 1;
> }
> +__setup("housekeeping=", housekeeping_setup);

Am I right that from now on nohz_full= users will also have
to specify housekeeping= in order to get nohz_full working?
If that's correct, then won't this patch break nohz_full for
existing setups?

Also, I just give this series a try and got this:

[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.13.0-rc4+ root=/dev/mapper/rhel_virtlab508-root ro crashkernel=auto rd.lvm.lv=rhel_virtlab508/root rd.lvm.lv=rhel_virtlab508/swap console=ttyS1,115200 LANG=en_US.UTF-8 housekeeping=0,2,4,6,8,10,12,14,1 isolcpus=15 nohz_full=15 intel_pstate=disable
[ 0.000000] static_key_slow_inc used before call to jump_label_init
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:108 static_key_slow_inc+0x86/0xa0
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.13.0-rc4+ #2
[ 0.000000] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015
[ 0.000000] task: ffffffffb6010480 task.stack: ffffffffb6000000
[ 0.000000] RIP: 0010:static_key_slow_inc+0x86/0xa0
[ 0.000000] RSP: 0000:ffffffffb6003d98 EFLAGS: 00010046 ORIG_RAX: 0000000000000000
[ 0.000000] RAX: 0000000000000037 RBX: ffffffffb66aa780 RCX: ffffffffb6061308
[ 0.000000] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000002
[ 0.000000] RBP: ffffffffb6003da0 R08: 6b5f636974617473 R09: 00000000000001e4
[ 0.000000] R10: 776f6c735f79656b R11: 0000000000000000 R12: ffff972c3ffd1cfe
[ 0.000000] R13: ffffffffffffffff R14: 0000000000000000 R15: 000000000000000d
[ 0.000000] FS: 0000000000000000(0000) GS:ffff97282ea00000(0000) knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.000000] CR2: ffff972905974000 CR3: 0000000545209000 CR4: 00000000000406b0
[ 0.000000] Call Trace:
[ 0.000000] static_key_enable+0x1d/0x30
[ 0.000000] housekeeping_setup+0x5a/0x7e
[ 0.000000] unknown_bootoption+0x8b/0x19a
[ 0.000000] parse_args+0x224/0x3b0
[ 0.000000] ? set_init_arg+0x5a/0x5a
[ 0.000000] start_kernel+0x209/0x4cd
[ 0.000000] ? set_init_arg+0x5a/0x5a
[ 0.000000] ? early_idt_handler_array+0x120/0x120
[ 0.000000] x86_64_start_reservations+0x24/0x26
[ 0.000000] x86_64_start_kernel+0x14c/0x16f
[ 0.000000] secondary_startup_64+0x9f/0x9f

2017-08-11 19:22:51

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] Introduce housekeeping subsystem

On Fri, 11 Aug 2017 17:01:30 +0200
Frederic Weisbecker <[email protected]> wrote:

> > Personally, I think NOHZ_FULL_ALL should just die.
>
> Yeah, although it's still useful for automatic boot testing to detect issues
> with nohz_full on.

Maybe we could rename/modify it to be a boot-time testing option for
nohz_full?

I think we may want to simplify config and kernel command-line options.
I don't think it's a good idea to break down every single isolation
feature into a kernel command-line option. What we want in the end is
having the ability to remove all the noise from a CPU right? Why not
have a single option for that?

2017-08-12 14:10:11

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Fri, Aug 11, 2017 at 03:09:57PM -0400, Luiz Capitulino wrote:
> On Fri, 21 Jul 2017 15:21:28 +0200
> Frederic Weisbecker <[email protected]> wrote:
> > -void __init housekeeping_init(void)
> > +/* Parse the boot-time housekeeping CPU list from the kernel parameters. */
> > +static int __init housekeeping_setup(char *str)
> > {
> > - if (!tick_nohz_full_enabled())
> > - return;
> > -
> > - if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
> > - WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
> > - cpumask_clear(tick_nohz_full_mask);
> > - tick_nohz_full_running = false;
> > - return;
> > + alloc_bootmem_cpumask_var(&housekeeping_mask);
> > + if (cpulist_parse(str, housekeeping_mask) < 0) {
> > + pr_warn("Housekeeping: Incorrect cpumask\n");
> > + free_bootmem_cpumask_var(housekeeping_mask);
> > + return 1;
> > }
> >
> > - cpumask_andnot(housekeeping_mask,
> > - cpu_possible_mask, tick_nohz_full_mask);
> > -
> > static_branch_enable(&housekeeping_overriden);
> >
> > /* We need at least one CPU to handle housekeeping work */
> > WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
> > +
> > + return 1;
> > }
> > +__setup("housekeeping=", housekeeping_setup);
>
> Am I right that from now on nohz_full= users will also have
> to specify housekeeping= in order to get nohz_full working?
> If that's correct, then won't this patch break nohz_full for
> existing setups?

nohz_full= will still work but will only imply tick stop. A few isolation
details that were enabled by nohz_full= won't be handled anymore such as:
unbound timers affinity, watchdog disablement, rcu threads affinity, sched idle
load balancing... Those are now handled by housekeeping=

So yes in a sense, this can break some setup that assume nohz_full= does more
than stopping the tick.

Perhaps I should remove the nohz_full= parameter altogether and let nohz_full controlled
by housekeeping= only. How much can kernel parameters be considered as kernel ABIs?

Also I'm wondering if "housekeeping=" is a clear name for users. "isolation=" or
"cpu_isolation=" would be better and more obvious. Housekeeping based naming would only be
internal implementation detail. And deactivating the tick through "cpu_isolation=" would
be clearer than if we did through "housekeeping=".

Of course the problem is that we already have "isolcpus=". But re-implementing isolcpus
on top of housekeeping might be a good idea. I believe that the current implementation on
top of NULL domains isn't much beloved. A less controversial implementation might even
allow us to control it though cpusets.

>
> Also, I just give this series a try and got this:
>
> [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.13.0-rc4+ root=/dev/mapper/rhel_virtlab508-root ro crashkernel=auto rd.lvm.lv=rhel_virtlab508/root rd.lvm.lv=rhel_virtlab508/swap console=ttyS1,115200 LANG=en_US.UTF-8 housekeeping=0,2,4,6,8,10,12,14,1 isolcpus=15 nohz_full=15 intel_pstate=disable
> [ 0.000000] static_key_slow_inc used before call to jump_label_init
> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:108 static_key_slow_inc+0x86/0xa0

Oops ^_^

Thanks.

2017-08-13 15:13:56

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Sat, 12 Aug 2017 16:10:06 +0200
Frederic Weisbecker <[email protected]> wrote:

> On Fri, Aug 11, 2017 at 03:09:57PM -0400, Luiz Capitulino wrote:
> > On Fri, 21 Jul 2017 15:21:28 +0200
> > Frederic Weisbecker <[email protected]> wrote:
> > > -void __init housekeeping_init(void)
> > > +/* Parse the boot-time housekeeping CPU list from the kernel parameters. */
> > > +static int __init housekeeping_setup(char *str)
> > > {
> > > - if (!tick_nohz_full_enabled())
> > > - return;
> > > -
> > > - if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
> > > - WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
> > > - cpumask_clear(tick_nohz_full_mask);
> > > - tick_nohz_full_running = false;
> > > - return;
> > > + alloc_bootmem_cpumask_var(&housekeeping_mask);
> > > + if (cpulist_parse(str, housekeeping_mask) < 0) {
> > > + pr_warn("Housekeeping: Incorrect cpumask\n");
> > > + free_bootmem_cpumask_var(housekeeping_mask);
> > > + return 1;
> > > }
> > >
> > > - cpumask_andnot(housekeeping_mask,
> > > - cpu_possible_mask, tick_nohz_full_mask);
> > > -
> > > static_branch_enable(&housekeeping_overriden);
> > >
> > > /* We need at least one CPU to handle housekeeping work */
> > > WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
> > > +
> > > + return 1;
> > > }
> > > +__setup("housekeeping=", housekeeping_setup);
> >
> > Am I right that from now on nohz_full= users will also have
> > to specify housekeeping= in order to get nohz_full working?
> > If that's correct, then won't this patch break nohz_full for
> > existing setups?
>
> nohz_full= will still work but will only imply tick stop. A few isolation
> details that were enabled by nohz_full= won't be handled anymore such as:
> unbound timers affinity, watchdog disablement, rcu threads affinity, sched idle
> load balancing... Those are now handled by housekeeping=
>
> So yes in a sense, this can break some setup that assume nohz_full= does more
> than stopping the tick.

Yes, the problem is that this is how it has always worked. Also,
the breakage will be very subtle and hard to debug.

> Perhaps I should remove the nohz_full= parameter altogether and let nohz_full controlled
> by housekeeping= only. How much can kernel parameters be considered as kernel ABIs?

That's a very good question, I don't have an answer for that.

> Also I'm wondering if "housekeeping=" is a clear name for users. "isolation=" or
> "cpu_isolation=" would be better and more obvious. Housekeeping based naming would only be
> internal implementation detail. And deactivating the tick through "cpu_isolation=" would
> be clearer than if we did through "housekeeping=".

That's exactly my thinking while I was reviewing the series!

> Of course the problem is that we already have "isolcpus=". But re-implementing isolcpus
> on top of housekeeping might be a good idea. I believe that the current implementation on
> top of NULL domains isn't much beloved. A less controversial implementation might even
> allow us to control it though cpusets.

You're completely right. Some people don't use isolcpus= because it
disables load balancing and that may be a problem for setups where
tasks are pinned to a set of CPUs where the number of tasks is greater
than the number of CPUs. However, for the cases where you have a
single task pinned to a CPU, having load balancing taking place adds
an extra latency (I won't remember how much, but I guess it was more
than 10us).

If there's a way to "disable" load balancing from user-space, say
with cpusets, then I think we should keep the isolated CPUs attached
to a domain as you suggest.

Another detail about isolcpus= is that it doesn't isolate the CPU
from kernel threads. That is, unpinned kernel threads are allowed
to run on CPUs not isolated with isolcpus=. We might consider changing
that for a new isolation option.

I know that there are many arguments against isolcpus= and some people
advice using cpusets. The problem with that advice is that isolcpus=
goes a bit beyond isolating a CPU from user-space tasks. One additional
thing is does for example, is pinning the kernel_init() thread to
housekeeping CPUs. This is key, because that thread will create timers
at early boot that will pin themselves to the CPU they run.

Finally, I'm wondering how all this will fit together with TASK_ISOLATION.
One of the questions I ask myself is: can/should the things TASK_ISOLATION
does be done by a kernel command-line parameter instead? Or should we
try to come up with a list of global things to control (eg. the tick,
kernel thread affinity, etc) and per-task controls?

2017-08-14 17:01:14

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Sun, Aug 13, 2017 at 11:13:40AM -0400, Luiz Capitulino wrote:
> On Sat, 12 Aug 2017 16:10:06 +0200
> Frederic Weisbecker <[email protected]> wrote:
> > > Am I right that from now on nohz_full= users will also have
> > > to specify housekeeping= in order to get nohz_full working?
> > > If that's correct, then won't this patch break nohz_full for
> > > existing setups?
> >
> > nohz_full= will still work but will only imply tick stop. A few isolation
> > details that were enabled by nohz_full= won't be handled anymore such as:
> > unbound timers affinity, watchdog disablement, rcu threads affinity, sched idle
> > load balancing... Those are now handled by housekeeping=
> >
> > So yes in a sense, this can break some setup that assume nohz_full= does more
> > than stopping the tick.
>
> Yes, the problem is that this is how it has always worked. Also,
> the breakage will be very subtle and hard to debug.

[...]

>
> > Perhaps I should remove the nohz_full= parameter altogether and let nohz_full controlled
> > by housekeeping= only. How much can kernel parameters be considered as kernel ABIs?
>
> That's a very good question, I don't have an answer for that.

That said, "nohz_full=" never implied too much isolation features so far, and those have
often changed over time, as in RCU. I think unbound timer affinity is the most important
one.

Perhaps we can keep "nohz_full=1-15" as an alias for a future "cpu_isolation=nohz,1-15"
and at least imply unbound timer affinity with it.

>
> > Also I'm wondering if "housekeeping=" is a clear name for users. "isolation=" or
> > "cpu_isolation=" would be better and more obvious. Housekeeping based naming would only be
> > internal implementation detail. And deactivating the tick through "cpu_isolation=" would
> > be clearer than if we did through "housekeeping=".
>
> That's exactly my thinking while I was reviewing the series!
>
> > Of course the problem is that we already have "isolcpus=". But re-implementing isolcpus
> > on top of housekeeping might be a good idea. I believe that the current implementation on
> > top of NULL domains isn't much beloved. A less controversial implementation might even
> > allow us to control it though cpusets.
>
> You're completely right. Some people don't use isolcpus= because it
> disables load balancing and that may be a problem for setups where
> tasks are pinned to a set of CPUs where the number of tasks is greater
> than the number of CPUs. However, for the cases where you have a
> single task pinned to a CPU, having load balancing taking place adds
> an extra latency (I won't remember how much, but I guess it was more
> than 10us).

What is the source of the load balancing inducing such latency when a single
task is affine to a CPU? If this is idle load balancing, it is now affine to
housekeepers. If this is task wakeup then it's suprising because select_task_rq()
is optimized toward single CPU affinity.

Is there another source I'm overlooking?

> If there's a way to "disable" load balancing from user-space, say
> with cpusets, then I think we should keep the isolated CPUs attached
> to a domain as you suggest.

I'm not sure such a solution would be accepted. The most sensible way
to disable load balancing is still to tune the affinity of tasks. If there
is an off-case overhead with load balancing (ie: when no more than one
task is affine to that CPU) then we should solve that with a fast path.

> Another detail about isolcpus= is that it doesn't isolate the CPU
> from kernel threads. That is, unpinned kernel threads are allowed
> to run on CPUs not isolated with isolcpus=. We might consider changing
> that for a new isolation option.

You mean unpinned kernel threads are allowed to run on isolcpus, right?
That definetly can be solved.

>
> I know that there are many arguments against isolcpus= and some people
> advice using cpusets. The problem with that advice is that isolcpus=
> goes a bit beyond isolating a CPU from user-space tasks. One additional
> thing is does for example, is pinning the kernel_init() thread to
> housekeeping CPUs. This is key, because that thread will create timers
> at early boot that will pin themselves to the CPU they run.

Right, but also unbound timers are affine to housekeepers, we needed that for
nohz_full.

> Finally, I'm wondering how all this will fit together with TASK_ISOLATION.
> One of the questions I ask myself is: can/should the things TASK_ISOLATION
> does be done by a kernel command-line parameter instead? Or should we
> try to come up with a list of global things to control (eg. the tick,
> kernel thread affinity, etc) and per-task controls?

So I've been thinking a lot about that lately. I told Chris that TASK_ISOLATION
shouldn't be a CPU feature but a task feature. Then I realized that it doesn't work
either, my bad :-) In the end I think that the most part of it must be a CPU
property: nohz, task isolation, timers and workqueue affinity, etc... Then what's
left for the per task thing is to tell it when it is unexpectingly interrupted by noise.

Therefore I think most of the isolation features should be controlled by
command line and cpusets (through a new cpuset subsystem maybe) then TASK_ISOLATION
through prtcl() for the noise monitoring.

Thanks.

2017-08-14 17:34:56

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Mon, 14 Aug 2017 19:01:09 +0200
Frederic Weisbecker <[email protected]> wrote:

> > > Perhaps I should remove the nohz_full= parameter altogether and let nohz_full controlled
> > > by housekeeping= only. How much can kernel parameters be considered as kernel ABIs?
> >
> > That's a very good question, I don't have an answer for that.
>
> That said, "nohz_full=" never implied too much isolation features so far, and those have
> often changed over time, as in RCU. I think unbound timer affinity is the most important
> one.
>
> Perhaps we can keep "nohz_full=1-15" as an alias for a future "cpu_isolation=nohz,1-15"
> and at least imply unbound timer affinity with it.

That would work for me.

> > > Also I'm wondering if "housekeeping=" is a clear name for users. "isolation=" or
> > > "cpu_isolation=" would be better and more obvious. Housekeeping based naming would only be
> > > internal implementation detail. And deactivating the tick through "cpu_isolation=" would
> > > be clearer than if we did through "housekeeping=".
> >
> > That's exactly my thinking while I was reviewing the series!
> >
> > > Of course the problem is that we already have "isolcpus=". But re-implementing isolcpus
> > > on top of housekeeping might be a good idea. I believe that the current implementation on
> > > top of NULL domains isn't much beloved. A less controversial implementation might even
> > > allow us to control it though cpusets.
> >
> > You're completely right. Some people don't use isolcpus= because it
> > disables load balancing and that may be a problem for setups where
> > tasks are pinned to a set of CPUs where the number of tasks is greater
> > than the number of CPUs. However, for the cases where you have a
> > single task pinned to a CPU, having load balancing taking place adds
> > an extra latency (I won't remember how much, but I guess it was more
> > than 10us).
>
> What is the source of the load balancing inducing such latency when a single
> task is affine to a CPU? If this is idle load balancing, it is now affine to
> housekeepers. If this is task wakeup then it's suprising because select_task_rq()
> is optimized toward single CPU affinity.

I guess it was idle load balancing, but I don't remember because this
was a few years ago. I think this might be reproducible without using
isolcpus=. I'll give it a try shortly and let you know.

> Is there another source I'm overlooking?
>
> > If there's a way to "disable" load balancing from user-space, say
> > with cpusets, then I think we should keep the isolated CPUs attached
> > to a domain as you suggest.
>
> I'm not sure such a solution would be accepted. The most sensible way
> to disable load balancing is still to tune the affinity of tasks. If there
> is an off-case overhead with load balancing (ie: when no more than one
> task is affine to that CPU) then we should solve that with a fast path.

OK, I'll take a look.

> > Another detail about isolcpus= is that it doesn't isolate the CPU
> > from kernel threads. That is, unpinned kernel threads are allowed
> > to run on CPUs not isolated with isolcpus=. We might consider changing
> > that for a new isolation option.
>
> You mean unpinned kernel threads are allowed to run on isolcpus, right?

Exactly.

> That definetly can be solved.
>
> >
> > I know that there are many arguments against isolcpus= and some people
> > advice using cpusets. The problem with that advice is that isolcpus=
> > goes a bit beyond isolating a CPU from user-space tasks. One additional
> > thing is does for example, is pinning the kernel_init() thread to
> > housekeeping CPUs. This is key, because that thread will create timers
> > at early boot that will pin themselves to the CPU they run.
>
> Right, but also unbound timers are affine to housekeepers, we needed that for
> nohz_full.
>
> > Finally, I'm wondering how all this will fit together with TASK_ISOLATION.
> > One of the questions I ask myself is: can/should the things TASK_ISOLATION
> > does be done by a kernel command-line parameter instead? Or should we
> > try to come up with a list of global things to control (eg. the tick,
> > kernel thread affinity, etc) and per-task controls?
>
> So I've been thinking a lot about that lately. I told Chris that TASK_ISOLATION
> shouldn't be a CPU feature but a task feature. Then I realized that it doesn't work
> either, my bad :-) In the end I think that the most part of it must be a CPU
> property: nohz, task isolation, timers and workqueue affinity, etc... Then what's
> left for the per task thing is to tell it when it is unexpectingly interrupted by noise.
>
> Therefore I think most of the isolation features should be controlled by
> command line and cpusets (through a new cpuset subsystem maybe) then TASK_ISOLATION
> through prtcl() for the noise monitoring.

I agree.

2017-08-14 18:30:36

by Mike Galbraith

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Mon, 2017-08-14 at 13:34 -0400, Luiz Capitulino wrote:
> On Mon, 14 Aug 2017 19:01:09 +0200
>
> > What is the source of the load balancing inducing such latency when a single
> > task is affine to a CPU? If this is idle load balancing, it is now affine to
> > housekeepers. If this is task wakeup then it's suprising because select_task_rq()
> > is optimized toward single CPU affinity.
>
> I guess it was idle load balancing, but I don't remember because this
> was a few years ago. I think this might be reproducible without using
> isolcpus=. I'll give it a try shortly and let you know.

idle_balance() can swamp other noise by a couple orders of magnitude,

-Mike

2017-08-15 13:07:18

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Mon, Aug 14, 2017 at 08:29:46PM +0200, Mike Galbraith wrote:
> On Mon, 2017-08-14 at 13:34 -0400, Luiz Capitulino wrote:
> > On Mon, 14 Aug 2017 19:01:09 +0200
> >
> > > What is the source of the load balancing inducing such latency when a single
> > > task is affine to a CPU? If this is idle load balancing, it is now affine to
> > > housekeepers. If this is task wakeup then it's suprising because select_task_rq()
> > > is optimized toward single CPU affinity.
> >
> > I guess it was idle load balancing, but I don't remember because this
> > was a few years ago. I think this might be reproducible without using
> > isolcpus=. I'll give it a try shortly and let you know.
>
> idle_balance() can swamp other noise by a couple orders of magnitude,

Ah I missed that one. Is there any way we can also lower the overhead there?
It looks unfortunately hard to tell if there is only one task affine to a given CPU,
assertion on top of which we could make a fast exit.

2017-08-15 15:16:20

by Mike Galbraith

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Tue, 2017-08-15 at 15:07 +0200, Frederic Weisbecker wrote:
> On Mon, Aug 14, 2017 at 08:29:46PM +0200, Mike Galbraith wrote:
> > On Mon, 2017-08-14 at 13:34 -0400, Luiz Capitulino wrote:
> > > On Mon, 14 Aug 2017 19:01:09 +0200
> > >
> > > > What is the source of the load balancing inducing such latency when a single
> > > > task is affine to a CPU? If this is idle load balancing, it is now affine to
> > > > housekeepers. If this is task wakeup then it's suprising because select_task_rq()
> > > > is optimized toward single CPU affinity.
> > >
> > > I guess it was idle load balancing, but I don't remember because this
> > > was a few years ago. I think this might be reproducible without using
> > > isolcpus=. I'll give it a try shortly and let you know.
> >
> > idle_balance() can swamp other noise by a couple orders of magnitude,
>
> Ah I missed that one. Is there any way we can also lower the overhead there?

Why?  HPC proggies won't benefit from a partially filled pothole any
more that a ~zero ground clearance formula 1 car would.  The pothole
intolerant either isolate, killing (most) LB, or they meet a wall.

-Mike

2017-08-15 15:30:11

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Tue, Aug 15, 2017 at 05:15:23PM +0200, Mike Galbraith wrote:
> On Tue, 2017-08-15 at 15:07 +0200, Frederic Weisbecker wrote:
> > On Mon, Aug 14, 2017 at 08:29:46PM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-08-14 at 13:34 -0400, Luiz Capitulino wrote:
> > > > On Mon, 14 Aug 2017 19:01:09 +0200
> > > >
> > > > > What is the source of the load balancing inducing such latency when a single
> > > > > task is affine to a CPU? If this is idle load balancing, it is now affine to
> > > > > housekeepers. If this is task wakeup then it's suprising because select_task_rq()
> > > > > is optimized toward single CPU affinity.
> > > >
> > > > I guess it was idle load balancing, but I don't remember because this
> > > > was a few years ago. I think this might be reproducible without using
> > > > isolcpus=. I'll give it a try shortly and let you know.
> > >
> > > idle_balance() can swamp other noise by a couple orders of magnitude,
> >
> > Ah I missed that one. Is there any way we can also lower the overhead there?
>
> Why? ?HPC proggies won't benefit from a partially filled pothole any
> more that a ~zero ground clearance formula 1 car would. ?The pothole
> intolerant either isolate, killing (most) LB, or they meet a wall.

Don't the HPC guys just disable idle_balance(), or am I out of date again?

Thanx, Paul

Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Tue, 15 Aug 2017, Paul E. McKenney wrote:

> Don't the HPC guys just disable idle_balance(), or am I out of date again?

Ummm.. Why does idle management matter when your goal is to keep all
processor busy working at maximum throughput?

2017-08-15 15:54:58

by Mike Galbraith

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Tue, 2017-08-15 at 08:30 -0700, Paul E. McKenney wrote:
> On Tue, Aug 15, 2017 at 05:15:23PM +0200, Mike Galbraith wrote:
> > On Tue, 2017-08-15 at 15:07 +0200, Frederic Weisbecker wrote:
> > > On Mon, Aug 14, 2017 at 08:29:46PM +0200, Mike Galbraith wrote:
> > > > On Mon, 2017-08-14 at 13:34 -0400, Luiz Capitulino wrote:
> > > > > On Mon, 14 Aug 2017 19:01:09 +0200
> > > > >
> > > > > > What is the source of the load balancing inducing such latency when a single
> > > > > > task is affine to a CPU? If this is idle load balancing, it is now affine to
> > > > > > housekeepers. If this is task wakeup then it's suprising because select_task_rq()
> > > > > > is optimized toward single CPU affinity.
> > > > >
> > > > > I guess it was idle load balancing, but I don't remember because this
> > > > > was a few years ago. I think this might be reproducible without using
> > > > > isolcpus=. I'll give it a try shortly and let you know.
> > > >
> > > > idle_balance() can swamp other noise by a couple orders of magnitude,
> > >
> > > Ah I missed that one. Is there any way we can also lower the overhead there?
> >
> > Why?  HPC proggies won't benefit from a partially filled pothole any
> > more that a ~zero ground clearance formula 1 car would.  The pothole
> > intolerant either isolate, killing (most) LB, or they meet a wall.
>
> Don't the HPC guys just disable idle_balance(), or am I out of date again?

They could do just that if what they're doing is not really critical.
I'm not an HPC guy, so can only speculate.  I don't see much difference
between HPC and RT though, the rules of the game seem to be about the
same (them both being HPC;).. what you can control, you do control.

-Mike

2017-08-15 15:58:35

by Mike Galbraith

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Tue, 2017-08-15 at 10:52 -0500, Christopher Lameter wrote:
> On Tue, 15 Aug 2017, Paul E. McKenney wrote:
>
> > Don't the HPC guys just disable idle_balance(), or am I out of date again?
>
> Ummm.. Why does idle management matter when your goal is to keep all
> processor busy working at maximum throughput?

If you _never_ idle, you never have to worry about it.  Is 100% CPU
until the end of time all there is to HPC?

-Mike

Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Tue, 15 Aug 2017, Mike Galbraith wrote:

> On Tue, 2017-08-15 at 10:52 -0500, Christopher Lameter wrote:
> > On Tue, 15 Aug 2017, Paul E. McKenney wrote:
> >
> > > Don't the HPC guys just disable idle_balance(), or am I out of date again?
> >
> > Ummm.. Why does idle management matter when your goal is to keep all
> > processor busy working at maximum throughput?
>
> If you _never_ idle, you never have to worry about it.  Is 100% CPU
> until the end of time all there is to HPC?

Most of the time that is true for HPC loads. They may also go through a
I/O throughput constrained processing phase or synchronization phase where
idle activity occurs.

However, there are also low latency loads that are often confused with
HPC. Those are usually waiting idle until an event happens and then have
to react to it in the fastest way possible. After that they go back to
idle. You could call this a RT load (since the term seems to be so
flexible...) but its more event based than the typical idea of realtime
(do something at this and thata time).

2017-08-16 18:01:44

by Mike Galbraith

[permalink] [raw]
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

On Tue, 2017-08-15 at 11:26 -0500, Christopher Lameter wrote:
> On Tue, 15 Aug 2017, Mike Galbraith wrote:
>
> > On Tue, 2017-08-15 at 10:52 -0500, Christopher Lameter wrote:
> > > On Tue, 15 Aug 2017, Paul E. McKenney wrote:
> > >
> > > > Don't the HPC guys just disable idle_balance(), or am I out of date again?
> > >
> > > Ummm.. Why does idle management matter when your goal is to keep all
> > > processor busy working at maximum throughput?
> >
> > If you _never_ idle, you never have to worry about it.  Is 100% CPU
> > until the end of time all there is to HPC?
>
> Most of the time that is true for HPC loads. They may also go through a
> I/O throughput constrained processing phase or synchronization phase where
> idle activity occurs.

That synchronization is what I tend to get all hung up on pondering HPC
vs RT terminology.  Care and feeding of parallel pipelines has got to
be loaded to the gills with synchronization (intermediate math results
etc), rendering the sum event driven.

Doesn't matter, both acronyms reduce to latency intolerant.

-Mike