2015-04-24 15:58:41

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/4] nohz: A few improvements v3

Since v2, this set only changes
"context_tracking: Inherit TIF_NOHZ through forks instead of context switches"
which now warns when tasklist is populated earlier than expected as per
Peterz suggestion.

Simpler.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
nohz/core

HEAD: 3c2b3b78a1cca2865a932d0265690997db9706b1

Thanks,
Frederic
---

Chris Metcalf (2):
nohz: Add tick_nohz_full_add_cpus_to() API
nohz: Set isolcpus when nohz_full is set

Frederic Weisbecker (2):
context_tracking: Protect against recursion
context_tracking: Inherit TIF_NOHZ through forks instead of context switches


include/linux/context_tracking.h | 10 -----
include/linux/context_tracking_state.h | 1 +
include/linux/sched.h | 3 ++
include/linux/tick.h | 7 ++++
kernel/context_tracking.c | 69 +++++++++++++++++++++++-----------
kernel/sched/core.c | 4 +-
6 files changed, 61 insertions(+), 33 deletions(-)


2015-04-24 16:07:59

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/4] context_tracking: Protect against recursion

Context tracking recursion can happen when an exception triggers in the
middle of a call to a context tracking probe.

This special case can be caused by vmalloc faults. If an access to a
memory area allocated by vmalloc happens in the middle of
context_tracking_enter(), we may run into an endless fault loop because
the exception in turn calls context_tracking_enter() which faults on
the same vmalloc'ed memory, triggering an exception again, etc...

Some rare crashes have been reported so lets protect against this with
a recursion counter.

Reported-by: Dave Jones <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Oleg Nesterov <[email protected]>
---
include/linux/context_tracking_state.h | 1 +
kernel/context_tracking.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 31 insertions(+)

diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
index 6b7b96a..678ecdf 100644
--- a/include/linux/context_tracking_state.h
+++ b/include/linux/context_tracking_state.h
@@ -12,6 +12,7 @@ struct context_tracking {
* may be further optimized using static keys.
*/
bool active;
+ int recursion;
enum ctx_state {
CONTEXT_KERNEL = 0,
CONTEXT_USER,
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 72d59a1..b9e0b4f 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -38,6 +38,25 @@ void context_tracking_cpu_set(int cpu)
}
}

+static bool context_tracking_recursion_enter(void)
+{
+ int recursion;
+
+ recursion = __this_cpu_inc_return(context_tracking.recursion);
+ if (recursion == 1)
+ return true;
+
+ WARN_ONCE((recursion < 1), "Invalid context tracking recursion value %d\n", recursion);
+ __this_cpu_dec(context_tracking.recursion);
+
+ return false;
+}
+
+static void context_tracking_recursion_exit(void)
+{
+ __this_cpu_dec(context_tracking.recursion);
+}
+
/**
* context_tracking_enter - Inform the context tracking that the CPU is going
* enter user or guest space mode.
@@ -75,6 +94,11 @@ void context_tracking_enter(enum ctx_state state)
WARN_ON_ONCE(!current->mm);

local_irq_save(flags);
+ if (!context_tracking_recursion_enter()) {
+ local_irq_restore(flags);
+ return;
+ }
+
if ( __this_cpu_read(context_tracking.state) != state) {
if (__this_cpu_read(context_tracking.active)) {
/*
@@ -105,6 +129,7 @@ void context_tracking_enter(enum ctx_state state)
*/
__this_cpu_write(context_tracking.state, state);
}
+ context_tracking_recursion_exit();
local_irq_restore(flags);
}
NOKPROBE_SYMBOL(context_tracking_enter);
@@ -139,6 +164,10 @@ void context_tracking_exit(enum ctx_state state)
return;

local_irq_save(flags);
+ if (!context_tracking_recursion_enter()) {
+ local_irq_restore(flags);
+ return;
+ }
if (__this_cpu_read(context_tracking.state) == state) {
if (__this_cpu_read(context_tracking.active)) {
/*
@@ -153,6 +182,7 @@ void context_tracking_exit(enum ctx_state state)
}
__this_cpu_write(context_tracking.state, CONTEXT_KERNEL);
}
+ context_tracking_recursion_exit();
local_irq_restore(flags);
}
NOKPROBE_SYMBOL(context_tracking_exit);
--
2.1.4

2015-04-24 15:58:47

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/4] context_tracking: Inherit TIF_NOHZ through forks instead of context switches

TIF_NOHZ is used by context_tracking to force syscall slow-path on every
task in order to track userspace roundtrips. As such, it must be set on
all running tasks.

It's currently explicitly inherited through context switches. There is
no need to do it on this fast-path though. The flag could be simply
set once for all on all tasks, whether they are running or not.

Lets do this by setting the flag to init task on early boot and let it
propagate through fork inheritance.

While at it mark context_tracking_cpu_set() as init code, we only need
it at early boot time.

Suggested-by: Oleg Nesterov <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Oleg Nesterov <[email protected]>
---
include/linux/context_tracking.h | 10 ---------
include/linux/sched.h | 3 +++
kernel/context_tracking.c | 45 ++++++++++++++++++----------------------
kernel/sched/core.c | 1 -
4 files changed, 23 insertions(+), 36 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 2821838..b96bd29 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -14,8 +14,6 @@ extern void context_tracking_enter(enum ctx_state state);
extern void context_tracking_exit(enum ctx_state state);
extern void context_tracking_user_enter(void);
extern void context_tracking_user_exit(void);
-extern void __context_tracking_task_switch(struct task_struct *prev,
- struct task_struct *next);

static inline void user_enter(void)
{
@@ -51,19 +49,11 @@ static inline void exception_exit(enum ctx_state prev_ctx)
}
}

-static inline void context_tracking_task_switch(struct task_struct *prev,
- struct task_struct *next)
-{
- if (context_tracking_is_enabled())
- __context_tracking_task_switch(prev, next);
-}
#else
static inline void user_enter(void) { }
static inline void user_exit(void) { }
static inline enum ctx_state exception_enter(void) { return 0; }
static inline void exception_exit(enum ctx_state prev_ctx) { }
-static inline void context_tracking_task_switch(struct task_struct *prev,
- struct task_struct *next) { }
#endif /* !CONFIG_CONTEXT_TRACKING */


diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8222ae4..f5d4f9e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2540,6 +2540,9 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
}
#endif

+#define tasklist_empty() \
+ list_empty(&init_task.tasks)
+
#define next_task(p) \
list_entry_rcu((p)->tasks.next, struct task_struct, tasks)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index b9e0b4f..c9d9309 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -30,14 +30,6 @@ EXPORT_SYMBOL_GPL(context_tracking_enabled);
DEFINE_PER_CPU(struct context_tracking, context_tracking);
EXPORT_SYMBOL_GPL(context_tracking);

-void context_tracking_cpu_set(int cpu)
-{
- if (!per_cpu(context_tracking.active, cpu)) {
- per_cpu(context_tracking.active, cpu) = true;
- static_key_slow_inc(&context_tracking_enabled);
- }
-}
-
static bool context_tracking_recursion_enter(void)
{
int recursion;
@@ -194,24 +186,27 @@ void context_tracking_user_exit(void)
}
NOKPROBE_SYMBOL(context_tracking_user_exit);

-/**
- * __context_tracking_task_switch - context switch the syscall callbacks
- * @prev: the task that is being switched out
- * @next: the task that is being switched in
- *
- * The context tracking uses the syscall slow path to implement its user-kernel
- * boundaries probes on syscalls. This way it doesn't impact the syscall fast
- * path on CPUs that don't do context tracking.
- *
- * But we need to clear the flag on the previous task because it may later
- * migrate to some CPU that doesn't do the context tracking. As such the TIF
- * flag may not be desired there.
- */
-void __context_tracking_task_switch(struct task_struct *prev,
- struct task_struct *next)
+void __init context_tracking_cpu_set(int cpu)
{
- clear_tsk_thread_flag(prev, TIF_NOHZ);
- set_tsk_thread_flag(next, TIF_NOHZ);
+ static __initdata bool initialized = false;
+ struct task_struct *p, *t;
+
+ if (!per_cpu(context_tracking.active, cpu)) {
+ per_cpu(context_tracking.active, cpu) = true;
+ static_key_slow_inc(&context_tracking_enabled);
+ }
+
+ if (initialized)
+ return;
+
+ /*
+ * Set TIF_NOHZ to init/0 and let it propagate to all tasks through fork
+ * This assumes that init is the only task at this early boot stage.
+ */
+ set_tsk_thread_flag(&init_task, TIF_NOHZ);
+ WARN_ON_ONCE(!tasklist_empty());
+
+ initialized = true;
}

#ifdef CONFIG_CONTEXT_TRACKING_FORCE
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d8a6196..6f149f8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2338,7 +2338,6 @@ context_switch(struct rq *rq, struct task_struct *prev,
*/
spin_release(&rq->lock.dep_map, 1, _THIS_IP_);

- context_tracking_task_switch(prev, next);
/* Here we just switch the register state and the stack. */
switch_to(prev, next, prev);
barrier();
--
2.1.4

2015-04-24 16:00:30

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/4] nohz: Add tick_nohz_full_add_cpus_to() API

From: Chris Metcalf <[email protected]>

This API is useful to modify a cpumask indicating some special nohz-type
functionality so that the nohz cores are automatically added to that set.

Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Chris Metcalf <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Ingo Molnar <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/tick.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index f8492da5..4191b56 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -134,6 +134,12 @@ static inline bool tick_nohz_full_cpu(int cpu)
return cpumask_test_cpu(cpu, tick_nohz_full_mask);
}

+static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_or(mask, mask, tick_nohz_full_mask);
+}
+
extern void __tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_cpu(int cpu);
@@ -142,6 +148,7 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
#else
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
+static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
static inline void __tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void tick_nohz_full_kick(void) { }
--
2.1.4

2015-04-24 15:59:18

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 4/4] nohz: Set isolcpus when nohz_full is set

From: Chris Metcalf <[email protected]>

nohz_full is only useful with isolcpus also set, since otherwise the
scheduler has to run periodically to try to determine whether to steal
work from other cores.

Accordingly, when booting with nohz_full=xxx on the command line, we
should act as if isolcpus=xxx was also set, and set (or extend) the
isolcpus set to include the nohz_full cpus.

Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
Acked-by: Rik van Riel <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Chris Metcalf <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/sched/core.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6f149f8..e95b4d8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7042,6 +7042,9 @@ void __init sched_init_smp(void)
alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
alloc_cpumask_var(&fallback_doms, GFP_KERNEL);

+ /* nohz_full won't take effect without isolating the cpus. */
+ tick_nohz_full_add_cpus_to(cpu_isolated_map);
+
sched_init_numa();

/*
--
2.1.4

2015-04-24 20:15:22

by gene heskett

[permalink] [raw]
Subject: Re: [PATCH 4/4] nohz: Set isolcpus when nohz_full is set

On Friday 24 April 2015 11:58:31 Frederic Weisbecker wrote:
> From: Chris Metcalf <[email protected]>
>
> nohz_full is only useful with isolcpus also set, since otherwise the
> scheduler has to run periodically to try to determine whether to steal
> work from other cores.
>
> Accordingly, when booting with nohz_full=xxx on the command line, we
> should act as if isolcpus=xxx was also set, and set (or extend) the
> isolcpus set to include the nohz_full cpus.
>
> Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
> Acked-by: Rik van Riel <[email protected]>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Chris Metcalf <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Martin Schwidefsky <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>

As a user of LinuxCNC, we expect the core(s) so isolated by the isolcpus
argument at bootup time to remain undisturbed in order to preserve the
I/O updates heartbeat latency at the absolute minimum that board and cpu
combo can accomplish.

If this patch changes that behaviour such that the isolated core is
grabbed for another job while the RTAI bits and pieces are loaded and
running machinery, causing the machinery to lose this steady, possibly
as little as a 20 microsecond period repeating operation heartbeat, this
will quite effectively destroy our ability to run this software on linux
without farming that whole operation out to intelligent I/O cards.

Please keep this in mind. I don't read the patch well enough to
determine this myself.

> ---
> kernel/sched/core.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 6f149f8..e95b4d8 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7042,6 +7042,9 @@ void __init sched_init_smp(void)
> alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
> alloc_cpumask_var(&fallback_doms, GFP_KERNEL);
>
> + /* nohz_full won't take effect without isolating the cpus. */
> + tick_nohz_full_add_cpus_to(cpu_isolated_map);
> +
> sched_init_numa();
>
> /*

Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>

2015-04-25 23:13:17

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 4/4] nohz: Set isolcpus when nohz_full is set

On Fri, Apr 24, 2015 at 04:07:52PM -0400, Gene Heskett wrote:
> On Friday 24 April 2015 11:58:31 Frederic Weisbecker wrote:
> > From: Chris Metcalf <[email protected]>
> >
> > nohz_full is only useful with isolcpus also set, since otherwise the
> > scheduler has to run periodically to try to determine whether to steal
> > work from other cores.
> >
> > Accordingly, when booting with nohz_full=xxx on the command line, we
> > should act as if isolcpus=xxx was also set, and set (or extend) the
> > isolcpus set to include the nohz_full cpus.
> >
> > Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
> > Acked-by: Rik van Riel <[email protected]>
> > Acked-by: Peter Zijlstra (Intel) <[email protected]>
> > Signed-off-by: Chris Metcalf <[email protected]>
> > Cc: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Paul E. McKenney <[email protected]>
> > Cc: Rafael J. Wysocki <[email protected]>
> > Cc: Martin Schwidefsky <[email protected]>
> > Cc: Mike Galbraith <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Rik van Riel <[email protected]>
> > Signed-off-by: Frederic Weisbecker <[email protected]>
>
> As a user of LinuxCNC, we expect the core(s) so isolated by the isolcpus
> argument at bootup time to remain undisturbed in order to preserve the
> I/O updates heartbeat latency at the absolute minimum that board and cpu
> combo can accomplish.
>
> If this patch changes that behaviour such that the isolated core is
> grabbed for another job while the RTAI bits and pieces are loaded and
> running machinery, causing the machinery to lose this steady, possibly
> as little as a 20 microsecond period repeating operation heartbeat, this
> will quite effectively destroy our ability to run this software on linux
> without farming that whole operation out to intelligent I/O cards.
>
> Please keep this in mind. I don't read the patch well enough to
> determine this myself.

I think it's not a problem. This patch isn't adding any work or noise to
isolcpus, it's actually setting CPUs that run tickless in userspace to be
part of the isolcpus set, because we need userspace-tickless CPUs to not
be disturbed at all. You shouldn't be concerned if you don't use full
dynticks.

If you happen to use full dynticks in your usecase one day, you'll use it
on your isolcpus anyway.

2015-04-25 23:50:15

by gene heskett

[permalink] [raw]
Subject: Re: [PATCH 4/4] nohz: Set isolcpus when nohz_full is set

On Saturday 25 April 2015 19:13:10 Frederic Weisbecker wrote:
> On Fri, Apr 24, 2015 at 04:07:52PM -0400, Gene Heskett wrote:
> > On Friday 24 April 2015 11:58:31 Frederic Weisbecker wrote:
> > > From: Chris Metcalf <[email protected]>
> > >
> > > nohz_full is only useful with isolcpus also set, since otherwise
> > > the scheduler has to run periodically to try to determine whether
> > > to steal work from other cores.
> > >
> > > Accordingly, when booting with nohz_full=xxx on the command line,
> > > we should act as if isolcpus=xxx was also set, and set (or extend)
> > > the isolcpus set to include the nohz_full cpus.
> > >
> > > Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
> > > Acked-by: Rik van Riel <[email protected]>
> > > Acked-by: Peter Zijlstra (Intel) <[email protected]>
> > > Signed-off-by: Chris Metcalf <[email protected]>
> > > Cc: Peter Zijlstra (Intel) <[email protected]>
> > > Cc: Paul E. McKenney <[email protected]>
> > > Cc: Rafael J. Wysocki <[email protected]>
> > > Cc: Martin Schwidefsky <[email protected]>
> > > Cc: Mike Galbraith <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: Rik van Riel <[email protected]>
> > > Signed-off-by: Frederic Weisbecker <[email protected]>
> >
> > As a user of LinuxCNC, we expect the core(s) so isolated by the
> > isolcpus argument at bootup time to remain undisturbed in order to
> > preserve the I/O updates heartbeat latency at the absolute minimum
> > that board and cpu combo can accomplish.
> >
> > If this patch changes that behaviour such that the isolated core is
> > grabbed for another job while the RTAI bits and pieces are loaded
> > and running machinery, causing the machinery to lose this steady,
> > possibly as little as a 20 microsecond period repeating operation
> > heartbeat, this will quite effectively destroy our ability to run
> > this software on linux without farming that whole operation out to
> > intelligent I/O cards.
> >
> > Please keep this in mind. I don't read the patch well enough to
> > determine this myself.
>
> I think it's not a problem. This patch isn't adding any work or noise
> to isolcpus, it's actually setting CPUs that run tickless in userspace
> to be part of the isolcpus set, because we need userspace-tickless
> CPUs to not be disturbed at all. You shouldn't be concerned if you
> don't use full dynticks.
>
> If you happen to use full dynticks in your usecase one day, you'll use
> it on your isolcpus anyway.

TBT I am probably too paranoid for my own good, but I did want to be
heard from the far corner of the cheap seats in the bleachers. I write
my stuff in bash, but there are, I expect, some guys in our group who
could given enough time, run down anything that breaks it for us and
patch it back out. We are the guys who generally run an RTAI patched
kernel anyway.

Some of the machinery driven by LinuxCNC is pretty good sized stuff, like
a 5 axis milling machine in Cincinnati that may be the biggest one
Cincinnati ever made, bed is 26 feet long, nearly 6 feet wide. The whole
thing is a bit over 200k lbs. Bad controller, controller company gone,
sold at scrap cheap. Adapted it to be run by LinuxCNC. After
commissioning it, he turned his shop machinists loose to use it. First
job was a locomotive axle bearing seat in a multi-ton truck casting,
needed .001" tolerance for a shrink fitted bearing. Measured when done,
it was off .0002" worst case. It has been detected by the seismographs
at the UNI across town. :) My biggest machine is less than 200 lbs.
Toy's IOW.

Thank you for taking the time to reply, Frederick. Its appreciated.

Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>