2013-07-17 16:44:42

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 00/18] nohz patches for 3.12 preview

Hi,

So patch 1 and 2 are fixes that I'll rather queue for 3.11.

The rest is 3.12 material. There is a strong focus on minimizing
the overhead for the full dynticks off case: nohz_full= parameter
not passed and CONFIG_NO_HZ_FULL_ALL=n.

I'm working on that because it seems that distros want to enable
CONFIG_NO_HZ_FULL and of course they quickly hit the resulting
overhead and powersaving issues.

So I tried to optimize the off case with static keys. I hope this
will be sufficient. Otherwise the last resort is to create an
exception and irq slow path. I believe that x86 irq tracepoints
overwrite the IDT for that purpose? At least that was a plan, not
sure if we sticked to it but that's just an idea.
At least static keys don't require further arch backend support,
expect of course the support for static keys themselves, so I hope
that will be enough.

There is one little remaining thing to take care of: let the boot CPU
go idle as well in the off case.

I also hope we'll get Paul's patches that allow the timekeeper to go
idle in 3.12, but that's another story.

---

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/nohz-3.12-preview

Frederic Weisbecker (16):
sched: Consolidate open coded preemptible() checks
context_tracing: Fix guest accounting with native vtime
vtime: Update a few comments
context_tracking: Fix runtime CPU off-case
nohz: Selectively enable context tracking on full dynticks CPUs
context_tracking: Ground setup for static key use
context_tracking: Optimize main APIs off case with static key
context_tracking: Optimize context switch off case with static keys
context_tracking: User/kernel broundary cross trace events
vtime: Remove a few unneeded generic vtime state checks
vtime: Fix racy cputime delta update
context_tracking: Split low level state headers
vtime: Describe overriden functions in dedicated arch headers
vtime: Optimize full dynticks accounting off case with static keys
vtime: Always scale generic vtime accounting results
vtime: Always debug check snapshot source _before_ updating it

Li Zhong (1):
nohz: fix compile warning in tick_nohz_init()

Steven Rostedt (1):
nohz: Do not warn about unstable tsc unless user uses nohz_full

arch/ia64/include/asm/Kbuild | 1 +
arch/powerpc/include/asm/Kbuild | 1 +
arch/s390/include/asm/cputime.h | 3 -
arch/s390/include/asm/vtime.h | 7 ++
arch/s390/kernel/vtime.c | 1 +
include/linux/context_tracking.h | 118 ++++++++++++++--------------
include/linux/context_tracking_state.h | 39 ++++++++++
include/linux/vtime.h | 74 ++++++++++++++++--
include/trace/events/context_tracking.h | 58 ++++++++++++++
init/Kconfig | 2 +-
kernel/context_tracking.c | 126 ++++++++++++++++++-------------
kernel/sched/core.c | 4 +-
kernel/sched/cputime.c | 53 ++++---------
kernel/time/Kconfig | 1 -
kernel/time/tick-sched.c | 7 ++-
15 files changed, 327 insertions(+), 168 deletions(-)
create mode 100644 arch/s390/include/asm/vtime.h
create mode 100644 include/asm-generic/vtime.h
create mode 100644 include/linux/context_tracking_state.h
create mode 100644 include/trace/events/context_tracking.h

--
1.7.5.4


2013-07-17 16:44:51

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 04/18] context_tracing: Fix guest accounting with native vtime

1) If context tracking is enabled with native vtime accounting (which
combo is useless except for dev testing), we call vtime_guest_enter()
and vtime_guest_exit() on host <-> guest switches. But those are stubs
in this configurations. As a result, cputime is not correctly flushed
on kvm context switches.

2) If context tracking runs but is disabled on some CPUs, those
CPUs end up calling __guest_enter/__guest_exit which in turn
call vtime_account_system(). We don't want to call this because we
run in tick based accounting for these CPUs.

Refactor the guest_enter/guest_exit code such that all combinations
finally work.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
include/linux/context_tracking.h | 52 ++++++++++++++++----------------------
kernel/context_tracking.c | 6 +++-
2 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index fc09d7b..5984f25 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -20,25 +20,6 @@ struct context_tracking {
} state;
};

-static inline void __guest_enter(void)
-{
- /*
- * This is running in ioctl context so we can avoid
- * the call to vtime_account() with its unnecessary idle check.
- */
- vtime_account_system(current);
- current->flags |= PF_VCPU;
-}
-
-static inline void __guest_exit(void)
-{
- /*
- * This is running in ioctl context so we can avoid
- * the call to vtime_account() with its unnecessary idle check.
- */
- vtime_account_system(current);
- current->flags &= ~PF_VCPU;
-}

#ifdef CONFIG_CONTEXT_TRACKING
DECLARE_PER_CPU(struct context_tracking, context_tracking);
@@ -56,9 +37,6 @@ static inline bool context_tracking_active(void)
extern void user_enter(void);
extern void user_exit(void);

-extern void guest_enter(void);
-extern void guest_exit(void);
-
static inline enum ctx_state exception_enter(void)
{
enum ctx_state prev_ctx;
@@ -81,21 +59,35 @@ extern void context_tracking_task_switch(struct task_struct *prev,
static inline bool context_tracking_in_user(void) { return false; }
static inline void user_enter(void) { }
static inline void user_exit(void) { }
+static inline enum ctx_state exception_enter(void) { return 0; }
+static inline void exception_exit(enum ctx_state prev_ctx) { }
+static inline void context_tracking_task_switch(struct task_struct *prev,
+ struct task_struct *next) { }
+#endif /* !CONFIG_CONTEXT_TRACKING */

+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+extern void guest_enter(void);
+extern void guest_exit(void);
+#else
static inline void guest_enter(void)
{
- __guest_enter();
+ /*
+ * This is running in ioctl context so we can avoid
+ * the call to vtime_account() with its unnecessary idle check.
+ */
+ vtime_account_system(current);
+ current->flags |= PF_VCPU;
}

static inline void guest_exit(void)
{
- __guest_exit();
+ /*
+ * This is running in ioctl context so we can avoid
+ * the call to vtime_account() with its unnecessary idle check.
+ */
+ vtime_account_system(current);
+ current->flags &= ~PF_VCPU;
}
-
-static inline enum ctx_state exception_enter(void) { return 0; }
-static inline void exception_exit(enum ctx_state prev_ctx) { }
-static inline void context_tracking_task_switch(struct task_struct *prev,
- struct task_struct *next) { }
-#endif /* !CONFIG_CONTEXT_TRACKING */
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */

#endif
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 942835c..1f47119 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -141,12 +141,13 @@ void user_exit(void)
local_irq_restore(flags);
}

+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
void guest_enter(void)
{
if (vtime_accounting_enabled())
vtime_guest_enter(current);
else
- __guest_enter();
+ current->flags |= PF_VCPU;
}
EXPORT_SYMBOL_GPL(guest_enter);

@@ -155,9 +156,10 @@ void guest_exit(void)
if (vtime_accounting_enabled())
vtime_guest_exit(current);
else
- __guest_exit();
+ current->flags &= ~PF_VCPU;
}
EXPORT_SYMBOL_GPL(guest_exit);
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */


/**
--
1.7.5.4

2013-07-17 16:44:54

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 06/18] context_tracking: Fix runtime CPU off-case

As long as the context tracking is enabled on any CPU, even
a single one, all other CPUs need to keep track of their
user <-> kernel boundaries cross as well.

This is because a task can sleep while servicing an exception
that happened in the kernel or in userspace. Then when the task
eventually wakes up and return from the exception, the CPU needs
to know if we resume in userspace or in the kernel. exception_exit()
get this information from exception_enter() that saved the previous
state.

If the CPU where the exception happened didn't keep track of
these informations, exception_exit() doesn't know which state
tracking to restore on the CPU where the task got migrated
and we may return to userspace with the context tracking
subsystem thinking that we are in kernel mode.

This can be fixed in the long term if we move our context tracking
probes on very low level arch fast path user <-> kernel boundary,
although even that is worrisome as an exception can still happen
in the few instructions between the probe and the actual iret.

Also we are not yet ready to set these probes in the fast path given
the potential overhead problem it induces.

So let's fix this by always enable context tracking even on CPUs
that are not in the full dynticks range. OTOH we can spare the
rcu_user_*() and vtime_user_*() calls there because the tick runs
on these CPUs and we can handle RCU state machine and cputime
accounting through it.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
kernel/context_tracking.c | 52 ++++++++++++++++++++++++++++----------------
1 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 1f47119..7b095de 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -54,17 +54,31 @@ void user_enter(void)
WARN_ON_ONCE(!current->mm);

local_irq_save(flags);
- if (__this_cpu_read(context_tracking.active) &&
- __this_cpu_read(context_tracking.state) != IN_USER) {
+ if ( __this_cpu_read(context_tracking.state) != IN_USER) {
+ if (__this_cpu_read(context_tracking.active)) {
+ /*
+ * At this stage, only low level arch entry code remains and
+ * then we'll run in userspace. We can assume there won't be
+ * any RCU read-side critical section until the next call to
+ * user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
+ * on the tick.
+ */
+ vtime_user_enter(current);
+ rcu_user_enter();
+ }
/*
- * At this stage, only low level arch entry code remains and
- * then we'll run in userspace. We can assume there won't be
- * any RCU read-side critical section until the next call to
- * user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
- * on the tick.
+ * Even if context tracking is disabled on this CPU, because it's outside
+ * the full dynticks mask for example, we still have to keep track of the
+ * context transitions and states to prevent inconsistency on those of
+ * other CPUs.
+ * If a task triggers an exception in userspace, sleep on the exception
+ * handler and then migrate to another CPU, that new CPU must know where
+ * the exception returns by the time we call exception_exit().
+ * This information can only be provided by the previous CPU when it called
+ * exception_enter().
+ * OTOH we can spare the calls to vtime and RCU when context_tracking.active
+ * is false because we know that CPU is not tickless.
*/
- vtime_user_enter(current);
- rcu_user_enter();
__this_cpu_write(context_tracking.state, IN_USER);
}
local_irq_restore(flags);
@@ -130,12 +144,14 @@ void user_exit(void)

local_irq_save(flags);
if (__this_cpu_read(context_tracking.state) == IN_USER) {
- /*
- * We are going to run code that may use RCU. Inform
- * RCU core about that (ie: we may need the tick again).
- */
- rcu_user_exit();
- vtime_user_exit(current);
+ if (__this_cpu_read(context_tracking.active)) {
+ /*
+ * We are going to run code that may use RCU. Inform
+ * RCU core about that (ie: we may need the tick again).
+ */
+ rcu_user_exit();
+ vtime_user_exit(current);
+ }
__this_cpu_write(context_tracking.state, IN_KERNEL);
}
local_irq_restore(flags);
@@ -178,8 +194,6 @@ EXPORT_SYMBOL_GPL(guest_exit);
void context_tracking_task_switch(struct task_struct *prev,
struct task_struct *next)
{
- if (__this_cpu_read(context_tracking.active)) {
- clear_tsk_thread_flag(prev, TIF_NOHZ);
- set_tsk_thread_flag(next, TIF_NOHZ);
- }
+ clear_tsk_thread_flag(prev, TIF_NOHZ);
+ set_tsk_thread_flag(next, TIF_NOHZ);
}
--
1.7.5.4

2013-07-17 16:45:02

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 11/18] context_tracking: User/kernel broundary cross trace events

This can be useful to track all kernel/user round trips.
And it's also helpful to debug the context tracking subsystem.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
include/trace/events/context_tracking.h | 58 +++++++++++++++++++++++++++++++
kernel/context_tracking.c | 5 +++
2 files changed, 63 insertions(+), 0 deletions(-)
create mode 100644 include/trace/events/context_tracking.h

diff --git a/include/trace/events/context_tracking.h b/include/trace/events/context_tracking.h
new file mode 100644
index 0000000..ce8007c
--- /dev/null
+++ b/include/trace/events/context_tracking.h
@@ -0,0 +1,58 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM context_tracking
+
+#if !defined(_TRACE_CONTEXT_TRACKING_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_CONTEXT_TRACKING_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(context_tracking_user,
+
+ TP_PROTO(int dummy),
+
+ TP_ARGS(dummy),
+
+ TP_STRUCT__entry(
+ __field( int, dummy )
+ ),
+
+ TP_fast_assign(
+ __entry->dummy = dummy;
+ ),
+
+ TP_printk("%s", "")
+);
+
+/**
+ * user_enter - called when the kernel resumes to userspace
+ * @dummy: dummy arg to make trace event macro happy
+ *
+ * This event occurs when the kernel resumes to userspace after
+ * an exception or a syscall.
+ */
+DEFINE_EVENT(context_tracking_user, user_enter,
+
+ TP_PROTO(int dummy),
+
+ TP_ARGS(dummy)
+);
+
+/**
+ * user_exit - called when userspace enters the kernel
+ * @dummy: dummy arg to make trace event macro happy
+ *
+ * This event occurs when userspace enters the kernel through
+ * an exception or a syscall.
+ */
+DEFINE_EVENT(context_tracking_user, user_exit,
+
+ TP_PROTO(int dummy),
+
+ TP_ARGS(dummy)
+);
+
+
+#endif /* _TRACE_CONTEXT_TRACKING_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 5323a5c..64195f2 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -20,6 +20,9 @@
#include <linux/hardirq.h>
#include <linux/export.h>

+#define CREATE_TRACE_POINTS
+#include <trace/events/context_tracking.h>
+
struct static_key context_tracking_enabled = STATIC_KEY_INIT_FALSE;

DEFINE_PER_CPU(struct context_tracking, context_tracking);
@@ -62,6 +65,7 @@ void context_tracking_user_enter(void)
local_irq_save(flags);
if ( __this_cpu_read(context_tracking.state) != IN_USER) {
if (__this_cpu_read(context_tracking.active)) {
+ trace_user_enter(0);
/*
* At this stage, only low level arch entry code remains and
* then we'll run in userspace. We can assume there won't be
@@ -157,6 +161,7 @@ void context_tracking_user_exit(void)
*/
rcu_user_exit();
vtime_user_exit(current);
+ trace_user_exit(0);
}
__this_cpu_write(context_tracking.state, IN_KERNEL);
}
--
1.7.5.4

2013-07-17 16:44:58

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 08/18] context_tracking: Ground setup for static key use

Prepare for using a static key in the context tracking subsystem.
This will help optimizing the off case on its many users:

* user_enter, user_exit, exception_enter, exception_exit, guest_enter,
guest_exit, vtime_*()

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
include/linux/context_tracking.h | 2 ++
kernel/context_tracking.c | 26 ++++++++++++++++++++------
2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 2c2b73aa..0007ab4 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -4,6 +4,7 @@
#include <linux/sched.h>
#include <linux/percpu.h>
#include <linux/vtime.h>
+#include <linux/static_key.h>
#include <asm/ptrace.h>

struct context_tracking {
@@ -22,6 +23,7 @@ struct context_tracking {


#ifdef CONFIG_CONTEXT_TRACKING
+extern struct static_key context_tracking_enabled;
DECLARE_PER_CPU(struct context_tracking, context_tracking);

static inline bool context_tracking_in_user(void)
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 72bcb25..f07505c 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -20,15 +20,16 @@
#include <linux/hardirq.h>
#include <linux/export.h>

-DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
-#ifdef CONFIG_CONTEXT_TRACKING_FORCE
- .active = true,
-#endif
-};
+struct static_key context_tracking_enabled = STATIC_KEY_INIT_FALSE;
+
+DEFINE_PER_CPU(struct context_tracking, context_tracking);

void context_tracking_cpu_set(int cpu)
{
- per_cpu(context_tracking.active, cpu) = true;
+ if (!per_cpu(context_tracking.active, cpu)) {
+ per_cpu(context_tracking.active, cpu) = true;
+ static_key_slow_inc(&context_tracking_enabled);
+ }
}

/**
@@ -202,3 +203,16 @@ void context_tracking_task_switch(struct task_struct *prev,
clear_tsk_thread_flag(prev, TIF_NOHZ);
set_tsk_thread_flag(next, TIF_NOHZ);
}
+
+#ifdef CONFIG_CONTEXT_TRACKING_FORCE
+static int __init context_tracking_init(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ context_tracking_cpu_set(cpu);
+
+ return 0;
+}
+early_initcall(context_tracking_init);
+#endif
--
1.7.5.4

2013-07-17 16:45:09

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 13/18] vtime: Fix racy cputime delta update

get_vtime_delta() must be called under the task vtime_seqlock
with the code that does the cputime accounting flush.

Otherwise the cputime reader can be fooled and run into
a race where it sees the snapshot update but misses the
cputime flush. As a result it can report a cputime that is
way too short.

Fix vtime_account_user() that wasn't complying to that rule.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
kernel/sched/cputime.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5f273b47..b62d5c0 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -683,9 +683,10 @@ void vtime_account_irq_exit(struct task_struct *tsk)

void vtime_account_user(struct task_struct *tsk)
{
- cputime_t delta_cpu = get_vtime_delta(tsk);
+ cputime_t delta_cpu;

write_seqlock(&tsk->vtime_seqlock);
+ delta_cpu = get_vtime_delta(tsk);
tsk->vtime_snap_whence = VTIME_SYS;
account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
write_sequnlock(&tsk->vtime_seqlock);
--
1.7.5.4

2013-07-17 16:45:19

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 17/18] vtime: Always scale generic vtime accounting results

The cputime accounting in full dynticks can be a subtle
mixup of CPUs using tick based accounting and others using
generic vtime.

As long as the tick can have a share on producing these stats, we
want to scale the result against CFS precise accounting as the tick
can miss some task hiding between the periodic interrupt.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
kernel/sched/cputime.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 0831b06..e9e742e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -553,12 +553,6 @@ static void cputime_adjust(struct task_cputime *curr,
{
cputime_t rtime, stime, utime, total;

- if (vtime_accounting_enabled()) {
- *ut = curr->utime;
- *st = curr->stime;
- return;
- }
-
stime = curr->stime;
total = stime + curr->utime;

--
1.7.5.4

2013-07-17 16:45:15

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 18/18] vtime: Always debug check snapshot source _before_ updating it

The vtime delta update performed by get_vtime_delta() always check
that the source of the snapshot is valid.

Meanhile the snapshot updaters that rely on get_vtime_delta() also
set the new snapshot origin. But some of them do this right before
the call to get_vtime_delta(), making its debug check useless.

This is easily fixable by moving the snapshot origin update after
the call to get_vtime_delta(). The order doesn't matter there.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
kernel/sched/cputime.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e9e742e..c1d7493 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -660,9 +660,9 @@ void vtime_account_system(struct task_struct *tsk)
void vtime_gen_account_irq_exit(struct task_struct *tsk)
{
write_seqlock(&tsk->vtime_seqlock);
+ __vtime_account_system(tsk);
if (context_tracking_in_user())
tsk->vtime_snap_whence = VTIME_USER;
- __vtime_account_system(tsk);
write_sequnlock(&tsk->vtime_seqlock);
}

@@ -680,8 +680,8 @@ void vtime_account_user(struct task_struct *tsk)
void vtime_user_enter(struct task_struct *tsk)
{
write_seqlock(&tsk->vtime_seqlock);
- tsk->vtime_snap_whence = VTIME_USER;
__vtime_account_system(tsk);
+ tsk->vtime_snap_whence = VTIME_USER;
write_sequnlock(&tsk->vtime_seqlock);
}

--
1.7.5.4

2013-07-17 16:45:59

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 16/18] vtime: Optimize full dynticks accounting off case with static keys

If no CPU is in the full dynticks range, we can avoid the full
dynticks cputime accounting through generic vtime along with its
overhead and use the traditional tick based accounting instead.

Let's do this and nope the off case with static keys.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
include/linux/context_tracking.h | 6 +--
include/linux/vtime.h | 70 +++++++++++++++++++++++++++++++++-----
kernel/sched/cputime.c | 22 ++----------
3 files changed, 67 insertions(+), 31 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 8635dbb..07cdf93 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -66,8 +66,7 @@ static inline void context_tracking_task_switch(struct task_struct *prev,
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
static inline void guest_enter(void)
{
- if (static_key_false(&context_tracking_enabled) &&
- vtime_accounting_enabled())
+ if (vtime_accounting_enabled())
vtime_guest_enter(current);
else
current->flags |= PF_VCPU;
@@ -75,8 +74,7 @@ static inline void guest_enter(void)

static inline void guest_exit(void)
{
- if (static_key_false(&context_tracking_enabled) &&
- vtime_accounting_enabled())
+ if (vtime_accounting_enabled())
vtime_guest_exit(current);
else
current->flags &= ~PF_VCPU;
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 2ad0739..f5b72b3 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -1,22 +1,68 @@
#ifndef _LINUX_KERNEL_VTIME_H
#define _LINUX_KERNEL_VTIME_H

+#include <linux/context_tracking_state.h>
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
#include <asm/vtime.h>
#endif

+
struct task_struct;

+/*
+ * vtime_accounting_enabled() definitions/declarations
+ */
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+static inline bool vtime_accounting_enabled(void) { return true; }
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+static inline bool vtime_accounting_enabled(void)
+{
+ if (static_key_false(&context_tracking_enabled)) {
+ if (context_tracking_active())
+ return true;
+ }
+
+ return false;
+}
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
+
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+static inline bool vtime_accounting_enabled(void) { return false; }
+#endif /* !CONFIG_VIRT_CPU_ACCOUNTING */
+
+
+/*
+ * Common vtime APIs
+ */
#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+
+#ifdef __ARCH_HAS_VTIME_TASK_SWITCH
extern void vtime_task_switch(struct task_struct *prev);
+#else
+extern void vtime_common_task_switch(struct task_struct *prev);
+static inline void vtime_task_switch(struct task_struct *prev)
+{
+ if (vtime_accounting_enabled())
+ vtime_common_task_switch(prev);
+}
+#endif /* __ARCH_HAS_VTIME_TASK_SWITCH */
+
extern void vtime_account_system(struct task_struct *tsk);
extern void vtime_account_idle(struct task_struct *tsk);
extern void vtime_account_user(struct task_struct *tsk);
-extern void vtime_account_irq_enter(struct task_struct *tsk);

-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-static inline bool vtime_accounting_enabled(void) { return true; }
-#endif
+#ifdef __ARCH_HAS_VTIME_ACCOUNT
+extern void vtime_account_irq_enter(struct task_struct *tsk);
+#else
+extern void vtime_common_account_irq_enter(struct task_struct *tsk);
+static inline void vtime_account_irq_enter(struct task_struct *tsk)
+{
+ if (vtime_accounting_enabled())
+ vtime_common_account_irq_enter(tsk);
+}
+#endif /* __ARCH_HAS_VTIME_ACCOUNT */

#else /* !CONFIG_VIRT_CPU_ACCOUNTING */

@@ -24,14 +70,20 @@ static inline void vtime_task_switch(struct task_struct *prev) { }
static inline void vtime_account_system(struct task_struct *tsk) { }
static inline void vtime_account_user(struct task_struct *tsk) { }
static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
-static inline bool vtime_accounting_enabled(void) { return false; }
-#endif
+#endif /* !CONFIG_VIRT_CPU_ACCOUNTING */

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
extern void arch_vtime_task_switch(struct task_struct *tsk);
-extern void vtime_account_irq_exit(struct task_struct *tsk);
-extern bool vtime_accounting_enabled(void);
+extern void vtime_gen_account_irq_exit(struct task_struct *tsk);
+
+static inline void vtime_account_irq_exit(struct task_struct *tsk)
+{
+ if (vtime_accounting_enabled())
+ vtime_gen_account_irq_exit(tsk);
+}
+
extern void vtime_user_enter(struct task_struct *tsk);
+
static inline void vtime_user_exit(struct task_struct *tsk)
{
vtime_account_user(tsk);
@@ -39,7 +91,7 @@ static inline void vtime_user_exit(struct task_struct *tsk)
extern void vtime_guest_enter(struct task_struct *tsk);
extern void vtime_guest_exit(struct task_struct *tsk);
extern void vtime_init_idle(struct task_struct *tsk, int cpu);
-#else
+#else /* !CONFIG_VIRT_CPU_ACCOUNTING_GEN */
static inline void vtime_account_irq_exit(struct task_struct *tsk)
{
/* On hard|softirq exit we always account to hard|softirq cputime */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index b62d5c0..0831b06 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -378,11 +378,8 @@ static inline void irqtime_account_process_tick(struct task_struct *p, int user_
#ifdef CONFIG_VIRT_CPU_ACCOUNTING

#ifndef __ARCH_HAS_VTIME_TASK_SWITCH
-void vtime_task_switch(struct task_struct *prev)
+void vtime_common_task_switch(struct task_struct *prev)
{
- if (!vtime_accounting_enabled())
- return;
-
if (is_idle_task(prev))
vtime_account_idle(prev);
else
@@ -404,11 +401,8 @@ void vtime_task_switch(struct task_struct *prev)
* vtime_account().
*/
#ifndef __ARCH_HAS_VTIME_ACCOUNT
-void vtime_account_irq_enter(struct task_struct *tsk)
+void vtime_common_account_irq_enter(struct task_struct *tsk)
{
- if (!vtime_accounting_enabled())
- return;
-
if (!in_interrupt()) {
/*
* If we interrupted user, context_tracking_in_user()
@@ -428,7 +422,7 @@ void vtime_account_irq_enter(struct task_struct *tsk)
}
vtime_account_system(tsk);
}
-EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
+EXPORT_SYMBOL_GPL(vtime_common_account_irq_enter);
#endif /* __ARCH_HAS_VTIME_ACCOUNT */
#endif /* CONFIG_VIRT_CPU_ACCOUNTING */

@@ -669,11 +663,8 @@ void vtime_account_system(struct task_struct *tsk)
write_sequnlock(&tsk->vtime_seqlock);
}

-void vtime_account_irq_exit(struct task_struct *tsk)
+void vtime_gen_account_irq_exit(struct task_struct *tsk)
{
- if (!vtime_accounting_enabled())
- return;
-
write_seqlock(&tsk->vtime_seqlock);
if (context_tracking_in_user())
tsk->vtime_snap_whence = VTIME_USER;
@@ -732,11 +723,6 @@ void vtime_account_idle(struct task_struct *tsk)
account_idle_time(delta_cpu);
}

-bool vtime_accounting_enabled(void)
-{
- return context_tracking_active();
-}
-
void arch_vtime_task_switch(struct task_struct *prev)
{
write_seqlock(&prev->vtime_seqlock);
--
1.7.5.4

2013-07-17 16:49:58

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 15/18] vtime: Describe overriden functions in dedicated arch headers

If the arch overrides some generic vtime APIs, let it describe
these on a dedicated and standalone header. This way it becomes
convenient to include it in vtime generic headers without irrelevant
stuff in such a low level header.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
---
arch/ia64/include/asm/Kbuild | 1 +
arch/powerpc/include/asm/Kbuild | 1 +
arch/s390/include/asm/cputime.h | 3 ---
arch/s390/include/asm/vtime.h | 7 +++++++
arch/s390/kernel/vtime.c | 1 +
include/linux/vtime.h | 4 ++++
6 files changed, 14 insertions(+), 3 deletions(-)
create mode 100644 arch/s390/include/asm/vtime.h
create mode 100644 include/asm-generic/vtime.h

diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 05b03ec..a3456f3 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
generic-y += exec.h
generic-y += kvm_para.h
generic-y += trace_clock.h
+generic-y += vtime.h
\ No newline at end of file
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 650757c..704e6f1 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -2,3 +2,4 @@
generic-y += clkdev.h
generic-y += rwsem.h
generic-y += trace_clock.h
+generic-y += vtime.h
\ No newline at end of file
diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index d2ff4137..f65bd36 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -13,9 +13,6 @@
#include <asm/div64.h>


-#define __ARCH_HAS_VTIME_ACCOUNT
-#define __ARCH_HAS_VTIME_TASK_SWITCH
-
/* We want to use full resolution of the CPU timer: 2**-12 micro-seconds. */

typedef unsigned long long __nocast cputime_t;
diff --git a/arch/s390/include/asm/vtime.h b/arch/s390/include/asm/vtime.h
new file mode 100644
index 0000000..af9896c
--- /dev/null
+++ b/arch/s390/include/asm/vtime.h
@@ -0,0 +1,7 @@
+#ifndef _S390_VTIME_H
+#define _S390_VTIME_H
+
+#define __ARCH_HAS_VTIME_ACCOUNT
+#define __ARCH_HAS_VTIME_TASK_SWITCH
+
+#endif /* _S390_VTIME_H */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 3fb0935..785c73f 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -19,6 +19,7 @@
#include <asm/irq_regs.h>
#include <asm/cputime.h>
#include <asm/vtimer.h>
+#include <asm/vtime.h>
#include <asm/irq.h>
#include "entry.h"

diff --git a/include/asm-generic/vtime.h b/include/asm-generic/vtime.h
new file mode 100644
index 0000000..e69de29
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index b1dd2db..2ad0739 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -1,6 +1,10 @@
#ifndef _LINUX_KERNEL_VTIME_H
#define _LINUX_KERNEL_VTIME_H

+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+#include <asm/vtime.h>
+#endif
+
struct task_struct;

#ifdef CONFIG_VIRT_CPU_ACCOUNTING
--
1.7.5.4

2013-07-17 16:49:59

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 14/18] context_tracking: Split low level state headers

We plan to use the context tracking static key on inline
vtime APIs. For this we need to include the context tracking
headers from those of vtime.

However vtime headers need to stay low level because they are
included in hardirq.h that mostly contains standalone
definitions. But context_tracking.h includes sched.h for
a few task_struct references, therefore it wouldn't be sensible
to include it from vtime.h

To solve this, lets split the context tracking headers and move
out the pure state definitions that only require a few low level
headers. We can safely include that small part in vtime.h later.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
include/linux/context_tracking.h | 31 +------------------------
include/linux/context_tracking_state.h | 39 ++++++++++++++++++++++++++++++++
2 files changed, 40 insertions(+), 30 deletions(-)
create mode 100644 include/linux/context_tracking_state.h

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 77f7306..8635dbb 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -2,40 +2,12 @@
#define _LINUX_CONTEXT_TRACKING_H

#include <linux/sched.h>
-#include <linux/percpu.h>
#include <linux/vtime.h>
-#include <linux/static_key.h>
+#include <linux/context_tracking_state.h>
#include <asm/ptrace.h>

-struct context_tracking {
- /*
- * When active is false, probes are unset in order
- * to minimize overhead: TIF flags are cleared
- * and calls to user_enter/exit are ignored. This
- * may be further optimized using static keys.
- */
- bool active;
- enum ctx_state {
- IN_KERNEL = 0,
- IN_USER,
- } state;
-};
-

#ifdef CONFIG_CONTEXT_TRACKING
-extern struct static_key context_tracking_enabled;
-DECLARE_PER_CPU(struct context_tracking, context_tracking);
-
-static inline bool context_tracking_in_user(void)
-{
- return __this_cpu_read(context_tracking.state) == IN_USER;
-}
-
-static inline bool context_tracking_active(void)
-{
- return __this_cpu_read(context_tracking.active);
-}
-
extern void context_tracking_cpu_set(int cpu);

extern void context_tracking_user_enter(void);
@@ -83,7 +55,6 @@ static inline void context_tracking_task_switch(struct task_struct *prev,
__context_tracking_task_switch(prev, next);
}
#else
-static inline bool context_tracking_in_user(void) { return false; }
static inline void user_enter(void) { }
static inline void user_exit(void) { }
static inline enum ctx_state exception_enter(void) { return 0; }
diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
new file mode 100644
index 0000000..0f1979d
--- /dev/null
+++ b/include/linux/context_tracking_state.h
@@ -0,0 +1,39 @@
+#ifndef _LINUX_CONTEXT_TRACKING_STATE_H
+#define _LINUX_CONTEXT_TRACKING_STATE_H
+
+#include <linux/percpu.h>
+#include <linux/static_key.h>
+
+struct context_tracking {
+ /*
+ * When active is false, probes are unset in order
+ * to minimize overhead: TIF flags are cleared
+ * and calls to user_enter/exit are ignored. This
+ * may be further optimized using static keys.
+ */
+ bool active;
+ enum ctx_state {
+ IN_KERNEL = 0,
+ IN_USER,
+ } state;
+};
+
+#ifdef CONFIG_CONTEXT_TRACKING
+extern struct static_key context_tracking_enabled;
+DECLARE_PER_CPU(struct context_tracking, context_tracking);
+
+static inline bool context_tracking_in_user(void)
+{
+ return __this_cpu_read(context_tracking.state) == IN_USER;
+}
+
+static inline bool context_tracking_active(void)
+{
+ return __this_cpu_read(context_tracking.active);
+}
+#else
+static inline bool context_tracking_in_user(void) { return false; }
+static inline bool context_tracking_active(void) { return false; }
+#endif /* CONFIG_CONTEXT_TRACKING */
+
+#endif
--
1.7.5.4

2013-07-17 16:50:47

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 10/18] context_tracking: Optimize context switch off case with static keys

No need for syscall slowpath if no CPU is full dynticks,
rather nop this in this case.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
include/linux/context_tracking.h | 11 +++++++++--
kernel/context_tracking.c | 6 +++---
2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 8eba58c..77f7306 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -40,6 +40,8 @@ extern void context_tracking_cpu_set(int cpu);

extern void context_tracking_user_enter(void);
extern void context_tracking_user_exit(void);
+extern void __context_tracking_task_switch(struct task_struct *prev,
+ struct task_struct *next);

static inline void user_enter(void)
{
@@ -74,8 +76,12 @@ static inline void exception_exit(enum ctx_state prev_ctx)
}
}

-extern void context_tracking_task_switch(struct task_struct *prev,
- struct task_struct *next);
+static inline void context_tracking_task_switch(struct task_struct *prev,
+ struct task_struct *next)
+{
+ if (static_key_false(&context_tracking_enabled))
+ __context_tracking_task_switch(prev, next);
+}
#else
static inline bool context_tracking_in_user(void) { return false; }
static inline void user_enter(void) { }
@@ -104,6 +110,7 @@ static inline void guest_exit(void)
else
current->flags &= ~PF_VCPU;
}
+
#else
static inline void guest_enter(void)
{
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 7b0a22d..5323a5c 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -164,7 +164,7 @@ void context_tracking_user_exit(void)
}

/**
- * context_tracking_task_switch - context switch the syscall callbacks
+ * __context_tracking_task_switch - context switch the syscall callbacks
* @prev: the task that is being switched out
* @next: the task that is being switched in
*
@@ -176,8 +176,8 @@ void context_tracking_user_exit(void)
* migrate to some CPU that doesn't do the context tracking. As such the TIF
* flag may not be desired there.
*/
-void context_tracking_task_switch(struct task_struct *prev,
- struct task_struct *next)
+void __context_tracking_task_switch(struct task_struct *prev,
+ struct task_struct *next)
{
clear_tsk_thread_flag(prev, TIF_NOHZ);
set_tsk_thread_flag(next, TIF_NOHZ);
--
1.7.5.4

2013-07-17 16:50:45

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 12/18] vtime: Remove a few unneeded generic vtime state checks

Some generic vtime APIs check if the vtime accounting
is enabled on the local CPU before doing their work.

Some of these are not needed because all their callers already
take care of that. Let's remove the checks on these.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
kernel/sched/cputime.c | 13 +------------
1 files changed, 1 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index bb6b29a..5f273b47 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -664,9 +664,6 @@ static void __vtime_account_system(struct task_struct *tsk)

void vtime_account_system(struct task_struct *tsk)
{
- if (!vtime_accounting_enabled())
- return;
-
write_seqlock(&tsk->vtime_seqlock);
__vtime_account_system(tsk);
write_sequnlock(&tsk->vtime_seqlock);
@@ -686,12 +683,7 @@ void vtime_account_irq_exit(struct task_struct *tsk)

void vtime_account_user(struct task_struct *tsk)
{
- cputime_t delta_cpu;
-
- if (!vtime_accounting_enabled())
- return;
-
- delta_cpu = get_vtime_delta(tsk);
+ cputime_t delta_cpu = get_vtime_delta(tsk);

write_seqlock(&tsk->vtime_seqlock);
tsk->vtime_snap_whence = VTIME_SYS;
@@ -701,9 +693,6 @@ void vtime_account_user(struct task_struct *tsk)

void vtime_user_enter(struct task_struct *tsk)
{
- if (!vtime_accounting_enabled())
- return;
-
write_seqlock(&tsk->vtime_seqlock);
tsk->vtime_snap_whence = VTIME_USER;
__vtime_account_system(tsk);
--
1.7.5.4

2013-07-17 16:51:38

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 09/18] context_tracking: Optimize main APIs off case with static key

Optimize user, exception and guest entry/exit APIs with static
keys. This minimize the overhead for those who enable
CONFIG_NO_HZ_FULL without always using it. Having no range
passed to nohz_full= should result in the probes to be nopped
(at least we hope so...).

If this proves not be enough in the long term, we'll need
to bring an exception slow path by re-routing the exception
handlers.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
include/linux/context_tracking.h | 46 ++++++++++++++++++++++++++++++++-----
kernel/context_tracking.c | 33 +++++----------------------
kernel/sched/cputime.c | 2 +
3 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 0007ab4..8eba58c 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -38,23 +38,40 @@ static inline bool context_tracking_active(void)

extern void context_tracking_cpu_set(int cpu);

-extern void user_enter(void);
-extern void user_exit(void);
+extern void context_tracking_user_enter(void);
+extern void context_tracking_user_exit(void);
+
+static inline void user_enter(void)
+{
+ if (static_key_false(&context_tracking_enabled))
+ context_tracking_user_enter();
+
+}
+static inline void user_exit(void)
+{
+ if (static_key_false(&context_tracking_enabled))
+ context_tracking_user_exit();
+}

static inline enum ctx_state exception_enter(void)
{
enum ctx_state prev_ctx;

+ if (!static_key_false(&context_tracking_enabled))
+ return 0;
+
prev_ctx = this_cpu_read(context_tracking.state);
- user_exit();
+ context_tracking_user_exit();

return prev_ctx;
}

static inline void exception_exit(enum ctx_state prev_ctx)
{
- if (prev_ctx == IN_USER)
- user_enter();
+ if (static_key_false(&context_tracking_enabled)) {
+ if (prev_ctx == IN_USER)
+ context_tracking_user_enter();
+ }
}

extern void context_tracking_task_switch(struct task_struct *prev,
@@ -70,8 +87,23 @@ static inline void context_tracking_task_switch(struct task_struct *prev,
#endif /* !CONFIG_CONTEXT_TRACKING */

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-extern void guest_enter(void);
-extern void guest_exit(void);
+static inline void guest_enter(void)
+{
+ if (static_key_false(&context_tracking_enabled) &&
+ vtime_accounting_enabled())
+ vtime_guest_enter(current);
+ else
+ current->flags |= PF_VCPU;
+}
+
+static inline void guest_exit(void)
+{
+ if (static_key_false(&context_tracking_enabled) &&
+ vtime_accounting_enabled())
+ vtime_guest_exit(current);
+ else
+ current->flags &= ~PF_VCPU;
+}
#else
static inline void guest_enter(void)
{
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index f07505c..7b0a22d 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -33,15 +33,15 @@ void context_tracking_cpu_set(int cpu)
}

/**
- * user_enter - Inform the context tracking that the CPU is going to
- * enter userspace mode.
+ * context_tracking_user_enter - Inform the context tracking that the CPU is going to
+ * enter userspace mode.
*
* This function must be called right before we switch from the kernel
* to userspace, when it's guaranteed the remaining kernel instructions
* to execute won't use any RCU read side critical section because this
* function sets RCU in extended quiescent state.
*/
-void user_enter(void)
+void context_tracking_user_enter(void)
{
unsigned long flags;

@@ -131,8 +131,8 @@ EXPORT_SYMBOL_GPL(preempt_schedule_context);
#endif /* CONFIG_PREEMPT */

/**
- * user_exit - Inform the context tracking that the CPU is
- * exiting userspace mode and entering the kernel.
+ * context_tracking_user_exit - Inform the context tracking that the CPU is
+ * exiting userspace mode and entering the kernel.
*
* This function must be called after we entered the kernel from userspace
* before any use of RCU read side critical section. This potentially include
@@ -141,7 +141,7 @@ EXPORT_SYMBOL_GPL(preempt_schedule_context);
* This call supports re-entrancy. This way it can be called from any exception
* handler without needing to know if we came from userspace or not.
*/
-void user_exit(void)
+void context_tracking_user_exit(void)
{
unsigned long flags;

@@ -163,27 +163,6 @@ void user_exit(void)
local_irq_restore(flags);
}

-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-void guest_enter(void)
-{
- if (vtime_accounting_enabled())
- vtime_guest_enter(current);
- else
- current->flags |= PF_VCPU;
-}
-EXPORT_SYMBOL_GPL(guest_enter);
-
-void guest_exit(void)
-{
- if (vtime_accounting_enabled())
- vtime_guest_exit(current);
- else
- current->flags &= ~PF_VCPU;
-}
-EXPORT_SYMBOL_GPL(guest_exit);
-#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
-
-
/**
* context_tracking_task_switch - context switch the syscall callbacks
* @prev: the task that is being switched out
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 223a35e..bb6b29a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -724,6 +724,7 @@ void vtime_guest_enter(struct task_struct *tsk)
current->flags |= PF_VCPU;
write_sequnlock(&tsk->vtime_seqlock);
}
+EXPORT_SYMBOL_GPL(vtime_guest_enter);

void vtime_guest_exit(struct task_struct *tsk)
{
@@ -732,6 +733,7 @@ void vtime_guest_exit(struct task_struct *tsk)
current->flags &= ~PF_VCPU;
write_sequnlock(&tsk->vtime_seqlock);
}
+EXPORT_SYMBOL_GPL(vtime_guest_exit);

void vtime_account_idle(struct task_struct *tsk)
{
--
1.7.5.4

2013-07-17 16:51:58

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 07/18] nohz: Selectively enable context tracking on full dynticks CPUs

The code is ready to do so in the context tracking subsystem, now
we just need to pass our cpu range selection to it from the
full dynticks subsystem.

This way we can spare the overhead of RCU user extended quiescent
state and vtime maintainance on these CPUs. Just keep in mind the
context tracking itself is still necessary everywhere.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
include/linux/context_tracking.h | 2 ++
init/Kconfig | 2 +-
kernel/context_tracking.c | 5 +++++
kernel/time/Kconfig | 1 -
kernel/time/tick-sched.c | 6 ++++++
5 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 12045ce..2c2b73aa 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -34,6 +34,8 @@ static inline bool context_tracking_active(void)
return __this_cpu_read(context_tracking.active);
}

+extern void context_tracking_cpu_set(int cpu);
+
extern void user_enter(void);
extern void user_exit(void);

diff --git a/init/Kconfig b/init/Kconfig
index 247084b..914da3f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -527,7 +527,7 @@ config RCU_USER_QS
config CONTEXT_TRACKING_FORCE
bool "Force context tracking"
depends on CONTEXT_TRACKING
- default CONTEXT_TRACKING
+ default y if !NO_HZ_FULL
help
Probe on user/kernel boundaries by default in order to
test the features that rely on it such as userspace RCU extended
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 7b095de..72bcb25 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -26,6 +26,11 @@ DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
#endif
};

+void context_tracking_cpu_set(int cpu)
+{
+ per_cpu(context_tracking.active, cpu) = true;
+}
+
/**
* user_enter - Inform the context tracking that the CPU is going to
* enter userspace mode.
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 70f27e8..747bbc7 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -105,7 +105,6 @@ config NO_HZ_FULL
select RCU_USER_QS
select RCU_NOCB_CPU
select VIRT_CPU_ACCOUNTING_GEN
- select CONTEXT_TRACKING_FORCE
select IRQ_WORK
help
Adaptively try to shutdown the tick whenever possible, even when
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 088c411..062759e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -23,6 +23,7 @@
#include <linux/irq_work.h>
#include <linux/posix-timers.h>
#include <linux/perf_event.h>
+#include <linux/context_tracking.h>

#include <asm/irq_regs.h>

@@ -344,11 +345,16 @@ static int tick_nohz_init_all(void)

void __init tick_nohz_init(void)
{
+ int cpu;
+
if (!have_nohz_full_mask) {
if (tick_nohz_init_all() < 0)
return;
}

+ for_each_cpu(cpu, nohz_full_mask)
+ context_tracking_cpu_set(cpu);
+
cpu_notifier(tick_nohz_cpu_down_callback, 0);
cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), nohz_full_mask);
pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
--
1.7.5.4

2013-07-17 16:52:25

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 05/18] vtime: Update a few comments

Update a stale comment from the old vtime era and document some
locking that might be non obvious.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
include/linux/context_tracking.h | 8 ++------
kernel/sched/cputime.c | 7 +++++++
2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 5984f25..12045ce 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -72,8 +72,8 @@ extern void guest_exit(void);
static inline void guest_enter(void)
{
/*
- * This is running in ioctl context so we can avoid
- * the call to vtime_account() with its unnecessary idle check.
+ * This is running in ioctl context so its safe
+ * to assume the pending cputime to flush is stime.
*/
vtime_account_system(current);
current->flags |= PF_VCPU;
@@ -81,10 +81,6 @@ static inline void guest_enter(void)

static inline void guest_exit(void)
{
- /*
- * This is running in ioctl context so we can avoid
- * the call to vtime_account() with its unnecessary idle check.
- */
vtime_account_system(current);
current->flags &= ~PF_VCPU;
}
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index a7959e0..223a35e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -712,6 +712,13 @@ void vtime_user_enter(struct task_struct *tsk)

void vtime_guest_enter(struct task_struct *tsk)
{
+ /*
+ * The flags must be updated under the lock with
+ * the vtime_snap flush and update.
+ * That enforces a right ordering and update sequence
+ * synchronization against the reader (task_gtime())
+ * that can thus safely catch up with a tickless delta.
+ */
write_seqlock(&tsk->vtime_seqlock);
__vtime_account_system(tsk);
current->flags |= PF_VCPU;
--
1.7.5.4

2013-07-17 16:44:48

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 02/18] nohz: fix compile warning in tick_nohz_init()

From: Li Zhong <[email protected]>

cpu is not used after commit 5b8621a68fdcd2baf1d3b413726f913a5254d46a

Signed-off-by: Li Zhong <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/tick-sched.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6f47049..088c411 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -344,8 +344,6 @@ static int tick_nohz_init_all(void)

void __init tick_nohz_init(void)
{
- int cpu;
-
if (!have_nohz_full_mask) {
if (tick_nohz_init_all() < 0)
return;
--
1.7.5.4

2013-07-17 16:52:51

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 03/18] sched: Consolidate open coded preemptible() checks

preempt_schedule() and preempt_schedule_context() open
code their preemptability checks.

Use the standard API instead for consolidation.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Vincent Guittot <[email protected]>
---
kernel/context_tracking.c | 3 +--
kernel/sched/core.c | 4 +---
2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 383f823..942835c 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -87,10 +87,9 @@ void user_enter(void)
*/
void __sched notrace preempt_schedule_context(void)
{
- struct thread_info *ti = current_thread_info();
enum ctx_state prev_ctx;

- if (likely(ti->preempt_count || irqs_disabled()))
+ if (likely(!preemptible()))
return;

/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0d8eb45..0e2936d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2510,13 +2510,11 @@ void __sched schedule_preempt_disabled(void)
*/
asmlinkage void __sched notrace preempt_schedule(void)
{
- struct thread_info *ti = current_thread_info();
-
/*
* If there is a non-zero preempt_count or interrupts are disabled,
* we do not want to preempt the current task. Just return..
*/
- if (likely(ti->preempt_count || irqs_disabled()))
+ if (likely(!preemptible()))
return;

do {
--
1.7.5.4

2013-07-17 16:53:14

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 01/18] nohz: Do not warn about unstable tsc unless user uses nohz_full

From: Steven Rostedt <[email protected]>

If the user enables CONFIG_NO_HZ_FULL and runs the kernel on a machine
with an unstable TSC, it will produce a WARN_ON dump as well as taint
the kernel. This is a bit extreme for a kernel that just enables a
feature but doesn't use it.

The warning should only happen if the user tries to use the feature by
either adding nohz_full to the kernel command line, or by enabling
CONFIG_NO_HZ_FULL_ALL that makes nohz used on all CPUs at boot up. Note,
this second feature should not (yet) be used by distros or anyone that
doesn't care if NO_HZ is used or not.

Signed-off-by: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/tick-sched.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6960172..6f47049 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -182,7 +182,8 @@ static bool can_stop_full_tick(void)
* Don't allow the user to think they can get
* full NO_HZ with this machine.
*/
- WARN_ONCE(1, "NO_HZ FULL will not work with unstable sched clock");
+ WARN_ONCE(have_nohz_full_mask,
+ "NO_HZ FULL will not work with unstable sched clock");
return false;
}
#endif
--
1.7.5.4

2013-07-17 17:57:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 05/18] vtime: Update a few comments

On Wed, 2013-07-17 at 18:44 +0200, Frederic Weisbecker wrote:
> Update a stale comment from the old vtime era and document some
> locking that might be non obvious.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Li Zhong <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Kevin Hilman <[email protected]>
> ---
> include/linux/context_tracking.h | 8 ++------
> kernel/sched/cputime.c | 7 +++++++
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index 5984f25..12045ce 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -72,8 +72,8 @@ extern void guest_exit(void);
> static inline void guest_enter(void)
> {
> /*
> - * This is running in ioctl context so we can avoid
> - * the call to vtime_account() with its unnecessary idle check.
> + * This is running in ioctl context so its safe
> + * to assume the pending cputime to flush is stime.

The above is worded funny. What about:

"This is running in ioctl context so its safe to assume that its the
stime pending cputime to flush"

I don't know. But "is stime" is what made me have to read that three
times to figure out what you meant.


> */
> vtime_account_system(current);
> current->flags |= PF_VCPU;
> @@ -81,10 +81,6 @@ static inline void guest_enter(void)
>
> static inline void guest_exit(void)
> {
> - /*
> - * This is running in ioctl context so we can avoid
> - * the call to vtime_account() with its unnecessary idle check.
> - */

Should we copy the comment here too?

-- Steve

> vtime_account_system(current);
> current->flags &= ~PF_VCPU;
> }
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index a7959e0..223a35e 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -712,6 +712,13 @@ void vtime_user_enter(struct task_struct *tsk)
>
> void vtime_guest_enter(struct task_struct *tsk)
> {
> + /*
> + * The flags must be updated under the lock with
> + * the vtime_snap flush and update.
> + * That enforces a right ordering and update sequence
> + * synchronization against the reader (task_gtime())
> + * that can thus safely catch up with a tickless delta.
> + */
> write_seqlock(&tsk->vtime_seqlock);
> __vtime_account_system(tsk);
> current->flags |= PF_VCPU;

2013-07-17 18:27:20

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 07/18] nohz: Selectively enable context tracking on full dynticks CPUs

On Wed, 2013-07-17 at 18:44 +0200, Frederic Weisbecker wrote:
> The code is ready to do so in the context tracking subsystem, now

"do so"? Do what?

> we just need to pass our cpu range selection to it from the

Pass cpu range selection to what?

Pronouns are evil in technical documentation.

> full dynticks subsystem.
>
> This way we can spare the overhead of RCU user extended quiescent
> state and vtime maintainance on these CPUs. Just keep in mind the
> context tracking itself is still necessary everywhere.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Li Zhong <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Kevin Hilman <[email protected]>
> ---
> include/linux/context_tracking.h | 2 ++
> init/Kconfig | 2 +-
> kernel/context_tracking.c | 5 +++++
> kernel/time/Kconfig | 1 -
> kernel/time/tick-sched.c | 6 ++++++
> 5 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index 12045ce..2c2b73aa 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -34,6 +34,8 @@ static inline bool context_tracking_active(void)
> return __this_cpu_read(context_tracking.active);
> }
>
> +extern void context_tracking_cpu_set(int cpu);
> +
> extern void user_enter(void);
> extern void user_exit(void);
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 247084b..914da3f 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -527,7 +527,7 @@ config RCU_USER_QS
> config CONTEXT_TRACKING_FORCE
> bool "Force context tracking"
> depends on CONTEXT_TRACKING
> - default CONTEXT_TRACKING
> + default y if !NO_HZ_FULL

Why the if !NO_HZ_FULL?

That selects this anyway. Oh wait, you changed this.

> help
> Probe on user/kernel boundaries by default in order to
> test the features that rely on it such as userspace RCU extended
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index 7b095de..72bcb25 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -26,6 +26,11 @@ DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
> #endif
> };
>
> +void context_tracking_cpu_set(int cpu)
> +{
> + per_cpu(context_tracking.active, cpu) = true;
> +}
> +
> /**
> * user_enter - Inform the context tracking that the CPU is going to
> * enter userspace mode.
> diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
> index 70f27e8..747bbc7 100644
> --- a/kernel/time/Kconfig
> +++ b/kernel/time/Kconfig
> @@ -105,7 +105,6 @@ config NO_HZ_FULL
> select RCU_USER_QS
> select RCU_NOCB_CPU
> select VIRT_CPU_ACCOUNTING_GEN
> - select CONTEXT_TRACKING_FORCE

OK, I'm confused.

-- Steve

> select IRQ_WORK
> help
> Adaptively try to shutdown the tick whenever possible, even when
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 088c411..062759e 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -23,6 +23,7 @@
> #include <linux/irq_work.h>
> #include <linux/posix-timers.h>
> #include <linux/perf_event.h>
> +#include <linux/context_tracking.h>
>
> #include <asm/irq_regs.h>
>
> @@ -344,11 +345,16 @@ static int tick_nohz_init_all(void)
>
> void __init tick_nohz_init(void)
> {
> + int cpu;
> +
> if (!have_nohz_full_mask) {
> if (tick_nohz_init_all() < 0)
> return;
> }
>
> + for_each_cpu(cpu, nohz_full_mask)
> + context_tracking_cpu_set(cpu);
> +
> cpu_notifier(tick_nohz_cpu_down_callback, 0);
> cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), nohz_full_mask);
> pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);

2013-07-18 21:31:01

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 05/18] vtime: Update a few comments

On Wed, Jul 17, 2013 at 01:57:51PM -0400, Steven Rostedt wrote:
> On Wed, 2013-07-17 at 18:44 +0200, Frederic Weisbecker wrote:
> > Update a stale comment from the old vtime era and document some
> > locking that might be non obvious.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Paul E. McKenney <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Li Zhong <[email protected]>
> > Cc: Mike Galbraith <[email protected]>
> > Cc: Kevin Hilman <[email protected]>
> > ---
> > include/linux/context_tracking.h | 8 ++------
> > kernel/sched/cputime.c | 7 +++++++
> > 2 files changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> > index 5984f25..12045ce 100644
> > --- a/include/linux/context_tracking.h
> > +++ b/include/linux/context_tracking.h
> > @@ -72,8 +72,8 @@ extern void guest_exit(void);
> > static inline void guest_enter(void)
> > {
> > /*
> > - * This is running in ioctl context so we can avoid
> > - * the call to vtime_account() with its unnecessary idle check.
> > + * This is running in ioctl context so its safe
> > + * to assume the pending cputime to flush is stime.
>
> The above is worded funny. What about:
>
> "This is running in ioctl context so its safe to assume that its the
> stime pending cputime to flush"

May be. What I try to express is that we have some tickless
cputime that elapsed since the last snapshot and we need to know the nature
of that cputime in order to account it to the right stats: was it userspace,
system or idle cputime? We are in the kernel serving an ioctl syscall so
we know it's system, hence the vtime_account_system().

You are the native speaker and I wasn't listening in my english courses
so you definetly have the last word on this, I just want to be sure you
correctly understand why I try to say :)

>
> I don't know. But "is stime" is what made me have to read that three
> times to figure out what you meant.
>
>
> > */
> > vtime_account_system(current);
> > current->flags |= PF_VCPU;
> > @@ -81,10 +81,6 @@ static inline void guest_enter(void)
> >
> > static inline void guest_exit(void)
> > {
> > - /*
> > - * This is running in ioctl context so we can avoid
> > - * the call to vtime_account() with its unnecessary idle check.
> > - */
>
> Should we copy the comment here too?

Yeah or a small reminder.

Thanks.

>
> -- Steve
>
> > vtime_account_system(current);
> > current->flags &= ~PF_VCPU;
> > }
> > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> > index a7959e0..223a35e 100644
> > --- a/kernel/sched/cputime.c
> > +++ b/kernel/sched/cputime.c
> > @@ -712,6 +712,13 @@ void vtime_user_enter(struct task_struct *tsk)
> >
> > void vtime_guest_enter(struct task_struct *tsk)
> > {
> > + /*
> > + * The flags must be updated under the lock with
> > + * the vtime_snap flush and update.
> > + * That enforces a right ordering and update sequence
> > + * synchronization against the reader (task_gtime())
> > + * that can thus safely catch up with a tickless delta.
> > + */
> > write_seqlock(&tsk->vtime_seqlock);
> > __vtime_account_system(tsk);
> > current->flags |= PF_VCPU;
>
>

2013-07-18 22:13:53

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 07/18] nohz: Selectively enable context tracking on full dynticks CPUs

On Wed, Jul 17, 2013 at 02:27:17PM -0400, Steven Rostedt wrote:
> On Wed, 2013-07-17 at 18:44 +0200, Frederic Weisbecker wrote:
> > The code is ready to do so in the context tracking subsystem, now
>
> "do so"? Do what?

It's referring to the patch title. The code is ready to selectively
enable context tracking on the CPUs.

I see many changelogs that use that kind of style where the title
of the patch is considered as the 1st line of the changelog. That's
convenient because it avoids the need to rephrase the title in the
changelog.

But may be the reference to the title is not obvious. if you prefer
I can expand the "do so" here.

>
> > we just need to pass our cpu range selection to it from the
>
> Pass cpu range selection to what?
>
> Pronouns are evil in technical documentation.

How about:

"""
The code in the context tracking subsystem is ready to selectively
enable its tracking on specificied CPU ranges instead of inconditionally
force it on all CPUs.

What we need to do now is to pass the desired CPU ranges to track from
the full dynticks subsystem, according to the ranges specified in the
"nohz_full=" boot option.
"""

> > diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> > index 12045ce..2c2b73aa 100644
> > --- a/include/linux/context_tracking.h
> > +++ b/include/linux/context_tracking.h
> > @@ -34,6 +34,8 @@ static inline bool context_tracking_active(void)
> > return __this_cpu_read(context_tracking.active);
> > }
> >
> > +extern void context_tracking_cpu_set(int cpu);
> > +
> > extern void user_enter(void);
> > extern void user_exit(void);
> >
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 247084b..914da3f 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -527,7 +527,7 @@ config RCU_USER_QS
> > config CONTEXT_TRACKING_FORCE
> > bool "Force context tracking"
> > depends on CONTEXT_TRACKING
> > - default CONTEXT_TRACKING
> > + default y if !NO_HZ_FULL
>
> Why the if !NO_HZ_FULL?
>
> That selects this anyway. Oh wait, you changed this.

Yeah that's probably confusing. Ok lets consider a system with:

CONTEXT_TRACKING=y

By default it doesn't track any CPU, it's inactive unless you set:

CONTEXT_TRACKING_FORCE=y

In this case, all CPUs are tracked.

The full dynticks subsystem was supposed to pass its CPU range to context
tracking such that it activates the tracking only on the relevant CPUs.

But the context tracking code was merged before full dynticks. So nothing
was there to enabled CPUs on context tracking initially. So we needed
CONTEXT_TRACKING_FORCE for testing.

Then later we merged full dynticks. But we got lazy and rushed and instead of
selecting the CPUs to track on runtime from the full dynticks subsystem to
the context tracking subsystem, we forced CONTEXT_TRACKING_FORCE=y when
NO_HZ_FULL=y. Then using runtime selection became a TODO.

Now these patches handle that TODO and full dynticks passes its range to
contex tracking.

Now one could argue why we keep CONTEXT_TRACKING_FORCE around, since we
have full dynticks and NO_HZ_FULL_ALL for wide automated testing.

This is because CONTEXT_TRACKING is not sufficient for NO_HZ_FULL alone.
Especially because of the 64bits requirement that I need to drop after
careful review of any use of cputime_t. But anyway, CONTEXT_TRACKING_FORCE
is still handy to keep around for archs that want support for nohz full
but don't yet meet all dependencies.

Thanks.

2013-07-18 22:52:03

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 07/18] nohz: Selectively enable context tracking on full dynticks CPUs

On Fri, 2013-07-19 at 00:13 +0200, Frederic Weisbecker wrote:
> On Wed, Jul 17, 2013 at 02:27:17PM -0400, Steven Rostedt wrote:
> > On Wed, 2013-07-17 at 18:44 +0200, Frederic Weisbecker wrote:
> > > The code is ready to do so in the context tracking subsystem, now
> >
> > "do so"? Do what?
>
> It's referring to the patch title. The code is ready to selectively
> enable context tracking on the CPUs.
>
> I see many changelogs that use that kind of style where the title
> of the patch is considered as the 1st line of the changelog. That's
> convenient because it avoids the need to rephrase the title in the
> changelog.
>
> But may be the reference to the title is not obvious. if you prefer
> I can expand the "do so" here.

Yeah, I've seen that too. But this was a bit too subtle to get it. The
subject is a bit vague as well, which doesn't help the matter. What does
"selectively enable context tracking" mean? How is it selective?

>
> >
> > > we just need to pass our cpu range selection to it from the
> >
> > Pass cpu range selection to what?
> >
> > Pronouns are evil in technical documentation.
>
> How about:
>
> """
> The code in the context tracking subsystem is ready to selectively
> enable its tracking on specificied CPU ranges instead of inconditionally

"specified" "unconditionally"

> force it on all CPUs.
>
> What we need to do now is to pass the desired CPU ranges to track from
> the full dynticks subsystem, according to the ranges specified in the
> "nohz_full=" boot option.
> """
>
> > > diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> > > index 12045ce..2c2b73aa 100644
> > > --- a/include/linux/context_tracking.h
> > > +++ b/include/linux/context_tracking.h
> > > @@ -34,6 +34,8 @@ static inline bool context_tracking_active(void)
> > > return __this_cpu_read(context_tracking.active);
> > > }
> > >
> > > +extern void context_tracking_cpu_set(int cpu);
> > > +
> > > extern void user_enter(void);
> > > extern void user_exit(void);
> > >
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index 247084b..914da3f 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -527,7 +527,7 @@ config RCU_USER_QS
> > > config CONTEXT_TRACKING_FORCE
> > > bool "Force context tracking"
> > > depends on CONTEXT_TRACKING
> > > - default CONTEXT_TRACKING
> > > + default y if !NO_HZ_FULL
> >
> > Why the if !NO_HZ_FULL?
> >
> > That selects this anyway. Oh wait, you changed this.
>
> Yeah that's probably confusing. Ok lets consider a system with:
>
> CONTEXT_TRACKING=y
>
> By default it doesn't track any CPU, it's inactive unless you set:
>
> CONTEXT_TRACKING_FORCE=y
>
> In this case, all CPUs are tracked.
>
> The full dynticks subsystem was supposed to pass its CPU range to context
> tracking such that it activates the tracking only on the relevant CPUs.
>
> But the context tracking code was merged before full dynticks. So nothing
> was there to enabled CPUs on context tracking initially. So we needed
> CONTEXT_TRACKING_FORCE for testing.
>
> Then later we merged full dynticks. But we got lazy and rushed and instead of
> selecting the CPUs to track on runtime from the full dynticks subsystem to
> the context tracking subsystem, we forced CONTEXT_TRACKING_FORCE=y when
> NO_HZ_FULL=y. Then using runtime selection became a TODO.
>
> Now these patches handle that TODO and full dynticks passes its range to
> contex tracking.
>
> Now one could argue why we keep CONTEXT_TRACKING_FORCE around, since we
> have full dynticks and NO_HZ_FULL_ALL for wide automated testing.
>
> This is because CONTEXT_TRACKING is not sufficient for NO_HZ_FULL alone.
> Especially because of the 64bits requirement that I need to drop after
> careful review of any use of cputime_t. But anyway, CONTEXT_TRACKING_FORCE
> is still handy to keep around for archs that want support for nohz full
> but don't yet meet all dependencies.

OK, that needs a comment in the Kconfig. Perhaps something like:

"CONTEXT_TRACKING is only needed by NO_HZ_FULL, but the user may want to
test CONTEXT_TRACKING on systems that do not yet support NO_HZ_FULL, in
which case we must keep the FORCE to enable it."

Or something to that nature. That way people have a clue to why that's
like that.

-- Steve

2013-07-19 14:19:30

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 07/18] nohz: Selectively enable context tracking on full dynticks CPUs

On Thu, Jul 18, 2013 at 06:51:57PM -0400, Steven Rostedt wrote:
> On Fri, 2013-07-19 at 00:13 +0200, Frederic Weisbecker wrote:
> > On Wed, Jul 17, 2013 at 02:27:17PM -0400, Steven Rostedt wrote:
> > > On Wed, 2013-07-17 at 18:44 +0200, Frederic Weisbecker wrote:
> > > > The code is ready to do so in the context tracking subsystem, now
> > >
> > > "do so"? Do what?
> >
> > It's referring to the patch title. The code is ready to selectively
> > enable context tracking on the CPUs.
> >
> > I see many changelogs that use that kind of style where the title
> > of the patch is considered as the 1st line of the changelog. That's
> > convenient because it avoids the need to rephrase the title in the
> > changelog.
> >
> > But may be the reference to the title is not obvious. if you prefer
> > I can expand the "do so" here.
>
> Yeah, I've seen that too. But this was a bit too subtle to get it. The
> subject is a bit vague as well, which doesn't help the matter. What does
> "selectively enable context tracking" mean? How is it selective?

Yeah selectively means here that it's enabled only on some CPUs, not
all of them.

>
> >
> > >
> > > > we just need to pass our cpu range selection to it from the
> > >
> > > Pass cpu range selection to what?
> > >
> > > Pronouns are evil in technical documentation.
> >
> > How about:
> >
> > """
> > The code in the context tracking subsystem is ready to selectively
> > enable its tracking on specificied CPU ranges instead of inconditionally
>
> "specified" "unconditionally"

Oops :)

>
> > force it on all CPUs.
> >
> > What we need to do now is to pass the desired CPU ranges to track from
> > the full dynticks subsystem, according to the ranges specified in the
> > "nohz_full=" boot option.
> > """
> >
> > > > diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> > > > index 12045ce..2c2b73aa 100644
> > > > --- a/include/linux/context_tracking.h
> > > > +++ b/include/linux/context_tracking.h
> > > > @@ -34,6 +34,8 @@ static inline bool context_tracking_active(void)
> > > > return __this_cpu_read(context_tracking.active);
> > > > }
> > > >
> > > > +extern void context_tracking_cpu_set(int cpu);
> > > > +
> > > > extern void user_enter(void);
> > > > extern void user_exit(void);
> > > >
> > > > diff --git a/init/Kconfig b/init/Kconfig
> > > > index 247084b..914da3f 100644
> > > > --- a/init/Kconfig
> > > > +++ b/init/Kconfig
> > > > @@ -527,7 +527,7 @@ config RCU_USER_QS
> > > > config CONTEXT_TRACKING_FORCE
> > > > bool "Force context tracking"
> > > > depends on CONTEXT_TRACKING
> > > > - default CONTEXT_TRACKING
> > > > + default y if !NO_HZ_FULL
> > >
> > > Why the if !NO_HZ_FULL?
> > >
> > > That selects this anyway. Oh wait, you changed this.
> >
> > Yeah that's probably confusing. Ok lets consider a system with:
> >
> > CONTEXT_TRACKING=y
> >
> > By default it doesn't track any CPU, it's inactive unless you set:
> >
> > CONTEXT_TRACKING_FORCE=y
> >
> > In this case, all CPUs are tracked.
> >
> > The full dynticks subsystem was supposed to pass its CPU range to context
> > tracking such that it activates the tracking only on the relevant CPUs.
> >
> > But the context tracking code was merged before full dynticks. So nothing
> > was there to enabled CPUs on context tracking initially. So we needed
> > CONTEXT_TRACKING_FORCE for testing.
> >
> > Then later we merged full dynticks. But we got lazy and rushed and instead of
> > selecting the CPUs to track on runtime from the full dynticks subsystem to
> > the context tracking subsystem, we forced CONTEXT_TRACKING_FORCE=y when
> > NO_HZ_FULL=y. Then using runtime selection became a TODO.
> >
> > Now these patches handle that TODO and full dynticks passes its range to
> > contex tracking.
> >
> > Now one could argue why we keep CONTEXT_TRACKING_FORCE around, since we
> > have full dynticks and NO_HZ_FULL_ALL for wide automated testing.
> >
> > This is because CONTEXT_TRACKING is not sufficient for NO_HZ_FULL alone.
> > Especially because of the 64bits requirement that I need to drop after
> > careful review of any use of cputime_t. But anyway, CONTEXT_TRACKING_FORCE
> > is still handy to keep around for archs that want support for nohz full
> > but don't yet meet all dependencies.
>
> OK, that needs a comment in the Kconfig. Perhaps something like:
>
> "CONTEXT_TRACKING is only needed by NO_HZ_FULL, but the user may want to
> test CONTEXT_TRACKING on systems that do not yet support NO_HZ_FULL, in
> which case we must keep the FORCE to enable it."
>
> Or something to that nature. That way people have a clue to why that's
> like that.

Ok, I'll add that.

Thanks!

>
> -- Steve
>
>