2013-04-22 18:59:56

by Frederic Weisbecker

[permalink] [raw]
Subject: [GIT PULL] nohz: Adaptively stop the tick, finally

Ingo,

Please pull the latest full dynticks branch that can found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/nohz

HEAD: 67826eae8c16dbf00c262be6ec15021bb42f69c4

This handles perf and CPUs that get more than one task and fix posix cpu timers
handling.

This can finally stop the tick. It boots and doesn't crash, as far as I tested.

Now what's left:

* Kick CPUs' tick when the clock is marked unstable

* Kick CPUs when they extend the RCU grace periods too much by staying in
the kernel for too long (we are discussing this with Paul).

* sched_class:task_tick(). There are gazillions statistics maintained there.
It's probably mostly about local and global fairness. May be for other stuff
too (cgroups, etc...).

* update_cpu_load_active(): again, various stats maintained there

* load balancing (see trigger_load_balance() usually called from the tick).

I hope we can handle these things progressively in the long run.

Thanks.

---
Frederic Weisbecker (10):
posix_timers: Fix pre-condition to stop the tick on full dynticks
perf: Kick full dynticks CPU if events rotation is needed
perf: New helper to prevent full dynticks CPUs from stopping tick
sched: Kick full dynticks CPU that have more than one task enqueued.
sched: New helper to prevent from stopping the tick in full dynticks
nohz: Re-evaluate the tick from the scheduler IPI
nohz: Implement full dynticks kick
nohz: Prepare to stop the tick on irq exit
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Disable the tick when irq resume in full dynticks CPU

include/linux/perf_event.h | 6 +++
include/linux/sched.h | 6 +++
include/linux/tick.h | 4 ++
kernel/events/core.c | 17 +++++++-
kernel/posix-cpu-timers.c | 6 +-
kernel/sched/core.c | 24 +++++++++++-
kernel/sched/sched.h | 11 +++++
kernel/softirq.c | 19 ++++++--
kernel/time/tick-sched.c | 95 ++++++++++++++++++++++++++++++++++++++-----
9 files changed, 167 insertions(+), 21 deletions(-)

--
1.7.5.4


2013-04-22 19:00:03

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 03/10] perf: New helper to prevent full dynticks CPUs from stopping tick

Provide a new helper that help full dynticks CPUs to prevent
from stopping their tick in case there are events in the local
rotation list.

This way we make sure that perf_event_task_tick() is serviced
on demand.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Jiri Olsa <[email protected]>
---
include/linux/perf_event.h | 6 ++++++
kernel/events/core.c | 10 ++++++++++
2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e47ee46..0140830 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -799,6 +799,12 @@ static inline int __perf_event_disable(void *info) { return -1; }
static inline void perf_event_task_tick(void) { }
#endif

+#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL)
+extern bool perf_event_can_stop_tick(void);
+#else
+static inline bool perf_event_can_stop_tick(void) { return true; }
+#endif
+
#define perf_output_put(handle, x) perf_output_copy((handle), &(x), sizeof(x))

/*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 75b58bb..ddb993b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2560,6 +2560,16 @@ done:
list_del_init(&cpuctx->rotation_list);
}

+#ifdef CONFIG_NO_HZ_FULL
+bool perf_event_can_stop_tick(void)
+{
+ if (list_empty(&__get_cpu_var(rotation_list)))
+ return true;
+ else
+ return false;
+}
+#endif
+
void perf_event_task_tick(void)
{
struct list_head *head = &__get_cpu_var(rotation_list);
--
1.7.5.4

2013-04-22 19:00:18

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 06/10] nohz: Re-evaluate the tick from the scheduler IPI

The scheduler IPI is used by the scheduler to kick
full dynticks CPUs asynchronously when more than one
task are running or when a new timer list timer is
enqueued. This way the destination CPU can decide
to restart the tick to handle this new situation.

Now let's call that kick in the scheduler IPI.

(Reusing the scheduler IPI rather than implementing
a new IPI was suggested by Peter Zijlstra a while ago)

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
include/linux/tick.h | 2 ++
kernel/sched/core.c | 4 +++-
kernel/time/tick-sched.c | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index d290168..e31e676 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -160,11 +160,13 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
#ifdef CONFIG_NO_HZ_FULL
extern void tick_nohz_init(void);
extern int tick_nohz_full_cpu(int cpu);
+extern void tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_all(void);
#else
static inline void tick_nohz_init(void) { }
static inline int tick_nohz_full_cpu(int cpu) { return 0; }
+static inline void tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick(void) { }
static inline void tick_nohz_full_kick_all(void) { }
#endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 69f7133..9ad3500 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1398,7 +1398,8 @@ static void sched_ttwu_pending(void)

void scheduler_ipi(void)
{
- if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
+ if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick()
+ && !tick_nohz_full_cpu(smp_processor_id()))
return;

/*
@@ -1415,6 +1416,7 @@ void scheduler_ipi(void)
* somewhat pessimize the simple resched case.
*/
irq_enter();
+ tick_nohz_full_check();
sched_ttwu_pending();

/*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 884a9f3..4d74a68 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -151,7 +151,7 @@ bool have_nohz_full_mask;
* Re-evaluate the need for the tick on the current CPU
* and restart it if necessary.
*/
-static void tick_nohz_full_check(void)
+void tick_nohz_full_check(void)
{
/*
* STUB for now, will be filled with the full tick stop/restart
--
1.7.5.4

2013-04-22 19:00:16

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 09/10] nohz: Re-evaluate the tick for the new task after a context switch

When a task is scheduled in, it may have some properties
of its own that could make the CPU reconsider the need for
the tick: posix cpu timers, perf events, ...

So notify the full dynticks subsystem when a task gets
scheduled in and re-check the tick dependency at this
stage. This is done through a self IPI to avoid messing
up with any current lock scenario.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
include/linux/tick.h | 2 ++
kernel/sched/core.c | 2 ++
kernel/time/tick-sched.c | 20 ++++++++++++++++++++
3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index e31e676..9180f4b 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -163,12 +163,14 @@ extern int tick_nohz_full_cpu(int cpu);
extern void tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_all(void);
+extern void tick_nohz_task_switch(struct task_struct *tsk);
#else
static inline void tick_nohz_init(void) { }
static inline int tick_nohz_full_cpu(int cpu) { return 0; }
static inline void tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick(void) { }
static inline void tick_nohz_full_kick_all(void) { }
+static inline void tick_nohz_task_switch(struct task_struct *tsk) { }
#endif


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9ad3500..dd09def 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1896,6 +1896,8 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
kprobe_flush_task(prev);
put_task_struct(prev);
}
+
+ tick_nohz_task_switch(current);
}

#ifdef CONFIG_SMP
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d0ed190..12a900d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -232,6 +232,26 @@ void tick_nohz_full_kick_all(void)
preempt_enable();
}

+/*
+ * Re-evaluate the need for the tick as we switch the current task.
+ * It might need the tick due to per task/process properties:
+ * perf events, posix cpu timers, ...
+ */
+void tick_nohz_task_switch(struct task_struct *tsk)
+{
+ unsigned long flags;
+
+ if (!tick_nohz_full_cpu(smp_processor_id()))
+ return;
+
+ local_irq_save(flags);
+
+ if (tick_nohz_tick_stopped() && !can_stop_full_tick())
+ tick_nohz_full_kick();
+
+ local_irq_restore(flags);
+}
+
int tick_nohz_full_cpu(int cpu)
{
if (!have_nohz_full_mask)
--
1.7.5.4

2013-04-22 19:00:10

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 07/10] nohz: Implement full dynticks kick

Implement the full dynticks kick that is performed from
IPIs sent by various subsystems (scheduler, posix timers, ...)
when they want to notify about a new event that may
reconsider the dependency on the tick.

Most of the time, such an event end up restarting the tick.

(Part of the design with subsystems providing *_can_stop_tick()
helpers suggested by Peter Zijlstra a while ago).

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/time/tick-sched.c | 42 ++++++++++++++++++++++++++++++++++++++----
1 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 4d74a68..95d79ae 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -21,6 +21,8 @@
#include <linux/sched.h>
#include <linux/module.h>
#include <linux/irq_work.h>
+#include <linux/posix-timers.h>
+#include <linux/perf_event.h>

#include <asm/irq_regs.h>

@@ -147,16 +149,48 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
static cpumask_var_t nohz_full_mask;
bool have_nohz_full_mask;

+static bool can_stop_full_tick(void)
+{
+ WARN_ON_ONCE(!irqs_disabled());
+
+ if (!sched_can_stop_tick())
+ return false;
+
+ if (!posix_cpu_timers_can_stop_tick(current))
+ return false;
+
+ if (!perf_event_can_stop_tick())
+ return false;
+
+ /* sched_clock_tick() needs us? */
+#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
+ /*
+ * TODO: kick full dynticks CPUs when
+ * sched_clock_stable is set.
+ */
+ if (!sched_clock_stable)
+ return false;
+#endif
+
+ return true;
+}
+
+static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now);
+
/*
* Re-evaluate the need for the tick on the current CPU
* and restart it if necessary.
*/
void tick_nohz_full_check(void)
{
- /*
- * STUB for now, will be filled with the full tick stop/restart
- * infrastructure patches
- */
+ struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+ if (tick_nohz_full_cpu(smp_processor_id())) {
+ if (ts->tick_stopped && !is_idle_task(current)) {
+ if (!can_stop_full_tick())
+ tick_nohz_restart_sched_tick(ts, ktime_get());
+ }
+ }
}

static void nohz_full_kick_work_func(struct irq_work *work)
--
1.7.5.4

2013-04-22 19:00:55

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 10/10] nohz: Disable the tick when irq resume in full dynticks CPU

Eventually try to disable tick on irq exit, now that the
fundamental infrastructure is in place.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/softirq.c | 19 ++++++++++++++-----
1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index de15813..8b1446d 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -337,6 +337,19 @@ static inline void invoke_softirq(void)
}
}

+static inline void tick_irq_exit(void)
+{
+#ifdef CONFIG_NO_HZ_COMMON
+ int cpu = smp_processor_id();
+
+ /* Make sure that timer wheel updates are propagated */
+ if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+ if (!in_interrupt())
+ tick_nohz_irq_exit();
+ }
+#endif
+}
+
/*
* Exit an interrupt context. Process softirqs if needed and possible:
*/
@@ -348,11 +361,7 @@ void irq_exit(void)
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();

-#ifdef CONFIG_NO_HZ_COMMON
- /* Make sure that timer wheel updates are propagated */
- if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
- tick_nohz_irq_exit();
-#endif
+ tick_irq_exit();
rcu_irq_exit();
sched_preempt_enable_no_resched();
}
--
1.7.5.4

2013-04-22 19:01:12

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 08/10] nohz: Prepare to stop the tick on irq exit

Interrupt exit is a natural place to stop the tick: it happens
after all events happening before and during the irq which
are liable to update the dependency on the tick occured. Also
it makes sure that any check on tick dependency is well ordered
against dynticks kick IPIs.

Bring in the infrastructure that performs the tick dependency
checks on irq exit and shut it down if these checks show that we
can do it safely.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/time/tick-sched.c | 31 +++++++++++++++++++++++++------
1 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 95d79ae..d0ed190 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -647,6 +647,24 @@ out:
return ret;
}

+static void tick_nohz_full_stop_tick(struct tick_sched *ts)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ int cpu = smp_processor_id();
+
+ if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
+ return;
+
+ if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
+ return;
+
+ if (!can_stop_full_tick())
+ return;
+
+ tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+#endif
+}
+
static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
{
/*
@@ -773,12 +791,13 @@ void tick_nohz_irq_exit(void)
{
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);

- if (!ts->inidle)
- return;
-
- /* Cancel the timer because CPU already waken up from the C-states*/
- menu_hrtimer_cancel();
- __tick_nohz_idle_enter(ts);
+ if (ts->inidle) {
+ /* Cancel the timer because CPU already waken up from the C-states*/
+ menu_hrtimer_cancel();
+ __tick_nohz_idle_enter(ts);
+ } else {
+ tick_nohz_full_stop_tick(ts);
+ }
}

/**
--
1.7.5.4

2013-04-22 19:00:07

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 04/10] sched: Kick full dynticks CPU that have more than one task enqueued.

Kick the tick on full dynticks CPUs when they get more
than one task running on their queue. This makes sure that
local fairness is maintained by the tick on the destination.

This is done regardless of these tasks' class. We should
be able to be more clever in the future depending on these. eg:
a CPU that runs a SCHED_FIFO task doesn't need to maintain
fairness against local pending tasks of the fair class.

But keep things simple for now.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/sched/sched.h | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 889904d..eb363aa 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -5,6 +5,7 @@
#include <linux/mutex.h>
#include <linux/spinlock.h>
#include <linux/stop_machine.h>
+#include <linux/tick.h>

#include "cpupri.h"

@@ -1106,6 +1107,16 @@ static inline u64 steal_ticks(u64 steal)
static inline void inc_nr_running(struct rq *rq)
{
rq->nr_running++;
+
+#ifdef CONFIG_NO_HZ_FULL
+ if (rq->nr_running == 2) {
+ if (tick_nohz_full_cpu(rq->cpu)) {
+ /* Order rq->nr_running write against the IPI */
+ smp_wmb();
+ smp_send_reschedule(rq->cpu);
+ }
+ }
+#endif
}

static inline void dec_nr_running(struct rq *rq)
--
1.7.5.4

2013-04-22 19:01:53

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 05/10] sched: New helper to prevent from stopping the tick in full dynticks

Provide a new helper to be called from the full dynticks engine
before stopping the tick in order to make sure we don't stop
it when there is more than one task running on the CPU.

This way we make sure that the tick stays alive to maintain
fairness.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
include/linux/sched.h | 6 ++++++
kernel/sched/core.c | 18 ++++++++++++++++++
2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1ff9e0a..a74aded 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1856,6 +1856,12 @@ extern void wake_up_nohz_cpu(int cpu);
static inline void wake_up_nohz_cpu(int cpu) { }
#endif

+#ifdef CONFIG_NO_HZ_FULL
+extern bool sched_can_stop_tick(void);
+#else
+static inline bool sched_can_stop_tick(void) { return false; }
+#endif
+
#ifdef CONFIG_SCHED_AUTOGROUP
extern void sched_autogroup_create_attach(struct task_struct *p);
extern void sched_autogroup_detach(struct task_struct *p);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0f0a5b3..69f7133 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -650,6 +650,24 @@ static inline bool got_nohz_idle_kick(void)

#endif /* CONFIG_NO_HZ_COMMON */

+#ifdef CONFIG_NO_HZ_FULL
+bool sched_can_stop_tick(void)
+{
+ struct rq *rq;
+
+ rq = this_rq();
+
+ /* Make sure rq->nr_running update is visible after the IPI */
+ smp_rmb();
+
+ /* More than one running task need preemption */
+ if (rq->nr_running > 1)
+ return false;
+
+ return true;
+}
+#endif /* CONFIG_NO_HZ_FULL */
+
void sched_avg_update(struct rq *rq)
{
s64 period = sched_avg_period();
--
1.7.5.4

2013-04-22 19:00:00

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 02/10] perf: Kick full dynticks CPU if events rotation is needed

Kick the current CPU's tick by sending it a self IPI when
an event is queued on the rotation list and it is the first
element inserted. This makes sure that perf_event_task_tick()
works on full dynticks CPUs.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Jiri Olsa <[email protected]>
---
kernel/events/core.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b0cd865..75b58bb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -18,6 +18,7 @@
#include <linux/poll.h>
#include <linux/slab.h>
#include <linux/hash.h>
+#include <linux/tick.h>
#include <linux/sysfs.h>
#include <linux/dcache.h>
#include <linux/percpu.h>
@@ -655,8 +656,12 @@ static void perf_pmu_rotate_start(struct pmu *pmu)

WARN_ON(!irqs_disabled());

- if (list_empty(&cpuctx->rotation_list))
+ if (list_empty(&cpuctx->rotation_list)) {
+ int was_empty = list_empty(head);
list_add(&cpuctx->rotation_list, head);
+ if (was_empty)
+ tick_nohz_full_kick();
+ }
}

static void get_ctx(struct perf_event_context *ctx)
--
1.7.5.4

2013-04-22 19:02:41

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 01/10] posix_timers: Fix pre-condition to stop the tick on full dynticks

The test that checks if a CPU can stop its tick from posix CPU
timers angle was mistakenly inverted.

What we want is to prevent the tick from being stopped as long
as the current CPU's task runs a posix CPU timer.

Fix this.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/posix-cpu-timers.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 84d5cb3..42670e9 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -673,12 +673,12 @@ static void posix_cpu_timer_kick_nohz(void)
bool posix_cpu_timers_can_stop_tick(struct task_struct *tsk)
{
if (!task_cputime_zero(&tsk->cputime_expires))
- return true;
+ return false;

if (tsk->signal->cputimer.running)
- return true;
+ return false;

- return false;
+ return true;
}
#else
static inline void posix_cpu_timer_kick_nohz(void) { }
--
1.7.5.4

2013-04-24 07:33:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] nohz: Adaptively stop the tick, finally


* Frederic Weisbecker <[email protected]> wrote:

> Ingo,
>
> Please pull the latest full dynticks branch that can found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> timers/nohz
>
> HEAD: 67826eae8c16dbf00c262be6ec15021bb42f69c4
>
> This handles perf and CPUs that get more than one task and fix posix cpu timers
> handling.
>
> This can finally stop the tick. It boots and doesn't crash, as far as I tested.
>
> Now what's left:
>
> * Kick CPUs' tick when the clock is marked unstable
>
> * Kick CPUs when they extend the RCU grace periods too much by staying in
> the kernel for too long (we are discussing this with Paul).
>
> * sched_class:task_tick(). There are gazillions statistics maintained there.
> It's probably mostly about local and global fairness. May be for other stuff
> too (cgroups, etc...).
>
> * update_cpu_load_active(): again, various stats maintained there
>
> * load balancing (see trigger_load_balance() usually called from the tick).
>
> I hope we can handle these things progressively in the long run.
>
> Thanks.
>
> ---
> Frederic Weisbecker (10):
> posix_timers: Fix pre-condition to stop the tick on full dynticks
> perf: Kick full dynticks CPU if events rotation is needed
> perf: New helper to prevent full dynticks CPUs from stopping tick
> sched: Kick full dynticks CPU that have more than one task enqueued.
> sched: New helper to prevent from stopping the tick in full dynticks
> nohz: Re-evaluate the tick from the scheduler IPI
> nohz: Implement full dynticks kick
> nohz: Prepare to stop the tick on irq exit
> nohz: Re-evaluate the tick for the new task after a context switch
> nohz: Disable the tick when irq resume in full dynticks CPU
>
> include/linux/perf_event.h | 6 +++
> include/linux/sched.h | 6 +++
> include/linux/tick.h | 4 ++
> kernel/events/core.c | 17 +++++++-
> kernel/posix-cpu-timers.c | 6 +-
> kernel/sched/core.c | 24 +++++++++++-
> kernel/sched/sched.h | 11 +++++
> kernel/softirq.c | 19 ++++++--
> kernel/time/tick-sched.c | 95 ++++++++++++++++++++++++++++++++++++++-----
> 9 files changed, 167 insertions(+), 21 deletions(-)

Pulled, thanks Frederic!

One detail: 'make oldconfig' gave me:

Timer tick handling
1. Periodic timer ticks (constant rate, no dynticks) (HZ_PERIODIC) (NEW)
> 2. Idle dynticks system (tickless idle) (NO_HZ_IDLE) (NEW)

I.e. CONFIG_NO_HZ_IDLE is picked by default. The default should really be
CONFIG_HZ_PERIODIC - so that people can easily enable full dynticks but
are not defaulted into it unknowingly.

Thanks,

Ingo

2013-04-24 07:38:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] nohz: Adaptively stop the tick, finally


* Ingo Molnar <[email protected]> wrote:

> One detail: 'make oldconfig' gave me:
>
> Timer tick handling
> 1. Periodic timer ticks (constant rate, no dynticks) (HZ_PERIODIC) (NEW)
> > 2. Idle dynticks system (tickless idle) (NO_HZ_IDLE) (NEW)
>
> I.e. CONFIG_NO_HZ_IDLE is picked by default. The default should really be
> CONFIG_HZ_PERIODIC - so that people can easily enable full dynticks but
> are not defaulted into it unknowingly.

Oh, I got confused by the artificial hiding of NO_HZ_FULL again. Why is it
still hidden? I have a fairly generic config, yet it was not offered. I
bet most people won't ever see it!

Sigh, it's due to the dependency mess that I pointed out twice already:

depends on TREE_RCU || TREE_PREEMPT_RCU
depends on VIRT_CPU_ACCOUNTING_GEN

It should _really_ select both the RCU and the CPU time accounting model
automatically!

The selection of the dynticks mode certainly overrides RCU selection, and
it should for sure override some arcane, low level detail like the CPU
accounting model ...

Thanks,

Ingo

2013-04-24 14:50:44

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [GIT PULL] nohz: Adaptively stop the tick, finally

On Wed, Apr 24, 2013 at 09:38:52AM +0200, Ingo Molnar wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
> > One detail: 'make oldconfig' gave me:
> >
> > Timer tick handling
> > 1. Periodic timer ticks (constant rate, no dynticks) (HZ_PERIODIC) (NEW)
> > > 2. Idle dynticks system (tickless idle) (NO_HZ_IDLE) (NEW)
> >
> > I.e. CONFIG_NO_HZ_IDLE is picked by default. The default should really be
> > CONFIG_HZ_PERIODIC - so that people can easily enable full dynticks but
> > are not defaulted into it unknowingly.
>
> Oh, I got confused by the artificial hiding of NO_HZ_FULL again. Why is it
> still hidden? I have a fairly generic config, yet it was not offered. I
> bet most people won't ever see it!
>
> Sigh, it's due to the dependency mess that I pointed out twice already:
>
> depends on TREE_RCU || TREE_PREEMPT_RCU

Ok I just removed this one. This was the same as "depends on SMP" which we already
have.

> depends on VIRT_CPU_ACCOUNTING_GEN
>
> It should _really_ select both the RCU and the CPU time accounting model
> automatically!

Yeah I know. I have yet to fix that in Kconfig (it's a Kconfig limitation).
It's high on my TODO list.

>
> The selection of the dynticks mode certainly overrides RCU selection, and
> it should for sure override some arcane, low level detail like the CPU
> accounting model ...

Agreed, that was not intended to stay as is.

Thanks.

2013-04-25 06:28:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] nohz: Adaptively stop the tick, finally


* Frederic Weisbecker <[email protected]> wrote:

> > depends on VIRT_CPU_ACCOUNTING_GEN
> >
> > It should _really_ select both the RCU and the CPU time accounting model
> > automatically!
>
> Yeah I know. I have yet to fix that in Kconfig (it's a Kconfig limitation).

Why cannot we simply select it and its dependencies, explicitly, for the
time being? Something like:

depends on 64BIT
select VIRT_CPU_ACCOUNTING
select VIRT_CPU_ACCOUNTING_GEN

90% of the .config's have VIRT_CPU_ACCOUNTING_GEN turned off, because it's
a default-off feature - so dynticks-full is effectively hidden from the
large majority of testers...

Thanks,

Ingo