2015-06-11 17:36:16

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/8] tick/nohz: Tick dependency quick check + cleanups

Thomas suggested a few month ago to make the tick dependency check
more synchronous. Instead of checking asynchronously from each interrupt
if the tick can be stopped, every subsystem that may have a tick
dependency should set itself a flag specifying the state of that
dependency. This way we can verify if we can stop the tick with a single
lightweight mask check.

This patchset introduces that. There is still room for optimizations.

Also in a further series I would like to piggyback on the scheduler
interrupt to restart the tick in case of remote wakeups. The tick
restart from Irq exit prepares for that.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/core-v2

HEAD: e5470c6cd483edb77d6c8398b3f1479aa0eba215

Thanks,
Frederic
---

Frederic Weisbecker (8):
jiffies: Remove HZ > USEC_PER_SEC special case
apm32: Fix cputime == jiffies assumption
alpha: Fix jiffies based cputime assumption
nohz: Remove idle task special case
nohz: Restart the tick from irq exit
nohz: Move tick_nohz_restart_sched_tick() above its users
nohz: Evaluate tick dependency once on context switch
nohz: Remove useless argument on tick_nohz_task_switch()


arch/alpha/kernel/osf_sys.c | 13 +++--
arch/x86/kernel/apm_32.c | 2 +-
include/linux/tick.h | 16 ++----
kernel/sched/core.c | 2 +-
kernel/time/tick-sched.c | 127 ++++++++++++++++++++++----------------------
kernel/time/tick-sched.h | 6 +++
kernel/time/time.c | 14 ++---
7 files changed, 93 insertions(+), 87 deletions(-)


2015-06-11 17:36:22

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/8] jiffies: Remove HZ > USEC_PER_SEC special case

HZ never goes much further 1000 and a bit. And if we ever reach one tick
per microsecond, we might be having a problem.

Lets stop maintaining this special case, just leave a paranoid check.

Cc: Christoph Lameter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc; John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/time.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/time/time.c b/kernel/time/time.c
index 972e3bb..6a22980 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -268,10 +268,14 @@ EXPORT_SYMBOL(jiffies_to_msecs);

unsigned int jiffies_to_usecs(const unsigned long j)
{
-#if HZ <= USEC_PER_SEC && !(USEC_PER_SEC % HZ)
+ /*
+ * Hz usually doesn't go much further MSEC_PER_SEC.
+ * jiffies_to_usecs() and usecs_to_jiffies() depend on that.
+ */
+ BUILD_BUG_ON(HZ > USEC_PER_SEC);
+
+#if !(USEC_PER_SEC % HZ)
return (USEC_PER_SEC / HZ) * j;
-#elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
- return (j + (HZ / USEC_PER_SEC) - 1)/(HZ / USEC_PER_SEC);
#else
# if BITS_PER_LONG == 32
return (HZ_TO_USEC_MUL32 * j) >> HZ_TO_USEC_SHR32;
@@ -526,10 +530,8 @@ unsigned long usecs_to_jiffies(const unsigned int u)
{
if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
return MAX_JIFFY_OFFSET;
-#if HZ <= USEC_PER_SEC && !(USEC_PER_SEC % HZ)
+#if !(USEC_PER_SEC % HZ)
return (u + (USEC_PER_SEC / HZ) - 1) / (USEC_PER_SEC / HZ);
-#elif HZ > USEC_PER_SEC && !(HZ % USEC_PER_SEC)
- return u * (HZ / USEC_PER_SEC);
#else
return (USEC_TO_HZ_MUL32 * u + USEC_TO_HZ_ADJ32)
>> USEC_TO_HZ_SHR32;
--
2.1.4

2015-06-11 17:36:28

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/8] apm32: Fix cputime == jiffies assumption

That code wrongly assumes that cputime_t wraps jiffies_t. Lets use
the correct accessors/mutators.

No real harm now as that code can't be used with full dynticks.

Cc: Christoph Lameter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc; John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/apm_32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index 927ec92..052c9c3 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -919,7 +919,7 @@ recalc:
} else if (jiffies_since_last_check > idle_period) {
unsigned int idle_percentage;

- idle_percentage = stime - last_stime;
+ idle_percentage = cputime_to_jiffies(stime - last_stime);
idle_percentage *= 100;
idle_percentage /= jiffies_since_last_check;
use_apm_idle = (idle_percentage > idle_threshold);
--
2.1.4

2015-06-11 17:48:52

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/8] alpha: Fix jiffies based cputime assumption

That code wrongly assumes that cputime_t wraps jiffies_t. Lets use
the correct accessors/mutators.

In practice there should be no harm yet because alpha currently
only support tick based cputime accounting which is always jiffies
based.

Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc; John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/alpha/kernel/osf_sys.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index e51f578..5f6da80 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1139,6 +1139,7 @@ SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
{
struct rusage32 r;
cputime_t utime, stime;
+ unsigned long utime_jiffies, stime_jiffies;

if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN)
return -EINVAL;
@@ -1147,14 +1148,18 @@ SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
switch (who) {
case RUSAGE_SELF:
task_cputime(current, &utime, &stime);
- jiffies_to_timeval32(utime, &r.ru_utime);
- jiffies_to_timeval32(stime, &r.ru_stime);
+ utime_jiffies = cputime_to_jiffies(utime);
+ stime_jiffies = cputime_to_jiffies(stime);
+ jiffies_to_timeval32(utime_jiffies, &r.ru_utime);
+ jiffies_to_timeval32(stime_jiffies, &r.ru_stime);
r.ru_minflt = current->min_flt;
r.ru_majflt = current->maj_flt;
break;
case RUSAGE_CHILDREN:
- jiffies_to_timeval32(current->signal->cutime, &r.ru_utime);
- jiffies_to_timeval32(current->signal->cstime, &r.ru_stime);
+ utime_jiffies = cputime_to_jiffies(current->signal->cutime);
+ stime_jiffies = cputime_to_jiffies(current->signal->cstime);
+ jiffies_to_timeval32(utime_jiffies, &r.ru_utime);
+ jiffies_to_timeval32(stime_jiffies, &r.ru_stime);
r.ru_minflt = current->signal->cmin_flt;
r.ru_majflt = current->signal->cmaj_flt;
break;
--
2.1.4

2015-06-11 17:36:32

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 4/8] nohz: Remove idle task special case

This is a leftover from old days to avoid conflicts with dynticks idle
code. Now full dynticks and idle dynticks are better integrated and
interact without known issue.

So lets remove it.

Cc: Christoph Lameter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc; John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/tick-sched.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 812f7a3..324482f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -208,10 +208,8 @@ void __tick_nohz_full_check(void)
struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);

if (tick_nohz_full_cpu(smp_processor_id())) {
- if (ts->tick_stopped && !is_idle_task(current)) {
- if (!can_stop_full_tick())
- tick_nohz_restart_sched_tick(ts, ktime_get());
- }
+ if (ts->tick_stopped && !can_stop_full_tick())
+ tick_nohz_restart_sched_tick(ts, ktime_get());
}
}

@@ -710,7 +708,7 @@ static void tick_nohz_full_stop_tick(struct tick_sched *ts)
#ifdef CONFIG_NO_HZ_FULL
int cpu = smp_processor_id();

- if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
+ if (!tick_nohz_full_cpu(cpu))
return;

if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
--
2.1.4

2015-06-11 17:48:35

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 5/8] nohz: Restart the tick from irq exit

Restart the tick when necessary from the irq exit path. It makes nohz
full more flexible and allow it to piggyback the tick restart on the
scheduler IPI in the future instead of sending a dedicated IPI that
often doubles the scheduler IPI on task wakeup. This will require
careful review of resched_curr() callers.

Cc: Christoph Lameter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc; John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/tick.h | 8 --------
kernel/time/tick-sched.c | 34 ++++++++++------------------------
2 files changed, 10 insertions(+), 32 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 4191b56..390559e 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -140,7 +140,6 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
cpumask_or(mask, mask, tick_nohz_full_mask);
}

-extern void __tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_cpu(int cpu);
extern void tick_nohz_full_kick_all(void);
@@ -149,7 +148,6 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
-static inline void __tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void tick_nohz_full_kick(void) { }
static inline void tick_nohz_full_kick_all(void) { }
@@ -174,12 +172,6 @@ static inline void housekeeping_affine(struct task_struct *t)
#endif
}

-static inline void tick_nohz_full_check(void)
-{
- if (tick_nohz_full_enabled())
- __tick_nohz_full_check();
-}
-
static inline void tick_nohz_task_switch(struct task_struct *tsk)
{
if (tick_nohz_full_enabled())
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 324482f..7cf37eb 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -197,25 +197,9 @@ static bool can_stop_full_tick(void)
return true;
}

-static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now);
-
-/*
- * Re-evaluate the need for the tick on the current CPU
- * and restart it if necessary.
- */
-void __tick_nohz_full_check(void)
-{
- struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
-
- if (tick_nohz_full_cpu(smp_processor_id())) {
- if (ts->tick_stopped && !can_stop_full_tick())
- tick_nohz_restart_sched_tick(ts, ktime_get());
- }
-}
-
static void nohz_full_kick_work_func(struct irq_work *work)
{
- __tick_nohz_full_check();
+ /* Empty, the tick restart happens on tick_nohz_irq_exit() */
}

static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
@@ -250,7 +234,7 @@ void tick_nohz_full_kick_cpu(int cpu)

static void nohz_full_kick_ipi(void *info)
{
- __tick_nohz_full_check();
+ /* Empty, the tick restart happens on tick_nohz_irq_exit() */
}

/*
@@ -703,7 +687,9 @@ out:
return tick;
}

-static void tick_nohz_full_stop_tick(struct tick_sched *ts)
+static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now);
+
+static void tick_nohz_full_update_tick(struct tick_sched *ts)
{
#ifdef CONFIG_NO_HZ_FULL
int cpu = smp_processor_id();
@@ -714,10 +700,10 @@ static void tick_nohz_full_stop_tick(struct tick_sched *ts)
if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
return;

- if (!can_stop_full_tick())
- return;
-
- tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+ if (can_stop_full_tick())
+ tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+ else if (ts->tick_stopped)
+ tick_nohz_restart_sched_tick(ts, ktime_get());
#endif
}

@@ -847,7 +833,7 @@ void tick_nohz_irq_exit(void)
if (ts->inidle)
__tick_nohz_idle_enter(ts);
else
- tick_nohz_full_stop_tick(ts);
+ tick_nohz_full_update_tick(ts);
}

/**
--
2.1.4

2015-06-11 17:48:17

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 6/8] nohz: Move tick_nohz_restart_sched_tick() above its users

Fix the function declaration/definition dance.

Cc: Christoph Lameter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc; John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/tick-sched.c | 34 ++++++++++++++++------------------
1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 7cf37eb..8acaab5 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -687,7 +687,22 @@ out:
return tick;
}

-static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now);
+static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
+{
+ /* Update jiffies first */
+ tick_do_update_jiffies64(now);
+ update_cpu_load_nohz();
+
+ calc_load_exit_idle();
+ touch_softlockup_watchdog();
+ /*
+ * Cancel the scheduled timer and restore the tick
+ */
+ ts->tick_stopped = 0;
+ ts->idle_exittime = now;
+
+ tick_nohz_restart(ts, now);
+}

static void tick_nohz_full_update_tick(struct tick_sched *ts)
{
@@ -848,23 +863,6 @@ ktime_t tick_nohz_get_sleep_length(void)
return ts->sleep_length;
}

-static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
-{
- /* Update jiffies first */
- tick_do_update_jiffies64(now);
- update_cpu_load_nohz();
-
- calc_load_exit_idle();
- touch_softlockup_watchdog();
- /*
- * Cancel the scheduled timer and restore the tick
- */
- ts->tick_stopped = 0;
- ts->idle_exittime = now;
-
- tick_nohz_restart(ts, now);
-}
-
static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
{
#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
--
2.1.4

2015-06-11 17:36:35

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 7/8] nohz: Evaluate tick dependency once on context switch

The tick dependency is evaluated on every irq. This is a batch of checks
which determine whether it is safe to stop the tick or not. These checks
are often split in many details: posix cpu timers, scheduler, sched clock,
perf events. Each of which are made of smaller details: posix cpu
timer involves checking process wide timers then thread wide timers. Perf
involves checking freq events then more per cpu details.

Checking these details every time we update the full dynticks state
bring avoidable overhead.

So lets evaluate these dependencies once on context switch. Then the
further dependency checks will be performed through a single state check.

This is a first step that can be later optimized by dividing task level
dependency, CPU level dependency and global dependency and update
each at the right time.

Suggested-by: Thomas Gleixner <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc; John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/tick-sched.c | 63 +++++++++++++++++++++++++++++++-----------------
kernel/time/tick-sched.h | 6 +++++
2 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8acaab5..5fea798 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -157,49 +157,61 @@ cpumask_var_t tick_nohz_full_mask;
cpumask_var_t housekeeping_mask;
bool tick_nohz_full_running;

-static bool can_stop_full_tick(void)
+static bool can_stop_full_tick(struct tick_sched *ts)
{
WARN_ON_ONCE(!irqs_disabled());

- if (!sched_can_stop_tick()) {
- trace_tick_stop(0, "more than 1 task in runqueue\n");
+ if (ts->tick_needed) {
+ if (ts->tick_needed & TICK_NEEDED_POSIX_CPU_TIMER)
+ trace_tick_stop(0, "posix timers running\n");
+ if (ts->tick_needed & TICK_NEEDED_PERF_EVENT)
+ trace_tick_stop(0, "perf events running\n");
+ if (ts->tick_needed & TICK_NEEDED_SCHED)
+ trace_tick_stop(0, "more than 1 task in runqueue\n");
+ if (ts->tick_needed & TICK_NEEDED_CLOCK_UNSTABLE)
+ trace_tick_stop(0, "unstable sched clock\n");
return false;
}

- if (!posix_cpu_timers_can_stop_tick(current)) {
- trace_tick_stop(0, "posix timers running\n");
- return false;
- }
+ return true;
+}

- if (!perf_event_can_stop_tick()) {
- trace_tick_stop(0, "perf events running\n");
- return false;
- }
+static void tick_nohz_full_update_dependencies(void)
+{
+ struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+
+ if (!posix_cpu_timers_can_stop_tick(current))
+ ts->tick_needed |= TICK_NEEDED_POSIX_CPU_TIMER;
+
+ if (!perf_event_can_stop_tick())
+ ts->tick_needed |= TICK_NEEDED_PERF_EVENT;
+
+ if (!sched_can_stop_tick())
+ ts->tick_needed |= TICK_NEEDED_SCHED;

- /* sched_clock_tick() needs us? */
#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
/*
+ * sched_clock_tick() needs us?
+ *
* TODO: kick full dynticks CPUs when
* sched_clock_stable is set.
*/
if (!sched_clock_stable()) {
- trace_tick_stop(0, "unstable sched clock\n");
+ ts->tick_needed |= TICK_NEEDED_CLOCK_UNSTABLE;
/*
* Don't allow the user to think they can get
* full NO_HZ with this machine.
*/
WARN_ONCE(tick_nohz_full_running,
"NO_HZ FULL will not work with unstable sched clock");
- return false;
}
#endif
-
- return true;
}

static void nohz_full_kick_work_func(struct irq_work *work)
{
- /* Empty, the tick restart happens on tick_nohz_irq_exit() */
+ /* tick restart happens on tick_nohz_irq_exit() */
+ tick_nohz_full_update_dependencies();
}

static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
@@ -234,7 +246,8 @@ void tick_nohz_full_kick_cpu(int cpu)

static void nohz_full_kick_ipi(void *info)
{
- /* Empty, the tick restart happens on tick_nohz_irq_exit() */
+ /* tick restart happens on tick_nohz_irq_exit() */
+ tick_nohz_full_update_dependencies();
}

/*
@@ -258,18 +271,24 @@ void tick_nohz_full_kick_all(void)
* It might need the tick due to per task/process properties:
* perf events, posix cpu timers, ...
*/
-void __tick_nohz_task_switch(struct task_struct *tsk)
+void __tick_nohz_task_switch(struct task_struct *next)
{
unsigned long flags;
+ struct tick_sched *ts;

local_irq_save(flags);

+ ts = this_cpu_ptr(&tick_cpu_sched);
+ /* Reset tick dependency evaluation */
+ ts->tick_needed = 0;
+
if (!tick_nohz_full_cpu(smp_processor_id()))
goto out;

- if (tick_nohz_tick_stopped() && !can_stop_full_tick())
+ tick_nohz_full_update_dependencies();
+
+ if (ts->tick_stopped && !can_stop_full_tick(ts))
tick_nohz_full_kick();
-
out:
local_irq_restore(flags);
}
@@ -715,7 +734,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
return;

- if (can_stop_full_tick())
+ if (can_stop_full_tick(ts))
tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
else if (ts->tick_stopped)
tick_nohz_restart_sched_tick(ts, ktime_get());
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index 42fdf49..03c283e 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -19,6 +19,11 @@ enum tick_nohz_mode {
NOHZ_MODE_HIGHRES,
};

+#define TICK_NEEDED_POSIX_CPU_TIMER 0x1
+#define TICK_NEEDED_PERF_EVENT 0x2
+#define TICK_NEEDED_SCHED 0x4
+#define TICK_NEEDED_CLOCK_UNSTABLE 0x8
+
/**
* struct tick_sched - sched tick emulation and no idle tick control/stats
* @sched_timer: hrtimer to schedule the periodic tick in high
@@ -60,6 +65,7 @@ struct tick_sched {
u64 next_timer;
ktime_t idle_expires;
int do_timer_last;
+ int tick_needed;
};

extern struct tick_sched *tick_get_tick_sched(int cpu);
--
2.1.4

2015-06-11 17:36:38

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 8/8] nohz: Remove useless argument on tick_nohz_task_switch()

Leftover from early code.

Cc: Christoph Lameter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc; John Stultz <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/tick.h | 8 ++++----
kernel/sched/core.c | 2 +-
kernel/time/tick-sched.c | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 390559e..5fc78a9 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -143,7 +143,7 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_cpu(int cpu);
extern void tick_nohz_full_kick_all(void);
-extern void __tick_nohz_task_switch(struct task_struct *tsk);
+extern void __tick_nohz_task_switch(void);
#else
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
@@ -151,7 +151,7 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void tick_nohz_full_kick(void) { }
static inline void tick_nohz_full_kick_all(void) { }
-static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
+static inline void __tick_nohz_task_switch(void) { }
#endif

static inline bool is_housekeeping_cpu(int cpu)
@@ -172,10 +172,10 @@ static inline void housekeeping_affine(struct task_struct *t)
#endif
}

-static inline void tick_nohz_task_switch(struct task_struct *tsk)
+static inline void tick_nohz_task_switch(void)
{
if (tick_nohz_full_enabled())
- __tick_nohz_task_switch(tsk);
+ __tick_nohz_task_switch();
}

#endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3db7c9b..0d81faa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2224,7 +2224,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
put_task_struct(prev);
}

- tick_nohz_task_switch(current);
+ tick_nohz_task_switch();
return rq;
}

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5fea798..1f3732b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -271,7 +271,7 @@ void tick_nohz_full_kick_all(void)
* It might need the tick due to per task/process properties:
* perf events, posix cpu timers, ...
*/
-void __tick_nohz_task_switch(struct task_struct *next)
+void __tick_nohz_task_switch(void)
{
unsigned long flags;
struct tick_sched *ts;
--
2.1.4

2015-06-11 20:46:33

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 7/8] nohz: Evaluate tick dependency once on context switch

On 06/11/2015 01:36 PM, Frederic Weisbecker wrote:
> The tick dependency is evaluated on every irq. This is a batch of checks
> which determine whether it is safe to stop the tick or not. These checks
> are often split in many details: posix cpu timers, scheduler, sched clock,
> perf events. Each of which are made of smaller details: posix cpu
> timer involves checking process wide timers then thread wide timers. Perf
> involves checking freq events then more per cpu details.
>
> Checking these details every time we update the full dynticks state
> bring avoidable overhead.
>
> So lets evaluate these dependencies once on context switch. Then the
> further dependency checks will be performed through a single state check.
>
> This is a first step that can be later optimized by dividing task level
> dependency, CPU level dependency and global dependency and update
> each at the right time.

> +static void tick_nohz_full_update_dependencies(void)
> +{
> + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> +
> + if (!posix_cpu_timers_can_stop_tick(current))
> + ts->tick_needed |= TICK_NEEDED_POSIX_CPU_TIMER;
> +
> + if (!perf_event_can_stop_tick())
> + ts->tick_needed |= TICK_NEEDED_PERF_EVENT;
> +
> + if (!sched_can_stop_tick())
> + ts->tick_needed |= TICK_NEEDED_SCHED;
>

I see this getting kicked from task work and from ipi
context, but does it get kicked on task wakeup, when
we have a second runnable task on a CPU, but we decide
not to preempt the currently running task to switch to
it yet, but we will want to preempt the currently running
task at a later point in time?

2015-06-11 20:47:03

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 1/8] jiffies: Remove HZ > USEC_PER_SEC special case

On 06/11/2015 01:36 PM, Frederic Weisbecker wrote:
> HZ never goes much further 1000 and a bit. And if we ever reach one tick
> per microsecond, we might be having a problem.
>
> Lets stop maintaining this special case, just leave a paranoid check.
>
> Cc: Christoph Lameter <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc; John Stultz <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Preeti U Murthy <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>

Reviewed-by: Rik van Riel <[email protected]>

2015-06-11 20:47:16

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 2/8] apm32: Fix cputime == jiffies assumption

On 06/11/2015 01:36 PM, Frederic Weisbecker wrote:
> That code wrongly assumes that cputime_t wraps jiffies_t. Lets use
> the correct accessors/mutators.
>
> No real harm now as that code can't be used with full dynticks.
>
> Cc: Christoph Lameter <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc; John Stultz <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Preeti U Murthy <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>

Reviewed-by: Rik van Riel <[email protected]>

2015-06-11 20:47:35

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 3/8] alpha: Fix jiffies based cputime assumption

On 06/11/2015 01:36 PM, Frederic Weisbecker wrote:
> That code wrongly assumes that cputime_t wraps jiffies_t. Lets use
> the correct accessors/mutators.
>
> In practice there should be no harm yet because alpha currently
> only support tick based cputime accounting which is always jiffies
> based.
>
> Cc: Richard Henderson <[email protected]>
> Cc: Ivan Kokshaysky <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc; John Stultz <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Preeti U Murthy <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>

Reviewed-by: Rik van Riel <[email protected]>

2015-06-11 20:47:48

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 4/8] nohz: Remove idle task special case

On 06/11/2015 01:36 PM, Frederic Weisbecker wrote:
> This is a leftover from old days to avoid conflicts with dynticks idle
> code. Now full dynticks and idle dynticks are better integrated and
> interact without known issue.
>
> So lets remove it.
>
> Cc: Christoph Lameter <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc; John Stultz <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Preeti U Murthy <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>

Reviewed-by: Rik van Riel <[email protected]>

2015-06-11 20:48:04

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 5/8] nohz: Restart the tick from irq exit

On 06/11/2015 01:36 PM, Frederic Weisbecker wrote:
> Restart the tick when necessary from the irq exit path. It makes nohz
> full more flexible and allow it to piggyback the tick restart on the
> scheduler IPI in the future instead of sending a dedicated IPI that
> often doubles the scheduler IPI on task wakeup. This will require
> careful review of resched_curr() callers.
>
> Cc: Christoph Lameter <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc; John Stultz <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Preeti U Murthy <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>

Reviewed-by: Rik van Riel <[email protected]>

2015-06-11 20:48:23

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 6/8] nohz: Move tick_nohz_restart_sched_tick() above its users

On 06/11/2015 01:36 PM, Frederic Weisbecker wrote:
> Fix the function declaration/definition dance.
>
> Cc: Christoph Lameter <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc; John Stultz <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Preeti U Murthy <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>

Reviewed-by: Rik van Riel <[email protected]>

2015-06-12 07:32:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 5/8] nohz: Restart the tick from irq exit

On Thu, Jun 11, 2015 at 07:36:05PM +0200, Frederic Weisbecker wrote:
> Restart the tick when necessary from the irq exit path. It makes nohz
> full more flexible and allow it to piggyback the tick restart on the
> scheduler IPI in the future instead of sending a dedicated IPI that
> often doubles the scheduler IPI on task wakeup. This will require
> careful review of resched_curr() callers.

This seems to assume schedule_ipi() callers use irq_exit(), this is
false.

2015-06-12 07:36:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 7/8] nohz: Evaluate tick dependency once on context switch

On Thu, Jun 11, 2015 at 07:36:07PM +0200, Frederic Weisbecker wrote:
> +static void tick_nohz_full_update_dependencies(void)
> +{
> + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> +
> + if (!posix_cpu_timers_can_stop_tick(current))
> + ts->tick_needed |= TICK_NEEDED_POSIX_CPU_TIMER;
> +
> + if (!perf_event_can_stop_tick())
> + ts->tick_needed |= TICK_NEEDED_PERF_EVENT;
> +
> + if (!sched_can_stop_tick())
> + ts->tick_needed |= TICK_NEEDED_SCHED;
>
> #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> /*
> + * sched_clock_tick() needs us?
> + *
> * TODO: kick full dynticks CPUs when
> * sched_clock_stable is set.
> */
> if (!sched_clock_stable()) {
> + ts->tick_needed |= TICK_NEEDED_CLOCK_UNSTABLE;
> /*
> * Don't allow the user to think they can get
> * full NO_HZ with this machine.
> */
> WARN_ONCE(tick_nohz_full_running,
> "NO_HZ FULL will not work with unstable sched clock");
> }
> #endif
> }

Colour me confused; why does this function exist at all? Should not
these bits be managed by those respective subsystems?

2015-06-12 12:38:43

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 5/8] nohz: Restart the tick from irq exit

On Fri, Jun 12, 2015 at 09:32:45AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 11, 2015 at 07:36:05PM +0200, Frederic Weisbecker wrote:
> > Restart the tick when necessary from the irq exit path. It makes nohz
> > full more flexible and allow it to piggyback the tick restart on the
> > scheduler IPI in the future instead of sending a dedicated IPI that
> > often doubles the scheduler IPI on task wakeup. This will require
> > careful review of resched_curr() callers.
>
> This seems to assume schedule_ipi() callers use irq_exit(), this is
> false.

Indeed there will be that too. Note the current patch doesn't yet rely on
schedule_ipi(), we are still using the nohz ipis. But introducing the
tick restart on irq exit prepares for later piggybacking on scheduler_ipi().

I think this will involve changes on got_nohz_idle_kick(), renamed to
got_nohz_kick() and include nohz full related checks to trigger the
irq_enter()/exit() pair.

2015-06-12 12:59:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 5/8] nohz: Restart the tick from irq exit

On Fri, Jun 12, 2015 at 02:38:36PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 12, 2015 at 09:32:45AM +0200, Peter Zijlstra wrote:
> > On Thu, Jun 11, 2015 at 07:36:05PM +0200, Frederic Weisbecker wrote:
> > > Restart the tick when necessary from the irq exit path. It makes nohz
> > > full more flexible and allow it to piggyback the tick restart on the
> > > scheduler IPI in the future instead of sending a dedicated IPI that
> > > often doubles the scheduler IPI on task wakeup. This will require
> > > careful review of resched_curr() callers.
> >
> > This seems to assume schedule_ipi() callers use irq_exit(), this is
> > false.
>
> Indeed there will be that too. Note the current patch doesn't yet rely on
> schedule_ipi(), we are still using the nohz ipis.

Ah, then I just didn't understand your changelog right.

2015-06-12 13:06:21

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 5/8] nohz: Restart the tick from irq exit

On Fri, Jun 12, 2015 at 02:59:41PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 12, 2015 at 02:38:36PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jun 12, 2015 at 09:32:45AM +0200, Peter Zijlstra wrote:
> > > On Thu, Jun 11, 2015 at 07:36:05PM +0200, Frederic Weisbecker wrote:
> > > > Restart the tick when necessary from the irq exit path. It makes nohz
> > > > full more flexible and allow it to piggyback the tick restart on the
> > > > scheduler IPI in the future instead of sending a dedicated IPI that
> > > > often doubles the scheduler IPI on task wakeup. This will require
> > > > careful review of resched_curr() callers.
> > >
> > > This seems to assume schedule_ipi() callers use irq_exit(), this is
> > > false.
> >
> > Indeed there will be that too. Note the current patch doesn't yet rely on
> > schedule_ipi(), we are still using the nohz ipis.
>
> Ah, then I just didn't understand your changelog right.

I think I should make it clearer, I must admit it's a bit confused :)

2015-06-14 01:44:48

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH 4/8] nohz: Remove idle task special case

On 06/11/2015 11:06 PM, Frederic Weisbecker wrote:
> This is a leftover from old days to avoid conflicts with dynticks idle
> code. Now full dynticks and idle dynticks are better integrated and
> interact without known issue.

I am sorry but I fail to understand why the check on idle task was there
in the first place in the below code paths. It would help if you could
clarify this in the changelog as well.

>
> So lets remove it.
>
> Cc: Christoph Lameter <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc; John Stultz <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Preeti U Murthy <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> kernel/time/tick-sched.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 812f7a3..324482f 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -208,10 +208,8 @@ void __tick_nohz_full_check(void)
> struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
>
> if (tick_nohz_full_cpu(smp_processor_id())) {
> - if (ts->tick_stopped && !is_idle_task(current)) {
> - if (!can_stop_full_tick())

can_stop_full_tick() would have bailed out if the current task was idle,
since it checks for the number of tasks being greater than 1 to restart
the tick. So why was the check is_idle_task() introduced earlier ?

> - tick_nohz_restart_sched_tick(ts, ktime_get());
> - }
> + if (ts->tick_stopped && !can_stop_full_tick())
> + tick_nohz_restart_sched_tick(ts, ktime_get());
> }
> }
>
> @@ -710,7 +708,7 @@ static void tick_nohz_full_stop_tick(struct tick_sched *ts)
> #ifdef CONFIG_NO_HZ_FULL
> int cpu = smp_processor_id();
>
> - if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
> + if (!tick_nohz_full_cpu(cpu))

If the current task was indeed idle, the check on ts->inidle would have
succeeded in tick_irq_exit() and we would not have reached this function
at all, isn't it? So here too I am unable to understand why we had it in
the first place.

Regards
Preeti U Murthy
> return;
>
> if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
>

2015-06-14 09:19:43

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH 5/8] nohz: Restart the tick from irq exit

On 06/11/2015 11:06 PM, Frederic Weisbecker wrote:
> Restart the tick when necessary from the irq exit path. It makes nohz
> full more flexible and allow it to piggyback the tick restart on the
> scheduler IPI in the future instead of sending a dedicated IPI that
> often doubles the scheduler IPI on task wakeup. This will require

You can piggy back on the scheduler ipi when you add a timer/hrtimer and
add a new task to the runqueue of the nohz_full cpus, since we call
resched_curr() in these code paths. But what about the calls to kick
nohz_full cpus by perf events and posix cpu timers ? These call sites
seem to be concerned about specifically waking up nohz_full cpus as far
as I can see. IOW there is no scheduling ipi that we can fall back on in
these paths.

> careful review of resched_curr() callers.
>

Regards
Preeti U Murthy

2015-06-14 09:19:32

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH 5/8] nohz: Restart the tick from irq exit

On 06/11/2015 11:06 PM, Frederic Weisbecker wrote:
> Restart the tick when necessary from the irq exit path. It makes nohz
> full more flexible and allow it to piggyback the tick restart on the
> scheduler IPI in the future instead of sending a dedicated IPI that
> often doubles the scheduler IPI on task wakeup. This will require

You can piggy back on the scheduler ipi when you add a timer/hrtimer and
add a new task to the runqueue of the nohz_full cpus, since we call
resched_curr() in these code paths. But what about the calls to kick
nohz_full cpus by perf events and posix cpu timers ? These call sites
seem to be concerned about specifically waking up nohz_full cpus as far
as I can see. IOW there is no scheduling ipi that we can fall back on in
these paths.

> careful review of resched_curr() callers.
>

Regards
Preeti U Murthy

2015-06-14 09:30:30

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH 5/8] nohz: Restart the tick from irq exit

On 06/12/2015 06:08 PM, Frederic Weisbecker wrote:
> On Fri, Jun 12, 2015 at 09:32:45AM +0200, Peter Zijlstra wrote:
>> On Thu, Jun 11, 2015 at 07:36:05PM +0200, Frederic Weisbecker wrote:
>>> Restart the tick when necessary from the irq exit path. It makes nohz
>>> full more flexible and allow it to piggyback the tick restart on the
>>> scheduler IPI in the future instead of sending a dedicated IPI that
>>> often doubles the scheduler IPI on task wakeup. This will require
>>> careful review of resched_curr() callers.
>>
>> This seems to assume schedule_ipi() callers use irq_exit(), this is
>> false.
>
> Indeed there will be that too. Note the current patch doesn't yet rely on
> schedule_ipi(), we are still using the nohz ipis. But introducing the
> tick restart on irq exit prepares for later piggybacking on scheduler_ipi().
>
> I think this will involve changes on got_nohz_idle_kick(), renamed to
> got_nohz_kick() and include nohz full related checks to trigger the
> irq_enter()/exit() pair.

I maybe saying something obvious here, nevertheless; I am not sure about
other archs, but atleast on powerpc after handling an interrupt, we will
call irq_exit() and reevaluate starting of ticks. So in our case even if
scheduler_ipi() callers do not call irq_exit(), it will be called after
handling the reschedule interrupt.

Regards
Preeti U Murthy
>

2015-06-17 05:59:23

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH 7/8] nohz: Evaluate tick dependency once on context switch

On 06/12/2015 02:16 AM, Rik van Riel wrote:
> On 06/11/2015 01:36 PM, Frederic Weisbecker wrote:
>> The tick dependency is evaluated on every irq. This is a batch of checks
>> which determine whether it is safe to stop the tick or not. These checks
>> are often split in many details: posix cpu timers, scheduler, sched clock,
>> perf events. Each of which are made of smaller details: posix cpu
>> timer involves checking process wide timers then thread wide timers. Perf
>> involves checking freq events then more per cpu details.
>>
>> Checking these details every time we update the full dynticks state
>> bring avoidable overhead.
>>
>> So lets evaluate these dependencies once on context switch. Then the
>> further dependency checks will be performed through a single state check.
>>
>> This is a first step that can be later optimized by dividing task level
>> dependency, CPU level dependency and global dependency and update
>> each at the right time.
>
>> +static void tick_nohz_full_update_dependencies(void)
>> +{
>> + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
>> +
>> + if (!posix_cpu_timers_can_stop_tick(current))
>> + ts->tick_needed |= TICK_NEEDED_POSIX_CPU_TIMER;
>> +
>> + if (!perf_event_can_stop_tick())
>> + ts->tick_needed |= TICK_NEEDED_PERF_EVENT;
>> +
>> + if (!sched_can_stop_tick())
>> + ts->tick_needed |= TICK_NEEDED_SCHED;
>>
>
> I see this getting kicked from task work and from ipi
> context, but does it get kicked on task wakeup, when
> we have a second runnable task on a CPU, but we decide
> not to preempt the currently running task to switch to
> it yet, but we will want to preempt the currently running
> task at a later point in time?

+1. This is not taken care of as far as I can see too.

Regards
Preeti U Murthy
>

2015-07-06 16:14:21

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 7/8] nohz: Evaluate tick dependency once on context switch

On Fri, Jun 12, 2015 at 09:36:50AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 11, 2015 at 07:36:07PM +0200, Frederic Weisbecker wrote:
> > +static void tick_nohz_full_update_dependencies(void)
> > +{
> > + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> > +
> > + if (!posix_cpu_timers_can_stop_tick(current))
> > + ts->tick_needed |= TICK_NEEDED_POSIX_CPU_TIMER;
> > +
> > + if (!perf_event_can_stop_tick())
> > + ts->tick_needed |= TICK_NEEDED_PERF_EVENT;
> > +
> > + if (!sched_can_stop_tick())
> > + ts->tick_needed |= TICK_NEEDED_SCHED;
> >
> > #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> > /*
> > + * sched_clock_tick() needs us?
> > + *
> > * TODO: kick full dynticks CPUs when
> > * sched_clock_stable is set.
> > */
> > if (!sched_clock_stable()) {
> > + ts->tick_needed |= TICK_NEEDED_CLOCK_UNSTABLE;
> > /*
> > * Don't allow the user to think they can get
> > * full NO_HZ with this machine.
> > */
> > WARN_ONCE(tick_nohz_full_running,
> > "NO_HZ FULL will not work with unstable sched clock");
> > }
> > #endif
> > }
>
> Colour me confused; why does this function exist at all? Should not
> these bits be managed by those respective subsystems?

So we have two choices here:

1) Something changes in a subsystem which needs the tick and that subsystem
sends an IPI to the CPU that is concerned such that it changes the tick
dependency state.

pros: The dependency bits are always modified and read locally
cons: We need to also check the subsystems from task switch because the next
task may have different dependencies than prev. So that's context switch
overhead

2) Whenever a subsystem changes its dependency to the tick (needs or doesn't need
anymore), that subsystem remotely changes the dependency bits then sends an IPI
in case we switched from "tick needed" to "tick not needed".

pros: Less context switch overhead
cons: Works for some subsystems for which dependency is per CPU: (scheduler)
Others for which dependency is per task exclusively or system wide need
more complicated treatment: posix cpu timers would then need to switch to
a seperate global flag.
perf depends on both a global state and a per cpu state.
The flags are read remotely. This involve some ordering but no full barrier
since we have the IPI.

This patchset takes the simple 1) way which definetly can be improved.

Perhaps we should do 2) with one global mask and one per cpu mask and all flags
atomically and remotely set and clear by the relevant subsystems.

2015-07-07 14:20:12

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 5/8] nohz: Restart the tick from irq exit

On Sun, Jun 14, 2015 at 02:49:16PM +0530, Preeti U Murthy wrote:
> On 06/11/2015 11:06 PM, Frederic Weisbecker wrote:
> > Restart the tick when necessary from the irq exit path. It makes nohz
> > full more flexible and allow it to piggyback the tick restart on the
> > scheduler IPI in the future instead of sending a dedicated IPI that
> > often doubles the scheduler IPI on task wakeup. This will require
>
> You can piggy back on the scheduler ipi when you add a timer/hrtimer and
> add a new task to the runqueue of the nohz_full cpus, since we call
> resched_curr() in these code paths. But what about the calls to kick
> nohz_full cpus by perf events and posix cpu timers ? These call sites
> seem to be concerned about specifically waking up nohz_full cpus as far
> as I can see. IOW there is no scheduling ipi that we can fall back on in
> these paths.

Sure, those will need to keep the current IPI. They are less of a worry
because they should be rare events compared to the scheduler.

Thanks.

2015-07-07 14:25:23

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 5/8] nohz: Restart the tick from irq exit

On Sun, Jun 14, 2015 at 03:00:12PM +0530, Preeti U Murthy wrote:
> On 06/12/2015 06:08 PM, Frederic Weisbecker wrote:
> > On Fri, Jun 12, 2015 at 09:32:45AM +0200, Peter Zijlstra wrote:
> >> On Thu, Jun 11, 2015 at 07:36:05PM +0200, Frederic Weisbecker wrote:
> >>> Restart the tick when necessary from the irq exit path. It makes nohz
> >>> full more flexible and allow it to piggyback the tick restart on the
> >>> scheduler IPI in the future instead of sending a dedicated IPI that
> >>> often doubles the scheduler IPI on task wakeup. This will require
> >>> careful review of resched_curr() callers.
> >>
> >> This seems to assume schedule_ipi() callers use irq_exit(), this is
> >> false.
> >
> > Indeed there will be that too. Note the current patch doesn't yet rely on
> > schedule_ipi(), we are still using the nohz ipis. But introducing the
> > tick restart on irq exit prepares for later piggybacking on scheduler_ipi().
> >
> > I think this will involve changes on got_nohz_idle_kick(), renamed to
> > got_nohz_kick() and include nohz full related checks to trigger the
> > irq_enter()/exit() pair.
>
> I maybe saying something obvious here, nevertheless; I am not sure about
> other archs, but atleast on powerpc after handling an interrupt, we will
> call irq_exit() and reevaluate starting of ticks. So in our case even if
> scheduler_ipi() callers do not call irq_exit(), it will be called after
> handling the reschedule interrupt.

scheduler_ipi() takes care of the call to irq_enter() and irq_exit() when
necessary. Which means that the arch low level handler for scheduler_ipi()
shouldn't call these functions. If it does then it's buggy.

2015-07-07 14:31:01

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 7/8] nohz: Evaluate tick dependency once on context switch

On Thu, Jun 11, 2015 at 04:46:29PM -0400, Rik van Riel wrote:
> On 06/11/2015 01:36 PM, Frederic Weisbecker wrote:
> > The tick dependency is evaluated on every irq. This is a batch of checks
> > which determine whether it is safe to stop the tick or not. These checks
> > are often split in many details: posix cpu timers, scheduler, sched clock,
> > perf events. Each of which are made of smaller details: posix cpu
> > timer involves checking process wide timers then thread wide timers. Perf
> > involves checking freq events then more per cpu details.
> >
> > Checking these details every time we update the full dynticks state
> > bring avoidable overhead.
> >
> > So lets evaluate these dependencies once on context switch. Then the
> > further dependency checks will be performed through a single state check.
> >
> > This is a first step that can be later optimized by dividing task level
> > dependency, CPU level dependency and global dependency and update
> > each at the right time.
>
> > +static void tick_nohz_full_update_dependencies(void)
> > +{
> > + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> > +
> > + if (!posix_cpu_timers_can_stop_tick(current))
> > + ts->tick_needed |= TICK_NEEDED_POSIX_CPU_TIMER;
> > +
> > + if (!perf_event_can_stop_tick())
> > + ts->tick_needed |= TICK_NEEDED_PERF_EVENT;
> > +
> > + if (!sched_can_stop_tick())
> > + ts->tick_needed |= TICK_NEEDED_SCHED;
> >
>
> I see this getting kicked from task work and from ipi
> context, but does it get kicked on task wakeup, when
> we have a second runnable task on a CPU, but we decide
> not to preempt the currently running task to switch to
> it yet, but we will want to preempt the currently running
> task at a later point in time?

Sure, it's kicked unconditionally as long as the runqueue has more
than one task running. So it includes task wakeup, load balancing, etc...
Everything that calls add_nr_running().

We'll certainly refine that in the long run, perhaps by hooking
on check_preempt_curr() and resched_curr() to minimize the IPIs
and piggyback on the scheduler ipi when possible.

2015-07-07 14:46:08

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 4/8] nohz: Remove idle task special case

On Sun, Jun 14, 2015 at 07:14:34AM +0530, Preeti U Murthy wrote:
> On 06/11/2015 11:06 PM, Frederic Weisbecker wrote:
> > This is a leftover from old days to avoid conflicts with dynticks idle
> > code. Now full dynticks and idle dynticks are better integrated and
> > interact without known issue.
>
> I am sorry but I fail to understand why the check on idle task was there
> in the first place in the below code paths. It would help if you could
> clarify this in the changelog as well.

Because dynticks-idle maintains various stats that were ignored by nohz
full. Both machinery were so badly integrated to each other that if the
nohz full code started or stopped the tick in idle on behalf of the dynticks
idle code, the relevant stats got ignored or buggy. But now the code has
been consolidated to handle that.

I'll try to improve the changelog.

>
> >
> > So lets remove it.
> >
> > Cc: Christoph Lameter <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc; John Stultz <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Preeti U Murthy <[email protected]>
> > Cc: Rik van Riel <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Viresh Kumar <[email protected]>
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > ---
> > kernel/time/tick-sched.c | 8 +++-----
> > 1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 812f7a3..324482f 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -208,10 +208,8 @@ void __tick_nohz_full_check(void)
> > struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> >
> > if (tick_nohz_full_cpu(smp_processor_id())) {
> > - if (ts->tick_stopped && !is_idle_task(current)) {
> > - if (!can_stop_full_tick())
>
> can_stop_full_tick() would have bailed out if the current task was idle,
> since it checks for the number of tasks being greater than 1 to restart
> the tick. So why was the check is_idle_task() introduced earlier ?

If the tick was stopped by dynticks idle and we received that IPI before
the tick got a chance to be restarted by the dynticks-idle code, the tick
could be restarted by nohz full and break some idle related statistics.

So we had to introduce that condition before the code got consolidated.

>
> > - tick_nohz_restart_sched_tick(ts, ktime_get());
> > - }
> > + if (ts->tick_stopped && !can_stop_full_tick())
> > + tick_nohz_restart_sched_tick(ts, ktime_get());
> > }
> > }
> >
> > @@ -710,7 +708,7 @@ static void tick_nohz_full_stop_tick(struct tick_sched *ts)
> > #ifdef CONFIG_NO_HZ_FULL
> > int cpu = smp_processor_id();
> >
> > - if (!tick_nohz_full_cpu(cpu) || is_idle_task(current))
> > + if (!tick_nohz_full_cpu(cpu))
>
> If the current task was indeed idle, the check on ts->inidle would have
> succeeded in tick_irq_exit() and we would not have reached this function
> at all, isn't it? So here too I am unable to understand why we had it in
> the first place.

is_idle_task() means that we run the idle task. But ts->inidle means that we
run the idle task between tick_nohz_idle_enter() and tick_nohz_idle_exit().

So there was a chance that we stopped the tick with full dynticks and then
tick_nohz_idle_enter() gets called with tick disabled and break some stats
in the way.

Again that's all leftover junk from early code.

> Regards
> Preeti U Murthy
> > return;
> >
> > if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
> >
>