2013-04-03 12:46:38

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC GIT PULL] nohz: Kconfig layout improvements

Ingo,

This set addresses your review concerning the Kconfig layout.
Please note two things here that derive from what we agreed
on due to technical limitations:

* Now the full dynticks Kconfig is not hidden anymore behind its
high level dependencies. (ie: passive dependencies are now active).
There is an exception though with CONFIG_VIRT_CPU_ACCOUNTING_GEN
(Full dynticks cputime accounting) that is part of a choice menu
like PREEMPT_*. It seems such kconfig layout prevent from doing a remote
select on its choices. So it stays a passive dependency for now, until
Kconfig/Kbuild supports that (Cc'ing Michel Marek) or somebody shows
me what I did wrong ;)

* Ideally we want to reuse CONFIG_NO_HZ as a Kconfig that consolidate
the common code between CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED.
But we also want CONFIG_NO_HZ from old config files to map to CONFIG_NO_HZ_IDLE.
Both at the same time is not possible or we have a Kconfig circular
dependency. So I introduced a new CONFIG_NO_HZ_COMMON for common nohz code
and CONFIG_NO_HZ stays for backward compatibility by enabling CONFIG_NO_HZ_IDLE
by default.

If you're ok, please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/nohz-v2

Thanks.

---
Frederic Weisbecker (4):
nohz: Unhide full dynticks feature from its dependencies
nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON
nohz: Pack nohz Kconfig option in a menu of choices
nohz: Print final full dynticks CPUs range on boot

Documentation/RCU/stallwarn.txt | 2 +-
Documentation/cpu-freq/governors.txt | 4 +-
arch/um/include/shared/common-offsets.h | 4 +-
arch/um/os-Linux/time.c | 2 +-
include/linux/sched.h | 8 ++--
include/linux/tick.h | 8 ++--
init/Kconfig | 2 +-
kernel/hrtimer.c | 4 +-
kernel/sched/core.c | 18 +++++-----
kernel/sched/fair.c | 10 +++---
kernel/sched/sched.h | 4 +-
kernel/softirq.c | 2 +-
kernel/time/Kconfig | 54 ++++++++++++++++++++++++------
kernel/time/tick-sched.c | 22 +++++++++---
kernel/timer.c | 4 +-
15 files changed, 95 insertions(+), 53 deletions(-)

--
1.7.5.4


2013-04-03 12:46:41

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/4] nohz: Unhide full dynticks feature from its dependencies

The full dynticks feature only shows up when all its
Kconfig dependencies are met (RCU nocbs, RCU user mode, ...)

This is far from being user friendly as those who want to
activate this feature need to look into the Kconfig files
and iterate through each dependency then activate these
by hand in order to show and select the full dynticks
Kconfig option.

So process the other way around: show up the Kconfig option
if the minimal low level dependencies are met and activate
the high level ones when we enable the feature.

Note there is one exception in the picture:
CONFIG_VIRT_CPU_ACCOUNTING_GEN is part of a Kconfig choice
menu and it appears we can't select it from another Kconfig
selection when it's under such layout. So for now this
particular item stays as a passive dependency.

Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/time/Kconfig | 19 ++++++++++++++-----
1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 5a87c03..726c33e 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -80,11 +80,20 @@ config NO_HZ
busy and when the system is idle.

config NO_HZ_EXTENDED
- bool "Full dynticks system"
- depends on NO_HZ && RCU_USER_QS && VIRT_CPU_ACCOUNTING_GEN && RCU_NOCB_CPU && SMP
- select CONTEXT_TRACKING_FORCE
- help
- Adaptively try to shutdown the tick whenever possible, even when
+ bool "Full dynticks system"
+ # NO_HZ dependency
+ depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
+ # RCU_USER_QS
+ depends on HAVE_CONTEXT_TRACKING && SMP
+ # RCU_NOCB_CPU dependency
+ depends on TREE_RCU || TREE_PREEMPT_RCU
+ depends on VIRT_CPU_ACCOUNTING_GEN
+ select NO_HZ
+ select RCU_USER_QS
+ select RCU_NOCB_CPU
+ select CONTEXT_TRACKING_FORCE
+ help
+ Adaptively try to shutdown the tick whenever possible, even when
the CPU is running tasks. Typically this requires running a single
task on the CPU. Chances for running tickless are maximized when
the task mostly runs in userspace and has few kernel activity.
--
1.7.5.4

2013-04-03 12:46:44

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/4] nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON

We are planning to convert the dynticks Kconfig options layout
into a choice menu. The user must be able to easily pick
any of the following implementations: constant periodic tick,
idle dynticks, full dynticks.

As this implies a mutual exclusion, the two dynticks implementions
need to converge on the selection of a common Kconfig option in order
to ease the sharing of a common infrastructure.

It would thus seem pretty natural to reuse CONFIG_NO_HZ to
that end. It already implements all the idle dynticks code
and the full dynticks depends on all that code for now.
So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.

On the other hand we want to stay backward compatible: if
CONFIG_NO_HZ is set in an older config file, we want to
enable CONFIG_NO_HZ_IDLE by default.

But we can't afford both at the same time or we run into
a circular dependency:

1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
CONFIG_NO_HZ
2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE

We might be able to support that from Kconfig/Kbuild but it
may not be wise to introduce such a confusing behaviour.

So to solve this, create a new CONFIG_NO_HZ_COMMON option
which gathers the common code between idle and full dynticks
(that common code for now is simply the idle dynticks code)
and select it from their referring Kconfig.

Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
to it for backward compatibility.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
Documentation/RCU/stallwarn.txt | 2 +-
Documentation/cpu-freq/governors.txt | 4 ++--
arch/um/include/shared/common-offsets.h | 4 ++--
arch/um/os-Linux/time.c | 2 +-
include/linux/sched.h | 8 ++++----
include/linux/tick.h | 8 ++++----
init/Kconfig | 2 +-
kernel/hrtimer.c | 4 ++--
kernel/sched/core.c | 18 +++++++++---------
kernel/sched/fair.c | 10 +++++-----
kernel/sched/sched.h | 4 ++--
kernel/softirq.c | 2 +-
kernel/time/Kconfig | 13 +++++++++----
kernel/time/tick-sched.c | 12 ++++++------
kernel/timer.c | 4 ++--
15 files changed, 51 insertions(+), 46 deletions(-)

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 1927151..b336755 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -176,7 +176,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
o A hardware or software issue shuts off the scheduler-clock
interrupt on a CPU that is not in dyntick-idle mode. This
problem really has happened, and seems to be most likely to
- result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.
+ result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.

o A bug in the RCU implementation.

diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt
index c7a2eb8..e3e5d9a 100644
--- a/Documentation/cpu-freq/governors.txt
+++ b/Documentation/cpu-freq/governors.txt
@@ -131,8 +131,8 @@ sampling_rate_min:
The sampling rate is limited by the HW transition latency:
transition_latency * 100
Or by kernel restrictions:
-If CONFIG_NO_HZ is set, the limit is 10ms fixed.
-If CONFIG_NO_HZ is not set or nohz=off boot parameter is used, the
+If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed.
+If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is used, the
limits depend on the CONFIG_HZ option:
HZ=1000: min=20000us (20ms)
HZ=250: min=80000us (80ms)
diff --git a/arch/um/include/shared/common-offsets.h b/arch/um/include/shared/common-offsets.h
index 2df313b..c923068 100644
--- a/arch/um/include/shared/common-offsets.h
+++ b/arch/um/include/shared/common-offsets.h
@@ -30,8 +30,8 @@ DEFINE(UM_NSEC_PER_USEC, NSEC_PER_USEC);
#ifdef CONFIG_PRINTK
DEFINE(UML_CONFIG_PRINTK, CONFIG_PRINTK);
#endif
-#ifdef CONFIG_NO_HZ
-DEFINE(UML_CONFIG_NO_HZ, CONFIG_NO_HZ);
+#ifdef CONFIG_NO_HZ_COMMON
+DEFINE(UML_CONFIG_NO_HZ_COMMON, CONFIG_NO_HZ_COMMON);
#endif
#ifdef CONFIG_UML_X86
DEFINE(UML_CONFIG_UML_X86, CONFIG_UML_X86);
diff --git a/arch/um/os-Linux/time.c b/arch/um/os-Linux/time.c
index fac388c..e9824d5 100644
--- a/arch/um/os-Linux/time.c
+++ b/arch/um/os-Linux/time.c
@@ -79,7 +79,7 @@ long long os_nsecs(void)
return timeval_to_ns(&tv);
}

-#ifdef UML_CONFIG_NO_HZ
+#ifdef UML_CONFIG_NO_HZ_COMMON
static int after_sleep_interval(struct timespec *ts)
{
return 0;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 10626e2e..1ff9e0a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -230,7 +230,7 @@ extern void init_idle_bootup_task(struct task_struct *idle);

extern int runqueue_is_locked(int cpu);

-#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
extern void nohz_balance_enter_idle(int cpu);
extern void set_cpu_sd_state_idle(void);
extern int get_nohz_timer_target(void);
@@ -1758,13 +1758,13 @@ static inline int set_cpus_allowed_ptr(struct task_struct *p,
}
#endif

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
void calc_load_enter_idle(void);
void calc_load_exit_idle(void);
#else
static inline void calc_load_enter_idle(void) { }
static inline void calc_load_exit_idle(void) { }
-#endif /* CONFIG_NO_HZ */
+#endif /* CONFIG_NO_HZ_COMMON */

#ifndef CONFIG_CPUMASK_OFFSTACK
static inline int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
@@ -1850,7 +1850,7 @@ extern void idle_task_exit(void);
static inline void idle_task_exit(void) {}
#endif

-#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+#if defined(CONFIG_NO_HZ_COMMON) && defined(CONFIG_SMP)
extern void wake_up_nohz_cpu(int cpu);
#else
static inline void wake_up_nohz_cpu(int cpu) { }
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 44bfa8a..5e40333 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -82,7 +82,7 @@ extern int tick_program_event(ktime_t expires, int force);
extern void tick_setup_sched_timer(void);
# endif

-# if defined CONFIG_NO_HZ || defined CONFIG_HIGH_RES_TIMERS
+# if defined CONFIG_NO_HZ_COMMON || defined CONFIG_HIGH_RES_TIMERS
extern void tick_cancel_sched_timer(int cpu);
# else
static inline void tick_cancel_sched_timer(int cpu) { }
@@ -123,7 +123,7 @@ static inline void tick_check_idle(int cpu) { }
static inline int tick_oneshot_mode_active(void) { return 0; }
#endif /* !CONFIG_GENERIC_CLOCKEVENTS */

-# ifdef CONFIG_NO_HZ
+# ifdef CONFIG_NO_HZ_COMMON
DECLARE_PER_CPU(struct tick_sched, tick_cpu_sched);

static inline int tick_nohz_tick_stopped(void)
@@ -138,7 +138,7 @@ extern ktime_t tick_nohz_get_sleep_length(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);

-# else /* !CONFIG_NO_HZ */
+# else /* !CONFIG_NO_HZ_COMMON */
static inline int tick_nohz_tick_stopped(void)
{
return 0;
@@ -155,7 +155,7 @@ static inline ktime_t tick_nohz_get_sleep_length(void)
}
static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
-# endif /* !NO_HZ */
+# endif /* !CONFIG_NO_HZ_COMMON */

#ifdef CONFIG_NO_HZ_EXTENDED
extern int tick_nohz_extended_cpu(int cpu);
diff --git a/init/Kconfig b/init/Kconfig
index 8a1dac2..edc8132 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -580,7 +580,7 @@ config RCU_FANOUT_EXACT

config RCU_FAST_NO_HZ
bool "Accelerate last non-dyntick-idle CPU's grace periods"
- depends on NO_HZ && SMP
+ depends on NO_HZ_COMMON && SMP
default n
help
This option causes RCU to attempt to accelerate grace periods in
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cc47812..ec60482 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -160,7 +160,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
*/
static int hrtimer_get_target(int this_cpu, int pinned)
{
-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu))
return get_nohz_timer_target();
#endif
@@ -1106,7 +1106,7 @@ ktime_t hrtimer_get_remaining(const struct hrtimer *timer)
}
EXPORT_SYMBOL_GPL(hrtimer_get_remaining);

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/**
* hrtimer_get_next_event - get the time until next expiry event
*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e91ee58..9bb397d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -549,7 +549,7 @@ void resched_cpu(int cpu)
raw_spin_unlock_irqrestore(&rq->lock, flags);
}

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/*
* In the semi idle case, use the nearest busy cpu for migrating timers
* from an idle cpu. This is good for power-savings.
@@ -641,14 +641,14 @@ static inline bool got_nohz_idle_kick(void)
return idle_cpu(cpu) && test_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu));
}

-#else /* CONFIG_NO_HZ */
+#else /* CONFIG_NO_HZ_COMMON */

static inline bool got_nohz_idle_kick(void)
{
return false;
}

-#endif /* CONFIG_NO_HZ */
+#endif /* CONFIG_NO_HZ_COMMON */

void sched_avg_update(struct rq *rq)
{
@@ -2139,7 +2139,7 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active)
return load >> FSHIFT;
}

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/*
* Handle NO_HZ for the global load-average.
*
@@ -2365,12 +2365,12 @@ static void calc_global_nohz(void)
smp_wmb();
calc_load_idx++;
}
-#else /* !CONFIG_NO_HZ */
+#else /* !CONFIG_NO_HZ_COMMON */

static inline long calc_load_fold_idle(void) { return 0; }
static inline void calc_global_nohz(void) { }

-#endif /* CONFIG_NO_HZ */
+#endif /* CONFIG_NO_HZ_COMMON */

/*
* calc_load - update the avenrun load estimates 10 ticks after the
@@ -2530,7 +2530,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
sched_avg_update(this_rq);
}

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/*
* There is no sane way to deal with nohz on smp when using jiffies because the
* cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
@@ -2590,7 +2590,7 @@ void update_cpu_load_nohz(void)
}
raw_spin_unlock(&this_rq->lock);
}
-#endif /* CONFIG_NO_HZ */
+#endif /* CONFIG_NO_HZ_COMMON */

/*
* Called from scheduler_tick()
@@ -7023,7 +7023,7 @@ void __init sched_init(void)
INIT_LIST_HEAD(&rq->cfs_tasks);

rq_attach_root(rq, &def_root_domain);
-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
rq->nohz_flags = 0;
#endif
#endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 539760e..5c97fca 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5331,7 +5331,7 @@ out_unlock:
return 0;
}

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/*
* idle load balancing details
* - When one of the busy CPUs notice that there may be an idle rebalancing
@@ -5541,9 +5541,9 @@ out:
rq->next_balance = next_balance;
}

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/*
- * In CONFIG_NO_HZ case, the idle balance kickee will do the
+ * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
* rebalancing for all the cpus for whom scheduler ticks are stopped.
*/
static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle)
@@ -5686,7 +5686,7 @@ void trigger_load_balance(struct rq *rq, int cpu)
if (time_after_eq(jiffies, rq->next_balance) &&
likely(!on_null_domain(cpu)))
raise_softirq(SCHED_SOFTIRQ);
-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu)))
nohz_balancer_kick(cpu);
#endif
@@ -6156,7 +6156,7 @@ __init void init_sched_fair_class(void)
#ifdef CONFIG_SMP
open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
nohz.next_balance = jiffies;
zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
cpu_notifier(sched_ilb_notifier, 0);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3bd15a4..889904d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -404,7 +404,7 @@ struct rq {
#define CPU_LOAD_IDX_MAX 5
unsigned long cpu_load[CPU_LOAD_IDX_MAX];
unsigned long last_load_update_tick;
-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
u64 nohz_stamp;
unsigned long nohz_flags;
#endif
@@ -1333,7 +1333,7 @@ extern void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq);

extern void account_cfs_bandwidth_used(int enabled, int was_enabled);

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
enum rq_nohz_flag_bits {
NOHZ_TICK_STOPPED,
NOHZ_BALANCE_KICK,
diff --git a/kernel/softirq.c b/kernel/softirq.c
index b4d252f..de15813 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -348,7 +348,7 @@ void irq_exit(void)
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/* Make sure that timer wheel updates are propagated */
if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
tick_nohz_irq_exit();
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 726c33e..c88fc43 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -64,16 +64,21 @@ config GENERIC_CMOS_UPDATE
if GENERIC_CLOCKEVENTS
menu "Timers subsystem"

-# Core internal switch. Selected by NO_HZ / HIGH_RES_TIMERS. This is
+# Core internal switch. Selected by NO_HZ_COMMON / HIGH_RES_TIMERS. This is
# only related to the tick functionality. Oneshot clockevent devices
# are supported independ of this.
config TICK_ONESHOT
bool

+config NO_HZ_COMMON
+ bool
+ depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
+ select TICK_ONESHOT
+
config NO_HZ
bool "Tickless System (Dynamic Ticks)"
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
- select TICK_ONESHOT
+ select NO_HZ_COMMON
help
This option enables a tickless system: timer interrupts will
only trigger on an as-needed basis both when the system is
@@ -81,14 +86,14 @@ config NO_HZ

config NO_HZ_EXTENDED
bool "Full dynticks system"
- # NO_HZ dependency
+ # NO_HZ_COMMON dependency
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
# RCU_USER_QS
depends on HAVE_CONTEXT_TRACKING && SMP
# RCU_NOCB_CPU dependency
depends on TREE_RCU || TREE_PREEMPT_RCU
depends on VIRT_CPU_ACCOUNTING_GEN
- select NO_HZ
+ select NO_HZ_COMMON
select RCU_USER_QS
select RCU_NOCB_CPU
select CONTEXT_TRACKING_FORCE
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 57bb3fe..ccfc208 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -104,7 +104,7 @@ static void tick_sched_do_timer(ktime_t now)
{
int cpu = smp_processor_id();

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/*
* Check if the do_timer duty was dropped. We don't care about
* concurrency: This happens only when the cpu in charge went
@@ -124,7 +124,7 @@ static void tick_sched_do_timer(ktime_t now)

static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
{
-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/*
* When we are idle and the tick is stopped, we have to touch
* the watchdog as we might not schedule for a really long
@@ -235,7 +235,7 @@ core_initcall(init_tick_nohz_extended);
/*
* NOHZ - aka dynamic tick functionality
*/
-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/*
* NO HZ enabled ?
*/
@@ -907,7 +907,7 @@ static inline void tick_check_nohz(int cpu)
static inline void tick_nohz_switch_to_nohz(void) { }
static inline void tick_check_nohz(int cpu) { }

-#endif /* NO_HZ */
+#endif /* CONFIG_NO_HZ_COMMON */

/*
* Called from irq_enter to notify about the possible interruption of idle()
@@ -992,14 +992,14 @@ void tick_setup_sched_timer(void)
now = ktime_get();
}

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
if (tick_nohz_enabled)
ts->nohz_mode = NOHZ_MODE_HIGHRES;
#endif
}
#endif /* HIGH_RES_TIMERS */

-#if defined CONFIG_NO_HZ || defined CONFIG_HIGH_RES_TIMERS
+#if defined CONFIG_NO_HZ_COMMON || defined CONFIG_HIGH_RES_TIMERS
void tick_cancel_sched_timer(int cpu)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
diff --git a/kernel/timer.c b/kernel/timer.c
index 4e3040b..1b7489f 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -738,7 +738,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,

cpu = smp_processor_id();

-#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+#if defined(CONFIG_NO_HZ_COMMON) && defined(CONFIG_SMP)
if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu))
cpu = get_nohz_timer_target();
#endif
@@ -1188,7 +1188,7 @@ static inline void __run_timers(struct tvec_base *base)
spin_unlock_irq(&base->lock);
}

-#ifdef CONFIG_NO_HZ
+#ifdef CONFIG_NO_HZ_COMMON
/*
* Find out when the next timer event is due to happen. This
* is used on S/390 to stop all activity when a CPU is idle.
--
1.7.5.4

2013-04-03 12:46:52

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 4/4] nohz: Print final full dynticks CPUs range on boot

Given that we apply a few restrictions on the full dynticks
CPUs range (keep an online timekeeper oustide the range,
then in the future have the range be an RCU nocb CPUs subset),
let's print the final resulting range of full dynticks CPUs to
the user so that he knows what's really going to run.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/time/tick-sched.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index ccfc208..e057d33 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -186,6 +186,13 @@ static int __cpuinit tick_nohz_cpu_down_callback(struct notifier_block *nfb,
return NOTIFY_OK;
}

+/*
+ * Worst case string length in chunks of CPU range seems 2 steps
+ * separations: 0,2,4,6,...
+ * This is NR_CPUS + sizeof('\0')
+ */
+static char __initdata nohz_ext_buf[NR_CPUS + 1];
+
static int __init init_tick_nohz_extended(void)
{
cpumask_var_t online_nohz;
@@ -225,6 +232,9 @@ static int __init init_tick_nohz_extended(void)
put_online_cpus();
free_cpumask_var(online_nohz);

+ cpulist_scnprintf(nohz_ext_buf, sizeof(nohz_ext_buf), nohz_extended_mask);
+ pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_ext_buf);
+
return 0;
}
core_initcall(init_tick_nohz_extended);
--
1.7.5.4

2013-04-03 12:47:18

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/4] nohz: Pack nohz Kconfig option in a menu of choices

Now the user has the choice between three implementations of
the timer tick:

* Static periodic tick
* Idle dynticks
* Full dynticks

At least for now, these are mutually exclusive choices, so
let's rely on the proper Kconfig feature to display these
to the user.

A new entry CONFIG_NO_HZ_IDLE is created and the old
CONFIG_NO_HZ maps to it for config file backward compatibility.
The old name was too general now that we have more
granular dynticks implementations.

While at it, add some explanation to help the user on
his decision between the 3 entries.

Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/time/Kconfig | 28 +++++++++++++++++++++++-----
1 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index c88fc43..27cc404 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -75,17 +75,33 @@ config NO_HZ_COMMON
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
select TICK_ONESHOT

+# Kept around for compatibility, maps to NO_HZ_IDLE
config NO_HZ
- bool "Tickless System (Dynamic Ticks)"
+ bool
+
+choice
+ prompt "Timer tick handling"
+ default NO_HZ_IDLE if NO_HZ
+
+config PERIODIC_HZ
+ bool "Periodic timer ticks (constant rate, no dynticks)"
+ help
+ This option keeps the tick running periodically at a constant
+ rate, even when the CPU doesn't need it.
+
+config NO_HZ_IDLE
+ bool "Idle dynticks system (tickless idle)"
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
select NO_HZ_COMMON
help
- This option enables a tickless system: timer interrupts will
- only trigger on an as-needed basis both when the system is
- busy and when the system is idle.
+ This option enables a tickless idle system: timer interrupts
+ will only trigger on an as-needed basis when the system is idle.
+ This is usually interesting for energy saving.
+
+ Most of the time you want to say Y here.

config NO_HZ_EXTENDED
- bool "Full dynticks system"
+ bool "Full dynticks system (tickless single task)"
# NO_HZ_COMMON dependency
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
# RCU_USER_QS
@@ -112,6 +128,8 @@ config NO_HZ_EXTENDED

Say N.

+endchoice
+
config HIGH_RES_TIMERS
bool "High Resolution Timer Support"
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
--
1.7.5.4

2013-04-04 18:21:46

by Christopher Lameter

[permalink] [raw]
Subject: Re: [RFC GIT PULL] nohz: Kconfig layout improvements

It seems that nohz still has no effect.

3.9-rc5 + patches. Affinity of init set to 0,1 so no
tasks are running on 9. The "latencytest" used here is part of my
lldiag-0.15 toolkit.

First test without any special kernel parameters. nohz off right?

$ nice -5 taskset -c 9 latencytest

CPUs: Freq=2.90Ghz Processors=32 Cores=8 cacheline_size=64 Intel(R)
Xeon(R) CPU E5-2690 0 @ 2.90GHz
16775106 samples below 1000 nsec
13 involuntary context switches
1019 (0.00607411%) variances in 10.00 seconds: minimum 1.07us maximum 12.32us average 3.30us stddev 0.63us

HZ=100 so the 1019 variances are likely timer interrupts.




After nohz setup

/proc/cmdline:

BOOT_IMAGE=/vmlinuz-3.9.0-rc5+ root=/dev/mapper/vg01-root ro console=tty0 console=ttyS0,115200 idle=mwait rcu_nocb_poll rcu_nocbs=2-31 nohz_extended=2-31

$ nice -5 taskset -c 9 latencytest
CPUs: Freq=2.90Ghz Processors=32 Cores=8 cacheline_size=64 Intel(R)
Xeon(R) CPU E5-2690 0 @ 2.90GHz
16779362 samples below 1000 nsec
13 involuntary context switches
1037 (0.00617983%) variances in 10.00 seconds: minimum 1.00us maximum 10.61us average 3.30us stddev 0.98us



If I move the RCU threads off the cpu then I get a slightly better result:

$ nice -5 taskset -c 9 latencytest
CPUs: Freq=2.90Ghz Processors=32 Cores=8 cacheline_size=64 Intel(R)
Xeon(R) CPU E5-2690 0 @ 2.90GHz
16796039 samples below 1000 nsec
12 involuntary context switches
1020 (0.00607249%) variances in 10.00 seconds: minimum 1.00us maximum 11.58us average 2.77us stddev 0.55us



Why is the tick not stopping? How do I diagnose this? (I can start
patching the kernel again like last time but isnt there a better way?)

2013-04-04 19:53:48

by Gilad Ben-Yossef

[permalink] [raw]
Subject: Re: [RFC GIT PULL] nohz: Kconfig layout improvements

Hi,

[ Sorry for dropping LKML on my previous email. It was caused by
replying from my smartphone. Adding everyone back now ]

On Thu, Apr 4, 2013 at 9:23 PM, Christoph Lameter <[email protected]> wrote:
>
> On Thu, 4 Apr 2013, Gilad Ben-Yossef wrote:
>
> > The vmstat work item leaving a timer running on the CPU, perhaps?
>
> Of course there is always a timer running for VM work.
>
> > Check /proc/timers
>
> Not compiled in.
>
> > I've sent a patch previously to make vmstat disable the timer if no VM work
> > has been detected since the last run. Maybe you can try it? It was against
> > 3.4 though. Probably won't apply...
>
> Can you repost it? CC me on future postings. I wrote the vmstat handling.
>

Here is the last version I posted over a year ago. You were CCed and
provided very useful feedback:

http://lkml.indiana.edu/hypermail/linux/kernel/1205.0/01291.html

Based on your feedback I re-spun them but never gotten around to port
them to recent kernel and post them. The latest (yet unpublished
versions) are here:

https://github.com/gby/linux/commit/04e041327036772383c14ebdc450f522d782f264
https://github.com/gby/linux/commit/f5b8ae815670a289af9ff22ad83da3295472e63c

I'll try to resurrect them, but maybe they'll prove useful as a
reference as they are in the meantime.

Hope it helps,
Gilad


--
Gilad Ben-Yossef
Chief Coffee Drinker
[email protected]
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
-- Jean-Baptiste Queru

2013-04-04 20:09:40

by Christopher Lameter

[permalink] [raw]
Subject: Re: [RFC GIT PULL] nohz: Kconfig layout improvements

On Thu, 4 Apr 2013, Gilad Ben-Yossef wrote:

> Here is the last version I posted over a year ago. You were CCed and
> provided very useful feedback:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1205.0/01291.html

Ah. yes I remember now.

> Based on your feedback I re-spun them but never gotten around to port
> them to recent kernel and post them. The latest (yet unpublished
> versions) are here:
>
> https://github.com/gby/linux/commit/04e041327036772383c14ebdc450f522d782f264
> https://github.com/gby/linux/commit/f5b8ae815670a289af9ff22ad83da3295472e63c
>
> I'll try to resurrect them, but maybe they'll prove useful as a
> reference as they are in the meantime.

I think that would be very useful.

2013-04-08 11:19:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC GIT PULL] nohz: Kconfig layout improvements


* Frederic Weisbecker <[email protected]> wrote:

> Ingo,
>
> This set addresses your review concerning the Kconfig layout.
> Please note two things here that derive from what we agreed
> on due to technical limitations:
>
> * Now the full dynticks Kconfig is not hidden anymore behind its
> high level dependencies. (ie: passive dependencies are now active).
> There is an exception though with CONFIG_VIRT_CPU_ACCOUNTING_GEN
> (Full dynticks cputime accounting) that is part of a choice menu
> like PREEMPT_*. It seems such kconfig layout prevent from doing a remote
> select on its choices. So it stays a passive dependency for now, until
> Kconfig/Kbuild supports that (Cc'ing Michel Marek) or somebody shows
> me what I did wrong ;)
>
> * Ideally we want to reuse CONFIG_NO_HZ as a Kconfig that consolidate
> the common code between CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED.
> But we also want CONFIG_NO_HZ from old config files to map to CONFIG_NO_HZ_IDLE.
> Both at the same time is not possible or we have a Kconfig circular
> dependency. So I introduced a new CONFIG_NO_HZ_COMMON for common nohz code
> and CONFIG_NO_HZ stays for backward compatibility by enabling CONFIG_NO_HZ_IDLE
> by default.
>
> If you're ok, please pull from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> timers/nohz-v2
>
> Thanks.
>
> ---
> Frederic Weisbecker (4):
> nohz: Unhide full dynticks feature from its dependencies
> nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON
> nohz: Pack nohz Kconfig option in a menu of choices
> nohz: Print final full dynticks CPUs range on boot
>
> Documentation/RCU/stallwarn.txt | 2 +-
> Documentation/cpu-freq/governors.txt | 4 +-
> arch/um/include/shared/common-offsets.h | 4 +-
> arch/um/os-Linux/time.c | 2 +-
> include/linux/sched.h | 8 ++--
> include/linux/tick.h | 8 ++--
> init/Kconfig | 2 +-
> kernel/hrtimer.c | 4 +-
> kernel/sched/core.c | 18 +++++-----
> kernel/sched/fair.c | 10 +++---
> kernel/sched/sched.h | 4 +-
> kernel/softirq.c | 2 +-
> kernel/time/Kconfig | 54 ++++++++++++++++++++++++------
> kernel/time/tick-sched.c | 22 +++++++++---
> kernel/timer.c | 4 +-
> 15 files changed, 95 insertions(+), 53 deletions(-)

I pulled it into tip:timers/nohz and will push it out if it passes testing because
I like the improvements - but there's still a few things missing I think.

Firstly, I performed this "how are users exposed to this new feature" test:

git checkout v3.9-rc6
make defconfig
git checkout tip/master
make oldconfig

the x86 (64-bit) defconfig has NO_HZ enabled. When I did the 'make oldconfig', I
was given:

*
* Timers subsystem
*
Timer tick handling
> 1. Periodic timer ticks (constant rate, no dynticks) (PERIODIC_HZ) (NEW)
2. Idle dynticks system (tickless idle) (NO_HZ_IDLE) (NEW)
choice[1-2]:

[ Firstly, while at it: that should be 'Timer subsystem' or 'Timers'. ]

More importantly, the new Kconfig behavior is still not quite right for two
reasons:

1)

the default is not set to NO_HZ_IDLE. The oldconfig .config had
CONFIG_NO_HZ set - this should be grandfathered over into the new config's
default choice. That can probably be done by still keeping CONFIG_NO_HZ as
a migration helper entry.

2)

there's still no extended idle tick option offered - due to it not meeting
the CONFIG_VIRT_CPU_ACCOUNTING_GEN dependency.

Even if I read the Kconfig rules and figure out the dependency, I have to
perform _two_ steps to get extended ticks:

I had to manually find CONFIG_VIRT_CPU_ACCOUNTING_GEN in the .config and
delete it, so that I'm given the choice on the next 'make oldconfig'.

Then NO_HZ_EXTENDED was set to disabled in the .config silently, because
NO_HZ_IDLE was already set. So I had to delete that and re-configure it
again.

Highly inconvenient.

-----------------------

Once the dependencies are right I like it how then we get offered the 3 variants:

*
* Timers subsystem
*
Timer tick handling
> 1. Periodic timer ticks (constant rate, no dynticks) (PERIODIC_HZ) (NEW)
2. Idle dynticks system (tickless idle) (NO_HZ_IDLE) (NEW)
3. Full dynticks system (tickless single task) (NO_HZ_EXTENDED) (NEW)

and I think that is how it should always look like, for standard .config's, pretty
much regardless of how other things are configured - as long as the architecture
supports extended dynticks.

So I'd suggest the following changes to fix the remaining usability problems:

- .config compatibility fix: the default selection should pick up existing
CONFIG_NO_HZ settings, for a kernel release cycle, so that people whole just go
through 'make oldconfig' and rely on the defaults don't lose CONFIG_NO_HZ_IDLE.

This can probably be done by adding a CONFIG_NO_HZ Kconfig entry, and
documenting it as a migration helper. This can then be removed in v3.11. The
multiple choices entry can then use CONFIG_NO_HZ to set its default offering to
CONFIG_NO_HZ_IDLE or CONFIG_PERIODIC_HZ?

- CONFIG_VIRT_CPU_ACCOUNTING_GEN should be selected as well. (Maybe even the RCU
model.)

- Nit: the 'depends on SMP' part looks a bit weird. Is that a quirk?

- Plus a minor help text nit: I'd not call it 'tickless single task' - but
'tickless'. When there are multiple tasks on a CPU then it's natural that
there's a scheduler tick arbitrating between them - this can be mentioned in
the help text, but otherwise should not distract from the 'full dynticks'
notion.

It's not even always about a single task: when there's multiple SCHED_FIFO
tasks running, then the scheduler tick is expected to be off, even if there are
2 or more SCHED_FIFO tasks on that runqueue, right?

- Could we rename NO_HZ_EXTENDED to NO_HZ_FULL? :-) I think it's important to
stress that in this mode the kernel does whatever it can to keep the tick off:


CONFIG_HZ_PERIODIC # (no dynticks, periodic ticks)
CONFIG_NO_HZ_IDLE # (partial dynticks, tick is off in idle only)
CONFIG_NO_HZ_FULL # (full dynticks, tick is off whenever possible)

while 'extended' is too vague, it really does not tell us anything about how
it's meant to be 'extended'.

( And I'd also suggest renaming CONFIG_PERIODIC_HZ -> CONFIG_HZ_PERIODIC, to be
consistent with the NO_HZ_IDLE, NO_HZ_FULL Polish notation naming pattern. )

I'm so nitpicky because the Kconfig logic of major kernel features has the
potential to cause quite a bit of user and tester confusion, so we have to try to
do our utmost best to structure it in the most logical and approachable fashion.

Thanks,

Ingo

2013-04-10 13:47:36

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC GIT PULL] nohz: Kconfig layout improvements

2013/4/4 Christoph Lameter <[email protected]>:
> It seems that nohz still has no effect.
>
> 3.9-rc5 + patches. Affinity of init set to 0,1 so no
> tasks are running on 9. The "latencytest" used here is part of my
> lldiag-0.15 toolkit.
>
> First test without any special kernel parameters. nohz off right?
>
> $ nice -5 taskset -c 9 latencytest
>
> CPUs: Freq=2.90Ghz Processors=32 Cores=8 cacheline_size=64 Intel(R)
> Xeon(R) CPU E5-2690 0 @ 2.90GHz
> 16775106 samples below 1000 nsec
> 13 involuntary context switches
> 1019 (0.00607411%) variances in 10.00 seconds: minimum 1.07us maximum 12.32us average 3.30us stddev 0.63us
>
> HZ=100 so the 1019 variances are likely timer interrupts.
>
>
>
>
> After nohz setup
>
> /proc/cmdline:
>
> BOOT_IMAGE=/vmlinuz-3.9.0-rc5+ root=/dev/mapper/vg01-root ro console=tty0 console=ttyS0,115200 idle=mwait rcu_nocb_poll rcu_nocbs=2-31 nohz_extended=2-31
>
> $ nice -5 taskset -c 9 latencytest
> CPUs: Freq=2.90Ghz Processors=32 Cores=8 cacheline_size=64 Intel(R)
> Xeon(R) CPU E5-2690 0 @ 2.90GHz
> 16779362 samples below 1000 nsec
> 13 involuntary context switches
> 1037 (0.00617983%) variances in 10.00 seconds: minimum 1.00us maximum 10.61us average 3.30us stddev 0.98us
>
>
>
> If I move the RCU threads off the cpu then I get a slightly better result:
>
> $ nice -5 taskset -c 9 latencytest
> CPUs: Freq=2.90Ghz Processors=32 Cores=8 cacheline_size=64 Intel(R)
> Xeon(R) CPU E5-2690 0 @ 2.90GHz
> 16796039 samples below 1000 nsec
> 12 involuntary context switches
> 1020 (0.00607249%) variances in 10.00 seconds: minimum 1.00us maximum 11.58us average 2.77us stddev 0.55us
>
>
>
> Why is the tick not stopping? How do I diagnose this? (I can start
> patching the kernel again like last time but isnt there a better way?)

I don't know which tree you are using. But if you have that patch in:
http://git.kernel.org/cgit/linux/kernel/git/frederic/linux-dynticks.git/commit/?h=3.9-rc1-nohz1&id=451128553e5e827dccc6cbcd24238470ec693d90
looking at the traces on that CPU may give you a few hints. Then you
can dig deeper by looking at the sched_switch, timers, irq, ... events

2013-04-10 14:08:12

by Christopher Lameter

[permalink] [raw]
Subject: Re: [RFC GIT PULL] nohz: Kconfig layout improvements

On Wed, 10 Apr 2013, Frederic Weisbecker wrote:

> I don't know which tree you are using. But if you have that patch in:

This is tip timers/nohz

> http://git.kernel.org/cgit/linux/kernel/git/frederic/linux-dynticks.git/commit/?h=3.9-rc1-nohz1&id=451128553e5e827dccc6cbcd24238470ec693d90
> looking at the traces on that CPU may give you a few hints. Then you can
> dig deeper by looking at the sched_switch, timers, irq, ... events

The last time when I checked this before testing this rev it turned out
that nr_running was always > 1. Traces did only show that the timer
interrupt was still on. I added printks to figure out what was up at that
time which was a bit awkward since the kernel is a bit sensitive about
doing printks in these functions.

Sadly, I am suffering from manageritis these days and probably wont have
the time to get into this again for the next two weeks.


2013-04-10 16:01:07

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC GIT PULL] nohz: Kconfig layout improvements

2013/4/8 Ingo Molnar <[email protected]>:
> I pulled it into tip:timers/nohz and will push it out if it passes testing because
> I like the improvements - but there's still a few things missing I think.
>
> Firstly, I performed this "how are users exposed to this new feature" test:
>
> git checkout v3.9-rc6
> make defconfig
> git checkout tip/master
> make oldconfig
>
> the x86 (64-bit) defconfig has NO_HZ enabled. When I did the 'make oldconfig', I
> was given:
>
> *
> * Timers subsystem
> *
> Timer tick handling
> > 1. Periodic timer ticks (constant rate, no dynticks) (PERIODIC_HZ) (NEW)
> 2. Idle dynticks system (tickless idle) (NO_HZ_IDLE) (NEW)
> choice[1-2]:
>
> [ Firstly, while at it: that should be 'Timer subsystem' or 'Timers'. ]

Isn't "Timers" too general? I really don't mind changing that though.

>
> More importantly, the new Kconfig behavior is still not quite right for two
> reasons:
>
> 1)
>
> the default is not set to NO_HZ_IDLE. The oldconfig .config had
> CONFIG_NO_HZ set - this should be grandfathered over into the new config's
> default choice. That can probably be done by still keeping CONFIG_NO_HZ as
> a migration helper entry.

Ah I did keep it for backward compatibility. We default to
CONFIG_NO_HZ_IDLE if CONFIG_NO_HZ is set. This is just not working
because CONFIG_NO_HZ isn't visible. It's an arbirtrary Kconfig
limitation. I'll just make it visible by adding it a title text and
this will work.

> 2)
>
> there's still no extended idle tick option offered - due to it not meeting
> the CONFIG_VIRT_CPU_ACCOUNTING_GEN dependency.
>
> Even if I read the Kconfig rules and figure out the dependency, I have to
> perform _two_ steps to get extended ticks:
>
> I had to manually find CONFIG_VIRT_CPU_ACCOUNTING_GEN in the .config and
> delete it, so that I'm given the choice on the next 'make oldconfig'.
>
> Then NO_HZ_EXTENDED was set to disabled in the .config silently, because
> NO_HZ_IDLE was already set. So I had to delete that and re-configure it
> again.

Agreed. I mentioned that in the pull request. It's again due to an
arbitrary Kconfig limitation. The following:

config X
select Y

doesn't work if Y is part of a Kconfig "choice".
I have a patch that fixes in Kconfig, will submit it to Michal and fix
the nohz passive dependency once we get that sorted.


>
> Highly inconvenient.
>
> -----------------------
>
> Once the dependencies are right I like it how then we get offered the 3 variants:
>
> *
> * Timers subsystem
> *
> Timer tick handling
> > 1. Periodic timer ticks (constant rate, no dynticks) (PERIODIC_HZ) (NEW)
> 2. Idle dynticks system (tickless idle) (NO_HZ_IDLE) (NEW)
> 3. Full dynticks system (tickless single task) (NO_HZ_EXTENDED) (NEW)
>
> and I think that is how it should always look like, for standard .config's, pretty
> much regardless of how other things are configured - as long as the architecture
> supports extended dynticks.
>
> So I'd suggest the following changes to fix the remaining usability problems:
>
> - .config compatibility fix: the default selection should pick up existing
> CONFIG_NO_HZ settings, for a kernel release cycle, so that people whole just go
> through 'make oldconfig' and rely on the defaults don't lose CONFIG_NO_HZ_IDLE.
>
> This can probably be done by adding a CONFIG_NO_HZ Kconfig entry, and
> documenting it as a migration helper. This can then be removed in v3.11. The
> multiple choices entry can then use CONFIG_NO_HZ to set its default offering to
> CONFIG_NO_HZ_IDLE or CONFIG_PERIODIC_HZ?

Ok I just need to make that Kconfig symbol visible. I guess we can
indeed remove it in the future.

>
> - CONFIG_VIRT_CPU_ACCOUNTING_GEN should be selected as well. (Maybe even the RCU
> model.)

So, will deal with that in the Kconfig side :)

>
> - Nit: the 'depends on SMP' part looks a bit weird. Is that a quirk?

It's a dependency inherited from CONFIG_RCU_USER_QS. But it's also
necessary because we need a timekeeping CPU. I'll add a comment on
that.

>
> - Plus a minor help text nit: I'd not call it 'tickless single task' - but
> 'tickless'. When there are multiple tasks on a CPU then it's natural that
> there's a scheduler tick arbitrating between them - this can be mentioned in
> the help text, but otherwise should not distract from the 'full dynticks'
> notion.
>
> It's not even always about a single task: when there's multiple SCHED_FIFO
> tasks running, then the scheduler tick is expected to be off, even if there are
> 2 or more SCHED_FIFO tasks on that runqueue, right?

Yeah agreed.

>
> - Could we rename NO_HZ_EXTENDED to NO_HZ_FULL? :-) I think it's important to
> stress that in this mode the kernel does whatever it can to keep the tick off:
>
>
> CONFIG_HZ_PERIODIC # (no dynticks, periodic ticks)
> CONFIG_NO_HZ_IDLE # (partial dynticks, tick is off in idle only)
> CONFIG_NO_HZ_FULL # (full dynticks, tick is off whenever possible)
>
> while 'extended' is too vague, it really does not tell us anything about how
> it's meant to be 'extended'.

Sounds good :) I too prefer the "_FULL" based naming.

>
> ( And I'd also suggest renaming CONFIG_PERIODIC_HZ -> CONFIG_HZ_PERIODIC, to be
> consistent with the NO_HZ_IDLE, NO_HZ_FULL Polish notation naming pattern. )

Ok.

>
> I'm so nitpicky because the Kconfig logic of major kernel features has the
> potential to cause quite a bit of user and tester confusion, so we have to try to
> do our utmost best to structure it in the most logical and approachable fashion.

Agreed, I'm looking at these issues, thanks!

2013-04-10 17:24:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC GIT PULL] nohz: Kconfig layout improvements


* Frederic Weisbecker <[email protected]> wrote:

> 2013/4/8 Ingo Molnar <[email protected]>:
> > I pulled it into tip:timers/nohz and will push it out if it passes testing because
> > I like the improvements - but there's still a few things missing I think.
> >
> > Firstly, I performed this "how are users exposed to this new feature" test:
> >
> > git checkout v3.9-rc6
> > make defconfig
> > git checkout tip/master
> > make oldconfig
> >
> > the x86 (64-bit) defconfig has NO_HZ enabled. When I did the 'make oldconfig', I
> > was given:
> >
> > *
> > * Timers subsystem
> > *
> > Timer tick handling
> > > 1. Periodic timer ticks (constant rate, no dynticks) (PERIODIC_HZ) (NEW)
> > 2. Idle dynticks system (tickless idle) (NO_HZ_IDLE) (NEW)
> > choice[1-2]:
> >
> > [ Firstly, while at it: that should be 'Timer subsystem' or 'Timers'. ]
>
> Isn't "Timers" too general? I really don't mind changing that though.

I'd suggest consulting Thomas.

> > More importantly, the new Kconfig behavior is still not quite right for two
> > reasons:
> >
> > 1)
> >
> > the default is not set to NO_HZ_IDLE. The oldconfig .config had
> > CONFIG_NO_HZ set - this should be grandfathered over into the new config's
> > default choice. That can probably be done by still keeping CONFIG_NO_HZ as
> > a migration helper entry.
>
> Ah I did keep it for backward compatibility. We default to
> CONFIG_NO_HZ_IDLE if CONFIG_NO_HZ is set. This is just not working
> because CONFIG_NO_HZ isn't visible. It's an arbirtrary Kconfig
> limitation. I'll just make it visible by adding it a title text and
> this will work.

Okay, cool!

> > 2)
> >
> > there's still no extended idle tick option offered - due to it not meeting
> > the CONFIG_VIRT_CPU_ACCOUNTING_GEN dependency.
> >
> > Even if I read the Kconfig rules and figure out the dependency, I have to
> > perform _two_ steps to get extended ticks:
> >
> > I had to manually find CONFIG_VIRT_CPU_ACCOUNTING_GEN in the .config and
> > delete it, so that I'm given the choice on the next 'make oldconfig'.
> >
> > Then NO_HZ_EXTENDED was set to disabled in the .config silently, because
> > NO_HZ_IDLE was already set. So I had to delete that and re-configure it
> > again.
>
> Agreed. I mentioned that in the pull request. It's again due to an
> arbitrary Kconfig limitation. The following:
>
> config X
> select Y
>
> doesn't work if Y is part of a Kconfig "choice".
> I have a patch that fixes in Kconfig, will submit it to Michal and fix
> the nohz passive dependency once we get that sorted.

Wow, you are fixing kconfig.

99% of the people work it around in some fashion.

Kudos!

Thanks,

Ingo

2013-04-12 14:23:37

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC GIT PULL] nohz: Kconfig layout improvements

2013/4/10 Ingo Molnar <[email protected]>:
>> Agreed. I mentioned that in the pull request. It's again due to an
>> arbitrary Kconfig limitation. The following:
>>
>> config X
>> select Y
>>
>> doesn't work if Y is part of a Kconfig "choice".
>> I have a patch that fixes in Kconfig, will submit it to Michal and fix
>> the nohz passive dependency once we get that sorted.
>
> Wow, you are fixing kconfig.
>
> 99% of the people work it around in some fashion.
>
> Kudos!

Wait, I'm actually _trying_ to! I found a few bugs in my patch
already. Looks like it's not going to be a piece of cake.