2015-08-14 01:46:22

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

From: Vatika Harlalka <[email protected]>

The problem addressed in this patch is about affining unpinned timers.
Adaptive or Full Dynticks CPUs are currently disturbed by unnecessary
jitter due to firing of such timers on them.

This patch will affine timers to online CPUs which are not full dynticks
in NOHZ_FULL configured systems. It should not introduce overhead in
nohz full off case due to static keys.

Reviewed-by: Preeti U Murthy <[email protected]>
Signed-off by: Vatika Harlalka <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Chris Metcalf <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/tick.h | 9 ++++++++-
kernel/sched/core.c | 7 +++++--
2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 3741ba1..51e6493 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -143,13 +143,20 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
if (tick_nohz_full_enabled())
cpumask_or(mask, mask, tick_nohz_full_mask);
}
-
+static inline int housekeeping_any_cpu(void)
+{
+ return cpumask_any_and(housekeeping_mask, cpu_online_mask);
+}
extern void __tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_cpu(int cpu);
extern void tick_nohz_full_kick_all(void);
extern void __tick_nohz_task_switch(struct task_struct *tsk);
#else
+static inline int housekeeping_any_cpu(void)
+{
+ return smp_processor_id();
+}
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9917c96..4fd42e4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -623,18 +623,21 @@ int get_nohz_timer_target(void)
int i, cpu = smp_processor_id();
struct sched_domain *sd;

- if (!idle_cpu(cpu))
+ if (!idle_cpu(cpu) && is_housekeeping_cpu(cpu))
return cpu;

rcu_read_lock();
for_each_domain(cpu, sd) {
for_each_cpu(i, sched_domain_span(sd)) {
- if (!idle_cpu(i)) {
+ if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) {
cpu = i;
goto unlock;
}
}
}
+
+ if (!is_housekeeping_cpu(cpu))
+ cpu = housekeeping_any_cpu();
unlock:
rcu_read_unlock();
return cpu;
--
2.1.4


2015-08-22 21:09:42

by Frederic Weisbecker

[permalink] [raw]
Subject: Ping: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers


Ping!

On Fri, Aug 14, 2015 at 03:46:14AM +0200, Frederic Weisbecker wrote:
> From: Vatika Harlalka <[email protected]>
>
> The problem addressed in this patch is about affining unpinned timers.
> Adaptive or Full Dynticks CPUs are currently disturbed by unnecessary
> jitter due to firing of such timers on them.
>
> This patch will affine timers to online CPUs which are not full dynticks
> in NOHZ_FULL configured systems. It should not introduce overhead in
> nohz full off case due to static keys.
>
> Reviewed-by: Preeti U Murthy <[email protected]>
> Signed-off by: Vatika Harlalka <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Chris Metcalf <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> include/linux/tick.h | 9 ++++++++-
> kernel/sched/core.c | 7 +++++--
> 2 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 3741ba1..51e6493 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -143,13 +143,20 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
> if (tick_nohz_full_enabled())
> cpumask_or(mask, mask, tick_nohz_full_mask);
> }
> -
> +static inline int housekeeping_any_cpu(void)
> +{
> + return cpumask_any_and(housekeeping_mask, cpu_online_mask);
> +}
> extern void __tick_nohz_full_check(void);
> extern void tick_nohz_full_kick(void);
> extern void tick_nohz_full_kick_cpu(int cpu);
> extern void tick_nohz_full_kick_all(void);
> extern void __tick_nohz_task_switch(struct task_struct *tsk);
> #else
> +static inline int housekeeping_any_cpu(void)
> +{
> + return smp_processor_id();
> +}
> static inline bool tick_nohz_full_enabled(void) { return false; }
> static inline bool tick_nohz_full_cpu(int cpu) { return false; }
> static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9917c96..4fd42e4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -623,18 +623,21 @@ int get_nohz_timer_target(void)
> int i, cpu = smp_processor_id();
> struct sched_domain *sd;
>
> - if (!idle_cpu(cpu))
> + if (!idle_cpu(cpu) && is_housekeeping_cpu(cpu))
> return cpu;
>
> rcu_read_lock();
> for_each_domain(cpu, sd) {
> for_each_cpu(i, sched_domain_span(sd)) {
> - if (!idle_cpu(i)) {
> + if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) {
> cpu = i;
> goto unlock;
> }
> }
> }
> +
> + if (!is_housekeeping_cpu(cpu))
> + cpu = housekeeping_any_cpu();
> unlock:
> rcu_read_unlock();
> return cpu;
> --
> 2.1.4
>

Subject: Re: Ping: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Sat, 22 Aug 2015, Frederic Weisbecker wrote:

> Ping!

Exellent!!!

2015-08-23 05:40:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers


* Frederic Weisbecker <[email protected]> wrote:

> From: Vatika Harlalka <[email protected]>
>
> The problem addressed in this patch is about affining unpinned timers.
> Adaptive or Full Dynticks CPUs are currently disturbed by unnecessary
> jitter due to firing of such timers on them.
>
> This patch will affine timers to online CPUs which are not full dynticks
> in NOHZ_FULL configured systems. It should not introduce overhead in
> nohz full off case due to static keys.
>
> Reviewed-by: Preeti U Murthy <[email protected]>
> Signed-off by: Vatika Harlalka <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Chris Metcalf <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> include/linux/tick.h | 9 ++++++++-
> kernel/sched/core.c | 7 +++++--
> 2 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 3741ba1..51e6493 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -143,13 +143,20 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
> if (tick_nohz_full_enabled())
> cpumask_or(mask, mask, tick_nohz_full_mask);
> }
> -
> +static inline int housekeeping_any_cpu(void)
> +{
> + return cpumask_any_and(housekeeping_mask, cpu_online_mask);
> +}
> extern void __tick_nohz_full_check(void);
> extern void tick_nohz_full_kick(void);
> extern void tick_nohz_full_kick_cpu(int cpu);
> extern void tick_nohz_full_kick_all(void);
> extern void __tick_nohz_task_switch(struct task_struct *tsk);
> #else
> +static inline int housekeeping_any_cpu(void)
> +{
> + return smp_processor_id();
> +}
> static inline bool tick_nohz_full_enabled(void) { return false; }
> static inline bool tick_nohz_full_cpu(int cpu) { return false; }
> static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9917c96..4fd42e4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -623,18 +623,21 @@ int get_nohz_timer_target(void)
> int i, cpu = smp_processor_id();
> struct sched_domain *sd;
>
> - if (!idle_cpu(cpu))
> + if (!idle_cpu(cpu) && is_housekeeping_cpu(cpu))
> return cpu;
>
> rcu_read_lock();
> for_each_domain(cpu, sd) {
> for_each_cpu(i, sched_domain_span(sd)) {
> - if (!idle_cpu(i)) {
> + if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) {
> cpu = i;
> goto unlock;
> }
> }
> }
> +
> + if (!is_housekeeping_cpu(cpu))
> + cpu = housekeeping_any_cpu();
> unlock:
> rcu_read_unlock();
> return cpu;

So I almost applied this yesterday, but had the following question: what ensures
that housekeeping_mask isn't empty? If it's empty then housekeeping_any_cpu()
returns cpumask_any_and() of an empty cpumask - which returns an out of range
index AFAICS - which will crash and burn in:

kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target());
kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target());

housekeeping_mask itself is derived from tick_nohz_full_mask (it's the inverse of
it in essence), and tick_nohz_full_mask is set via two methods, either via a boot
parameter:

if (cpulist_parse(str, tick_nohz_full_mask) < 0) {

in tick_nohz_full_setup(). What ensures here that tick_nohz_full_mask is not
completely full - making housekeeping_mask empty?

The other method is via CONFIG_NO_HZ_FULL_ALL:

cpumask_setall(tick_nohz_full_mask);

here it's fully set - triggering the bug I'm worried about. So what am I missing,
what prevents CONFIG_NO_HZ_FULL_ALL from crashing?

Thanks,

Ingo

2015-08-23 16:01:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Sun, Aug 23, 2015 at 07:40:32AM +0200, Ingo Molnar wrote:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
> > From: Vatika Harlalka <[email protected]>
> >
> > The problem addressed in this patch is about affining unpinned timers.
> > Adaptive or Full Dynticks CPUs are currently disturbed by unnecessary
> > jitter due to firing of such timers on them.
> >
> > This patch will affine timers to online CPUs which are not full dynticks
> > in NOHZ_FULL configured systems. It should not introduce overhead in
> > nohz full off case due to static keys.
> >
> > Reviewed-by: Preeti U Murthy <[email protected]>
> > Signed-off by: Vatika Harlalka <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Christoph Lameter <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Paul E. McKenney <[email protected]>
> > Cc: Chris Metcalf <[email protected]>
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > ---
> > include/linux/tick.h | 9 ++++++++-
> > kernel/sched/core.c | 7 +++++--
> > 2 files changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index 3741ba1..51e6493 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -143,13 +143,20 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
> > if (tick_nohz_full_enabled())
> > cpumask_or(mask, mask, tick_nohz_full_mask);
> > }
> > -
> > +static inline int housekeeping_any_cpu(void)
> > +{
> > + return cpumask_any_and(housekeeping_mask, cpu_online_mask);
> > +}
> > extern void __tick_nohz_full_check(void);
> > extern void tick_nohz_full_kick(void);
> > extern void tick_nohz_full_kick_cpu(int cpu);
> > extern void tick_nohz_full_kick_all(void);
> > extern void __tick_nohz_task_switch(struct task_struct *tsk);
> > #else
> > +static inline int housekeeping_any_cpu(void)
> > +{
> > + return smp_processor_id();
> > +}
> > static inline bool tick_nohz_full_enabled(void) { return false; }
> > static inline bool tick_nohz_full_cpu(int cpu) { return false; }
> > static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 9917c96..4fd42e4 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -623,18 +623,21 @@ int get_nohz_timer_target(void)
> > int i, cpu = smp_processor_id();
> > struct sched_domain *sd;
> >
> > - if (!idle_cpu(cpu))
> > + if (!idle_cpu(cpu) && is_housekeeping_cpu(cpu))
> > return cpu;
> >
> > rcu_read_lock();
> > for_each_domain(cpu, sd) {
> > for_each_cpu(i, sched_domain_span(sd)) {
> > - if (!idle_cpu(i)) {
> > + if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) {
> > cpu = i;
> > goto unlock;
> > }
> > }
> > }
> > +
> > + if (!is_housekeeping_cpu(cpu))
> > + cpu = housekeeping_any_cpu();
> > unlock:
> > rcu_read_unlock();
> > return cpu;
>
> So I almost applied this yesterday, but had the following question: what ensures
> that housekeeping_mask isn't empty? If it's empty then housekeeping_any_cpu()
> returns cpumask_any_and() of an empty cpumask - which returns an out of range
> index AFAICS - which will crash and burn in:
>
> kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target());
> kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target());
>
> housekeeping_mask itself is derived from tick_nohz_full_mask (it's the inverse of
> it in essence), and tick_nohz_full_mask is set via two methods, either via a boot
> parameter:
>
> if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
>
> in tick_nohz_full_setup(). What ensures here that tick_nohz_full_mask is not
> completely full - making housekeeping_mask empty?
>
> The other method is via CONFIG_NO_HZ_FULL_ALL:
>
> cpumask_setall(tick_nohz_full_mask);
>
> here it's fully set - triggering the bug I'm worried about. So what am I missing,
> what prevents CONFIG_NO_HZ_FULL_ALL from crashing?

The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(),
which is called from tick_init() which is called from start_kernel()
shortly after rcu_init():

cpu = smp_processor_id();

if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
cpumask_clear_cpu(cpu, tick_nohz_full_mask);
}

This happens after the call to tick_nohz_init_all() that does the
cpumask_setall() that you called out above.

Or is a recent patch that I missed changing this?

Thanx, Paul

2015-08-24 01:29:00

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Sun, Aug 23, 2015 at 09:01:01AM -0700, Paul E. McKenney wrote:
> The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(),
> which is called from tick_init() which is called from start_kernel()
> shortly after rcu_init():
>
> cpu = smp_processor_id();
>
> if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
> pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
> cpumask_clear_cpu(cpu, tick_nohz_full_mask);
> }
>
> This happens after the call to tick_nohz_init_all() that does the
> cpumask_setall() that you called out above.
>
> Or is a recent patch that I missed changing this?

Exactly, this happens right after tick_nohz_full_mask is filled and
makes sure that at least the boot CPU runs as a housekeeper. We
also make sure that it can't become offline later.

Thanks.

2015-08-24 01:45:37

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Sun, Aug 23, 2015 at 07:40:32AM +0200, Ingo Molnar wrote:
> So I almost applied this yesterday, but had the following question: what ensures
> that housekeeping_mask isn't empty? If it's empty then housekeeping_any_cpu()
> returns cpumask_any_and() of an empty cpumask - which returns an out of range
> index AFAICS - which will crash and burn in:
>
> kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target());
> kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target());
>
> housekeeping_mask itself is derived from tick_nohz_full_mask (it's the inverse of
> it in essence), and tick_nohz_full_mask is set via two methods, either via a boot
> parameter:
>
> if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
>
> in tick_nohz_full_setup(). What ensures here that tick_nohz_full_mask is not
> completely full - making housekeeping_mask empty?
>
> The other method is via CONFIG_NO_HZ_FULL_ALL:
>
> cpumask_setall(tick_nohz_full_mask);
>
> here it's fully set - triggering the bug I'm worried about. So what am I missing,
> what prevents CONFIG_NO_HZ_FULL_ALL from crashing?

Legitimate worry and I should have explained that in the changelog.

Like Paul replied, we make sure that at least the boot CPU is excluded
from tick_nohz_full_mask in tick_nohz_init(). Then housekeeping_mask,
by reverse effect, contains that boot CPU at least.

And we also make sure that the boot CPU can't get offline
(tick_nohz_cpu_down_callback()).

Now we should really document and check that assumption so here is a
second patch below. The sched patch depends on tip:sched/core (to avoid
conflicts with sched changes) and the following one is based on
tip:timer/nohz but should be applicable to sched/core without conflict.
Both are standalone anyway.

Thanks!

---
From: Frederic Weisbecker <[email protected]>
Date: Sun, 23 Aug 2015 19:34:31 +0200
Subject: [PATCH] nohz: Assert existing housekeepers when nohz full enabled

The code ensures that at least the boot CPU serves as a housekeeper.

Let's assert this assumption to make sure that we have CPUs to handle
unbound jobs like workqueues and timers while nohz full CPUs run
undisturbed.

Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/tick-sched.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3319e16..cc9884f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -370,6 +370,12 @@ void __init tick_nohz_init(void)
cpu_notifier(tick_nohz_cpu_down_callback, 0);
pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
cpumask_pr_args(tick_nohz_full_mask));
+
+ /*
+ * We need at least one CPU to handle housekeeping work such
+ * as timekeeping, unbound timers, workqueues, ...
+ */
+ WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
}
#endif

--
2.1.4

2015-08-24 06:44:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers


* Paul E. McKenney <[email protected]> wrote:

> > here it's fully set - triggering the bug I'm worried about. So what am I
> > missing, what prevents CONFIG_NO_HZ_FULL_ALL from crashing?
>
> The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(), which is
> called from tick_init() which is called from start_kernel() shortly after
> rcu_init():
>
> cpu = smp_processor_id();
>
> if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
> pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
> cpumask_clear_cpu(cpu, tick_nohz_full_mask);
> }
>
> This happens after the call to tick_nohz_init_all() that does the
> cpumask_setall() that you called out above.

Ah, indeed - I somehow missed that.

This brings up two other questions:

1)

the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to
it (housekeeping_mask - a variable mask to begin with) and recover the
housekeeping CPU via:

+ return cpumask_any_and(housekeeping_mask, cpu_online_mask);

which can be pretty expensive, and which gets executed in two hotpaths:

kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target());
kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target());

... why not just use a single housekeeping_cpu which would be way faster to pass
down to the timer code?

2)

What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y)

I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU
is offlined.

Thanks,

Ingo

2015-08-24 07:23:25

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Mon, 2015-08-24 at 08:44 +0200, Ingo Molnar wrote:

> the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to
> it (housekeeping_mask - a variable mask to begin with) and recover the
> housekeeping CPU via:
>
> + return cpumask_any_and(housekeeping_mask, cpu_online_mask);

There can be, and had better be if box is big, multiple housekeepers.

Imagine a NO_HZ_FULL_ALL kernel on an SGI beast from hell. Offloading
8191 CPUs onto poor little CPU0 probably wouldn't work out well :)

-Mike

2015-08-24 07:41:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers


* Mike Galbraith <[email protected]> wrote:

> On Mon, 2015-08-24 at 08:44 +0200, Ingo Molnar wrote:
>
> > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to
> > it (housekeeping_mask - a variable mask to begin with) and recover the
> > housekeeping CPU via:
> >
> > + return cpumask_any_and(housekeeping_mask, cpu_online_mask);
>
> There can be, and had better be if box is big, multiple housekeepers.

Yes - but that does not seem to be possible via the code right now AFAICS, so at
minimum it's incomplete.

Thanks,

Ingo

2015-08-24 07:54:53

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Mon, 2015-08-24 at 09:41 +0200, Ingo Molnar wrote:
> * Mike Galbraith <[email protected]> wrote:
>
> > On Mon, 2015-08-24 at 08:44 +0200, Ingo Molnar wrote:
> >
> > > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to
> > > it (housekeeping_mask - a variable mask to begin with) and recover the
> > > housekeeping CPU via:
> > >
> > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask);
> >
> > There can be, and had better be if box is big, multiple housekeepers.
>
> Yes - but that does not seem to be possible via the code right now AFAICS, so at
> minimum it's incomplete.

In master housekeepers are set up in tick_nohz_init(). Everybody who's
not a nohz_full CPU is a housekeeper.

-Mike

2015-08-24 08:00:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers


* Mike Galbraith <[email protected]> wrote:

> On Mon, 2015-08-24 at 09:41 +0200, Ingo Molnar wrote:
> > * Mike Galbraith <[email protected]> wrote:
> >
> > > On Mon, 2015-08-24 at 08:44 +0200, Ingo Molnar wrote:
> > >
> > > > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to
> > > > it (housekeeping_mask - a variable mask to begin with) and recover the
> > > > housekeeping CPU via:
> > > >
> > > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask);
> > >
> > > There can be, and had better be if box is big, multiple housekeepers.
> >
> > Yes - but that does not seem to be possible via the code right now AFAICS, so
> > at minimum it's incomplete.
>
> In master housekeepers are set up in tick_nohz_init(). Everybody who's not a
> nohz_full CPU is a housekeeper.

Ah, indeed, I missed the nohz_full= boot option:

static int __init tick_nohz_full_setup(char *str)
{
alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
if (cpulist_parse(str, tick_nohz_full_mask) < 0) {


Ok, first question is resolved - but the second one, behavior on boot CPU unplug
still holds.

Thanks,

Ingo

2015-08-24 13:36:08

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote:
> 2)
>
> What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y)
>
> I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU
> is offlined.

We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which
is the boot CPU in nohz full, never gets offlined.

>
> Thanks,
>
> Ingo

2015-08-24 13:50:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote:
>
> * Paul E. McKenney <[email protected]> wrote:
>
> > > here it's fully set - triggering the bug I'm worried about. So what am I
> > > missing, what prevents CONFIG_NO_HZ_FULL_ALL from crashing?
> >
> > The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(), which is
> > called from tick_init() which is called from start_kernel() shortly after
> > rcu_init():
> >
> > cpu = smp_processor_id();
> >
> > if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
> > pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
> > cpumask_clear_cpu(cpu, tick_nohz_full_mask);
> > }
> >
> > This happens after the call to tick_nohz_init_all() that does the
> > cpumask_setall() that you called out above.
>
> Ah, indeed - I somehow missed that.
>
> This brings up two other questions:
>
> 1)
>
> the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to
> it (housekeeping_mask - a variable mask to begin with) and recover the
> housekeeping CPU via:
>
> + return cpumask_any_and(housekeeping_mask, cpu_online_mask);
>
> which can be pretty expensive, and which gets executed in two hotpaths:
>
> kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target());
> kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target());
>
> ... why not just use a single housekeeping_cpu which would be way faster to pass
> down to the timer code?

The housekeeping_cpu came later, but that does seem like a good optimization.

> 2)
>
> What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y)
>
> I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU
> is offlined.

The tick_nohz_cpu_down_callback() function does this, though in a less
than obvious way. The tick_do_timer_cpu variable is the housekeeping
CPU that is currently handling timing, and it is not permitted to go
offline.

Thanx, Paul

2015-08-24 14:01:24

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Mon, 2015-08-24 at 15:36 +0200, Frederic Weisbecker wrote:
> On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote:
> > 2)
> >
> > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y)
> >
> > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU
> > is offlined.
>
> We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which
> is the boot CPU in nohz full, never gets offlined.

Aha.. and it works fine too. (CPU0 told me to go away earlier;)

-Mike

2015-08-24 14:04:41

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Mon, Aug 24, 2015 at 06:50:18AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote:
> >
> > * Paul E. McKenney <[email protected]> wrote:
> >
> > > > here it's fully set - triggering the bug I'm worried about. So what am I
> > > > missing, what prevents CONFIG_NO_HZ_FULL_ALL from crashing?
> > >
> > > The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(), which is
> > > called from tick_init() which is called from start_kernel() shortly after
> > > rcu_init():
> > >
> > > cpu = smp_processor_id();
> > >
> > > if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
> > > pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
> > > cpumask_clear_cpu(cpu, tick_nohz_full_mask);
> > > }
> > >
> > > This happens after the call to tick_nohz_init_all() that does the
> > > cpumask_setall() that you called out above.
> >
> > Ah, indeed - I somehow missed that.
> >
> > This brings up two other questions:
> >
> > 1)
> >
> > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to
> > it (housekeeping_mask - a variable mask to begin with) and recover the
> > housekeeping CPU via:
> >
> > + return cpumask_any_and(housekeeping_mask, cpu_online_mask);
> >
> > which can be pretty expensive, and which gets executed in two hotpaths:
> >
> > kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target());
> > kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target());
> >
> > ... why not just use a single housekeeping_cpu which would be way faster to pass
> > down to the timer code?
>
> The housekeeping_cpu came later, but that does seem like a good optimization.

Well nohz full is likely to be used for HPC and that can involve big machines.
Having the housekeeping duty spread per node is a likely future evolution there,
if it isn't already used that way.

So we need to keep it a cpumask.

>
> > 2)
> >
> > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y)
> >
> > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU
> > is offlined.
>
> The tick_nohz_cpu_down_callback() function does this, though in a less
> than obvious way. The tick_do_timer_cpu variable is the housekeeping
> CPU that is currently handling timing, and it is not permitted to go
> offline.

Indeed, more specifically tick-common.c makes sure to set the timekeeping
duty to a housekeeper and that housekeeper is always the boot CPU due to
early device initialization.

But I should find a way to simplify that code and make it obvious it's always
set to the boot CPU.

2015-08-24 15:45:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Mon, Aug 24, 2015 at 04:04:37PM +0200, Frederic Weisbecker wrote:
> On Mon, Aug 24, 2015 at 06:50:18AM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote:
> > >
> > > * Paul E. McKenney <[email protected]> wrote:
> > >
> > > > > here it's fully set - triggering the bug I'm worried about. So what am I
> > > > > missing, what prevents CONFIG_NO_HZ_FULL_ALL from crashing?
> > > >
> > > > The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(), which is
> > > > called from tick_init() which is called from start_kernel() shortly after
> > > > rcu_init():
> > > >
> > > > cpu = smp_processor_id();
> > > >
> > > > if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
> > > > pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
> > > > cpumask_clear_cpu(cpu, tick_nohz_full_mask);
> > > > }
> > > >
> > > > This happens after the call to tick_nohz_init_all() that does the
> > > > cpumask_setall() that you called out above.
> > >
> > > Ah, indeed - I somehow missed that.
> > >
> > > This brings up two other questions:
> > >
> > > 1)
> > >
> > > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to
> > > it (housekeeping_mask - a variable mask to begin with) and recover the
> > > housekeeping CPU via:
> > >
> > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask);
> > >
> > > which can be pretty expensive, and which gets executed in two hotpaths:
> > >
> > > kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target());
> > > kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target());
> > >
> > > ... why not just use a single housekeeping_cpu which would be way faster to pass
> > > down to the timer code?
> >
> > The housekeeping_cpu came later, but that does seem like a good optimization.
>
> Well nohz full is likely to be used for HPC and that can involve big machines.
> Having the housekeeping duty spread per node is a likely future evolution there,
> if it isn't already used that way.
>
> So we need to keep it a cpumask.

Fair point!

Thanx, Paul

> > > 2)
> > >
> > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y)
> > >
> > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU
> > > is offlined.
> >
> > The tick_nohz_cpu_down_callback() function does this, though in a less
> > than obvious way. The tick_do_timer_cpu variable is the housekeeping
> > CPU that is currently handling timing, and it is not permitted to go
> > offline.
>
> Indeed, more specifically tick-common.c makes sure to set the timekeeping
> duty to a housekeeper and that housekeeper is always the boot CPU due to
> early device initialization.
>
> But I should find a way to simplify that code and make it obvious it's always
> set to the boot CPU.
>

2015-08-25 08:29:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers


* Frederic Weisbecker <[email protected]> wrote:

> On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote:
> > 2)
> >
> > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y)
> >
> > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU
> > is offlined.
>
> We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which
> is the boot CPU in nohz full, never gets offlined.

That solution really sucks - it essentially regresses a feature the user
explicitly asked for! I also see no way for the user to migrate the timekeeping
functionality over to another CPU without rebooting.

If this is the last timekeeping CPU then it should migrate the timekeeping
functionality to another CPU, and perhaps printk a warning if all other CPUs are
nohz-full and we have to mark one of them as the timekeeper.

Also, the nohz-full and timekeeper functionality should not be a boot parameter
only thing, but should be runtime configurable.

Thanks,

Ingo

2015-08-25 13:46:04

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Tue, Aug 25, 2015 at 10:29:04AM +0200, Ingo Molnar wrote:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
> > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote:
> > > 2)
> > >
> > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y)
> > >
> > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU
> > > is offlined.
> >
> > We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which
> > is the boot CPU in nohz full, never gets offlined.
>
> That solution really sucks - it essentially regresses a feature the user
> explicitly asked for! I also see no way for the user to migrate the timekeeping
> functionality over to another CPU without rebooting.
>
> If this is the last timekeeping CPU then it should migrate the timekeeping
> functionality to another CPU, and perhaps printk a warning if all other CPUs are
> nohz-full and we have to mark one of them as the timekeeper.
>
> Also, the nohz-full and timekeeper functionality should not be a boot parameter
> only thing, but should be runtime configurable.

When I tried to allow moving the timekeeping duty over all housekeeping CPUs, Thomas got
angry because it broke the KISS current nohz full code. Indeed, there must be at least
one running all the time on behalf of nohz full CPUs that can run anytime. Thus balancing
the timekeeping duty over housekeepers is a bit more complicated than in normal
configurations.

Now surely we can do that using an IPI from CPU_DOWN_PREPARE to a housekeeper if
any remains or to a nohz full one. Then we must make sure the new timekeeper never
goes to idle.

But nohz_full is a corner usecase and I'm not sure it's worth the complexity. If a
nohz full user came and complained about CPU0 hotplog not working, I would definetly
retry it but I haven't heard about that yet. Besides, hotplug is very isolation-unfriendly
in general due to stop machine.

2015-08-28 08:33:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers


* Frederic Weisbecker <[email protected]> wrote:

> On Tue, Aug 25, 2015 at 10:29:04AM +0200, Ingo Molnar wrote:
> >
> > * Frederic Weisbecker <[email protected]> wrote:
> >
> > > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote:
> > > > 2)
> > > >
> > > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y)
> > > >
> > > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU
> > > > is offlined.
> > >
> > > We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which
> > > is the boot CPU in nohz full, never gets offlined.
> >
> > That solution really sucks - it essentially regresses a feature the user
> > explicitly asked for! I also see no way for the user to migrate the timekeeping
> > functionality over to another CPU without rebooting.
> >
> > If this is the last timekeeping CPU then it should migrate the timekeeping
> > functionality to another CPU, and perhaps printk a warning if all other CPUs are
> > nohz-full and we have to mark one of them as the timekeeper.
> >
> > Also, the nohz-full and timekeeper functionality should not be a boot parameter
> > only thing, but should be runtime configurable.
>
> When I tried to allow moving the timekeeping duty over all housekeeping CPUs,
> Thomas got angry because it broke the KISS current nohz full code. Indeed, there
> must be at least one running all the time on behalf of nohz full CPUs that can
> run anytime. Thus balancing the timekeeping duty over housekeepers is a bit more
> complicated than in normal configurations.
>
> Now surely we can do that using an IPI from CPU_DOWN_PREPARE to a housekeeper if
> any remains or to a nohz full one. Then we must make sure the new timekeeper
> never goes to idle.
>
> But nohz_full is a corner usecase and I'm not sure it's worth the complexity. If
> a nohz full user came and complained about CPU0 hotplog not working, I would
> definetly retry it but I haven't heard about that yet. Besides, hotplug is very
> isolation-unfriendly in general due to stop machine.

Ok, I guess we can live with this.

Mind sending an updated series with all patches?

Thanks,

Ingo

2015-08-28 12:31:04

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers

On Fri, Aug 28, 2015 at 10:32:59AM +0200, Ingo Molnar wrote:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
> > On Tue, Aug 25, 2015 at 10:29:04AM +0200, Ingo Molnar wrote:
> > >
> > > * Frederic Weisbecker <[email protected]> wrote:
> > >
> > > > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote:
> > > > > 2)
> > > > >
> > > > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y)
> > > > >
> > > > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU
> > > > > is offlined.
> > > >
> > > > We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which
> > > > is the boot CPU in nohz full, never gets offlined.
> > >
> > > That solution really sucks - it essentially regresses a feature the user
> > > explicitly asked for! I also see no way for the user to migrate the timekeeping
> > > functionality over to another CPU without rebooting.
> > >
> > > If this is the last timekeeping CPU then it should migrate the timekeeping
> > > functionality to another CPU, and perhaps printk a warning if all other CPUs are
> > > nohz-full and we have to mark one of them as the timekeeper.
> > >
> > > Also, the nohz-full and timekeeper functionality should not be a boot parameter
> > > only thing, but should be runtime configurable.
> >
> > When I tried to allow moving the timekeeping duty over all housekeeping CPUs,
> > Thomas got angry because it broke the KISS current nohz full code. Indeed, there
> > must be at least one running all the time on behalf of nohz full CPUs that can
> > run anytime. Thus balancing the timekeeping duty over housekeepers is a bit more
> > complicated than in normal configurations.
> >
> > Now surely we can do that using an IPI from CPU_DOWN_PREPARE to a housekeeper if
> > any remains or to a nohz full one. Then we must make sure the new timekeeper
> > never goes to idle.
> >
> > But nohz_full is a corner usecase and I'm not sure it's worth the complexity. If
> > a nohz full user came and complained about CPU0 hotplog not working, I would
> > definetly retry it but I haven't heard about that yet. Besides, hotplug is very
> > isolation-unfriendly in general due to stop machine.
>
> Ok, I guess we can live with this.

Now this will likely evolve in the future, I can easily imagine that timekeeping becomes
balanced among housekeepers when we'll have one per node. It's not yet the priority but
we may come to that one day.

>
> Mind sending an updated series with all patches?

Sure, I'm cooking that.

Thanks!