2014-06-21 23:29:25

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/5] timer: Fix missing dynticks kick v2

Hi,

It's a 2nd set that fixes some missing dyntick kicks in the timer's code.
This new version also handles missing kicks in the hrtimers subsystem.

The patches are also available at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/missing-kick-v2

Thanks,
Frederic
---

Viresh Kumar (5):
timer: Store cpu-number in 'struct tvec_base'
timer: Kick dynticks targets on mod_timer*() calls
hrtimer: Store cpu-number in 'struct hrtimer_cpu_base'
hrtimer: Kick lowres dynticks targets on timer enqueue
hrtimer: Remove hrtimer_enqueue_reprogram()


include/linux/hrtimer.h | 2 ++
kernel/hrtimer.c | 49 +++++++++++++++++++++++++++----------------------
kernel/timer.c | 34 ++++++++++++++++++----------------
3 files changed, 47 insertions(+), 38 deletions(-)


2014-06-21 23:29:32

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/5] timer: Kick dynticks targets on mod_timer*() calls

From: Viresh Kumar <[email protected]>

When a timer is enqueued or modified on a dynticks target, that CPU
must re-evaluate the next tick to service that timer.

The tick re-evaluation is performed by an IPI kick on the target.
Now while we correctly call wake_up_nohz_cpu() from add_timer_on(), the
mod_timer*() API family doesn't support so well dynticks targets.

The reason for this is likely that __mod_timer() isn't supposed to
select an idle target for a timer, unless that target is the current
CPU, in which case a dynticks idle kick isn't actually needed.

But there is a small race window lurking behind that assumption: the
elected target has all the time to turn dynticks idle between the call
to get_nohz_timer_target() and the locking of its base. Hence a risk
that we enqueue a timer on a dynticks idle destination without kicking
it. As a result, the timer might be serviced too late in the future.

Also a target elected by __mod_timer() can be in full dynticks mode
and thus require to be kicked as well. And unlike idle dynticks, this
concern both local and remote targets.

To fix this whole issue, lets centralize the dynticks kick to
internal_add_timer() so that it is well handled for all sort of timer
enqueue. Even timer migration is concerned so that a full dynticks target
is correctly kicked as needed when timers are migrating to it.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/timer.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 9e5f4f2..aca5dfe 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -410,6 +410,22 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer)
base->next_timer = timer->expires;
}
base->all_timers++;
+
+ /*
+ * Check whether the other CPU is in dynticks mode and needs
+ * to be triggered to reevaluate the timer wheel.
+ * We are protected against the other CPU fiddling
+ * with the timer by holding the timer base lock. This also
+ * makes sure that a CPU on the way to stop its tick can not
+ * evaluate the timer wheel.
+ *
+ * Spare the IPI for deferrable timers on idle targets though.
+ * The next busy ticks will take care of it. Except full dynticks
+ * require special care against races with idle_cpu(), lets deal
+ * with that later.
+ */
+ if (!tbase_get_deferrable(base) || tick_nohz_full_cpu(base->cpu))
+ wake_up_nohz_cpu(base->cpu);
}

#ifdef CONFIG_TIMER_STATS
@@ -949,22 +965,6 @@ void add_timer_on(struct timer_list *timer, int cpu)
timer_set_base(timer, base);
debug_activate(timer, timer->expires);
internal_add_timer(base, timer);
- /*
- * Check whether the other CPU is in dynticks mode and needs
- * to be triggered to reevaluate the timer wheel.
- * We are protected against the other CPU fiddling
- * with the timer by holding the timer base lock. This also
- * makes sure that a CPU on the way to stop its tick can not
- * evaluate the timer wheel.
- *
- * Spare the IPI for deferrable timers on idle targets though.
- * The next busy ticks will take care of it. Except full dynticks
- * require special care against races with idle_cpu(), lets deal
- * with that later.
- */
- if (!tbase_get_deferrable(timer->base) || tick_nohz_full_cpu(cpu))
- wake_up_nohz_cpu(cpu);
-
spin_unlock_irqrestore(&base->lock, flags);
}
EXPORT_SYMBOL_GPL(add_timer_on);
--
1.8.3.1

2014-06-21 23:29:44

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 5/5] hrtimer: Remove hrtimer_enqueue_reprogram()

From: Viresh Kumar <[email protected]>

We call hrtimer_enqueue_reprogram() only when we are in high resolution
mode now so we don't need to check that again in hrtimer_enqueue_reprogram().

Once the check is removed, hrtimer_enqueue_reprogram() turns to be an
useless wrapper over hrtimer_reprogram() and can be dropped.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/hrtimer.c | 23 ++++++++---------------
1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 5f30917..8b3ea17 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -602,6 +602,11 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal)
* timers, we have to check, whether it expires earlier than the timer for
* which the clock event device was armed.
*
+ * Note, that in case the state has HRTIMER_STATE_CALLBACK set, no reprogramming
+ * and no expiry check happens. The timer gets enqueued into the rbtree. The
+ * reprogramming and expiry check is done in the hrtimer_interrupt or in the
+ * softirq.
+ *
* Called with interrupts disabled and base->cpu_base.lock held
*/
static int hrtimer_reprogram(struct hrtimer *timer,
@@ -662,18 +667,6 @@ static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base)
base->hres_active = 0;
}

-/*
- * When High resolution timers are active, try to reprogram. Note, that in case
- * the state has HRTIMER_STATE_CALLBACK set, no reprogramming and no expiry
- * check happens. The timer gets enqueued into the rbtree. The reprogramming
- * and expiry check is done in the hrtimer_interrupt or in the softirq.
- */
-static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
- struct hrtimer_clock_base *base)
-{
- return base->cpu_base->hres_active && hrtimer_reprogram(timer, base);
-}
-
static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
{
ktime_t *offs_real = &base->clock_base[HRTIMER_BASE_REALTIME].offset;
@@ -755,8 +748,8 @@ static inline int hrtimer_is_hres_enabled(void) { return 0; }
static inline int hrtimer_switch_to_hres(void) { return 0; }
static inline void
hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { }
-static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
- struct hrtimer_clock_base *base)
+static inline int hrtimer_reprogram(struct hrtimer *timer,
+ struct hrtimer_clock_base *base)
{
return 0;
}
@@ -1025,7 +1018,7 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
*/
wake_up_nohz_cpu(base->cpu_base->cpu);
} else if (new_base->cpu_base == &__get_cpu_var(hrtimer_bases) &&
- hrtimer_enqueue_reprogram(timer, new_base)) {
+ hrtimer_reprogram(timer, new_base)) {
/*
* Only allow reprogramming if the new base is on this CPU.
* (it might still be on another CPU if the timer was pending)
--
1.8.3.1

2014-06-21 23:29:43

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 4/5] hrtimer: Kick lowres dynticks targets on timer enqueue

From: Viresh Kumar <[email protected]>

In lowres mode, hrtimers are serviced by the tick instead of a clock
event. It works well as long as the tick stays periodic but we must also
make sure that the hrtimers are serviced in dynticks mode targets,
pretty much like timer list timers do.

Note that all dynticks modes are concerned: get_nohz_timer_target()
tries not to return remote idle CPUs but there is nothing to prevent
the elected target from entering dynticks idle mode until we lock its
base. It's also prefectly legal to enqueue hrtimers on full dynticks CPU.

So there are two requirements to correctly handle dynticks:

1) On target's tick stop time, we must not delay the next tick further
the next hrtimer.

2) On hrtimer queue time. If the tick of the target is stopped, we must
wake up that CPU such that it sees the new hrtimer and recalculate
the next tick accordingly.

The point 1 is well handled currently through get_nohz_timer_interrupt() and
cmp_next_hrtimer_event().

But the point 2 isn't handled at all.

Fixing this is easy though as we have the necessary API ready for that.
All we need is to call wake_up_nohz_cpu() on a target when a newly
enqueued hrtimer requires tick rescheduling, like timer list timer do.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/hrtimer.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 0e32d4e..5f30917 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1013,14 +1013,25 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,

leftmost = enqueue_hrtimer(timer, new_base);

- /*
- * Only allow reprogramming if the new base is on this CPU.
- * (it might still be on another CPU if the timer was pending)
- *
- * XXX send_remote_softirq() ?
- */
- if (leftmost && new_base->cpu_base == &__get_cpu_var(hrtimer_bases)
- && hrtimer_enqueue_reprogram(timer, new_base)) {
+ if (!leftmost) {
+ unlock_hrtimer_base(timer, &flags);
+ return ret;
+ }
+
+ if (!hrtimer_is_hres_active(timer)) {
+ /*
+ * Kick to reschedule the next tick to handle the new timer
+ * on dynticks target.
+ */
+ wake_up_nohz_cpu(base->cpu_base->cpu);
+ } else if (new_base->cpu_base == &__get_cpu_var(hrtimer_bases) &&
+ hrtimer_enqueue_reprogram(timer, new_base)) {
+ /*
+ * Only allow reprogramming if the new base is on this CPU.
+ * (it might still be on another CPU if the timer was pending)
+ *
+ * XXX send_remote_softirq() ?
+ */
if (wakeup) {
/*
* We need to drop cpu_base->lock to avoid a
--
1.8.3.1

2014-06-21 23:29:41

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/5] timer: Store cpu-number in 'struct tvec_base'

From: Viresh Kumar <[email protected]>

Timers are serviced by the tick. But when a timer is enqueued on a
dynticks target, we need to kick it in order to make it reconsider the
next tick to schedule to correctly handle the timer's expiring time.

Now while this kick is correctly performed for add_timer_on(), the
mod_timer*() family has been a bit neglected.

To prepare for fixing this, we need internal_add_timer() to be able to
resolve the CPU target associated to a timer's object 'base' so that the
kick can be centralized there.

This can't be passed as an argument as not all the callers know the CPU
number of a timer's base. So lets store it in the struct tvec_base to
resolve the CPU without much overhead. It is set once for good at every
CPU's first boot.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/timer.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/timer.c b/kernel/timer.c
index 3bb01a3..9e5f4f2 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -82,6 +82,7 @@ struct tvec_base {
unsigned long next_timer;
unsigned long active_timers;
unsigned long all_timers;
+ int cpu;
struct tvec_root tv1;
struct tvec tv2;
struct tvec tv3;
@@ -1568,6 +1569,7 @@ static int init_timers_cpu(int cpu)
}
spin_lock_init(&base->lock);
tvec_base_done[cpu] = 1;
+ base->cpu = cpu;
} else {
base = per_cpu(tvec_bases, cpu);
}
--
1.8.3.1

2014-06-21 23:29:39

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/5] hrtimer: Store cpu-number in 'struct hrtimer_cpu_base'

From: Viresh Kumar <[email protected]>

In lowres mode, hrtimers are serviced by the tick instead of a clock
event. Now it works well as long as the tick stays periodic but we
must also make sure that the hrtimers are serviced in dynticks mode.

Part of that job consist in kicking a dynticks hrtimer target in order
to make it reconsider the next tick to schedule to correctly handle the
hrtimer's expiring time. And that part isn't handled by the hrtimers
subsystem.

To prepare for fixing this, we need __hrtimer_start_range_ns() to be
able to resolve the CPU target associated to a hrtimer's object
'cpu_base' so that the kick can be centralized there.

So lets store it in the 'struct hrtimer_cpu_base' to resolve the CPU
without overhead. It is set once at CPU's online notification.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/hrtimer.h | 2 ++
kernel/hrtimer.c | 1 +
2 files changed, 3 insertions(+)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index e7a8d3f..bb4ffff 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -165,6 +165,7 @@ enum hrtimer_base_type {
* struct hrtimer_cpu_base - the per cpu clock bases
* @lock: lock protecting the base and associated clock bases
* and timers
+ * @cpu: cpu number
* @active_bases: Bitfield to mark bases with active timers
* @clock_was_set: Indicates that clock was set from irq context.
* @expires_next: absolute time of the next event which was scheduled
@@ -179,6 +180,7 @@ enum hrtimer_base_type {
*/
struct hrtimer_cpu_base {
raw_spinlock_t lock;
+ unsigned int cpu;
unsigned int active_bases;
unsigned int clock_was_set;
#ifdef CONFIG_HIGH_RES_TIMERS
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 3ab2899..0e32d4e 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1680,6 +1680,7 @@ static void init_hrtimers_cpu(int cpu)
timerqueue_init_head(&cpu_base->clock_base[i].active);
}

+ cpu_base->cpu = cpu;
hrtimer_init_hres(cpu_base);
}

--
1.8.3.1

2014-06-22 13:36:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 4/5] hrtimer: Kick lowres dynticks targets on timer enqueue

On Sun, 22 Jun 2014, Frederic Weisbecker wrote:
> From: Viresh Kumar <[email protected]>
>
> In lowres mode, hrtimers are serviced by the tick instead of a clock
> event. It works well as long as the tick stays periodic but we must also
> make sure that the hrtimers are serviced in dynticks mode targets,
> pretty much like timer list timers do.
>
> Note that all dynticks modes are concerned: get_nohz_timer_target()
> tries not to return remote idle CPUs but there is nothing to prevent
> the elected target from entering dynticks idle mode until we lock its
> base. It's also prefectly legal to enqueue hrtimers on full dynticks CPU.
>
> So there are two requirements to correctly handle dynticks:
>
> 1) On target's tick stop time, we must not delay the next tick further
> the next hrtimer.
>
> 2) On hrtimer queue time. If the tick of the target is stopped, we must
> wake up that CPU such that it sees the new hrtimer and recalculate
> the next tick accordingly.
>
> The point 1 is well handled currently through get_nohz_timer_interrupt() and
> cmp_next_hrtimer_event().
>
> But the point 2 isn't handled at all.
>
> Fixing this is easy though as we have the necessary API ready for that.
> All we need is to call wake_up_nohz_cpu() on a target when a newly
> enqueued hrtimer requires tick rescheduling, like timer list timer do.
>
> Signed-off-by: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> kernel/hrtimer.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 0e32d4e..5f30917 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1013,14 +1013,25 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
>
> leftmost = enqueue_hrtimer(timer, new_base);
>
> - /*
> - * Only allow reprogramming if the new base is on this CPU.
> - * (it might still be on another CPU if the timer was pending)
> - *
> - * XXX send_remote_softirq() ?
> - */
> - if (leftmost && new_base->cpu_base == &__get_cpu_var(hrtimer_bases)
> - && hrtimer_enqueue_reprogram(timer, new_base)) {
> + if (!leftmost) {
> + unlock_hrtimer_base(timer, &flags);
> + return ret;
> + }
> +
> + if (!hrtimer_is_hres_active(timer)) {
> + /*
> + * Kick to reschedule the next tick to handle the new timer
> + * on dynticks target.
> + */
> + wake_up_nohz_cpu(base->cpu_base->cpu);

The timer is queued on new_base not on base.

Thanks,

tglx

2014-06-23 04:44:38

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH 4/5] hrtimer: Kick lowres dynticks targets on timer enqueue

On 22 June 2014 19:06, Thomas Gleixner <[email protected]> wrote:
> On Sun, 22 Jun 2014, Frederic Weisbecker wrote:

>> + wake_up_nohz_cpu(base->cpu_base->cpu);
>
> The timer is queued on new_base not on base.

Oops, thanks for spotting this bug. Will be fixed in next version.

2014-06-23 08:09:46

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V2 4/5] hrtimer: Kick lowres dynticks targets on timer enqueue

In lowres mode, hrtimers are serviced by the tick instead of a clock
event. It works well as long as the tick stays periodic but we must also
make sure that the hrtimers are serviced in dynticks mode targets,
pretty much like timer list timers do.

Note that all dynticks modes are concerned: get_nohz_timer_target()
tries not to return remote idle CPUs but there is nothing to prevent
the elected target from entering dynticks idle mode until we lock its
base. It's also prefectly legal to enqueue hrtimers on full dynticks CPU.

So there are two requirements to correctly handle dynticks:

1) On target's tick stop time, we must not delay the next tick further
the next hrtimer.

2) On hrtimer queue time. If the tick of the target is stopped, we must
wake up that CPU such that it sees the new hrtimer and recalculate
the next tick accordingly.

The point 1 is well handled currently through get_nohz_timer_interrupt() and
cmp_next_hrtimer_event().

But the point 2 isn't handled at all.

Fixing this is easy though as we have the necessary API ready for that.
All we need is to call wake_up_nohz_cpu() on a target when a newly
enqueued hrtimer requires tick rescheduling, like timer list timer do.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
Hi Thomas,

Frederic is on vacations and so sending it directly.

V1->V2: s/base/new_base as pointed out by Thomas.

Pushed here as well:
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux.git timers/missing-kick-v2

kernel/hrtimer.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 0e32d4e..f900747 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1013,14 +1013,25 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,

leftmost = enqueue_hrtimer(timer, new_base);

- /*
- * Only allow reprogramming if the new base is on this CPU.
- * (it might still be on another CPU if the timer was pending)
- *
- * XXX send_remote_softirq() ?
- */
- if (leftmost && new_base->cpu_base == &__get_cpu_var(hrtimer_bases)
- && hrtimer_enqueue_reprogram(timer, new_base)) {
+ if (!leftmost) {
+ unlock_hrtimer_base(timer, &flags);
+ return ret;
+ }
+
+ if (!hrtimer_is_hres_active(timer)) {
+ /*
+ * Kick to reschedule the next tick to handle the new timer
+ * on dynticks target.
+ */
+ wake_up_nohz_cpu(new_base->cpu_base->cpu);
+ } else if (new_base->cpu_base == &__get_cpu_var(hrtimer_bases) &&
+ hrtimer_enqueue_reprogram(timer, new_base)) {
+ /*
+ * Only allow reprogramming if the new base is on this CPU.
+ * (it might still be on another CPU if the timer was pending)
+ *
+ * XXX send_remote_softirq() ?
+ */
if (wakeup) {
/*
* We need to drop cpu_base->lock to avoid a
--
2.0.0.rc2

Subject: [tip:timers/core] timer: Store cpu-number in struct tvec_base

Commit-ID: d6f93829811a3e74f58e3c3823d507411eed651a
Gitweb: http://git.kernel.org/tip/d6f93829811a3e74f58e3c3823d507411eed651a
Author: Viresh Kumar <[email protected]>
AuthorDate: Sun, 22 Jun 2014 01:29:13 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 23 Jun 2014 11:23:46 +0200

timer: Store cpu-number in struct tvec_base

Timers are serviced by the tick. But when a timer is enqueued on a
dynticks target, we need to kick it in order to make it reconsider the
next tick to schedule to correctly handle the timer's expiring time.

Now while this kick is correctly performed for add_timer_on(), the
mod_timer*() family has been a bit neglected.

To prepare for fixing this, we need internal_add_timer() to be able to
resolve the CPU target associated to a timer's object 'base' so that the
kick can be centralized there.

This can't be passed as an argument as not all the callers know the CPU
number of a timer's base. So lets store it in the struct tvec_base to
resolve the CPU without much overhead. It is set once for good at every
CPU's first boot.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/time/timer.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 3bb01a3..9e5f4f2 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -82,6 +82,7 @@ struct tvec_base {
unsigned long next_timer;
unsigned long active_timers;
unsigned long all_timers;
+ int cpu;
struct tvec_root tv1;
struct tvec tv2;
struct tvec tv3;
@@ -1568,6 +1569,7 @@ static int init_timers_cpu(int cpu)
}
spin_lock_init(&base->lock);
tvec_base_done[cpu] = 1;
+ base->cpu = cpu;
} else {
base = per_cpu(tvec_bases, cpu);
}

Subject: [tip:timers/core] timer: Kick dynticks targets on mod_timer*() calls

Commit-ID: 9f6d9baaa8ca94b48aea495261cadaf2967c7784
Gitweb: http://git.kernel.org/tip/9f6d9baaa8ca94b48aea495261cadaf2967c7784
Author: Viresh Kumar <[email protected]>
AuthorDate: Sun, 22 Jun 2014 01:29:14 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 23 Jun 2014 11:23:47 +0200

timer: Kick dynticks targets on mod_timer*() calls

When a timer is enqueued or modified on a dynticks target, that CPU
must re-evaluate the next tick to service that timer.

The tick re-evaluation is performed by an IPI kick on the target.
Now while we correctly call wake_up_nohz_cpu() from add_timer_on(), the
mod_timer*() API family doesn't support so well dynticks targets.

The reason for this is likely that __mod_timer() isn't supposed to
select an idle target for a timer, unless that target is the current
CPU, in which case a dynticks idle kick isn't actually needed.

But there is a small race window lurking behind that assumption: the
elected target has all the time to turn dynticks idle between the call
to get_nohz_timer_target() and the locking of its base. Hence a risk
that we enqueue a timer on a dynticks idle destination without kicking
it. As a result, the timer might be serviced too late in the future.

Also a target elected by __mod_timer() can be in full dynticks mode
and thus require to be kicked as well. And unlike idle dynticks, this
concern both local and remote targets.

To fix this whole issue, lets centralize the dynticks kick to
internal_add_timer() so that it is well handled for all sort of timer
enqueue. Even timer migration is concerned so that a full dynticks target
is correctly kicked as needed when timers are migrating to it.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/time/timer.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 9e5f4f2..aca5dfe 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -410,6 +410,22 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer)
base->next_timer = timer->expires;
}
base->all_timers++;
+
+ /*
+ * Check whether the other CPU is in dynticks mode and needs
+ * to be triggered to reevaluate the timer wheel.
+ * We are protected against the other CPU fiddling
+ * with the timer by holding the timer base lock. This also
+ * makes sure that a CPU on the way to stop its tick can not
+ * evaluate the timer wheel.
+ *
+ * Spare the IPI for deferrable timers on idle targets though.
+ * The next busy ticks will take care of it. Except full dynticks
+ * require special care against races with idle_cpu(), lets deal
+ * with that later.
+ */
+ if (!tbase_get_deferrable(base) || tick_nohz_full_cpu(base->cpu))
+ wake_up_nohz_cpu(base->cpu);
}

#ifdef CONFIG_TIMER_STATS
@@ -949,22 +965,6 @@ void add_timer_on(struct timer_list *timer, int cpu)
timer_set_base(timer, base);
debug_activate(timer, timer->expires);
internal_add_timer(base, timer);
- /*
- * Check whether the other CPU is in dynticks mode and needs
- * to be triggered to reevaluate the timer wheel.
- * We are protected against the other CPU fiddling
- * with the timer by holding the timer base lock. This also
- * makes sure that a CPU on the way to stop its tick can not
- * evaluate the timer wheel.
- *
- * Spare the IPI for deferrable timers on idle targets though.
- * The next busy ticks will take care of it. Except full dynticks
- * require special care against races with idle_cpu(), lets deal
- * with that later.
- */
- if (!tbase_get_deferrable(timer->base) || tick_nohz_full_cpu(cpu))
- wake_up_nohz_cpu(cpu);
-
spin_unlock_irqrestore(&base->lock, flags);
}
EXPORT_SYMBOL_GPL(add_timer_on);

Subject: [tip:timers/core] hrtimer: Store cpu-number in struct hrtimer_cpu_base

Commit-ID: cddd02489f52ccf635ed65931214729a23b93cd6
Gitweb: http://git.kernel.org/tip/cddd02489f52ccf635ed65931214729a23b93cd6
Author: Viresh Kumar <[email protected]>
AuthorDate: Sun, 22 Jun 2014 01:29:15 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 23 Jun 2014 11:23:47 +0200

hrtimer: Store cpu-number in struct hrtimer_cpu_base

In lowres mode, hrtimers are serviced by the tick instead of a clock
event. Now it works well as long as the tick stays periodic but we
must also make sure that the hrtimers are serviced in dynticks mode.

Part of that job consist in kicking a dynticks hrtimer target in order
to make it reconsider the next tick to schedule to correctly handle the
hrtimer's expiring time. And that part isn't handled by the hrtimers
subsystem.

To prepare for fixing this, we need __hrtimer_start_range_ns() to be
able to resolve the CPU target associated to a hrtimer's object
'cpu_base' so that the kick can be centralized there.

So lets store it in the 'struct hrtimer_cpu_base' to resolve the CPU
without overhead. It is set once at CPU's online notification.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
include/linux/hrtimer.h | 2 ++
kernel/time/hrtimer.c | 1 +
2 files changed, 3 insertions(+)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index e7a8d3f..bb4ffff 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -165,6 +165,7 @@ enum hrtimer_base_type {
* struct hrtimer_cpu_base - the per cpu clock bases
* @lock: lock protecting the base and associated clock bases
* and timers
+ * @cpu: cpu number
* @active_bases: Bitfield to mark bases with active timers
* @clock_was_set: Indicates that clock was set from irq context.
* @expires_next: absolute time of the next event which was scheduled
@@ -179,6 +180,7 @@ enum hrtimer_base_type {
*/
struct hrtimer_cpu_base {
raw_spinlock_t lock;
+ unsigned int cpu;
unsigned int active_bases;
unsigned int clock_was_set;
#ifdef CONFIG_HIGH_RES_TIMERS
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 3ab2899..0e32d4e 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1680,6 +1680,7 @@ static void init_hrtimers_cpu(int cpu)
timerqueue_init_head(&cpu_base->clock_base[i].active);
}

+ cpu_base->cpu = cpu;
hrtimer_init_hres(cpu_base);
}

Subject: [tip:timers/core] hrtimer: Kick lowres dynticks targets on timer enqueue

Commit-ID: 49a2a07514a3a2ea4a02482fa60575e106d960f9
Gitweb: http://git.kernel.org/tip/49a2a07514a3a2ea4a02482fa60575e106d960f9
Author: Viresh Kumar <[email protected]>
AuthorDate: Mon, 23 Jun 2014 13:39:37 +0530
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 23 Jun 2014 11:23:47 +0200

hrtimer: Kick lowres dynticks targets on timer enqueue

In lowres mode, hrtimers are serviced by the tick instead of a clock
event. It works well as long as the tick stays periodic but we must also
make sure that the hrtimers are serviced in dynticks mode targets,
pretty much like timer list timers do.

Note that all dynticks modes are concerned: get_nohz_timer_target()
tries not to return remote idle CPUs but there is nothing to prevent
the elected target from entering dynticks idle mode until we lock its
base. It's also prefectly legal to enqueue hrtimers on full dynticks CPU.

So there are two requirements to correctly handle dynticks:

1) On target's tick stop time, we must not delay the next tick further
the next hrtimer.

2) On hrtimer queue time. If the tick of the target is stopped, we must
wake up that CPU such that it sees the new hrtimer and recalculate
the next tick accordingly.

The point 1 is well handled currently through get_nohz_timer_interrupt() and
cmp_next_hrtimer_event().

But the point 2 isn't handled at all.

Fixing this is easy though as we have the necessary API ready for that.
All we need is to call wake_up_nohz_cpu() on a target when a newly
enqueued hrtimer requires tick rescheduling, like timer list timer do.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Link: http://lkml.kernel.org/r/3d7ea08ce008698e26bd39fe10f55949391073ab.1403507178.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/time/hrtimer.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 0e32d4e..f900747 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1013,14 +1013,25 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,

leftmost = enqueue_hrtimer(timer, new_base);

- /*
- * Only allow reprogramming if the new base is on this CPU.
- * (it might still be on another CPU if the timer was pending)
- *
- * XXX send_remote_softirq() ?
- */
- if (leftmost && new_base->cpu_base == &__get_cpu_var(hrtimer_bases)
- && hrtimer_enqueue_reprogram(timer, new_base)) {
+ if (!leftmost) {
+ unlock_hrtimer_base(timer, &flags);
+ return ret;
+ }
+
+ if (!hrtimer_is_hres_active(timer)) {
+ /*
+ * Kick to reschedule the next tick to handle the new timer
+ * on dynticks target.
+ */
+ wake_up_nohz_cpu(new_base->cpu_base->cpu);
+ } else if (new_base->cpu_base == &__get_cpu_var(hrtimer_bases) &&
+ hrtimer_enqueue_reprogram(timer, new_base)) {
+ /*
+ * Only allow reprogramming if the new base is on this CPU.
+ * (it might still be on another CPU if the timer was pending)
+ *
+ * XXX send_remote_softirq() ?
+ */
if (wakeup) {
/*
* We need to drop cpu_base->lock to avoid a

Subject: [tip:timers/core] hrtimer: Remove hrtimer_enqueue_reprogram()

Commit-ID: 9e1e01dd79ac4cf936623399abe57dfba4528ae6
Gitweb: http://git.kernel.org/tip/9e1e01dd79ac4cf936623399abe57dfba4528ae6
Author: Viresh Kumar <[email protected]>
AuthorDate: Sun, 22 Jun 2014 01:29:17 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 23 Jun 2014 11:23:47 +0200

hrtimer: Remove hrtimer_enqueue_reprogram()

We call hrtimer_enqueue_reprogram() only when we are in high resolution
mode now so we don't need to check that again in hrtimer_enqueue_reprogram().

Once the check is removed, hrtimer_enqueue_reprogram() turns to be an
useless wrapper over hrtimer_reprogram() and can be dropped.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/time/hrtimer.c | 23 ++++++++---------------
1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index f900747..66a6dc1 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -602,6 +602,11 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal)
* timers, we have to check, whether it expires earlier than the timer for
* which the clock event device was armed.
*
+ * Note, that in case the state has HRTIMER_STATE_CALLBACK set, no reprogramming
+ * and no expiry check happens. The timer gets enqueued into the rbtree. The
+ * reprogramming and expiry check is done in the hrtimer_interrupt or in the
+ * softirq.
+ *
* Called with interrupts disabled and base->cpu_base.lock held
*/
static int hrtimer_reprogram(struct hrtimer *timer,
@@ -662,18 +667,6 @@ static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base)
base->hres_active = 0;
}

-/*
- * When High resolution timers are active, try to reprogram. Note, that in case
- * the state has HRTIMER_STATE_CALLBACK set, no reprogramming and no expiry
- * check happens. The timer gets enqueued into the rbtree. The reprogramming
- * and expiry check is done in the hrtimer_interrupt or in the softirq.
- */
-static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
- struct hrtimer_clock_base *base)
-{
- return base->cpu_base->hres_active && hrtimer_reprogram(timer, base);
-}
-
static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
{
ktime_t *offs_real = &base->clock_base[HRTIMER_BASE_REALTIME].offset;
@@ -755,8 +748,8 @@ static inline int hrtimer_is_hres_enabled(void) { return 0; }
static inline int hrtimer_switch_to_hres(void) { return 0; }
static inline void
hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { }
-static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
- struct hrtimer_clock_base *base)
+static inline int hrtimer_reprogram(struct hrtimer *timer,
+ struct hrtimer_clock_base *base)
{
return 0;
}
@@ -1025,7 +1018,7 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
*/
wake_up_nohz_cpu(new_base->cpu_base->cpu);
} else if (new_base->cpu_base == &__get_cpu_var(hrtimer_bases) &&
- hrtimer_enqueue_reprogram(timer, new_base)) {
+ hrtimer_reprogram(timer, new_base)) {
/*
* Only allow reprogramming if the new base is on this CPU.
* (it might still be on another CPU if the timer was pending)