2015-04-06 23:16:17

by Marcelo Tosatti

[permalink] [raw]
Subject: kernel/timer: avoid spurious ksoftirqd wakeups (v2)



It is only necessary to raise timer softirq
in case there are active timers.

Limit the ksoftirqd wakeup to that case.

Fixes a latency spike with isolated CPUs and
nohz full mode.

v2: fix variable initialization
do not raise timer softirq due to pending irqwork

Reported-and-tested-by: Luiz Capitulino <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197..0c065f9 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -192,7 +192,7 @@ extern void set_timer_slack(struct timer_list *time, int slack_hz);
* locks the timer base and does the comparison against the given
* jiffie.
*/
-extern unsigned long get_next_timer_interrupt(unsigned long now);
+extern unsigned long get_next_timer_interrupt(unsigned long now, bool *raise_softirq);

/*
* Timer-statistics info:
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a4c4eda..1ee688c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
unsigned long rcu_delta_jiffies;
struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
u64 time_delta;
+ bool raise_softirq = false;

time_delta = timekeeping_max_deferment();

@@ -584,7 +585,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
delta_jiffies = 1;
} else {
/* Get the next timer wheel timer */
- next_jiffies = get_next_timer_interrupt(last_jiffies);
+ next_jiffies = get_next_timer_interrupt(last_jiffies,
+ &raise_softirq);
delta_jiffies = next_jiffies - last_jiffies;
if (rcu_delta_jiffies < delta_jiffies) {
next_jiffies = last_jiffies + rcu_delta_jiffies;
@@ -703,7 +705,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
*/
tick_do_update_jiffies64(ktime_get());
}
- raise_softirq_irqoff(TIMER_SOFTIRQ);
+ if (raise_softirq)
+ raise_softirq_irqoff(TIMER_SOFTIRQ);
out:
ts->next_jiffies = next_jiffies;
ts->last_jiffies = last_jiffies;
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c5..771f811 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1343,7 +1343,7 @@ static unsigned long cmp_next_hrtimer_event(unsigned long now,
* get_next_timer_interrupt - return the jiffy of the next pending timer
* @now: current time (in jiffies)
*/
-unsigned long get_next_timer_interrupt(unsigned long now)
+unsigned long get_next_timer_interrupt(unsigned long now, bool *raise_softirq)
{
struct tvec_base *base = __this_cpu_read(tvec_bases);
unsigned long expires = now + NEXT_TIMER_MAX_DELTA;
@@ -1357,6 +1357,7 @@ unsigned long get_next_timer_interrupt(unsigned long now)

spin_lock(&base->lock);
if (base->active_timers) {
+ *raise_softirq = true;
if (time_before_eq(base->next_timer, base->timer_jiffies))
base->next_timer = __next_timer_interrupt(base);
expires = base->next_timer;


2015-04-06 23:20:15

by Rik van Riel

[permalink] [raw]
Subject: Re: kernel/timer: avoid spurious ksoftirqd wakeups (v2)

On 04/06/2015 07:15 PM, Marcelo Tosatti wrote:
>
>
> It is only necessary to raise timer softirq
> in case there are active timers.
>
> Limit the ksoftirqd wakeup to that case.
>
> Fixes a latency spike with isolated CPUs and
> nohz full mode.
>
> v2: fix variable initialization
> do not raise timer softirq due to pending irqwork
>
> Reported-and-tested-by: Luiz Capitulino <[email protected]>
> Signed-off-by: Marcelo Tosatti <[email protected]>

Reviewed-by: Rik van Riel <[email protected]>

--
All rights reversed

2015-04-07 21:10:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: kernel/timer: avoid spurious ksoftirqd wakeups (v2)

On Mon, 6 Apr 2015, Marcelo Tosatti wrote:
> It is only necessary to raise timer softirq
> in case there are active timers.

Depends. See below.

> Limit the ksoftirqd wakeup to that case.
>
> Fixes a latency spike with isolated CPUs and
> nohz full mode.

This lacks a proper explanation of the observed issue.

> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> unsigned long rcu_delta_jiffies;
> struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> u64 time_delta;
> + bool raise_softirq = false;

This shadows the function name raise_softirq(). Not pretty.

> time_delta = timekeeping_max_deferment();
>
> @@ -584,7 +585,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> delta_jiffies = 1;
> } else {
> /* Get the next timer wheel timer */
> - next_jiffies = get_next_timer_interrupt(last_jiffies);
> + next_jiffies = get_next_timer_interrupt(last_jiffies,
> + &raise_softirq);
> delta_jiffies = next_jiffies - last_jiffies;
> if (rcu_delta_jiffies < delta_jiffies) {
> next_jiffies = last_jiffies + rcu_delta_jiffies;
> @@ -703,7 +705,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> */
> tick_do_update_jiffies64(ktime_get());
> }
> - raise_softirq_irqoff(TIMER_SOFTIRQ);
> + if (raise_softirq)
> + raise_softirq_irqoff(TIMER_SOFTIRQ);

This breaks when high resolution timers are disabled (compile or
runtime) because then the hrtimer queues are run from the timer
softirq.

Now assume the following situation:

Tick is stopped completely with no timers and no hrtimers pending.

Interrupt happens and schedules a hrtimer.

nohz_stop_sched_tick()
get_next_timer_interrupt(..., &raise_softirq);

---> base->active_timers = 0, so raise_softirq is false

tick_program_event(expires)
clockevents_program_event(expires)

---> Assume expires is already in the past

if (expires <= ktime_get())
return -ETIME;

if (raise_softirq)
raise_softirq_irqoff(TIMER_SOFTIRQ);

So because the tick device was not armed you wont get a tick
interrupt up to the point where tick_nohz_stop_sched_tick() is called
again which might be far off.

I can see that the unconditional raise_softirq_irqoff() is suboptimal,
but it was a rather simple solution to get stuff rolling again because
it forces the cpu out of the inner idle loop which in turn restarts
the tick.

We certainly can do better, but that needs more thought than the
proposed solution.

Thanks,

tglx

2015-04-07 22:12:50

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: kernel/timer: avoid spurious ksoftirqd wakeups (v2)

On Tue, Apr 07, 2015 at 11:10:49PM +0200, Thomas Gleixner wrote:
> On Mon, 6 Apr 2015, Marcelo Tosatti wrote:
> > It is only necessary to raise timer softirq
> > in case there are active timers.
>
> Depends. See below.
>
> > Limit the ksoftirqd wakeup to that case.
> >
> > Fixes a latency spike with isolated CPUs and
> > nohz full mode.
>
> This lacks a proper explanation of the observed issue.
>
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > unsigned long rcu_delta_jiffies;
> > struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> > u64 time_delta;
> > + bool raise_softirq = false;
>
> This shadows the function name raise_softirq(). Not pretty.
>
> > time_delta = timekeeping_max_deferment();
> >
> > @@ -584,7 +585,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > delta_jiffies = 1;
> > } else {
> > /* Get the next timer wheel timer */
> > - next_jiffies = get_next_timer_interrupt(last_jiffies);
> > + next_jiffies = get_next_timer_interrupt(last_jiffies,
> > + &raise_softirq);
> > delta_jiffies = next_jiffies - last_jiffies;
> > if (rcu_delta_jiffies < delta_jiffies) {
> > next_jiffies = last_jiffies + rcu_delta_jiffies;
> > @@ -703,7 +705,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > */
> > tick_do_update_jiffies64(ktime_get());
> > }
> > - raise_softirq_irqoff(TIMER_SOFTIRQ);
> > + if (raise_softirq)
> > + raise_softirq_irqoff(TIMER_SOFTIRQ);
>
> This breaks when high resolution timers are disabled (compile or
> runtime) because then the hrtimer queues are run from the timer
> softirq.
>
> Now assume the following situation:
>
> Tick is stopped completely with no timers and no hrtimers pending.
>
> Interrupt happens and schedules a hrtimer.
>
> nohz_stop_sched_tick()
> get_next_timer_interrupt(..., &raise_softirq);
>
> ---> base->active_timers = 0, so raise_softirq is false
>
> tick_program_event(expires)
> clockevents_program_event(expires)
>
> ---> Assume expires is already in the past
>
> if (expires <= ktime_get())
> return -ETIME;
>
> if (raise_softirq)
> raise_softirq_irqoff(TIMER_SOFTIRQ);
>
> So because the tick device was not armed you wont get a tick
> interrupt up to the point where tick_nohz_stop_sched_tick() is called
> again which might be far off.
>
> I can see that the unconditional raise_softirq_irqoff() is suboptimal,
> but it was a rather simple solution to get stuff rolling again because
> it forces the cpu out of the inner idle loop which in turn restarts
> the tick.

Doh, that's the kind of side effect I was worried about, thanks for the
explanation. The necessary exit out of the idle loop implied by this
softirq when the timer fails to be programmed really deserves a comment.

And note how it relies on the magic !in_interrupt() in this piece of
hardirq code, otherwise that would be softirq from hardirq without
reschedule() and thus no exit from idle loop, and thus no tick
reprogramming.

Let's see if I can come up with some solution to clean this up, if
Marcelo doesn't beat me at it.

>
> We certainly can do better, but that needs more thought than the
> proposed solution.
>
> Thanks,
>
> tglx

2015-04-10 18:09:30

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: kernel/timer: avoid spurious ksoftirqd wakeups (v2)

On Wed, Apr 08, 2015 at 12:12:45AM +0200, Frederic Weisbecker wrote:
> On Tue, Apr 07, 2015 at 11:10:49PM +0200, Thomas Gleixner wrote:
> > On Mon, 6 Apr 2015, Marcelo Tosatti wrote:
> > > It is only necessary to raise timer softirq
> > > in case there are active timers.
> >
> > Depends. See below.
> >
> > > Limit the ksoftirqd wakeup to that case.
> > >
> > > Fixes a latency spike with isolated CPUs and
> > > nohz full mode.
> >
> > This lacks a proper explanation of the observed issue.
> >
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > unsigned long rcu_delta_jiffies;
> > > struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> > > u64 time_delta;
> > > + bool raise_softirq = false;
> >
> > This shadows the function name raise_softirq(). Not pretty.
> >
> > > time_delta = timekeeping_max_deferment();
> > >
> > > @@ -584,7 +585,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > delta_jiffies = 1;
> > > } else {
> > > /* Get the next timer wheel timer */
> > > - next_jiffies = get_next_timer_interrupt(last_jiffies);
> > > + next_jiffies = get_next_timer_interrupt(last_jiffies,
> > > + &raise_softirq);
> > > delta_jiffies = next_jiffies - last_jiffies;
> > > if (rcu_delta_jiffies < delta_jiffies) {
> > > next_jiffies = last_jiffies + rcu_delta_jiffies;
> > > @@ -703,7 +705,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > */
> > > tick_do_update_jiffies64(ktime_get());
> > > }
> > > - raise_softirq_irqoff(TIMER_SOFTIRQ);
> > > + if (raise_softirq)
> > > + raise_softirq_irqoff(TIMER_SOFTIRQ);
> >
> > This breaks when high resolution timers are disabled (compile or
> > runtime) because then the hrtimer queues are run from the timer
> > softirq.
> >
> > Now assume the following situation:
> >
> > Tick is stopped completely with no timers and no hrtimers pending.
> >
> > Interrupt happens and schedules a hrtimer.
> >
> > nohz_stop_sched_tick()
> > get_next_timer_interrupt(..., &raise_softirq);
> >
> > ---> base->active_timers = 0, so raise_softirq is false
> >
> > tick_program_event(expires)
> > clockevents_program_event(expires)
> >
> > ---> Assume expires is already in the past
> >
> > if (expires <= ktime_get())
> > return -ETIME;
> >
> > if (raise_softirq)
> > raise_softirq_irqoff(TIMER_SOFTIRQ);
> >
> > So because the tick device was not armed you wont get a tick
> > interrupt up to the point where tick_nohz_stop_sched_tick() is called
> > again which might be far off.
> >
> > I can see that the unconditional raise_softirq_irqoff() is suboptimal,
> > but it was a rather simple solution to get stuff rolling again because
> > it forces the cpu out of the inner idle loop which in turn restarts
> > the tick.
>
> Doh, that's the kind of side effect I was worried about, thanks for the
> explanation. The necessary exit out of the idle loop implied by this
> softirq when the timer fails to be programmed really deserves a comment.
>
> And note how it relies on the magic !in_interrupt() in this piece of
> hardirq code, otherwise that would be softirq from hardirq without
> reschedule() and thus no exit from idle loop, and thus no tick
> reprogramming.
>
> Let's see if I can come up with some solution to clean this up, if
> Marcelo doesn't beat me at it.

The problem is the following from -RT:

#ifdef CONFIG_PREEMPT_RT_BASE
if (!hrtimer_rt_defer(timer))
return -ETIME;
#endif

It seems a valid solution for this interrupt is to program
sched_timer to the nearest future possible.

if (expires < now)
expires = now + safe_margin;

program_timer(expires);

(perhaps a for loop increasing safe_margin if program_timer fails...)

Is that what you mean by clean up, Frederic?

2015-04-11 01:30:44

by Luiz Capitulino

[permalink] [raw]
Subject: Re: kernel/timer: avoid spurious ksoftirqd wakeups (v2)

On Fri, 10 Apr 2015 15:09:07 -0300
Marcelo Tosatti <[email protected]> wrote:

> On Wed, Apr 08, 2015 at 12:12:45AM +0200, Frederic Weisbecker wrote:
> > On Tue, Apr 07, 2015 at 11:10:49PM +0200, Thomas Gleixner wrote:
> > > On Mon, 6 Apr 2015, Marcelo Tosatti wrote:
> > > > It is only necessary to raise timer softirq
> > > > in case there are active timers.
> > >
> > > Depends. See below.
> > >
> > > > Limit the ksoftirqd wakeup to that case.
> > > >
> > > > Fixes a latency spike with isolated CPUs and
> > > > nohz full mode.
> > >
> > > This lacks a proper explanation of the observed issue.
> > >
> > > > --- a/kernel/time/tick-sched.c
> > > > +++ b/kernel/time/tick-sched.c
> > > > @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > > unsigned long rcu_delta_jiffies;
> > > > struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> > > > u64 time_delta;
> > > > + bool raise_softirq = false;
> > >
> > > This shadows the function name raise_softirq(). Not pretty.
> > >
> > > > time_delta = timekeeping_max_deferment();
> > > >
> > > > @@ -584,7 +585,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > > delta_jiffies = 1;
> > > > } else {
> > > > /* Get the next timer wheel timer */
> > > > - next_jiffies = get_next_timer_interrupt(last_jiffies);
> > > > + next_jiffies = get_next_timer_interrupt(last_jiffies,
> > > > + &raise_softirq);
> > > > delta_jiffies = next_jiffies - last_jiffies;
> > > > if (rcu_delta_jiffies < delta_jiffies) {
> > > > next_jiffies = last_jiffies + rcu_delta_jiffies;
> > > > @@ -703,7 +705,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > > */
> > > > tick_do_update_jiffies64(ktime_get());
> > > > }
> > > > - raise_softirq_irqoff(TIMER_SOFTIRQ);
> > > > + if (raise_softirq)
> > > > + raise_softirq_irqoff(TIMER_SOFTIRQ);
> > >
> > > This breaks when high resolution timers are disabled (compile or
> > > runtime) because then the hrtimer queues are run from the timer
> > > softirq.
> > >
> > > Now assume the following situation:
> > >
> > > Tick is stopped completely with no timers and no hrtimers pending.
> > >
> > > Interrupt happens and schedules a hrtimer.
> > >
> > > nohz_stop_sched_tick()
> > > get_next_timer_interrupt(..., &raise_softirq);
> > >
> > > ---> base->active_timers = 0, so raise_softirq is false
> > >
> > > tick_program_event(expires)
> > > clockevents_program_event(expires)
> > >
> > > ---> Assume expires is already in the past
> > >
> > > if (expires <= ktime_get())
> > > return -ETIME;
> > >
> > > if (raise_softirq)
> > > raise_softirq_irqoff(TIMER_SOFTIRQ);
> > >
> > > So because the tick device was not armed you wont get a tick
> > > interrupt up to the point where tick_nohz_stop_sched_tick() is called
> > > again which might be far off.
> > >
> > > I can see that the unconditional raise_softirq_irqoff() is suboptimal,
> > > but it was a rather simple solution to get stuff rolling again because
> > > it forces the cpu out of the inner idle loop which in turn restarts
> > > the tick.
> >
> > Doh, that's the kind of side effect I was worried about, thanks for the
> > explanation. The necessary exit out of the idle loop implied by this
> > softirq when the timer fails to be programmed really deserves a comment.
> >
> > And note how it relies on the magic !in_interrupt() in this piece of
> > hardirq code, otherwise that would be softirq from hardirq without
> > reschedule() and thus no exit from idle loop, and thus no tick
> > reprogramming.
> >
> > Let's see if I can come up with some solution to clean this up, if
> > Marcelo doesn't beat me at it.
>
> The problem is the following from -RT:
>
> #ifdef CONFIG_PREEMPT_RT_BASE
> if (!hrtimer_rt_defer(timer))
> return -ETIME;
> #endif

Just to clarify, Marcelo and I have found that this is the code
that fails in clockevents_program_event() returning -ETIME:

delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
if (delta <= 0)
return force ? clockevents_program_min_delta(dev) : -ETIME;

It fails in this call trace:

tick_nohz_stop_sched_tick()
hrtimer_start()
__hrtimer_start_range_ns()
hrtimer_enqueue_reprogram()
hrtimer_reprogram() /* timer expires in 38259845000000 */
tick_program_event(38259845000000, 0) /* returns -ETIME */
tick_program_event()
clockevents_program_event() /* returns -ETIME */

> It seems a valid solution for this interrupt is to program
> sched_timer to the nearest future possible.

What about calling the timer function right there, like
hrtimer_interrupt() does?

if (!hrtimer_rt_defer(timer))
__run_hrtimer(timer, &basenow);

>
> if (expires < now)
> expires = now + safe_margin;
>
> program_timer(expires);
>
> (perhaps a for loop increasing safe_margin if program_timer fails...)
>
> Is that what you mean by clean up, Frederic?
>

2015-04-11 09:25:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: kernel/timer: avoid spurious ksoftirqd wakeups (v2)

On Fri, 10 Apr 2015, Luiz Capitulino wrote:
> On Fri, 10 Apr 2015 15:09:07 -0300
> > It seems a valid solution for this interrupt is to program
> > sched_timer to the nearest future possible.
>
> What about calling the timer function right there, like
> hrtimer_interrupt() does?

No, you cannot call the timer function from the enqueue path. The
caller might hold a lock which is taken in the callback as well.....

Thanks,

tglx

2015-04-13 15:06:49

by Luiz Capitulino

[permalink] [raw]
Subject: Re: kernel/timer: avoid spurious ksoftirqd wakeups (v2)

On Sat, 11 Apr 2015 11:25:49 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Fri, 10 Apr 2015, Luiz Capitulino wrote:
> > On Fri, 10 Apr 2015 15:09:07 -0300
> > > It seems a valid solution for this interrupt is to program
> > > sched_timer to the nearest future possible.
> >
> > What about calling the timer function right there, like
> > hrtimer_interrupt() does?
>
> No, you cannot call the timer function from the enqueue path. The
> caller might hold a lock which is taken in the callback as well.....

Fair enough. We're going to think about a different solution.

>
> Thanks,
>
> tglx
>