2009-07-21 18:26:37

by Martin Schwidefsky

[permalink] [raw]
Subject: [RFC][PATCH] cache __next_timer_interrupt result

From: Martin Schwidefsky <[email protected]>

Each time a cpu goes to sleep on a NOHZ=y system the timer wheel is
searched for the next timer interrupt. It can take quite a few cycles
to find the next pending timer. This patch adds a field to tvec_base
that caches the result of __next_timer_interrupt. The hit ratio is
around 80% on my thinkpad under normal use, on a server I've seen
hit ratios from 5% to 95% dependent on the workload.

Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: john stultz <[email protected]>
Cc: Venki Pallipadi <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
---
kernel/timer.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -72,6 +72,7 @@ struct tvec_base {
spinlock_t lock;
struct timer_list *running_timer;
unsigned long timer_jiffies;
+ unsigned long next_timer;
struct tvec_root tv1;
struct tvec tv2;
struct tvec tv3;
@@ -622,6 +623,9 @@ __mod_timer(struct timer_list *timer, un

if (timer_pending(timer)) {
detach_timer(timer, 0);
+ if (timer->expires == base->next_timer &&
+ !tbase_get_deferrable(timer->base))
+ base->next_timer = base->timer_jiffies;
ret = 1;
} else {
if (pending_only)
@@ -663,6 +667,9 @@ __mod_timer(struct timer_list *timer, un
}

timer->expires = expires;
+ if (timer->expires < base->next_timer &&
+ !tbase_get_deferrable(timer->base))
+ base->next_timer = timer->expires;
internal_add_timer(base, timer);

out_unlock:
@@ -781,6 +788,9 @@ void add_timer_on(struct timer_list *tim
spin_lock_irqsave(&base->lock, flags);
timer_set_base(timer, base);
debug_timer_activate(timer);
+ if (timer->expires < base->next_timer &&
+ !tbase_get_deferrable(timer->base))
+ base->next_timer = timer->expires;
internal_add_timer(base, timer);
/*
* Check whether the other CPU is idle and needs to be
@@ -817,6 +827,9 @@ int del_timer(struct timer_list *timer)
base = lock_timer_base(timer, &flags);
if (timer_pending(timer)) {
detach_timer(timer, 1);
+ if (timer->expires == base->next_timer &&
+ !tbase_get_deferrable(timer->base))
+ base->next_timer = base->timer_jiffies;
ret = 1;
}
spin_unlock_irqrestore(&base->lock, flags);
@@ -850,6 +863,9 @@ int try_to_del_timer_sync(struct timer_l
ret = 0;
if (timer_pending(timer)) {
detach_timer(timer, 1);
+ if (timer->expires == base->next_timer &&
+ !tbase_get_deferrable(timer->base))
+ base->next_timer = base->timer_jiffies;
ret = 1;
}
out:
@@ -1134,7 +1150,9 @@ unsigned long get_next_timer_interrupt(u
unsigned long expires;

spin_lock(&base->lock);
- expires = __next_timer_interrupt(base);
+ if (base->next_timer <= base->timer_jiffies)
+ base->next_timer = __next_timer_interrupt(base);
+ expires = base->next_timer;
spin_unlock(&base->lock);

if (time_before_eq(expires, now))
@@ -1523,6 +1541,7 @@ static int __cpuinit init_timers_cpu(int
INIT_LIST_HEAD(base->tv1.vec + j);

base->timer_jiffies = jiffies;
+ base->next_timer = base->timer_jiffies;
return 0;
}

@@ -1535,6 +1554,9 @@ static void migrate_timer_list(struct tv
timer = list_first_entry(head, struct timer_list, entry);
detach_timer(timer, 0);
timer_set_base(timer, new_base);
+ if (timer->expires < new_base->next_timer &&
+ !tbase_get_deferrable(timer->base))
+ new_base->next_timer = timer->expires;
internal_add_timer(new_base, timer);
}
}


2009-07-22 14:38:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC][PATCH] cache __next_timer_interrupt result

On Tue, 21 Jul 2009, Martin Schwidefsky wrote:

> From: Martin Schwidefsky <[email protected]>
>
> Each time a cpu goes to sleep on a NOHZ=y system the timer wheel is
> searched for the next timer interrupt. It can take quite a few cycles
> to find the next pending timer. This patch adds a field to tvec_base
> that caches the result of __next_timer_interrupt. The hit ratio is
> around 80% on my thinkpad under normal use, on a server I've seen

Nice, I like it.

> hit ratios from 5% to 95% dependent on the workload.

Which workloads result in lower hit ratios ? Heavy networking ?

Thanks,

tglx

2009-07-22 16:02:53

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [RFC][PATCH] cache __next_timer_interrupt result

On Wed, 22 Jul 2009 16:38:18 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Tue, 21 Jul 2009, Martin Schwidefsky wrote:
>
> > From: Martin Schwidefsky <[email protected]>
> >
> > Each time a cpu goes to sleep on a NOHZ=y system the timer wheel is
> > searched for the next timer interrupt. It can take quite a few cycles
> > to find the next pending timer. This patch adds a field to tvec_base
> > that caches the result of __next_timer_interrupt. The hit ratio is
> > around 80% on my thinkpad under normal use, on a server I've seen
>
> Nice, I like it.

Thanks :-)

> > hit ratios from 5% to 95% dependent on the workload.
>
> Which workloads result in lower hit ratios ? Heavy networking ?

5% ping-pong packet over loopback between two cpus. So yes, networking.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.