2005-11-29 01:34:09

by Roman Zippel

[permalink] [raw]
Subject: [PATCH 1/9] timer locking optimization


This simplifies the work done lock_timer_base() and moves it to the less
common path at __mod_timer(). Instead of setting the base to NULL we
already set it to the new base there. If another __mod_timer() gets the
lock it can only move the base to another cpu, so it's enough to check
that base is still the same.
Only add_timer_on() could add timer on the current cpu, but this is not
legal, so we don't have to recheck that the timer has been reactivated
on the current cpu.

Signed-off-by: Roman Zippel <[email protected]>

---

kernel/timer.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)

Index: linux-2.6-mm/kernel/timer.c
===================================================================
--- linux-2.6-mm.orig/kernel/timer.c 2005-11-28 22:32:47.000000000 +0100
+++ linux-2.6-mm/kernel/timer.c 2005-11-28 23:04:46.000000000 +0100
@@ -178,27 +178,22 @@ static inline void detach_timer(struct t
*
* So __run_timers/migrate_timers can safely modify all timers which could
* be found on ->tvX lists.
- *
- * When the timer's base is locked, and the timer removed from list, it is
- * possible to set timer->base = NULL and drop the lock: the timer remains
- * locked.
*/
static timer_base_t *lock_timer_base(struct timer_list *timer,
unsigned long *flags)
{
timer_base_t *base;

- for (;;) {
- base = timer->base;
- if (likely(base != NULL)) {
- spin_lock_irqsave(&base->lock, *flags);
- if (likely(base == timer->base))
- return base;
- /* The timer has migrated to another CPU */
- spin_unlock_irqrestore(&base->lock, *flags);
- }
+ base = timer->base;
+ spin_lock_irqsave(&base->lock, *flags);
+ while (unlikely(base != timer->base)) {
+ /* The timer has migrated to another CPU */
+ spin_unlock(&base->lock);
cpu_relax();
+ base = timer->base;
+ spin_lock(&base->lock);
}
+ return base;
}

int __mod_timer(struct timer_list *timer, unsigned long expires)
@@ -212,6 +207,7 @@ int __mod_timer(struct timer_list *timer

base = lock_timer_base(timer, &flags);

+restart:
if (timer_pending(timer)) {
detach_timer(timer, 0);
ret = 1;
@@ -231,11 +227,16 @@ int __mod_timer(struct timer_list *timer
/* The timer remains on a former base */
new_base = container_of(base, tvec_base_t, t_base);
} else {
- /* See the comment in lock_timer_base() */
- timer->base = NULL;
+ /*
+ * We shortly release the timer and the timer can
+ * migrate to another cpu, so recheck the base after
+ * getting the lock.
+ */
+ timer->base = &new_base->t_base;
spin_unlock(&base->lock);
spin_lock(&new_base->t_base.lock);
- timer->base = &new_base->t_base;
+ if (unlikely(timer->base != &new_base->t_base))
+ goto restart;
}
}


2005-11-29 11:43:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/9] timer locking optimization

Roman Zippel wrote:
>
> + base = timer->base;
> + spin_lock_irqsave(&base->lock, *flags);
> + while (unlikely(base != timer->base)) {
> + /* The timer has migrated to another CPU */
> + spin_unlock(&base->lock);
> cpu_relax();
> + base = timer->base;
> + spin_lock(&base->lock);

This spins with interrupts disabled, not good, imho.

> int __mod_timer(struct timer_list *timer, unsigned long expires)
> @@ -212,6 +207,7 @@ int __mod_timer(struct timer_list *timer
>
> base = lock_timer_base(timer, &flags);
>
> +restart:
> if (timer_pending(timer)) {
> detach_timer(timer, 0);
> ret = 1;
> @@ -231,11 +227,16 @@ int __mod_timer(struct timer_list *timer
> /* The timer remains on a former base */
> new_base = container_of(base, tvec_base_t, t_base);
> } else {
> - /* See the comment in lock_timer_base() */
> - timer->base = NULL;
> + /*
> + * We shortly release the timer and the timer can
> + * migrate to another cpu, so recheck the base after
> + * getting the lock.
> + */
> + timer->base = &new_base->t_base;
> spin_unlock(&base->lock);
> spin_lock(&new_base->t_base.lock);
> - timer->base = &new_base->t_base;
> + if (unlikely(timer->base != &new_base->t_base))
> + goto restart;

This is not right.

This way you can delete the timer (ret == 1), notice that timer's base
was changed after re-locking, goto restart, and get ret == 0.

Also, you have wrong value of 'base' after 'goto restart'.

Oleg.

2005-11-29 12:17:29

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH 1/9] timer locking optimization

Hi,

On Tue, 29 Nov 2005, Oleg Nesterov wrote:

> > + base = timer->base;
> > + spin_lock_irqsave(&base->lock, *flags);
> > + while (unlikely(base != timer->base)) {
> > + /* The timer has migrated to another CPU */
> > + spin_unlock(&base->lock);
> > cpu_relax();
> > + base = timer->base;
> > + spin_lock(&base->lock);
>
> This spins with interrupts disabled, not good, imho.

It's the slow path anyway, so restoring flags should indeed be fine.

> This way you can delete the timer (ret == 1), notice that timer's base
> was changed after re-locking, goto restart, and get ret == 0.

ret is only set, but not reset, so if __mod_timer() deleted the timer it
will return 1.

> Also, you have wrong value of 'base' after 'goto restart'.

Indeed, thanks for spotting this.

bye, Roman

2005-11-30 02:32:47

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH 1/9] timer locking optimization

Hi,

On Tue, 29 Nov 2005, Oleg Nesterov wrote:

> Also, you have wrong value of 'base' after 'goto restart'.

Here is the updated patch, which fixes this.

bye, Roman

Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c 2005-11-29 13:29:35.000000000 +0100
+++ linux-2.6/kernel/timer.c 2005-11-29 14:01:42.000000000 +0100
@@ -178,27 +178,20 @@ static inline void detach_timer(struct t
*
* So __run_timers/migrate_timers can safely modify all timers which could
* be found on ->tvX lists.
- *
- * When the timer's base is locked, and the timer removed from list, it is
- * possible to set timer->base = NULL and drop the lock: the timer remains
- * locked.
*/
static timer_base_t *lock_timer_base(struct timer_list *timer,
unsigned long *flags)
{
timer_base_t *base;

- for (;;) {
- base = timer->base;
- if (likely(base != NULL)) {
- spin_lock_irqsave(&base->lock, *flags);
- if (likely(base == timer->base))
- return base;
- /* The timer has migrated to another CPU */
- spin_unlock_irqrestore(&base->lock, *flags);
- }
- cpu_relax();
- }
+again:
+ base = timer->base;
+ spin_lock_irqsave(&base->lock, *flags);
+ if (likely(base == timer->base))
+ return base;
+ /* The timer has migrated to another CPU */
+ spin_unlock_irqrestore(&base->lock, *flags);
+ goto again;
}

int __mod_timer(struct timer_list *timer, unsigned long expires)
@@ -210,6 +203,7 @@ int __mod_timer(struct timer_list *timer

BUG_ON(!timer->function);

+restart:
base = lock_timer_base(timer, &flags);

if (timer_pending(timer)) {
@@ -231,11 +225,18 @@ int __mod_timer(struct timer_list *timer
/* The timer remains on a former base */
new_base = container_of(base, tvec_base_t, t_base);
} else {
- /* See the comment in lock_timer_base() */
- timer->base = NULL;
+ /*
+ * We shortly release the timer and the timer can
+ * migrate to another cpu, so recheck the base after
+ * getting the lock.
+ */
+ timer->base = &new_base->t_base;
spin_unlock(&base->lock);
spin_lock(&new_base->t_base.lock);
- timer->base = &new_base->t_base;
+ if (unlikely(timer->base != &new_base->t_base)) {
+ spin_unlock_irqrestore(&new_base->t_base.lock, flags);
+ goto restart;
+ }
}
}

2005-11-30 08:44:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/9] timer locking optimization

Roman Zippel wrote:
>
> @@ -210,6 +203,7 @@ int __mod_timer(struct timer_list *timer
>
> BUG_ON(!timer->function);
>
> +restart:
> base = lock_timer_base(timer, &flags);
>
> if (timer_pending(timer)) {
> @@ -231,11 +225,18 @@ int __mod_timer(struct timer_list *timer
> /* The timer remains on a former base */
> new_base = container_of(base, tvec_base_t, t_base);
> } else {
> - /* See the comment in lock_timer_base() */
> - timer->base = NULL;
> + /*
> + * We shortly release the timer and the timer can
> + * migrate to another cpu, so recheck the base after
> + * getting the lock.
> + */
> + timer->base = &new_base->t_base;
> spin_unlock(&base->lock);
> spin_lock(&new_base->t_base.lock);

Still not correct, I beleive.

The problem is that you are changing timer->base = &new_base->t_base
without holding new_base->t_base.lock, this is racy vs timer_del().
Suppose we are calling __mod_timer(pending_timer):

__mod_timer() locks old base, deletes the timer, changes timer's base,
unlocks old base.

Another cpu calls del_timer(). It is possible that this cpu will
see the new value of ->base == new_base before it sees changes in the
timer->entry. It locks new_base, but this is not enough, because the
timer was removed from list under the old base's lock and we don't
have a proper serialization. So it is possible that del_timer() sees
that the timer is still pending, and will try to delete it again.

In other words, in this scenario __mod_timer() and del_timer() will
take 2 different locks trying to serialize access to common data.

You can solve this with memory barriers, but this will be pessimization,
not optimization (you will also need smp_rmb in lock_timer_base()).

Honestly, personally I don'like this patch even if it was correct.
It complicates the code, and the only win is that it removes
'if (likely(base != NULL))' from the fast path, I doubt this is
noticeable.

Also, __mod_timer() becomes "non atomic", but probably this is ok.

Btw, I think you have the same problems in "[PATCH 2/9] ptimer core".

Oleg.

2005-11-30 10:09:03

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/9] timer locking optimization

Roman Zippel wrote:
>
> int __mod_timer(struct timer_list *timer, unsigned long expires)
> @@ -210,6 +203,7 @@ int __mod_timer(struct timer_list *timer
>
> BUG_ON(!timer->function);
>
> +restart:
> base = lock_timer_base(timer, &flags);
>
> if (timer_pending(timer)) {

Another problem. __mode_timer() does:

if (timer_pending(timer)) {
detach_timer(timer, 0);

Note that 'clear_pending' parameter == 0. This means that detach_timer()
will remove the timer from list, but won't clear 'pending' status. So
this will crash after 'goto restart' (or in case of concurrent del_timer()).

Oleg.

2005-11-30 11:06:18

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH 1/9] timer locking optimization

Hi,

On Wed, 30 Nov 2005, Oleg Nesterov wrote:

> Another problem. __mode_timer() does:
>
> if (timer_pending(timer)) {
> detach_timer(timer, 0);
>
> Note that 'clear_pending' parameter == 0. This means that detach_timer()
> will remove the timer from list, but won't clear 'pending' status. So
> this will crash after 'goto restart' (or in case of concurrent del_timer()).

I just noticed this too. I'll drop the patch and I'll also change the
second patch.
Thanks for looking into this.

bye, Roman

2005-11-30 15:07:32

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH 1/9] timer locking optimization

Hi,

On Wed, 30 Nov 2005, Oleg Nesterov wrote:

> Still not correct, I beleive.

Here is a new idea, what do you think about using spin_trylock(), e.g.
something like:

if (spin_trylock(&new_base->t_base.lock)) {
timer->base = &new_base->t_base;
spin_unlock(&base->lock);
} else
new_base = container_of(base, tvec_base_t, t_base);

It's not like we must start the timer on the current cpu and this might
even be faster. If the new base is busy on another cpu, it's possible we
have to pull dirty cache lines from the other cpu, where we might already
have the data from the current base already in the cache from the detach.

bye, Roman

2005-11-30 17:02:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/9] timer locking optimization

Roman Zippel wrote:
>
> Hi,
>
> On Wed, 30 Nov 2005, Oleg Nesterov wrote:
>
> > Still not correct, I beleive.
>
> Here is a new idea, what do you think about using spin_trylock(), e.g.
> something like:
>
> if (spin_trylock(&new_base->t_base.lock)) {
> timer->base = &new_base->t_base;
> spin_unlock(&base->lock);
> } else
> new_base = container_of(base, tvec_base_t, t_base);
>
> It's not like we must start the timer on the current cpu and this might
> even be faster. If the new base is busy on another cpu, it's possible we
> have to pull dirty cache lines from the other cpu, where we might already
> have the data from the current base already in the cache from the detach.

... and this will simplify the code! I think this is a nice idea.
Something like this:

if (base != &new_base->t_base) {

if (unlikely(base->running_timer == timer) ||
!spin_trylock(&new_base->t_base.lock)) {
/* The timer remains on a former base */
new_base = container_of(base, tvec_base_t, t_base);
} else {
timer->base = &new_base->t_base;
spin_unlock(&base->lock);
}
}

But please update comments too, and don't forget to cc Ingo and Andrew
at least. In my opinion this patch should go separately from other ptimer
patches.

Oleg.