2005-05-01 07:48:36

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] fix __mod_timer vs __run_timers deadlock.

The bug was identified by Maneesh Soni.

When __mod_timer() changes timer's base it waits for the completion
of timer->function. It is just stupid: the caller of __mod_timer()
can held locks which would prevent completion of the timer's handler.

Solution: do not change the base of the currently running timer.

Side effect: __mod_timer() doesn't garantees anymore that timer will
run on the local cpu.

Signed-off-by: Oleg Nesterov <[email protected]>

--- rc2-mm3/kernel/timer.c~ 2005-04-30 18:43:56.000000000 +0400
+++ rc2-mm3/kernel/timer.c 2005-05-01 14:29:02.000000000 +0400
@@ -212,41 +212,39 @@ int __mod_timer(struct timer_list *timer
timer_base_t *base;
tvec_base_t *new_base;
unsigned long flags;
- int ret = -1;
+ int ret;

BUG_ON(!timer->function);
check_timer(timer);
-
- do {
- base = lock_timer_base(timer, &flags);
- new_base = &__get_cpu_var(tvec_bases);
-
- /* Ensure the timer is serialized. */
- if (base != &new_base->t_base
- && base->running_timer == timer)
- goto unlock;
-
- ret = 0;
- if (timer_pending(timer)) {
- detach_timer(timer, 0);
- ret = 1;
- }
-
- if (base != &new_base->t_base) {
- timer->base = NULL;
- /* Safe: the timer can't be seen via ->entry,
- * and lock_timer_base checks ->base != 0. */
- spin_unlock(&base->lock);
- base = &new_base->t_base;
- spin_lock(&base->lock);
- timer->base = base;
- }
-
- timer->expires = expires;
- internal_add_timer(new_base, timer);
-unlock:
- spin_unlock_irqrestore(&base->lock, flags);
- } while (ret < 0);
+
+ base = lock_timer_base(timer, &flags);
+
+ ret = 0;
+ if (timer_pending(timer)) {
+ detach_timer(timer, 0);
+ ret = 1;
+ }
+
+ new_base = &__get_cpu_var(tvec_bases);
+
+ if (base != &new_base->t_base) {
+ if (unlikely(base->running_timer == timer))
+ /* Don't change timer's base while it is running.
+ * Needed for serialization of timer wrt itself. */
+ new_base = container_of(base, tvec_base_t, t_base);
+ else {
+ timer->base = NULL;
+ /* Safe: the timer can't be seen via ->entry,
+ * and lock_timer_base checks ->base != 0. */
+ spin_unlock(&base->lock);
+ spin_lock(&new_base->t_base.lock);
+ timer->base = &new_base->t_base;
+ }
+ }
+
+ timer->expires = expires;
+ internal_add_timer(new_base, timer);
+ spin_unlock_irqrestore(&new_base->t_base.lock, flags);

return ret;
}


2005-05-01 09:32:32

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

Oleg Nesterov <[email protected]> wrote:
>
> When __mod_timer() changes timer's base it waits for the completion
> of timer->function. It is just stupid: the caller of __mod_timer()
> can held locks which would prevent completion of the timer's handler.
>
> Solution: do not change the base of the currently running timer.

OK, fingers crossed. Juergen, it would be great if you could test Oleg's
patch sometime.

Oleg, could you please review the changelog in
timers-fixes-improvements.patch sometime, make sure that it's all accurate
and complete?

Also, please review the comments in timer.c - some of the info which you
placed in that changelog really should be brought into the code itself so
people can understand the data structures and their interactions without
having to trawl the changelogs and mailing list archives.

For example, in __mod_timer:

if (timer_pending(timer)) {
detach_timer(timer, 0);
ret = 1;
}

new_base = &__get_cpu_var(tvec_bases);

if (base != &new_base->t_base) {
/*
* We really need a comment here explaining what
* (base != &new_base->t_base) *means*
*/
if (unlikely(base->running_timer == timer)) {
/*

2005-05-02 04:10:36

by Maneesh Soni

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

On Sun, May 01, 2005 at 11:55:33AM +0400, Oleg Nesterov wrote:
> The bug was identified by Maneesh Soni.
>

Thanks, to Sharyathi too as he is the first to see the problem and
provided kdump to us. I just did initial kdump analysis.

Sharyathi, it would be great if you can test the following patch in your
setup.

Thanks
Maneesh


> When __mod_timer() changes timer's base it waits for the completion
> of timer->function. It is just stupid: the caller of __mod_timer()
> can held locks which would prevent completion of the timer's handler.
>
> Solution: do not change the base of the currently running timer.
>
> Side effect: __mod_timer() doesn't garantees anymore that timer will
> run on the local cpu.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- rc2-mm3/kernel/timer.c~ 2005-04-30 18:43:56.000000000 +0400
> +++ rc2-mm3/kernel/timer.c 2005-05-01 14:29:02.000000000 +0400
> @@ -212,41 +212,39 @@ int __mod_timer(struct timer_list *timer
> timer_base_t *base;
> tvec_base_t *new_base;
> unsigned long flags;
> - int ret = -1;
> + int ret;
>
> BUG_ON(!timer->function);
> check_timer(timer);
> -
> - do {
> - base = lock_timer_base(timer, &flags);
> - new_base = &__get_cpu_var(tvec_bases);
> -
> - /* Ensure the timer is serialized. */
> - if (base != &new_base->t_base
> - && base->running_timer == timer)
> - goto unlock;
> -
> - ret = 0;
> - if (timer_pending(timer)) {
> - detach_timer(timer, 0);
> - ret = 1;
> - }
> -
> - if (base != &new_base->t_base) {
> - timer->base = NULL;
> - /* Safe: the timer can't be seen via ->entry,
> - * and lock_timer_base checks ->base != 0. */
> - spin_unlock(&base->lock);
> - base = &new_base->t_base;
> - spin_lock(&base->lock);
> - timer->base = base;
> - }
> -
> - timer->expires = expires;
> - internal_add_timer(new_base, timer);
> -unlock:
> - spin_unlock_irqrestore(&base->lock, flags);
> - } while (ret < 0);
> +
> + base = lock_timer_base(timer, &flags);
> +
> + ret = 0;
> + if (timer_pending(timer)) {
> + detach_timer(timer, 0);
> + ret = 1;
> + }
> +
> + new_base = &__get_cpu_var(tvec_bases);
> +
> + if (base != &new_base->t_base) {
> + if (unlikely(base->running_timer == timer))
> + /* Don't change timer's base while it is running.
> + * Needed for serialization of timer wrt itself. */
> + new_base = container_of(base, tvec_base_t, t_base);
> + else {
> + timer->base = NULL;
> + /* Safe: the timer can't be seen via ->entry,
> + * and lock_timer_base checks ->base != 0. */
> + spin_unlock(&base->lock);
> + spin_lock(&new_base->t_base.lock);
> + timer->base = &new_base->t_base;
> + }
> + }
> +
> + timer->expires = expires;
> + internal_add_timer(new_base, timer);
> + spin_unlock_irqrestore(&new_base->t_base.lock, flags);
>
> return ret;
> }

--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990

2005-05-02 22:50:37

by Juergen Kreileder

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

Andrew Morton <[email protected]> writes:

> Oleg Nesterov <[email protected]> wrote:
>>
>> When __mod_timer() changes timer's base it waits for the completion
>> of timer->function. It is just stupid: the caller of __mod_timer()
>> can held locks which would prevent completion of the timer's
>> handler.
>>
>> Solution: do not change the base of the currently running timer.
>
> OK, fingers crossed. Juergen, it would be great if you could test
> Oleg's patch sometime.

I had one more lockup yesterday but that probably was caused by
something else because Azureus is running fine for 24 hours now.

Thanks Oleg!


Juergen

--
Juergen Kreileder, Blackdown Java-Linux Team
http://blog.blackdown.de/

2005-05-03 00:16:18

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

On Tue, 2005-05-03 at 00:50 +0200, Juergen Kreileder wrote:
> Andrew Morton <[email protected]> writes:
>
> > Oleg Nesterov <[email protected]> wrote:
> >>
> >> When __mod_timer() changes timer's base it waits for the completion
> >> of timer->function. It is just stupid: the caller of __mod_timer()
> >> can held locks which would prevent completion of the timer's
> >> handler.
> >>
> >> Solution: do not change the base of the currently running timer.
> >
> > OK, fingers crossed. Juergen, it would be great if you could test
> > Oleg's patch sometime.
>
> I had one more lockup yesterday but that probably was caused by
> something else because Azureus is running fine for 24 hours now.

Well, there may be other issues brought by this new timer code though.
I'm running G5s regulary without a lockup or anything for weeks, so it
would be interesting if you could try to find out what's involved in
that other lockup you had.

Ben


2005-05-03 01:24:20

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

On Tue, 2005-05-03 at 10:13 +1000, Benjamin Herrenschmidt wrote:
> Well, there may be other issues brought by this new timer code though.
> I'm running G5s regulary without a lockup or anything for weeks, so it
> would be interesting if you could try to find out what's involved in
> that other lockup you had.

It seems like it would not be to hard to create a timer test suite that
just hammers the timer subsystem, creating and deleting and modifying
zillions of timers, changing the system time, etc. Combined with
running with HZ=10000 or something it seems like you could shake out
bugs a lot faster than just running & waiting for a race to show up.

I've seen timer related issues (the set_rtc_mmss issue that George
Anzinger fixed) while testing the RT patchset that I could only ever
reproduce once.

Lee


2005-05-03 01:31:53

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

On Mon, 2005-05-02 at 21:24 -0400, Lee Revell wrote:
> On Tue, 2005-05-03 at 10:13 +1000, Benjamin Herrenschmidt wrote:
> > Well, there may be other issues brought by this new timer code though.
> > I'm running G5s regulary without a lockup or anything for weeks, so it
> > would be interesting if you could try to find out what's involved in
> > that other lockup you had.
>
> It seems like it would not be to hard to create a timer test suite that
> just hammers the timer subsystem, creating and deleting and modifying
> zillions of timers, changing the system time, etc. Combined with
> running with HZ=10000 or something it seems like you could shake out
> bugs a lot faster than just running & waiting for a race to show up.

Well, yes and know... the timer subsytem is very senstivie to things
like memory barriers issues, and thus changes in the "barrier" semantics
of a timer call may have an impact on the caller code regardless of the
validity of the timer code itself, that sort of thing ...

> I've seen timer related issues (the set_rtc_mmss issue that George
> Anzinger fixed) while testing the RT patchset that I could only ever
> reproduce once.


2005-05-03 01:32:50

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

On Mon, 2 May 2005, Lee Revell wrote:

> On Tue, 2005-05-03 at 10:13 +1000, Benjamin Herrenschmidt wrote:
> > Well, there may be other issues brought by this new timer code though.
> > I'm running G5s regulary without a lockup or anything for weeks, so it
> > would be interesting if you could try to find out what's involved in
> > that other lockup you had.
>
> It seems like it would not be to hard to create a timer test suite that
> just hammers the timer subsystem, creating and deleting and modifying
> zillions of timers, changing the system time, etc. Combined with
> running with HZ=10000 or something it seems like you could shake out
> bugs a lot faster than just running & waiting for a race to show up.
>
> I've seen timer related issues (the set_rtc_mmss issue that George
> Anzinger fixed) while testing the RT patchset that I could only ever
> reproduce once.

Heavy networking load should do it, something like volanomark actually
does quite a decent job of stressing timers.

2005-05-03 01:37:10

by Juergen Kreileder

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

Benjamin Herrenschmidt <[email protected]> writes:

> On Tue, 2005-05-03 at 00:50 +0200, Juergen Kreileder wrote:
>> Andrew Morton <[email protected]> writes:
>>
>>> Oleg Nesterov <[email protected]> wrote:
>>>>
>>>> When __mod_timer() changes timer's base it waits for the
>>>> completion of timer->function. It is just stupid: the caller of
>>>> __mod_timer() can held locks which would prevent completion of
>>>> the timer's handler.
>>>>
>>>> Solution: do not change the base of the currently running timer.
>>>
>>> OK, fingers crossed. Juergen, it would be great if you could test
>>> Oleg's patch sometime.
>>
>> I had one more lockup yesterday but that probably was caused by
>> something else because Azureus is running fine for 24 hours now.
>
> Well, there may be other issues brought by this new timer code
> though. I'm running G5s regulary without a lockup or anything for
> weeks, so it would be interesting if you could try to find out
> what's involved in that other lockup you had.

Sure, if I find a way to reproduce it. It happened only once so far.


BTW, xmon doesn't work for me. 'echo x > /proc/sysrq-trigger' gives
me a :mon> prompt but I can't enter any commands.


Juergen

--
Juergen Kreileder, Blackdown Java-Linux Team
http://blog.blackdown.de/

2005-05-03 02:43:06

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

Juergen Kreileder writes:

> BTW, xmon doesn't work for me. 'echo x > /proc/sysrq-trigger' gives
> me a :mon> prompt but I can't enter any commands.

We don't have polled interrupts-off input methods for USB keyboards.
If you get a "stealth" serial port for your G5 you can run xmon over
that.

Paul.

2005-05-03 19:02:31

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

On Tue, 3 May 2005 12:48:40 +1000
Paul Mackerras <[email protected]> wrote:

> Juergen Kreileder writes:
>
> > BTW, xmon doesn't work for me. 'echo x > /proc/sysrq-trigger' gives
> > me a :mon> prompt but I can't enter any commands.
>
> We don't have polled interrupts-off input methods for USB keyboards.

There is nothing in the USB nor INPUT layers really blocking an
implementation of this BTW.

I need this on Sparc64 as well, so that PROM command line input
works after booting with USB keyboards.

2005-05-03 23:51:03

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

On Wed, 04 May 2005 09:44:53 +1000
Benjamin Herrenschmidt <[email protected]> wrote:

> Nothing prevents it ? well, I wouldn't be that optimistic :) The USB
> stuff is a bit complex, it inlcudes doing DMAs, so manipulating the
> iommu, dealing with URB queues (and thus allocating/releasing them)
> etc... and especially in the context of xmon, that mean letting the
> driver do a lot of these at any time whatever state the system is...

I think doing calls to the USB interrupt handler would work.
I'm not being crazy :)

But yeah we could do a micro-stack as well, but as you noted the
transfer to/from the real USB HCI driver would be non-trivial.

I truly believe just calling the real USB HCI driver interrupt
handler in a polling fashion is the way to go.

2005-05-03 23:47:41

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

On Tue, 2005-05-03 at 11:51 -0700, David S. Miller wrote:
> On Tue, 3 May 2005 12:48:40 +1000
> Paul Mackerras <[email protected]> wrote:
>
> > Juergen Kreileder writes:
> >
> > > BTW, xmon doesn't work for me. 'echo x > /proc/sysrq-trigger' gives
> > > me a :mon> prompt but I can't enter any commands.
> >
> > We don't have polled interrupts-off input methods for USB keyboards.
>
> There is nothing in the USB nor INPUT layers really blocking an
> implementation of this BTW.
>
> I need this on Sparc64 as well, so that PROM command line input
> works after booting with USB keyboards.

Nothing prevents it ? well, I wouldn't be that optimistic :) The USB
stuff is a bit complex, it inlcudes doing DMAs, so manipulating the
iommu, dealing with URB queues (and thus allocating/releasing them)
etc... and especially in the context of xmon, that mean letting the
driver do a lot of these at any time whatever state the system is...

So it's possible, but will probably be difficult and not very reliable.

An alternative is to take over the OHCI chip completely and implement a
micro-stack that only does interrupt pipe polling and hard decodes kbd
data based on a standard boot protocol layout... Handing the controller
back to normal USB operations may prove a bit difficult here though.

Ben.


2005-05-04 00:07:48

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] fix __mod_timer vs __run_timers deadlock.

On Tue, 2005-05-03 at 16:39 -0700, David S. Miller wrote:
> On Wed, 04 May 2005 09:44:53 +1000
> Benjamin Herrenschmidt <[email protected]> wrote:
>
> > Nothing prevents it ? well, I wouldn't be that optimistic :) The USB
> > stuff is a bit complex, it inlcudes doing DMAs, so manipulating the
> > iommu, dealing with URB queues (and thus allocating/releasing them)
> > etc... and especially in the context of xmon, that mean letting the
> > driver do a lot of these at any time whatever state the system is...
>
> I think doing calls to the USB interrupt handler would work.
> I'm not being crazy :)

Maybe, but it will trigger all other USB drivers around and really
weird things might happen.. Oh well, I may give it a try one of
these days anyway. I could probably even do some hack to force the
OHCI to only service incoming interrupt URBs for input devices or
such things...

> But yeah we could do a micro-stack as well, but as you noted the
> transfer to/from the real USB HCI driver would be non-trivial.
>
> I truly believe just calling the real USB HCI driver interrupt
> handler in a polling fashion is the way to go.

But it feels very fragile... Oh well, I will experiment with that one of
these days (no time right now though).

Ben.