2015-05-05 13:31:04

by Jiri Bohac

[permalink] [raw]
Subject: running hrtimer_start on an already active hrtimer?

Hi,


I came across a strange bug (in a very old kernel) that triggers
the
BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
in __run_hrtimer().

The code runs hrtimer_start() on an already started hrtimer.
Looking at the description of hrtimer_start() it looks
like something that is allowed:
/**
* hrtimer_start - (re)start an hrtimer on the current CPU
...
* Returns:
* 0 on success
* 1 when the timer was active

Is this really supposed to work?

I think it's not immune to this race condition:

CPU0 CPU1
__run_hrtimer()
__remove_hrtimer(...HRTIMER_STATE_CALLBACK)
//clears HRTIMER_STATE_ENQUEUED
...
raw_spin_unlock(&cpu_base->lock);
restart = fn(timer);
hrtimer_start()
__hrtimer_start_range_ns()
//remove_hrtimer() does nothing because
// HRTIMER_STATE_ENQUEUED is not set
enqueue_hrtimer()
raw_spin_lock(&cpu_base->lock);
...
BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
// state has HRTIMER_STATE_ENQUEUED set



Should __hrtimer_start_range_ns() do something like
hrtimer_cancel - i.e. explicitly check for ...
HRTIMER_STATE_CALLBACK?


Thanks,

--
Jiri Bohac <[email protected]>
SUSE Labs, SUSE CZ


2015-05-05 16:09:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: running hrtimer_start on an already active hrtimer?

On Tue, 5 May 2015, Jiri Bohac wrote:
> Hi,
>
>
> I came across a strange bug (in a very old kernel) that triggers
> the
> BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
> in __run_hrtimer().
>
> The code runs hrtimer_start() on an already started hrtimer.
> Looking at the description of hrtimer_start() it looks
> like something that is allowed:
> /**
> * hrtimer_start - (re)start an hrtimer on the current CPU
> ...
> * Returns:
> * 0 on success
> * 1 when the timer was active
>
> Is this really supposed to work?
>
> I think it's not immune to this race condition:
>
> CPU0 CPU1
> __run_hrtimer()
> __remove_hrtimer(...HRTIMER_STATE_CALLBACK)
> //clears HRTIMER_STATE_ENQUEUED
> ...
> raw_spin_unlock(&cpu_base->lock);
> restart = fn(timer);
> hrtimer_start()
> __hrtimer_start_range_ns()
> //remove_hrtimer() does nothing because
> // HRTIMER_STATE_ENQUEUED is not set
> enqueue_hrtimer()
> raw_spin_lock(&cpu_base->lock);
> ...
> BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
> // state has HRTIMER_STATE_ENQUEUED set
>

That's in the conditional path:

if (restart != HRTIMER_NORESTART) {
BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
....

Which was intentional when we implemented hrtimers in the very
beginning. We wanted to enforce that restart from the callback is not
mixed with a start from some other place.


We removed that restriction recently (queued for 4.2 in
tip/timers/core)

> Should __hrtimer_start_range_ns() do something like
> hrtimer_cancel - i.e. explicitly check for ...
> HRTIMER_STATE_CALLBACK?

No, you cannot do anything about it other than lifting the restriction
or preventing the site which handles the hrtimer to start it.

Thanks,

tglx