Andrew Morton wrote:
>
> If we're prepared to rule that a timer handler is not allowed to do
> add_timer_on() then a recurring timer is permanently pinned to a CPU, isn't
> it?
>
> That should make things simpler?
In that case I think that both problems (race and scalability)
can be solved without increasing sizeof(timer_list).
What if timer_list had ->pending field? Then we can do:
timer_pending:
return timer->pending;
__mod_timer:
internal_add_timer(new_base, timer);
timer->base = new_base;
timer->pending = 1;
__run_timers:
list_del(&timer->entry);
set_running_timer(base, timer);
/* do not change timer->base */
timer->pending = 0;
spin_unlock(base->lock);
timer->function();
del_timer:
if (!timer->pending)
return 0;
base = timer->base;
...
del_timer_sync:
base = timer->base;
if (!base)
return 0;
spin_lock(base->lock);
if (base != timer->base)
goto del_again;
if (base->running_timer == timer)
goto del_again;
if (timer->pending)
list_del(&timer->entry);
timer->pending = 0;
timer->base = NULL;
The ->pending flag could live in the least significant bit of
timer->base, this way we:
1. do not waste the space
2. can read/write base+pending atomically
These patches are incomplete/suboptimal, just a proof of concept.
Oleg.
I like the idea and it would solve the concerns that we had. The encoding
of a bit in a pointer is weird but we have done that before in the
page_struct.
However, this also means that __run_timers will not free up the timer and
it has to be explicitly freed with del_timer_??. There may be code
that relies on not having to delete a single shot timer after it has been
fired.
Christoph Lameter wrote:
>
> However, this also means that __run_timers will not free up the timer and
> it has to be explicitly freed with del_timer_??.
I am not sure I understand you but no, del_timer{,_sync} is not needed.
__run_timer deletes timer from base->tv? list and clears 'pending flag'.
__del_timer_sync sets ->_base = NULL, but it is merely optimization.
It could set ->_base = base, but in that case next del_timer_sync()
call will need spin_lock(base->lock) again.
Oleg.
On Tue, 15 Mar 2005, Oleg Nesterov wrote:
> Christoph Lameter wrote:
> >
> > However, this also means that __run_timers will not free up the timer and
> > it has to be explicitly freed with del_timer_??.
>
> I am not sure I understand you but no, del_timer{,_sync} is not needed.
>
> __run_timer deletes timer from base->tv? list and clears 'pending flag'.
>
> __del_timer_sync sets ->_base = NULL, but it is merely optimization.
> It could set ->_base = base, but in that case next del_timer_sync()
> call will need spin_lock(base->lock) again.
For some reason I thought that ->base == NULL would have special
significance outside of the function you discussed. Looks fine to me now.
Andrew Morton wrote:
>
> If we're prepared to rule that a timer handler is not allowed to do
> add_timer_on() then a recurring timer is permanently pinned to a CPU, isn't
> it?
>
> That should make things simpler?
I think that current inplementation of del_timer_sync() don't like
add_timer_on() too.
Consider the timer running on CPU_0. It sets timer->expires = jiffies,
and calls add_timer_on(1). Now it is possible that local timer interrupt
on CPU_1 happens and starts that timer before timer->function returns on
CPU_0.
del_timer_sync() detects that timer is running on CPU_0, waits while
->running_timer == timer, and returns. The timer still runs on CPU_1.
Oleg.