2005-03-19 17:25:07

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/5] timers: description

Hello.

These patches are updated version of 'del_timer_sync: proof of concept'
2 patches.

1/5:
unchanded.

2/5:
del_timer_sync() simplified. It is not neccessary to unlock and
retry if __TIMER_PENDING has changed, it is only neccessary if
timer's base == (timer->_base & ~1) has changed. Also, comments
are updated.

3/5:
The reworked del_timer_sync() can't work unless timers are
serialized wrt to itself. They are not.
I missed the fact that __mod_timer() can change timer's base
while the timer is running.

4/5:
remove memory barrier in __run_timers() and del_timer().

5/5:
kill ugly __get_base(), it was temporal.


The del_singleshot_timer_sync function now unneeded, but it looks like
additional test for del_timer_sync(), so it will be removed later.

Btw, add_timer_on() is racy against __mod_timer(), is it worth fixing?

Oleg.


2005-03-27 10:02:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/5] timers: description

"Chen, Kenneth W" wrote:
>
> Oleg Nesterov wrote on March 19, 2005 17:28:48
> > These patches are updated version of 'del_timer_sync: proof of
> > concept' 2 patches.
>
> I changed schedule_timeout() to call the new del_timer_sync instead of
> currently del_singleshot_timer_sync in attempt to stress these set of
> patches a bit more and I just observed a kernel hang.
>
> The symptom starts with lost network connectivity. It looks like the
> entire ethernet connections were gone, followed by blank screen on the
> console. I'm not sure whether it is a hard or soft hang, but system
> is inaccessible (blank screen and no network connection). I'm forced
> to do a reboot when that happens.

Very strange. I am running 2.6.11 + timer patches +
#define del_singleshot_timer_sync(t) del_timer_sync(t)
without any problems.

This timer is private to schedule_timeout(), it can't change
base, so del_timer_sync() should be "obviously correct" in
that case.

What kernel version? Could you try this stupid patch?

Oleg.

--- TST/kernel/timer.c~ 2005-03-27 16:47:20.000000000 +0400
+++ TST/kernel/timer.c 2005-03-27 17:16:32.000000000 +0400
@@ -352,27 +352,46 @@ EXPORT_SYMBOL(del_timer);
*/
int del_timer_sync(struct timer_list *timer)
{
+ unsigned long tout;
+ int running = 0, migrated = 0, done = 0;
int ret;

check_timer(timer);

+ preempt_disable();
+ tout = jiffies + 10;
+
ret = 0;
for (;;) {
unsigned long flags;
tvec_base_t *base;

base = timer_base(timer);
- if (!base)
+ if (!base) {
+ preempt_enable();
return ret;
+ }
+ if (time_after(jiffies, tout)) {
+ preempt_enable();
+ printk(KERN_ERR "del_timer_sync hang: %d %d %d %d\n",
+ running, migrated, done, ret);
+ dump_stack();
+ return 0;
+ }

spin_lock_irqsave(&base->lock, flags);

- if (base->running_timer == timer)
+ if (base->running_timer == timer) {
+ ++running;
goto unlock;
+ }

- if (timer_base(timer) != base)
+ if (timer_base(timer) != base) {
+ ++migrated;
goto unlock;
+ }

+ ++done;
if (timer_pending(timer)) {
list_del(&timer->entry);
ret = 1;

2005-03-29 11:21:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/5] timers: description

Christoph Lameter wrote:
>
> On Sat, 26 Mar 2005, Chen, Kenneth W wrote:
>
> > I changed schedule_timeout() to call the new del_timer_sync instead of
> > currently del_singleshot_timer_sync in attempt to stress these set of
> > patches a bit more and I just observed a kernel hang.
> >
> > The symptom starts with lost network connectivity. It looks like the
> > entire ethernet connections were gone, followed by blank screen on the
> > console. I'm not sure whether it is a hard or soft hang, but system
> > is inaccessible (blank screen and no network connection). I'm forced
> > to do a reboot when that happens.
>
> Same problems here with occasional hangs w/o changes to schedule_timeout.

Bad. You are runnning 2.6.12-rc1-mm1 ?

Oleg.

2005-03-29 14:47:16

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 0/5] timers: description

On Tue, 29 Mar 2005, Oleg Nesterov wrote:

> > Same problems here with occasional hangs w/o changes to schedule_timeout.
>
> Bad. You are runnning 2.6.12-rc1-mm1 ?

Not sure if this is really related to your patches. Its 2.6.11 with your
patches extracted from mm.