2001-04-13 10:42:19

by Mircea Damian

[permalink] [raw]
Subject: No one wants to help me :-(


Hello,


I was expecting to receive some replies to my last desperate messages:
http://www.mail-archive.com/[email protected]/msg35446.html
http://www.mail-archive.com/[email protected]/msg36591.html


My machine is dyeing in add_timer(). It seems to happen only on SMP
machines and is something related to the network driver. For some reason
one of the timer lists gets broken so we (we are two people trying to
solve this issue) wrote a "safe" timer.c which tries to rebuild the chain
in case it hits a NULL pointer.

The machine is (ofcourse) slower with this patch but at least it works.

Maybe someone can see which is the real bug and fix it.

Please help!

--
Mircea Damian
E-mails: [email protected], [email protected]
WebPage: http://taz.mania.k.ro/~dmircea/


Attachments:
(No filename) (794.00 B)
timer.c (23.86 kB)
Download all attachments

2001-04-13 18:14:56

by Brian Gerst

[permalink] [raw]
Subject: Re: No one wants to help me :-(

diff -urN linux-2.4.3/kernel/timer.c linux/kernel/timer.c
--- linux-2.4.3/kernel/timer.c Thu Dec 14 20:52:22 2000
+++ linux/kernel/timer.c Fri Apr 13 13:26:08 2001
@@ -194,6 +194,7 @@
if (!timer_pending(timer))
return 0;
list_del(&timer->list);
+ timer->list.next = timer->list.prev = NULL;
return 1;
}

@@ -217,7 +218,6 @@

spin_lock_irqsave(&timerlist_lock, flags);
ret = detach_timer(timer);
- timer->list.next = timer->list.prev = NULL;
spin_unlock_irqrestore(&timerlist_lock, flags);
return ret;
}
@@ -246,7 +246,6 @@

spin_lock_irqsave(&timerlist_lock, flags);
ret += detach_timer(timer);
- timer->list.next = timer->list.prev = 0;
running = timer_is_running(timer);
spin_unlock_irqrestore(&timerlist_lock, flags);

@@ -309,7 +308,6 @@
data= timer->data;

detach_timer(timer);
- timer->list.next = timer->list.prev = NULL;
timer_enter(timer);
spin_unlock_irq(&timerlist_lock);
fn(data);


Attachments:
detach_timer.diff (953.00 B)

2001-04-13 22:07:26

by George Anzinger

[permalink] [raw]
Subject: Re: No one wants to help me :-(

Brian Gerst wrote:
>
> Mircea Damian wrote:
> >
> > Hello,
> >
> > I was expecting to receive some replies to my last desperate messages:
> > http://www.mail-archive.com/[email protected]/msg35446.html
> > http://www.mail-archive.com/[email protected]/msg36591.html
> >
> > My machine is dyeing in add_timer(). It seems to happen only on SMP
> > machines and is something related to the network driver. For some reason
> > one of the timer lists gets broken so we (we are two people trying to
> > solve this issue) wrote a "safe" timer.c which tries to rebuild the chain
> > in case it hits a NULL pointer.
> >
> > The machine is (ofcourse) slower with this patch but at least it works.
> >
> > Maybe someone can see which is the real bug and fix it.
> >
> > Please help!
>
> I found (at least part of) the problem. In detach_timer() we test if
> the timer is pending. If it is not the function does not remove the
> timer from the list and returns 0. The functions that call
> detach_timer() do not check the return value and unconditionally set the
> list pointers to NULL, even though the timer is still on the list.
> Patch against 2.4.3 attached, but there may be a better solution.
>
uh, but the pending test is to check for NULL pointers, so while your
change consolidates some code, I don't think it changes anything, unless
you can find a place that calls detach_timer and doesn't clear the
pointers...

For what its worth, my look at this problem seems to indicate that the
new timer pointer is zero. This would be a problem in the network code
somewhere. I would guess that the whole structure is being released by
cpu X while cpu y is trying to set up a timer. But then I don't really
know the network code (at all). Just going by the error, defer of zero
in add_timer() which only looks at the timer to verify that the pointers
are zero.

George


George


> --
>
> Brian Gerst
>
> ------------------------------------------------------------------------
> diff -urN linux-2.4.3/kernel/timer.c linux/kernel/timer.c
> --- linux-2.4.3/kernel/timer.c Thu Dec 14 20:52:22 2000
> +++ linux/kernel/timer.c Fri Apr 13 13:26:08 2001
> @@ -194,6 +194,7 @@
> if (!timer_pending(timer))
> return 0;
> list_del(&timer->list);
> + timer->list.next = timer->list.prev = NULL;
> return 1;
> }
>
> @@ -217,7 +218,6 @@
>
> spin_lock_irqsave(&timerlist_lock, flags);
> ret = detach_timer(timer);
> - timer->list.next = timer->list.prev = NULL;
> spin_unlock_irqrestore(&timerlist_lock, flags);
> return ret;
> }
> @@ -246,7 +246,6 @@
>
> spin_lock_irqsave(&timerlist_lock, flags);
> ret += detach_timer(timer);
> - timer->list.next = timer->list.prev = 0;
> running = timer_is_running(timer);
> spin_unlock_irqrestore(&timerlist_lock, flags);
>
> @@ -309,7 +308,6 @@
> data= timer->data;
>
> detach_timer(timer);
> - timer->list.next = timer->list.prev = NULL;
> timer_enter(timer);
> spin_unlock_irq(&timerlist_lock);
> fn(data);