2003-03-29 21:30:57

by Shawn Starr

[permalink] [raw]
Subject: Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings

How can I go about debugging this? How can I find the path causing the
problem?

Shawn.

----- Original Message -----
From: "Robert Love" <[email protected]>
To: "Shawn Starr" <[email protected]>
Cc: <[email protected]>
Sent: Saturday, March 29, 2003 4:15 PM
Subject: Re: [PANIC][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings


> On Sat, 2003-03-29 at 15:45, Shawn Starr wrote:
>
> > In both panics below c012e9b4 does not exist as a kernel symbol in
> > System.map:
>
> The EIP need not exist itself in System.map. System.map has the symbol
> to initial address mapping. For example,
>
> 100 functionA
> 200 functionB
>
> If the EIP was "150" you would be 50 bytes into functionA().
>
> > Code: 89 50 04 89 02 c7 41 30 00 00 00 00 81 3d 60 98 41 c0 3c 4b
> > kernel/timer.c:258: spin_lock(kernel/timer.c:c0419860) already locked
by
> > kernel/timer.c/398
> > Kernel panic: Aiee, killing interrupt handler!
> > In interrupt handler - not syncing
>
> This is not a panic, just an oops. And it was just a debugging check
> from spin lock debugging, but unfortunately you were in an interrupt
> handler so the machine went bye bye.
>
> It is probably a simple double-lock deadlock, detected by spin lock
> debugging. Knowing the EIP would help... but timer_interrupt() is a
> good first guess.
>
> Robert Love
>
>


2003-03-29 22:00:26

by Robert Love

[permalink] [raw]
Subject: Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings

On Sat, 2003-03-29 at 16:42, Shawn Starr wrote:

> How can I go about debugging this? How can I find the path causing
> the problem?

Begin by finding out where the EIP is. It should be a spin_lock(). The
oops says it is kernel/timer.c:258.

This line is a double locking of an already-locked lock. So find where
the initial lock was. The oops said that is kernel/timer.c:398.

Look at the call chain (from the oops) from the first to the second
lock. Someone assumed it could not happen. Obviously they were wrong.

Robert Love

2003-03-29 22:16:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings

"Shawn Starr" <[email protected]> wrote:
>
> How can I go about debugging this? How can I find the path causing the
> problem?
>

Someone did

kfree(foo);

insead of

del_timer_sync(&foo->timer);
kfree(foo);

so you have a freed-but-pending timer.


This is a horrid bug. One way to find it would be to change
cache_free_debugcheck() to walk the just-freed-up memory looking for an
instance of TIMER_MAGIC. If that is found and timer_pending() is true then
drop a backtrace and print out timer->function.

err.... Here you go, this may find it.



fs/open.c | 33 +++++++++++++++++++++++++++++++++
include/linux/timer.h | 3 ++-
mm/slab.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 65 insertions(+), 1 deletion(-)

diff -puN include/linux/timer.h~freed-timer-finder include/linux/timer.h
--- 25/include/linux/timer.h~freed-timer-finder 2003-03-29 14:10:15.000000000 -0800
+++ 25-akpm/include/linux/timer.h 2003-03-29 14:10:26.000000000 -0800
@@ -8,11 +8,12 @@
struct tvec_t_base_s;

struct timer_list {
+ unsigned long magic;
+
struct list_head entry;
unsigned long expires;

spinlock_t lock;
- unsigned long magic;

void (*function)(unsigned long);
unsigned long data;
diff -puN mm/slab.c~freed-timer-finder mm/slab.c
--- 25/mm/slab.c~freed-timer-finder 2003-03-29 14:10:30.000000000 -0800
+++ 25-akpm/mm/slab.c 2003-03-29 14:16:23.000000000 -0800
@@ -800,6 +800,35 @@ static void poison_obj(kmem_cache_t *cac
*(unsigned char *)(addr+size-1) = POISON_END;
}

+static void timer_hunt(kmem_cache_t *cachep, void *addr)
+{
+ int size = cachep->objsize;
+ void *p;
+
+ if (cachep->flags & SLAB_RED_ZONE) {
+ addr += BYTES_PER_WORD;
+ size -= 2*BYTES_PER_WORD;
+ }
+ if (cachep->flags & SLAB_STORE_USER) {
+ size -= BYTES_PER_WORD;
+ }
+
+ for (p = addr; p < addr + size; p += sizeof(unsigned long)) {
+ unsigned long *laddr = p;
+
+ if (*laddr == TIMER_MAGIC) {
+ struct timer_list *timer;
+
+ timer = (struct timer_list *)laddr;
+ if (timer_pending(timer)) {
+ printk("free of pending timer at %p\n", timer);
+ printk("function=%p\n", timer->function);
+ dump_stack();
+ }
+ }
+ }
+}
+
static void *fprob(unsigned char* addr, unsigned int size)
{
unsigned char *end;
@@ -1603,6 +1632,7 @@ static inline void *cache_free_debugchec
else
cachep->dtor(objp, cachep, 0);
}
+ timer_hunt(cachep, objp);
if (cachep->flags & SLAB_POISON)
poison_obj(cachep, objp, POISON_AFTER);
#endif
diff -puN fs/open.c~freed-timer-finder fs/open.c
--- 25/fs/open.c~freed-timer-finder 2003-03-29 14:17:32.000000000 -0800
+++ 25-akpm/fs/open.c 2003-03-29 14:21:20.000000000 -0800
@@ -793,11 +793,44 @@ void fd_install(unsigned int fd, struct
write_unlock(&files->file_lock);
}

+#include <linux/timer.h>
+
+struct foo_thing {
+ int a;
+ struct timer_list t;
+ int b;
+};
+
+static void my_foo(unsigned long data)
+{
+ printk("the handler\n");
+}
+
+static void timer_thing(void)
+{
+ static int did_it;
+ struct foo_thing *f;
+
+ if (did_it)
+ return;
+ did_it = 1;
+
+ f = kmalloc(sizeof(*f), GFP_KERNEL);
+ init_timer(&f->t);
+ f->t.expires = jiffies + HZ;
+ f->t.function = my_foo;
+ add_timer(&f->t);
+ kfree(f);
+}
+
asmlinkage long sys_open(const char * filename, int flags, int mode)
{
char * tmp;
int fd, error;

+ if (current->uid == 9999)
+ timer_thing();
+
#if BITS_PER_LONG != 32
flags |= O_LARGEFILE;
#endif

_