2003-03-30 21:17:30

by Shawn Starr

[permalink] [raw]
Subject: Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer

drivers/char/tty_io.c - Only

I bet it's this function, there's only a kfree, not destruction of any
timers.

Added this rebuilt kernel waiting :-)

Shawn.

----- Original Message -----
From: "Roland Dreier" <[email protected]>
To: "Shawn Starr" <[email protected]>
Cc: "Andrew Morton" <[email protected]>; <[email protected]>;
<[email protected]>
Sent: Sunday, March 30, 2003 4:02 PM
Subject: Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New
OOPS w/ timer


> Shawn> Function found was: delayed_work_timer_fn
> Shawn> (kernel/workqueue.c)
>
> It looks to me like something is calling schedule_delayed_work()
> (which calls queue_delayed_work(), which starts a timer) and then
> freeing the work_struct before it's executed.
>
> Here's a list of places that use schedule_delayed_work() where the
> work_struct might be kmalloc()ed. Are you using any of these drivers?
> (Obviously you're using tty_io, so that bears some looking at)
>
> drivers/char/cyclades.c
> drivers/char/mxser.c
> drivers/char/tty_io.c
> drivers/isdn/i4l/isdn_tty.c
> drivers/message/fusion/mptlan.c
> drivers/net/hamradio/baycom_epp.c
> drivers/net/plip.c
> drivers/scsi/imm.c
> drivers/scsi/ppa.c
>
> If tty_io.c is the problem, then maybe something like the patch below
> will find the culprit.
>
> - Roland
>
> ===== drivers/char/tty_io.c 1.68 vs edited =====
> --- 1.68/drivers/char/tty_io.c Thu Mar 27 21:15:44 2003
> +++ edited/drivers/char/tty_io.c Sun Mar 30 12:51:00 2003
> @@ -169,6 +169,10 @@
>
> static inline void free_tty_struct(struct tty_struct *tty)
> {
> + if (timer_pending(&tty->flip.work.timer)) {
> + printk(KERN_WARNING "freeing tty with pending flip work timer from
[<%p>]\n",
> + __builtin_return_address(0));
> + }
> kfree(tty);
> }
>
>


2003-03-30 23:06:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer

"Shawn Starr" <[email protected]> wrote:
>
> drivers/char/tty_io.c - Only
>
> I bet it's this function, there's only a kfree, not destruction of any
> timers.
>

This is fairly foul.

--- 25/drivers/char/tty_io.c~a 2003-03-30 15:12:37.000000000 -0800
+++ 25-akpm/drivers/char/tty_io.c 2003-03-30 15:16:59.000000000 -0800
@@ -1288,6 +1288,8 @@ static void release_dev(struct file * fi
/*
* Make sure that the tty's task queue isn't activated.
*/
+ clear_bit(TTY_DONT_FLIP, &tty->flags);
+ del_timer_sync(&tty->flip.work.timer);
flush_scheduled_work();

/*

_

2003-03-30 23:49:36

by Roland Dreier

[permalink] [raw]
Subject: Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer

> --- 25/drivers/char/tty_io.c~a 2003-03-30 15:12:37.000000000 -0800
> +++ 25-akpm/drivers/char/tty_io.c 2003-03-30 15:16:59.000000000 -0800
> @@ -1288,6 +1288,8 @@ static void release_dev(struct file * fi
> /*
> * Make sure that the tty's task queue isn't activated.
> */
> + clear_bit(TTY_DONT_FLIP, &tty->flags);
> + del_timer_sync(&tty->flip.work.timer);
> flush_scheduled_work();

I'm confused by this for two reasons:

First, from looking at workqueue.c (especially the comment in
queue_delayed_work() that says "Increase nr_queued so that the flush
function knows that there's something pending."), it seems like
flush_scheduled_work() should wait until even delayed work is done.
Given that, I don't think the del_timer_sync() should be there --
wouldn't flush_scheduled_work() block forever, since nr_queued can
never reach 0 now?

(I guess I'm assuming the real race is that tty_io.c calls
schedule_delayed_work() between flush_scheduled_work() and
release_mem() in release_dev())

Second, I don't see how it's _ever_ safe to call
flush_scheduled_work(). The comment in workqueue.c before
flush_workqueue() says "NOTE: if work is being added to the queue
constantly by some other context then this function might block
indefinitely." But flush_scheduled_work() is flushing the keventd_wq,
which other code will definitely add work to. If we're unlucky,
flush_scheduled_work() could block forever. Am I just being paranoid?

- Roland

2003-03-31 13:53:55

by Shawn Starr

[permalink] [raw]
Subject: Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer

I have applied this to my current tree, testing it out.

On Sun, 30 Mar 2003, Andrew Morton wrote:

> "Shawn Starr" <[email protected]> wrote:
> >
> > drivers/char/tty_io.c - Only
> >
> > I bet it's this function, there's only a kfree, not destruction of any
> > timers.
> >
>
> This is fairly foul.
>
> --- 25/drivers/char/tty_io.c~a 2003-03-30 15:12:37.000000000 -0800
> +++ 25-akpm/drivers/char/tty_io.c 2003-03-30 15:16:59.000000000 -0800
> @@ -1288,6 +1288,8 @@ static void release_dev(struct file * fi
> /*
> * Make sure that the tty's task queue isn't activated.
> */
> + clear_bit(TTY_DONT_FLIP, &tty->flags);
> + del_timer_sync(&tty->flip.work.timer);
> flush_scheduled_work();
>
> /*
>
> _
>
>
>