2006-02-26 10:05:28

by Russell King

[permalink] [raw]
Subject: [PATCH] Convert serial_core oopses to BUG_ON

Hi,

With reference to these two bugs:

http://bugzilla.kernel.org/show_bug.cgi?id=5958
http://bugzilla.kernel.org/show_bug.cgi?id=6131

it seems that folk are under the impression that serial_core is
responsible for these bugs. It isn't.

Calling serial functions to flush buffers, or try to send more data
after the port has been closed or hung up is a bug in the code doing
the calling, not in the serial_core driver.

Make this explicitly obvious by adding BUG_ON()'s.

I don't particularly want to add these BUG_ON()'s since they have a
performance impact, but it seems that they're necessary to convey
sufficient understanding about where the bug lies. The Bluetooth
problem has been around for _ages_ (longer than the entry in bugzilla)
and no one seems the least bit interested in fixing the fscking thing.

I can only hope that adding these BUG_ON()'s provides sufficient
clarity to cause people to look elsewhere.

diff --git a/drivers/serial/serial_core.c b/drivers/serial/serial_core.c
--- a/drivers/serial/serial_core.c
+++ b/drivers/serial/serial_core.c
@@ -71,6 +71,11 @@ static void uart_change_pm(struct uart_s
void uart_write_wakeup(struct uart_port *port)
{
struct uart_info *info = port->info;
+ /*
+ * This means you called this function _after_ the port was
+ * closed. No cookie for you.
+ */
+ BUG_ON(!info);
tasklet_schedule(&info->tlet);
}

@@ -479,6 +484,12 @@ uart_write(struct tty_struct *tty, const
unsigned long flags;
int c, ret = 0;

+ /*
+ * This means you called this function _after_ the port was
+ * closed. No cookie for you.
+ */
+ BUG_ON(!state);
+
if (!circ->buf)
return 0;

@@ -521,6 +532,12 @@ static void uart_flush_buffer(struct tty
struct uart_port *port = state->port;
unsigned long flags;

+ /*
+ * This means you called this function _after_ the port was
+ * closed. No cookie for you.
+ */
+ BUG_ON(!state);
+
DPRINTK("uart_flush_buffer(%d) called\n", tty->index);

spin_lock_irqsave(&port->lock, flags);

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core


2006-02-26 10:15:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

Russell King <[email protected]> wrote:
>
> Calling serial functions to flush buffers, or try to send more data
> after the port has been closed or hung up is a bug in the code doing
> the calling, not in the serial_core driver.
>
> Make this explicitly obvious by adding BUG_ON()'s.

If we make it

if (!info) {
WARN_ON(1);
return;
}

will that allow people's kernels to limp along until it gets fixed?

2006-02-26 10:18:49

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

Andrew Morton wrote:
> Russell King <[email protected]> wrote:
>
>>Calling serial functions to flush buffers, or try to send more data
>> after the port has been closed or hung up is a bug in the code doing
>> the calling, not in the serial_core driver.
>>
>> Make this explicitly obvious by adding BUG_ON()'s.
>
>
> If we make it
>
> if (!info) {
> WARN_ON(1);
> return;
> }
>
> will that allow people's kernels to limp along until it gets fixed?

Any reason we don't have a WARN(), btw?

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-02-26 18:17:54

by Russell King

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

On Sun, Feb 26, 2006 at 02:14:14AM -0800, Andrew Morton wrote:
> Russell King <[email protected]> wrote:
> >
> > Calling serial functions to flush buffers, or try to send more data
> > after the port has been closed or hung up is a bug in the code doing
> > the calling, not in the serial_core driver.
> >
> > Make this explicitly obvious by adding BUG_ON()'s.
>
> If we make it
>
> if (!info) {
> WARN_ON(1);
> return;
> }
>
> will that allow people's kernels to limp along until it gets fixed?

"until" - I think you mean "if anyone ever bothers" so no I don't agree.
The bluetooth folk seem to have absolutely no interest in bug fixing.
Can we mark bluetooth broken please?

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-02-26 20:00:38

by Russell King

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

On Sun, Feb 26, 2006 at 06:17:47PM +0000, Russell King wrote:
> On Sun, Feb 26, 2006 at 02:14:14AM -0800, Andrew Morton wrote:
> > If we make it
> >
> > if (!info) {
> > WARN_ON(1);
> > return;
> > }
> >
> > will that allow people's kernels to limp along until it gets fixed?
>
> "until" - I think you mean "if anyone ever bothers" so no I don't agree.
> The bluetooth folk seem to have absolutely no interest in bug fixing.
> Can we mark bluetooth broken please?

Sorry, I mean just mark BT_HCIUART broken, not the whole of bluetooth.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-02-26 22:35:06

by Karol Kozimor

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

Thus wrote Russell King:
> > > Calling serial functions to flush buffers, or try to send more data
> > > after the port has been closed or hung up is a bug in the code doing
> > > the calling, not in the serial_core driver.
> > >
> > > Make this explicitly obvious by adding BUG_ON()'s.
> >
> > If we make it
> >
> > if (!info) {
> > WARN_ON(1);
> > return;
> > }
> >
> > will that allow people's kernels to limp along until it gets fixed?
> "until" - I think you mean "if anyone ever bothers" so no I don't agree.

Russel, I agree this should be clearly marked and an oops seems okay here,
but we're talking dead systems here (dead as in all interrupts masked type
of dead). Most users won't even be aware an oops occured, let alone be able
to read the messages on the console.

I was just lucky because after the first one I got (#5958, a regular oops)
I tried to nail it down in text mode, with the console loglevel upped a
bit, so I was able to see what the panic (#6131) was all about.

I think we really need those *_ON()s at least in uart_write().

Best regards,

--
Karol 'sziwan' Kozimor
[email protected]

2006-02-26 22:41:52

by Russell King

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

On Sun, Feb 26, 2006 at 11:34:48PM +0100, Karol Kozimor wrote:
> Thus wrote Russell King:
> > > > Calling serial functions to flush buffers, or try to send more data
> > > > after the port has been closed or hung up is a bug in the code doing
> > > > the calling, not in the serial_core driver.
> > > >
> > > > Make this explicitly obvious by adding BUG_ON()'s.
> > >
> > > If we make it
> > >
> > > if (!info) {
> > > WARN_ON(1);
> > > return;
> > > }
> > >
> > > will that allow people's kernels to limp along until it gets fixed?
> > "until" - I think you mean "if anyone ever bothers" so no I don't agree.
>
> Russel, I agree this should be clearly marked and an oops seems okay here,
> but we're talking dead systems here (dead as in all interrupts masked type
> of dead). Most users won't even be aware an oops occured, let alone be able
> to read the messages on the console.
>
> I was just lucky because after the first one I got (#5958, a regular oops)
> I tried to nail it down in text mode, with the console loglevel upped a
> bit, so I was able to see what the panic (#6131) was all about.
>
> I think we really need those *_ON()s at least in uart_write().

I don't think it'll make a blind bit of difference. The oopses have
been known about (to me and others) since about October 2005... Pavel
Machek was the first to report the hci_uart problems.

I diagnosed the uart_flush_buffer() oops being caused by hci_uart back
in October and where did that get us - nowhere at the time, but later
another bug report in bugzilla against serial and no progress.

Excuse me if I'm despondent on this issue - I'm personally looking for
a way out of being assigned blame for the crap known as hci_uart.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-02-28 14:56:47

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON


On Sun 26-02-06 02:14:14, Andrew Morton wrote:
> Russell King <[email protected]> wrote:
> >
> > Calling serial functions to flush buffers, or try to send more data
> > after the port has been closed or hung up is a bug in the code doing
> > the calling, not in the serial_core driver.
> >
> > Make this explicitly obvious by adding BUG_ON()'s.
>
> If we make it
>
> if (!info) {
> WARN_ON(1);
> return;
> }
>
> will that allow people's kernels to limp along until it gets fixed?

It will oops in hard-to-guess, place, anyway. BUG_ON is right.
--
Thanks, Sharp!

2006-02-28 18:18:45

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

Pavel Machek <[email protected]> wrote:
>
>
> On Sun 26-02-06 02:14:14, Andrew Morton wrote:
> > Russell King <[email protected]> wrote:
> > >
> > > Calling serial functions to flush buffers, or try to send more data
> > > after the port has been closed or hung up is a bug in the code doing
> > > the calling, not in the serial_core driver.
> > >
> > > Make this explicitly obvious by adding BUG_ON()'s.
> >
> > If we make it
> >
> > if (!info) {
> > WARN_ON(1);
> > return;
> > }
> >
> > will that allow people's kernels to limp along until it gets fixed?
>
> It will oops in hard-to-guess, place, anyway.

Will it? Where? Unfixably?

> BUG_ON is right.

WARN_ON+return is far better. It keeps people's machines working until the
bug gets fixed.

2006-02-28 22:01:41

by Martin Michlmayr

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

* Andrew Morton <[email protected]> [2006-02-28 10:17]:
> > It will oops in hard-to-guess, place, anyway.
> Will it? Where? Unfixably?

http://www.linux-mips.org/archives/linux-mips/2006-02/msg00241.html is
one example we just had on MIPS. On SGI IP22, using the serial
console, you'd get the following on shutdown:

The system is going down for reboot NOW!
INIT: Sending processes the TERM signal
INIT: Sending proces

and then nothing at all. I'd never have suspected the serial driver,
had not users reported that the machine shutdowns properly when using
the framebuffer.

For the record, I don't mind whether it's BUG_ON or WARN_ON, but I
just wanted to give this as an example of an "oops in hard-to-guess,
place".
--
Martin Michlmayr
[email protected]

2006-02-28 23:34:29

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

Martin Michlmayr <[email protected]> wrote:
>
> * Andrew Morton <[email protected]> [2006-02-28 10:17]:
> > > It will oops in hard-to-guess, place, anyway.
> > Will it? Where? Unfixably?
>
> http://www.linux-mips.org/archives/linux-mips/2006-02/msg00241.html is
> one example we just had on MIPS. On SGI IP22, using the serial
> console, you'd get the following on shutdown:
>
> The system is going down for reboot NOW!
> INIT: Sending processes the TERM signal
> INIT: Sending proces
>
> and then nothing at all. I'd never have suspected the serial driver,
> had not users reported that the machine shutdowns properly when using
> the framebuffer.
>
> For the record, I don't mind whether it's BUG_ON or WARN_ON, but I
> just wanted to give this as an example of an "oops in hard-to-guess,
> place".

>From my reading of the above thread, putting the proposed workaround into
serial core will indeed allow people's machines to keep running while
reminding us about the driver bugs.

2006-03-01 17:11:11

by Russell King

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

On Tue, Feb 28, 2006 at 03:32:56PM -0800, Andrew Morton wrote:
> Martin Michlmayr <[email protected]> wrote:
> >
> > * Andrew Morton <[email protected]> [2006-02-28 10:17]:
> > > > It will oops in hard-to-guess, place, anyway.
> > > Will it? Where? Unfixably?
> >
> > http://www.linux-mips.org/archives/linux-mips/2006-02/msg00241.html is
> > one example we just had on MIPS. On SGI IP22, using the serial
> > console, you'd get the following on shutdown:
> >
> > The system is going down for reboot NOW!
> > INIT: Sending processes the TERM signal
> > INIT: Sending proces
> >
> > and then nothing at all. I'd never have suspected the serial driver,
> > had not users reported that the machine shutdowns properly when using
> > the framebuffer.
> >
> > For the record, I don't mind whether it's BUG_ON or WARN_ON, but I
> > just wanted to give this as an example of an "oops in hard-to-guess,
> > place".
>
> >From my reading of the above thread, putting the proposed workaround into
> serial core will indeed allow people's machines to keep running while
> reminding us about the driver bugs.

I would much rather the buggy drivers were actually fixed - is there a
reason why the drivers can't actually be fixed (other than lazyness)?

Once they're fixed, adding a BUG_ON then becomes practical IMHO - it'll
stop new driver writers being confused.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-03-01 19:48:15

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

On St 01-03-06 17:10:46, Russell King wrote:
> On Tue, Feb 28, 2006 at 03:32:56PM -0800, Andrew Morton wrote:
> > Martin Michlmayr <[email protected]> wrote:
> > >
> > > * Andrew Morton <[email protected]> [2006-02-28 10:17]:
> > > > > It will oops in hard-to-guess, place, anyway.
> > > > Will it? Where? Unfixably?
> > >
> > > http://www.linux-mips.org/archives/linux-mips/2006-02/msg00241.html is
> > > one example we just had on MIPS. On SGI IP22, using the serial
> > > console, you'd get the following on shutdown:
> > >
> > > The system is going down for reboot NOW!
> > > INIT: Sending processes the TERM signal
> > > INIT: Sending proces
> > >
> > > and then nothing at all. I'd never have suspected the serial driver,
> > > had not users reported that the machine shutdowns properly when using
> > > the framebuffer.
> > >
> > > For the record, I don't mind whether it's BUG_ON or WARN_ON, but I
> > > just wanted to give this as an example of an "oops in hard-to-guess,
> > > place".
> >
> > >From my reading of the above thread, putting the proposed workaround into
> > serial core will indeed allow people's machines to keep running while
> > reminding us about the driver bugs.
>
> I would much rather the buggy drivers were actually fixed - is there a
> reason why the drivers can't actually be fixed (other than lazyness)?
>
> Once they're fixed, adding a BUG_ON then becomes practical IMHO - it'll
> stop new driver writers being confused.

Okay, but lets add BUG_ONs that actually work. BUG_ON in second hunk
could never trigger, and last hunk did not help in specific case of
bluetooth problem. This should be better: it warns at right places,
but allows system to survive. I'll now try to fix bluetooth problem.

Signed-off-by: Pavel Machek <[email protected]>

diff --git a/drivers/serial/serial_core.c b/drivers/serial/serial_core.c
index 95fb493..cc1faa3 100644
--- a/drivers/serial/serial_core.c
+++ b/drivers/serial/serial_core.c
@@ -71,6 +71,11 @@ static void uart_change_pm(struct uart_s
void uart_write_wakeup(struct uart_port *port)
{
struct uart_info *info = port->info;
+ /*
+ * This means you called this function _after_ the port was
+ * closed. No cookie for you.
+ */
+ BUG_ON(!info);
tasklet_schedule(&info->tlet);
}

@@ -471,14 +476,26 @@ static void uart_flush_chars(struct tty_
}

static int
-uart_write(struct tty_struct *tty, const unsigned char * buf, int count)
+uart_write(struct tty_struct *tty, const unsigned char *buf, int count)
{
struct uart_state *state = tty->driver_data;
- struct uart_port *port = state->port;
- struct circ_buf *circ = &state->info->xmit;
+ struct uart_port *port;
+ struct circ_buf *circ;
unsigned long flags;
int c, ret = 0;

+ /*
+ * This means you called this function _after_ the port was
+ * closed. No cookie for you.
+ */
+ if (!state || !state->info) {
+ WARN_ON(1);
+ return -EL3HLT;
+ }
+
+ port = state->port;
+ circ = &state->info->xmit;
+
if (!circ->buf)
return 0;

@@ -521,6 +538,15 @@ static void uart_flush_buffer(struct tty
struct uart_port *port = state->port;
unsigned long flags;

+ /*
+ * This means you called this function _after_ the port was
+ * closed. No cookie for you.
+ */
+ if (!state || !state->info) {
+ WARN_ON(1);
+ return;
+ }
+
DPRINTK("uart_flush_buffer(%d) called\n", tty->index);

spin_lock_irqsave(&port->lock, flags);


--
Web maintainer for suspend.sf.net (http://www.sf.net/projects/suspend) wanted...

2006-03-01 20:02:41

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

Hi!

> With reference to these two bugs:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=5958
> http://bugzilla.kernel.org/show_bug.cgi?id=6131
>
> it seems that folk are under the impression that serial_core is
> responsible for these bugs. It isn't.
>
> Calling serial functions to flush buffers, or try to send more data
> after the port has been closed or hung up is a bug in the code doing
> the calling, not in the serial_core driver.
>
> Make this explicitly obvious by adding BUG_ON()'s.

They did not trigger for me, altrough my own traps do.

> diff --git a/drivers/serial/serial_core.c b/drivers/serial/serial_core.c
> --- a/drivers/serial/serial_core.c
> +++ b/drivers/serial/serial_core.c
> @@ -521,6 +532,12 @@ static void uart_flush_buffer(struct tty
> struct uart_port *port = state->port;
> unsigned long flags;
>
> + /*
> + * This means you called this function _after_ the port was
> + * closed. No cookie for you.
> + */
> + BUG_ON(!state);
> +
> DPRINTK("uart_flush_buffer(%d) called\n", tty->index);
>
> spin_lock_irqsave(&port->lock, flags);
>

This is one I'm hitting. Actually I'm hitting my own check few lines
down:

if (!state->info)
printk(KERN_CRIT "no state->info\n");
else uart_circ_clear(&state->info->xmit);
spin_unlock_irqrestore(&port->lock, flags);
tty_wakeup(tty);

... simply not doing uart_circ_clear() at all makes system
survive... so this could be made into WARN()...

Pavel
--
Web maintainer for suspend.sf.net (http://www.sf.net/projects/suspend) wanted...

2006-03-01 22:30:32

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Convert serial_core oopses to BUG_ON

Pavel Machek <[email protected]> wrote:
>
> > >
> > > >From my reading of the above thread, putting the proposed workaround into
> > > serial core will indeed allow people's machines to keep running while
> > > reminding us about the driver bugs.
> >
> > I would much rather the buggy drivers were actually fixed - is there a
> > reason why the drivers can't actually be fixed (other than lazyness)?
> >
> > Once they're fixed, adding a BUG_ON then becomes practical IMHO - it'll
> > stop new driver writers being confused.
>
> Okay, but lets add BUG_ONs that actually work. BUG_ON in second hunk
> could never trigger, and last hunk did not help in specific case of
> bluetooth problem. This should be better: it warns at right places,
> but allows system to survive.

I like that approach, thanks.

> I'll now try to fix bluetooth problem.

Please keep Marcel Cc'ed.

> + return -EL3HLT;

ooh, I always wanted to use that ;)