2005-01-26 13:20:58

by Martin Kögler

[permalink] [raw]
Subject: Deadlock in serial driver 2.6.x

I noticed with different kernel versions (a 2.6.5 FC2 Kernel, a 2.6.7 Knoppix Kernel
and 2.6.10 FC2 and FC3 Kernels (which have no patches for the serial driver)), that it
is possible for a normal user, which has rw access to /dev/ttySx, to hang a computer.
To exploit it, there must be a device on the other end on the serial line, which sends
some data.

I tested it on a i686 machine.

At http://www.auto.tuwien.ac.at/~mkoegler/linux/serial_oops.c , I have an example programm,
which exploits the problem (/dev/ttyS0 is hardcoded as serial device).

To trigger the problem, connect two computers with a null modem cable and send some
characters to the programm (The baud rate at the other computer seems to be not important).

With SMP-Kernels, the computer stops responding.
Kernels without SMP print a panic message before they stop working, eg:
Kernel panic - not syncing: drivers/serial/serial_core.c:103: spin_lock(drivers/serial/serial_core.c:c04055e0) already locked by drivers/serial/8250.c/1170

Photos of a panic messages of a FC3 2.6.10-1.741_FC3 Kernel are available at
http://www.auto.tuwien.ac.at/~mkoegler/linux .

What the programm does:
It sets the low latency mode, then waiting, until a certain state of the handshake
lines is reached, then it sends a bytes and waits for a byte. Then it changes the
handshake lines again and the process starts again.

Martin K?gler
[email protected]
PS: Please CC me on replies.


2005-01-27 07:20:29

by Andrew Morton

[permalink] [raw]
Subject: Re: Deadlock in serial driver 2.6.x

Martin K?gler <[email protected]> wrote:
>
> I noticed with different kernel versions (a 2.6.5 FC2 Kernel, a 2.6.7 Knoppix Kernel
> and 2.6.10 FC2 and FC3 Kernels (which have no patches for the serial driver)), that it
> is possible for a normal user, which has rw access to /dev/ttySx, to hang a computer.
> To exploit it, there must be a device on the other end on the serial line, which sends
> some data.
>
> I tested it on a i686 machine.
>
> At http://www.auto.tuwien.ac.at/~mkoegler/linux/serial_oops.c , I have an example programm,
> which exploits the problem (/dev/ttyS0 is hardcoded as serial device).
>
> To trigger the problem, connect two computers with a null modem cable and send some
> characters to the programm (The baud rate at the other computer seems to be not important).
>
> With SMP-Kernels, the computer stops responding.
> Kernels without SMP print a panic message before they stop working, eg:
> Kernel panic - not syncing: drivers/serial/serial_core.c:103: spin_lock(drivers/serial/serial_core.c:c04055e0) already locked by drivers/serial/8250.c/1170
>
> Photos of a panic messages of a FC3 2.6.10-1.741_FC3 Kernel are available at
> http://www.auto.tuwien.ac.at/~mkoegler/linux .
>
> What the programm does:
> It sets the low latency mode, then waiting, until a certain state of the handshake
> lines is reached, then it sends a bytes and waits for a byte. Then it changes the
> handshake lines again and the process starts again.

Yes, I get a similar deadlock:

[<c0101327>] show_regs+0x11f/0x12c
[<c0273cc8>] sysrq_handle_showregs+0x10/0x18
[<c0273e72>] __handle_sysrq+0x76/0x120
[<c0273f39>] handle_sysrq+0x1d/0x24
[<c026eefd>] kbd_keycode+0x105/0x2c8
[<c026f144>] kbd_event+0x84/0xbc
[<c03038d8>] input_event+0x398/0x3bc
[<c0305e0a>] atkbd_report_key+0x3e/0x64
[<c030629b>] atkbd_interrupt+0x46b/0x4e8
[<c0276059>] serio_interrupt+0x39/0x69
[<c0276bff>] i8042_interrupt+0x1f7/0x20c
[<c013a215>] handle_IRQ_event+0x2d/0x64
[<c013a343>] __do_IRQ+0xf7/0x154
[<c0104eee>] do_IRQ+0x1e/0x34
[<c010376a>] common_interrupt+0x1a/0x20
[<c0277c35>] uart_start+0x19/0x34
[<c0278088>] uart_flush_chars+0xc/0x10
[<c02670d5>] n_tty_receive_buf+0x104d/0x10f8
[<c0264bb9>] flush_to_ldisc+0xe1/0xf4
[<c0264c75>] tty_flip_buffer_push+0x15/0x34
[<c027b4d8>] receive_chars+0x1fc/0x210
[<c027b703>] serial8250_interrupt+0x63/0xe0
[<c013a215>] handle_IRQ_event+0x2d/0x64
[<c013a343>] __do_IRQ+0xf7/0x154
[<c0104eee>] do_IRQ+0x1e/0x34
[<c010376a>] common_interrupt+0x1a/0x20

(For some reason the NMI watchdog isn't triggering here, and it's still
taking interrupts).

serial8250_interrupt() takes uart_8250_port.port.lock
->serial8250_handle_port
->receive_chars
->tty_flip_buffer_push (->low_latency is true)
->flush_to_ldisc
->n_tty_receive_buf

(this takes tty->read_lock inside uart_8250_port.port.lock. Is
this ranking correct?)

->uart_flush_chars
->uart_start

Does spin_lock_irqsave(&port->lock, flags);


Looks like low-latency mode is busted.

2005-01-30 16:45:24

by Alan

[permalink] [raw]
Subject: Re: Deadlock in serial driver 2.6.x

On Iau, 2005-01-27 at 07:13, Andrew Morton wrote:
> Martin Kögler <[email protected]> wrote:
> (For some reason the NMI watchdog isn't triggering here, and it's still
> taking interrupts).

> Looks like low-latency mode is busted.

low latency mode is fine, the drivers/serial layer is busted. It workd
fine with non drivers/serial using hardware still, and it worked fine in
2.4


2005-01-30 16:49:19

by Russell King

[permalink] [raw]
Subject: Re: Deadlock in serial driver 2.6.x

On Sun, Jan 30, 2005 at 03:39:32PM +0000, Alan Cox wrote:
> On Iau, 2005-01-27 at 07:13, Andrew Morton wrote:
> > Martin Kögler <[email protected]> wrote:
> > (For some reason the NMI watchdog isn't triggering here, and it's still
> > taking interrupts).
>
> > Looks like low-latency mode is busted.
>
> low latency mode is fine, the drivers/serial layer is busted. It workd
> fine with non drivers/serial using hardware still, and it worked fine in
> 2.4

Unfortunately it creates a major locking problem. We need to hold the
spinlock because we're using the port structures and relying on things
like the tty structure staying around during the interrupt processing.

We call into the tty layer, and the tty layer calls us back via the
write method, (for echo purposes) and that's indistinguishable from
user-level or other kernel-level writes.

Unsolvable as the tty layer currently stands. tty needs to not call back
into serial drivers when they supply read characters from their interrupt
functions.

tty layer's busted. 8)

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2005-01-31 08:42:40

by Alan

[permalink] [raw]
Subject: Re: Deadlock in serial driver 2.6.x

On Sul, 2005-01-30 at 16:48, Russell King wrote:
> Unsolvable as the tty layer currently stands. tty needs to not call back
> into serial drivers when they supply read characters from their interrupt
> functions.

The tty layer cannot fix this for now, and I don't intend to fix it. Fix
the serial driver: the fix is quite simple since you can keep a field in
the driver for now to detect recursive calling into the echo case and
don't relock.

Alan

2005-01-31 08:49:33

by Andrew Morton

[permalink] [raw]
Subject: Re: Deadlock in serial driver 2.6.x

Alan Cox <[email protected]> wrote:
>
> On Sul, 2005-01-30 at 16:48, Russell King wrote:
> > Unsolvable as the tty layer currently stands. tty needs to not call back
> > into serial drivers when they supply read characters from their interrupt
> > functions.
>
> The tty layer cannot fix this for now, and I don't intend to fix it. Fix
> the serial driver: the fix is quite simple since you can keep a field in
> the driver for now to detect recursive calling into the echo case and
> don't relock.

Are we sure that the serial driver is the only one which will hit this
deadlock?

2005-02-03 11:23:41

by Alan

[permalink] [raw]
Subject: Re: Deadlock in serial driver 2.6.x

On Llu, 2005-01-31 at 08:48, Andrew Morton wrote:
> > The tty layer cannot fix this for now, and I don't intend to fix it. Fix
> > the serial driver: the fix is quite simple since you can keep a field in
> > the driver for now to detect recursive calling into the echo case and
> > don't relock.
>
> Are we sure that the serial driver is the only one which will hit this
> deadlock?

Yes fairly sure. The feature has been a well known but non-documented
property of the tty layer since about 1.0. There are two ways I see to
clean it up - we
can have the serial driver behave like other drivers and if need be
known about
recursive entries or we could extend the driver interface with an "echo"
method used by line disciplines when calling back to the tty driver from
a data
receive event.

Alan

2005-02-03 18:25:41

by Andrew Morton

[permalink] [raw]
Subject: Re: Deadlock in serial driver 2.6.x

Alan Cox <[email protected]> wrote:
>
> On Llu, 2005-01-31 at 08:48, Andrew Morton wrote:
> > > The tty layer cannot fix this for now, and I don't intend to fix it. Fix
> > > the serial driver: the fix is quite simple since you can keep a field in
> > > the driver for now to detect recursive calling into the echo case and
> > > don't relock.
> >
> > Are we sure that the serial driver is the only one which will hit this
> > deadlock?
>
> Yes fairly sure. The feature has been a well known but non-documented
> property of the tty layer since about 1.0. There are two ways I see to
> clean it up - we
> can have the serial driver behave like other drivers and if need be
> known about
> recursive entries or we could extend the driver interface with an "echo"
> method used by line disciplines when calling back to the tty driver from
> a data
> receive event.

The "echo method" method sounds good. Do we think that's feasible for
2.6.11, or would it be safer to disable low-latency mode for that driver?

2005-02-04 11:08:31

by Martin Kögler

[permalink] [raw]
Subject: Re: Deadlock in serial driver 2.6.x

On Thu, Feb 03, 2005 at 10:21:12AM -0800, Andrew Morton wrote:
> The "echo method" method sounds good. Do we think that's feasible for
> 2.6.11, or would it be safer to disable low-latency mode for that driver?

As a temporary workaround, dropping the lock should also work:

--- linux-2.6.10/drivers/serial/8250.old.c 2005-02-03 14:07:48.557609200 +0100
+++ linux-2.6.10/drivers/serial/8250.c 2005-02-03 14:09:00.687887525 +0100
@@ -987,7 +987,11 @@
unsafe. It should be fixed ASAP */
if (unlikely(tty->flip.count >= TTY_FLIPBUF_SIZE)) {
if(tty->low_latency)
- tty_flip_buffer_push(tty);
+ {
+ spin_unlock(&up->port.lock);
+ tty_flip_buffer_push(tty);
+ spin_lock(&up->port.lock);
+ }
/* If this failed then we will throw away the
bytes but must do so to clear interrupts */
}
@@ -1058,7 +1062,9 @@
ignore_char:
lsr = serial_inp(up, UART_LSR);
} while ((lsr & UART_LSR_DR) && (max_count-- > 0));
+ spin_unlock(&up->port.lock);
tty_flip_buffer_push(tty);
+ spin_lock(&up->port.lock);
*status = lsr;
}

An other serial interrupt can not disturbed, because it will be blocked by the interrupt lock.
All other direct access to the UART seems to be protected by the port lock, so they may be only
execectued, while tty_flip_buffer_push is running and is not accessing the port.

I have this patch running on a 2.6.10 Kernel (Fedora Core 2 Version) and low latency mode
works without any problems (with and without SMP).

If that patch works under all conditons, I can't say. It may cause degrade the performance,
but for my work, I can't see any effect.

mfg Martin K?gler
[email protected]
PS: Please CC me on replies.

2005-02-04 13:56:13

by Paul Fulghum

[permalink] [raw]
Subject: Re: Deadlock in serial driver 2.6.x

Martin K?gler wrote:
> As a temporary workaround, dropping the lock should also work:

This looks good to me, and seems much more reasonable
that changing driver interfaces.

Treat tty_flip_buffer_push(tty) as something that
can call back into your driver (which *is* the case for low_latency),
so don't do anything that can cause a deadlock
when you call it.

--
Paul Fulghum
Microgate Systems, Ltd.