2009-07-20 17:52:19

by Daniel Mack

[permalink] [raw]
Subject: [PATCH] [usb-serial] fix Ooops on uplug

When an USB serial adapter is disconnected and <Enter> is pressed on a
connection still open on it (cu, minicom, ...), the kernel crashes.

The reason is that the serial port's resources are freed upon disconnect
(in serial_hangup()) but the tty device layer is not made aware of that.
Hence, the close callback will later access the resources again.

Fix that by postponing the resources freeing to the close callback and
remove it from the hangup callback.

Signed-off-by: Daniel Mack <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: Oliver Neukum <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: [email protected]
---
drivers/usb/serial/usb-serial.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c
index bd7581b..77cb3cd 100644
--- a/drivers/usb/serial/usb-serial.c
+++ b/drivers/usb/serial/usb-serial.c
@@ -355,7 +355,6 @@ static void serial_hangup(struct tty_struct *tty)
struct usb_serial_port *port = tty->driver_data;
serial_do_down(port);
tty_port_hangup(&port->port);
- serial_do_free(port);
}

static int serial_write(struct tty_struct *tty, const unsigned char *buf,
--
1.6.3.1


2009-07-20 21:50:35

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

> The reason is that the serial port's resources are freed upon
> disconnect (in serial_hangup()) but the tty device layer is not made
> aware of that. Hence, the close callback will later access the
> resources again.

tty_port_hangup() is part of the hangup path and the physical resources
should not be touched after the hangup completes. It's a good
indication of where the bug might be but its not I suspect the fix as
we now leak the resources.

2009-07-20 21:58:47

by Daniel Mack

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Mon, Jul 20, 2009 at 11:48:29PM +0100, Alan Cox wrote:
> > The reason is that the serial port's resources are freed upon
> > disconnect (in serial_hangup()) but the tty device layer is not made
> > aware of that. Hence, the close callback will later access the
> > resources again.
>
> tty_port_hangup() is part of the hangup path and the physical resources
> should not be touched after the hangup completes. It's a good
> indication of where the bug might be but its not I suspect the fix as
> we now leak the resources.

Depends on how you define the time of removal. The user space connection
stays open after the device was removed. And it is closed once the tty
layer finds out that it can't write to the hardware anymore. Once you
try to bring the tty to life, all resources are freed.

At least, this patch fixes a serious oops.

Daniel

2009-07-20 23:32:09

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Mon, 20 Jul 2009 19:51:53 +0200
Daniel Mack <[email protected]> wrote:

> When an USB serial adapter is disconnected and <Enter> is pressed on a
> connection still open on it (cu, minicom, ...), the kernel crashes.
>
> The reason is that the serial port's resources are freed upon disconnect
> (in serial_hangup()) but the tty device layer is not made aware of that.
> Hence, the close callback will later access the resources again.

I don't think that is the case. The hangup terminates access to the
resources and replaces the file operations at tty level. The tty
level close called will be the close for the hung up tty ops not the usb
device ops.

hangup() is a termination of access to the bus resources for that tty
handle.

Alan

2009-07-20 23:44:49

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

> Depends on how you define the time of removal. The user space connection
> stays open after the device was removed.

If that is occuring then the bug is elsewhere. The hang up sequence
reconnects the user space to the hung up tty ops and no longer references
the hardware.

2009-07-21 15:53:38

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Tue, 21 Jul 2009, Alan Cox wrote:

> > Depends on how you define the time of removal. The user space connection
> > stays open after the device was removed.
>
> If that is occuring then the bug is elsewhere. The hang up sequence
> reconnects the user space to the hung up tty ops and no longer references
> the hardware.

I got something similar with a pl2303 device, though not a crash. I
plugged in the device, opened /dev/ttyUSB0, unplugged the device, then
tried to read from the open file descriptor. The read provoked this:

[ 961.902428] WARNING: at kernel/lockdep.c:2621 __lock_acquire+0x395/0xaf5()
[ 961.902523] Hardware name:
[ 961.902608] Modules linked in: pl2303 usbserial sd_mod sg usb_storage scsi_mod evdev pcspkr e100 mii ohci_hcd uhci_hcd ehci_hcd floppy processor button thermal_sys usbcore [last unloaded: sd_mod]
[ 961.903538] Pid: 2536, comm: cat Not tainted 2.6.31-rc3 #1
[ 961.903630] Call Trace:
[ 961.903720] [<c1021718>] warn_slowpath_common+0x60/0x90
[ 961.903814] [<c1021755>] warn_slowpath_null+0xd/0x10
[ 961.903907] [<c103ed98>] __lock_acquire+0x395/0xaf5
[ 961.903999] [<c103ddb9>] ? mark_lock+0x1e/0x1e4
[ 961.904020] [<c103f540>] lock_acquire+0x48/0x64
[ 961.904020] [<c1126810>] ? tty_port_close_start+0x1a/0x118
[ 961.904020] [<c11c21ef>] _spin_lock_irqsave+0x2e/0x3e
[ 961.904020] [<c1126810>] ? tty_port_close_start+0x1a/0x118
[ 961.904020] [<c1126810>] tty_port_close_start+0x1a/0x118
[ 961.904020] [<f09649b5>] serial_close+0x4f/0x7b [usbserial]
[ 961.904020] [<c11215e3>] tty_release_dev+0x17c/0x400
[ 961.904020] [<c103d28e>] ? register_lock_class+0x17/0x272
[ 961.904020] [<c1121879>] tty_release+0x12/0x1c
[ 961.904020] [<c107201f>] __fput+0xe9/0x172
[ 961.904020] [<c10720c1>] fput+0x19/0x1c
[ 961.904020] [<c106f93c>] filp_close+0x51/0x5b
[ 961.904020] [<c106f9b0>] sys_close+0x6a/0xa4
[ 961.904020] [<c1002a08>] sysenter_do_call+0x12/0x36
[ 961.904020] ---[ end trace ed6ce19124f40616 ]---

This is only a lockdep warning, and I don't understand its
significance. Even worse, when I plugged in a USB flash drive
afterward this appeared:

[ 1093.156767] =============================================================================
[ 1093.156913] BUG kmalloc-1024: Poison overwritten
[ 1093.157003] -----------------------------------------------------------------------------
[ 1093.157006]
[ 1093.157223] INFO: 0xeea78c9c-0xeea78cab. First byte 0x6c instead of 0x6b
[ 1093.157335] INFO: Allocated in kzalloc+0xb/0xd [usbserial] age=41170 cpu=0 pid=483
[ 1093.157480] INFO: Freed in port_free+0x75/0x78 [usbserial] age=34856 cpu=0 pid=6
[ 1093.157619] INFO: Slab 0xc21c9060 objects=15 used=11 fp=0xeea78c90 flags=0x400040c3
[ 1093.157757] INFO: Object 0xeea78c90 @offset=3216 fp=0xeea7baa0

So it looks like something really is wrong, some sort of
use-after-free. Maybe a refcounting imbalance.

Alan Stern

2009-07-21 15:56:25

by Daniel Mack

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Tue, Jul 21, 2009 at 11:53:35AM -0400, Alan Stern wrote:
> [ 961.903814] [<c1021755>] warn_slowpath_null+0xd/0x10
> [ 961.903907] [<c103ed98>] __lock_acquire+0x395/0xaf5
> [ 961.903999] [<c103ddb9>] ? mark_lock+0x1e/0x1e4
> [ 961.904020] [<c103f540>] lock_acquire+0x48/0x64
> [ 961.904020] [<c1126810>] ? tty_port_close_start+0x1a/0x118
> [ 961.904020] [<c11c21ef>] _spin_lock_irqsave+0x2e/0x3e
> [ 961.904020] [<c1126810>] ? tty_port_close_start+0x1a/0x118
> [ 961.904020] [<c1126810>] tty_port_close_start+0x1a/0x118
> [ 961.904020] [<f09649b5>] serial_close+0x4f/0x7b [usbserial]
> [ 961.904020] [<c11215e3>] tty_release_dev+0x17c/0x400
> [ 961.904020] [<c103d28e>] ? register_lock_class+0x17/0x272
> [ 961.904020] [<c1121879>] tty_release+0x12/0x1c
> [ 961.904020] [<c107201f>] __fput+0xe9/0x172
> [ 961.904020] [<c10720c1>] fput+0x19/0x1c
> [ 961.904020] [<c106f93c>] filp_close+0x51/0x5b
> [ 961.904020] [<c106f9b0>] sys_close+0x6a/0xa4
> [ 961.904020] [<c1002a08>] sysenter_do_call+0x12/0x36
> [ 961.904020] ---[ end trace ed6ce19124f40616 ]---
>
> This is only a lockdep warning, and I don't understand its
> significance. Even worse, when I plugged in a USB flash drive
> afterward this appeared:

The problem is that freed resources are accessed, and that can lead to
any kind of strange kernel faults from crashing to other subsequent
errors.

Did you try my patch?

2009-07-21 16:00:00

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug


> The problem is that freed resources are accessed, and that can lead to
> any kind of strange kernel faults from crashing to other subsequent
> errors.
>
> Did you try my patch?

We've already established your patch is wrong. So trying it isn't at all
helpful.

2009-07-21 16:11:42

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Tue, 21 Jul 2009, Alan Cox wrote:

> > The problem is that freed resources are accessed, and that can lead to
> > any kind of strange kernel faults from crashing to other subsequent
> > errors.
> >
> > Did you try my patch?

No.

> We've already established your patch is wrong. So trying it isn't at all
> helpful.

I'll take your word for it. It will take some time to figure out
exactly which buffers are getting used after they are freed, although
the most likely is the usb_serial_port structure itself.

Let you know what I find...

Alan Stern

2009-07-21 16:15:21

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

> > If that is occuring then the bug is elsewhere. The hang up sequence
> > reconnects the user space to the hung up tty ops and no longer references
> > the hardware.
>
> I got something similar with a pl2303 device, though not a crash. I
> plugged in the device, opened /dev/ttyUSB0, unplugged the device, then
> tried to read from the open file descriptor. The read provoked this:

That looks like it occurs after the read, however that trace shows the
close() method being called off sys_close() which in turn means a hang up
didn't occur when it was unplugged.

> This is only a lockdep warning, and I don't understand its
> significance. Even worse, when I plugged in a USB flash drive
> afterward this appeared:

Looks like something freed the resources but didn't hang up when the
disconnect occurred

> So it looks like something really is wrong, some sort of
> use-after-free. Maybe a refcounting imbalance.

A tty getting destructed before it should have been would perhaps do some
of it but it doesn't explain how the close path got where it did.

What should be occurring is

USB disconnect
usb_serial_disconnect(interface)
tty_hangup(tty) [this is buggy and commented as such in
the USB code as it should do it synchronously]

tty_hangup()
do_tty_hangup()

file->f_ops = &hung_up_tty_ops;

(so the USB close will never be called)

ldisc hangup
tty->ops->hangup (no-op on USB serial)

so the trace to me implies that the usb_serial_disconnect is not
happening. That in turn leads the close method to be called on a
disconnected port which in turn crashes stuff.

(and it also explains why Daniel's change although bogus stops the crash)

2009-07-21 16:17:08

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Tue, 21 Jul 2009 12:11:39 -0400 (EDT)
Alan Stern <[email protected]> wrote:

> On Tue, 21 Jul 2009, Alan Cox wrote:
>
> > > The problem is that freed resources are accessed, and that can lead to
> > > any kind of strange kernel faults from crashing to other subsequent
> > > errors.
> > >
> > > Did you try my patch?
>
> No.
>
> > We've already established your patch is wrong. So trying it isn't at all
> > helpful.
>
> I'll take your word for it. It will take some time to figure out
> exactly which buffers are getting used after they are freed, although
> the most likely is the usb_serial_port structure itself.
>
> Let you know what I find...

On the tty side if you look in tty_io.c there is a define

TTY_DEBUG_HANGUP

which exists especially for these sorts of events..

2009-07-21 16:19:35

by Daniel Mack

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Tue, Jul 21, 2009 at 05:16:25PM +0100, Alan Cox wrote:
> > I got something similar with a pl2303 device, though not a crash. I
> > plugged in the device, opened /dev/ttyUSB0, unplugged the device, then
> > tried to read from the open file descriptor. The read provoked this:
>
> That looks like it occurs after the read, however that trace shows the
> close() method being called off sys_close() which in turn means a hang up
> didn't occur when it was unplugged.

It was, at least in my case. Did some printk debugging and it showed
that the hangup callback was entered what freed the serial device.

> > This is only a lockdep warning, and I don't understand its
> > significance. Even worse, when I plugged in a USB flash drive
> > afterward this appeared:
>
> Looks like something freed the resources but didn't hang up when the
> disconnect occurred

Probably the real fix would be to let the tty layer know the device died
from the hangup handler.

2009-07-21 16:27:37

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

> It was, at least in my case. Did some printk debugging and it showed
> that the hangup callback was entered what freed the serial device.

Ok
>
> > > This is only a lockdep warning, and I don't understand its
> > > significance. Even worse, when I plugged in a USB flash drive
> > > afterward this appeared:
> >
> > Looks like something freed the resources but didn't hang up when the
> > disconnect occurred
>
> Probably the real fix would be to let the tty layer know the device died
> from the hangup handler.

The hangup event means the physical hardware must no longer be touched so
that's already the case and we use it for a lot of things (from ISAPnP to
PCMCIA to USB).

The USB code does slightly misimplement it - it calls tty_hangup() in
usb_serial_disconnect when it should call tty_vhangup() as it needs the
event to be synchronous. That is the hangup of the tty should be done
before the USB layer can sneak off and free stuff.

I don't at this point see that as explaining the bug however

2009-07-21 18:36:44

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Tue, 21 Jul 2009, Alan Cox wrote:

> A tty getting destructed before it should have been would perhaps do some
> of it but it doesn't explain how the close path got where it did.
>
> What should be occurring is
>
> USB disconnect
> usb_serial_disconnect(interface)
> tty_hangup(tty) [this is buggy and commented as such in
> the USB code as it should do it synchronously]

I'll change it to tty_vhangup(tty) eventually.

> tty_hangup()
> do_tty_hangup()
> file->f_ops = &hung_up_tty_ops;
>
> (so the USB close will never be called)

That doesn't make sense. The driver's open method has been called, so
of course the driver expects its close method to be called.

> ldisc hangup
> tty->ops->hangup (no-op on USB serial)

What do you mean? There is a serial_hangup method in usb-serial.c
and it does get called; see below.

> so the trace to me implies that the usb_serial_disconnect is not
> happening. That in turn leads the close method to be called on a
> disconnected port which in turn crashes stuff.

No, you're wrong. Here is a log with some extra debugging lines added:

Open /dev/ttyUSB0:
[ 278.680845] opening ttyUSB0...
[ 278.689268] uhci_hcd 0000:00:1d.1: reserve dev 2 ep81-INT, period 1, phase 0, 19 us
[ 278.690246] pl2303 ttyUSB0: serial_open port 0 (ef7fd920), refcount 6
[ 278.696682] opening tty2...
[ 278.696766] tty_release_dev of tty2 (tty count=2)...
[ 278.699716] opening tty2...

Unplug the pl2303 device (a bunch of uninteresting USB messages have
been left out):
[ 283.618244] usb 2-2: unregistering interface 2-2:1.0
[ 283.618864] pl2303 ttyUSB0: serial_disconnect port 0, refcount 6

The line above was inserted in serial_disconnect.

[ 283.618958] ttyUSB0 hangup...
[ 283.619722] pl2303 ttyUSB0: pl2303 converter now disconnected from ttyUSB0
[ 283.619974] pl2303 2-2:1.0: device disconnected
[ 283.620244] usb 2-2:1.0: uevent
[ 283.621302] usb 2-2: uevent
[ 283.623660] usb-serial ttyUSB0: serial_hangup port 0, refcount 2

The line above was inserted in serial_hangup, as was the following
stack dump.

[ 283.623765] Pid: 6, comm: events/0 Not tainted 2.6.31-rc3 #2
[ 283.623859] Call Trace:
[ 283.623954] [<c11c0111>] ? printk+0xf/0x11
[ 283.624088] [<f08b09d7>] serial_hangup+0x45/0x66 [usbserial]
[ 283.624187] [<c112018c>] do_tty_hangup+0x28c/0x2b9
[ 283.624282] [<c102e6fe>] worker_thread+0x168/0x247
[ 283.624374] [<c102e6b9>] ? worker_thread+0x123/0x247
[ 283.624466] [<c111ff00>] ? do_tty_hangup+0x0/0x2b9
[ 283.624560] [<c1031b57>] ? autoremove_wake_function+0x0/0x33
[ 283.624653] [<c102e596>] ? worker_thread+0x0/0x247
[ 283.624744] [<c10318f1>] kthread+0x69/0x6e
[ 283.624834] [<c1031888>] ? kthread+0x0/0x6e
[ 283.624926] [<c10034ef>] kernel_thread_helper+0x7/0x10
[ 283.625144] usb-serial ttyUSB0: serial_do_free port 0, refcount 2
[ 283.625314] usb-serial ttyUSB0: destroy_serial port 0, refcount 1
[ 283.625408] port_free port 0, refcount 0

Close the open file descriptor:
[ 291.227977] tty_release_dev of tty2 (tty count=2)...
[ 291.230492] tty_release_dev of ttyUSB0 (tty count=1)...
[ 291.230630] serial_close port 107 (ef7fd920)

That line was inserted in serial_close. As you can see, the port
number is wrong because the port structure has already been
deallocated by port_free. And that leads to the following corruption.

[ 291.230772] ------------[ cut here ]------------
[ 291.230943] WARNING: at kernel/lockdep.c:2621 __lock_acquire+0x395/0xaf5()
[ 291.231037] Hardware name:
[ 291.231123] Modules linked in: pl2303 usbserial pcspkr evdev e100 mii ohci_hcd ehci_hcd uhci_hcd floppy processor button thermal_sys usbcore [last unloaded: scsi_wait_scan]
[ 291.231901] Pid: 2037, comm: cat Not tainted 2.6.31-rc3 #2
[ 291.231990] Call Trace:
[ 291.232042] [<c1021718>] warn_slowpath_common+0x60/0x90
[ 291.232042] [<c1021755>] warn_slowpath_null+0xd/0x10
[ 291.232042] [<c103ed98>] __lock_acquire+0x395/0xaf5
[ 291.232042] [<c1021ca4>] ? release_console_sem+0x197/0x1c4
[ 291.232042] [<c103f540>] lock_acquire+0x48/0x64
[ 291.232042] [<c112689c>] ? tty_port_close_start+0x1a/0x118
[ 291.232042] [<c11c227f>] _spin_lock_irqsave+0x2e/0x3e
[ 291.232042] [<c112689c>] ? tty_port_close_start+0x1a/0x118
[ 291.232042] [<c112689c>] tty_port_close_start+0x1a/0x118
[ 291.232042] [<f08b0a5a>] serial_close+0x62/0x91 [usbserial]
[ 291.232042] [<c1120c2b>] tty_release_dev+0x191/0x41f
[ 291.232042] [<c103d28e>] ? register_lock_class+0x17/0x272
[ 291.232042] [<c1120ecb>] tty_release+0x12/0x1c
[ 291.232042] [<c107201f>] __fput+0xe9/0x172
[ 291.232042] [<c10720c1>] fput+0x19/0x1c
[ 291.232042] [<c106f93c>] filp_close+0x51/0x5b
[ 291.232042] [<c106f9b0>] sys_close+0x6a/0xa4
[ 291.232042] [<c1002a08>] sysenter_do_call+0x12/0x36
[ 291.232042] ---[ end trace 5e364c88669fab14 ]---
[ 291.233897] freeing tty structure...

Does this help pin down the source of the problem?

Alan Stern

2009-07-21 19:20:25

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

> > file->f_ops = &hung_up_tty_ops;
> >
> > (so the USB close will never be called)
>
> That doesn't make sense. The driver's open method has been called, so
> of course the driver expects its close method to be called.

The driver close method yes - but that uses tty_port_close_start() which
sees the port was hung up and leaves well alone.

> > ldisc hangup
> > tty->ops->hangup (no-op on USB serial)
>
> What do you mean? There is a serial_hangup method in usb-serial.c
> and it does get called; see below.

Thats me not reading carefully. It isn't just a resource free it does
stuff. That calls drv->close() so in fact the USB layer for want of a
better word "fakes" the close.

> [ 283.624088] [<f08b09d7>] serial_hangup+0x45/0x66 [usbserial]
> [ 283.624187] [<c112018c>] do_tty_hangup+0x28c/0x2b9

So we passed

/* This breaks for file handles being sent over AF_UNIX sockets ?
*/ list_for_each_entry(filp, &tty->tty_files, f_u.fu_list) {
if (filp->f_op->write == redirected_tty_write)
cons_filp = filp;
if (filp->f_op->write != tty_write)
continue;
closecount++;
tty_fasync(-1, filp, 0); /* can't block */
filp->f_op = &hung_up_tty_fops;
}

and changed the fops. As you say my theory was completely wrong

> Close the open file descriptor:
> [ 291.227977] tty_release_dev of tty2 (tty count=2)...
> [ 291.230492] tty_release_dev of ttyUSB0 (tty count=1)...
> [ 291.230630] serial_close port 107 (ef7fd920)
>
> That line was inserted in serial_close. As you can see, the port
> number is wrong because the port structure has already been
> deallocated by port_free. And that leads to the following corruption.

Bingo - and that in turn means that the tty layer doesn't realise the
port has been hung up which makes tty_port_close_start do random things
which causes us to double free.

So in fact we need to delay the resource free until the tty layer has
really finished with it as the port resource contains the tty layer port.

We can't just skip freeing the resources in the hangup method as
tty_port_close_start() will return 0 and leak them on a hangup.

Alan: does this make sense

Take an extra tty layer reference to the usb_serial at open time

Put that reference in the tty shutdown() hook which is called
when the tty struct gets its final kref_put (ie after the close,
and if there is any outstanding other use eg in an IRQ handler on
another processor).

Am I understanding the usb_serial_port lifetime correctly ?

2009-07-21 21:38:51

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Tue, 21 Jul 2009, Alan Cox wrote:

> > > ldisc hangup
> > > tty->ops->hangup (no-op on USB serial)
> >
> > What do you mean? There is a serial_hangup method in usb-serial.c
> > and it does get called; see below.
>
> Thats me not reading carefully. It isn't just a resource free it does
> stuff. That calls drv->close() so in fact the USB layer for want of a
> better word "fakes" the close.

I see. Both serial_hangup() and serial_close() call serial_do_down()
and thus drv->close. So we obviously don't want both of them to be
called if the device is unplugged before the file is closed. But that
is just what tty_release_dev does...

> > [ 283.624088] [<f08b09d7>] serial_hangup+0x45/0x66 [usbserial]
> > [ 283.624187] [<c112018c>] do_tty_hangup+0x28c/0x2b9
>
> So we passed
>
> /* This breaks for file handles being sent over AF_UNIX sockets ?
> */ list_for_each_entry(filp, &tty->tty_files, f_u.fu_list) {
> if (filp->f_op->write == redirected_tty_write)
> cons_filp = filp;
> if (filp->f_op->write != tty_write)
> continue;
> closecount++;
> tty_fasync(-1, filp, 0); /* can't block */
> filp->f_op = &hung_up_tty_fops;
> }
>
> and changed the fops. As you say my theory was completely wrong
>
> > Close the open file descriptor:
> > [ 291.227977] tty_release_dev of tty2 (tty count=2)...
> > [ 291.230492] tty_release_dev of ttyUSB0 (tty count=1)...
> > [ 291.230630] serial_close port 107 (ef7fd920)
> >
> > That line was inserted in serial_close. As you can see, the port
> > number is wrong because the port structure has already been
> > deallocated by port_free. And that leads to the following corruption.
>
> Bingo - and that in turn means that the tty layer doesn't realise the
> port has been hung up which makes tty_port_close_start do random things
> which causes us to double free.
>
> So in fact we need to delay the resource free until the tty layer has
> really finished with it as the port resource contains the tty layer port.

By "port resource" you mean the usb_serial_port structure, right?

> We can't just skip freeing the resources in the hangup method as
> tty_port_close_start() will return 0 and leak them on a hangup.

Wait a second. Does the same serial_hangup() routine get called when
an RS-232 (for example) hangup event occurs, like DCD turning off, and
when the USB device is unplugged? Sure, the second implies the first,
but they need to be treated differently. After the first, the hardware
is still present and so the port resources shouldn't be released.

> Alan: does this make sense
>
> Take an extra tty layer reference to the usb_serial at open time
>
> Put that reference in the tty shutdown() hook which is called
> when the tty struct gets its final kref_put (ie after the close,
> and if there is any outstanding other use eg in an IRQ handler on
> another processor).

I don't think that will work; we mustn't deallocate the usb_serial_port
structures before calling serial->type->release(), which happens in
destroy_serial(). Yes, this is stupid, but some of the subdrivers
depend on it.

But the real problem isn't the references to the usb_serial. It seems
like a mistake to call serial_do_free() during serial_hangup() -- you
can fake a close but you can't fake a release. It's probably also
wrong to call serial_do_down().

> Am I understanding the usb_serial_port lifetime correctly ?

Perhaps not; I'll explain. It's very simplistic, because when I wrote
it I didn't know what was going on with the tty layer internals. (I
still don't, come to that.)

So usb_serial_port gets treated like any other character device. The
refcount is initialized to 1, it gets incremented during serial_open(),
decremented during serial_do_free() -- which used to be during
serial_close(), and then decremented a final time in destroy_serial().

Meanwhile, the higher usb_serial structure has a refcount also. It
gets incremented during serial_open(), decremented during
serial_do_free(), and decremented finally during
usb_serial_disconnect().

Logically, the usb_serial structure should exist only as long as any of
the usb_serial_ports need it. So logically, each of them should take a
reference to it as they are created and drop their reference as they
are deallocated, as you suggested. But since they can't get
deallocated until after the usb_serial's refcount has dropped to 0, I
had to use a more roundabout method.

The usb_serial_port structures should exist as long as they are needed,
which means as long as the USB device is connected or the tty device
file is open. That's how my original design was meant to work, but it
is now messed up by the fact that we get two "close" events -- a fake
one during disconnect and then the real one later.

Eliminating the fake calls seems like the cleanest solution.
Alternatively, we could keep the fake close (but not the fake release!)
and change serial_close() so that it calls serial_do_free() even if
tty_port_close_start returns 0.

Alan Stern

2009-07-21 22:54:16

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

> > We can't just skip freeing the resources in the hangup method as
> > tty_port_close_start() will return 0 and leak them on a hangup.
>
> Wait a second. Does the same serial_hangup() routine get called when
> an RS-232 (for example) hangup event occurs, like DCD turning off, and
> when the USB device is unplugged? Sure, the second implies the first,
> but they need to be treated differently. After the first, the hardware
> is still present and so the port resources shouldn't be released.

The sequences of behaviour for the tty interface are usually

open
allocate resources
start uart

do stuff

close
stop uart
free resources

and if a hang up occurs:

open
allocate resources
start uart
do stuff

hangup event
stop uart
free resources
wake any pending openers

[pending open completes]
allocate resources
start uart

[original opener closes]
close
was hung up
do nothing much


> Perhaps not; I'll explain. It's very simplistic, because when I wrote
> it I didn't know what was going on with the tty layer internals. (I
> still don't, come to that.)

Ok so between us we know the answer I hope 8)

> The usb_serial_port structures should exist as long as they are needed,
> which means as long as the USB device is connected or the tty device
> file is open. That's how my original design was meant to work, but it
> is now messed up by the fact that we get two "close" events -- a fake
> one during disconnect and then the real one later.

The tty notion of "open" is really open->hangup or open->close. Once the
hangup occurs you may have a file handle to a tty object but it doesn't
talk to hardware. You need to open it again to get access again. Between
hangup and close you are sitting on a dead file handle that returns
errors if you use it. The hardware is owned by whoever opened it next.

The historical model for this is from dial in. A carrier drop is
effectively a secure attention key, it has to kill off anything left on
the port so an evil user cannot leave a key logger active on the port.

> Eliminating the fake calls seems like the cleanest solution.
> Alternatively, we could keep the fake close (but not the fake release!)
> and change serial_close() so that it calls serial_do_free() even if
> tty_port_close_start returns 0.

There are several other cases where tty_port_close_start returns zero
(such as multiple openers and not being the last one)

Alan

2009-07-22 14:44:31

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Tue, 21 Jul 2009, Alan Cox wrote:

> The sequences of behaviour for the tty interface are usually
>
> open
> allocate resources
> start uart
>
> do stuff
>
> close
> stop uart
> free resources
>
> and if a hang up occurs:
>
> open
> allocate resources
> start uart
> do stuff
>
> hangup event
> stop uart
> free resources
> wake any pending openers

Where exactly is the code that wakes the other openers?

>
> [pending open completes]

Does this completion involve calling tty->ops->open()? Or was it
called earlier, before the open was forced to block?

> allocate resources
> start uart

To rephrase the question above, how is the device driver made aware
that it needs to reallocate resources and restart the uart?

> [original opener closes]
> close
> was hung up
> do nothing much

There's an obvious race here between hangup and close. The assignment
of hung_up_tty_fops to filp->f_op is protected by the BKL and not much
else. Does the code in tty_release_dev() check to see whether this
assignment has been made before calling tty->ops->close()? It doesn't
like like it to me. With the wrong timing, you could end up telling
the device driver to stop the uart twice.


> The tty notion of "open" is really open->hangup or open->close. Once the
> hangup occurs you may have a file handle to a tty object but it doesn't
> talk to hardware.

But it still talks to the device driver via tty_release_dev's call to
tty->ops->close. How is the driver supposed to know that this method
call shouldn't talk to the hardware?

In fact, what point is there in making the call at all? Once the
hangup has occurred, the driver shouldn't do _anything_ when the
corresponding release happens. As you say, the notion is open->hangup
or open->close, not open->(hangup followed by close).

> You need to open it again to get access again. Between
> hangup and close you are sitting on a dead file handle that returns
> errors if you use it. The hardware is owned by whoever opened it next.
>
> The historical model for this is from dial in. A carrier drop is
> effectively a secure attention key, it has to kill off anything left on
> the port so an evil user cannot leave a key logger active on the port.

What about the other historical model, a user sitting at a terminal and
telling his modem to dial-out? One wouldn't think the modem's device
file would need to be reopened between calls.

> > Eliminating the fake calls seems like the cleanest solution.
> > Alternatively, we could keep the fake close (but not the fake release!)
> > and change serial_close() so that it calls serial_do_free() even if
> > tty_port_close_start returns 0.
>
> There are several other cases where tty_port_close_start returns zero
> (such as multiple openers and not being the last one)

I've seen the patch you just posted, and I agree -- it isn't the right
fix for the long run. By all means, call tty->ops->shutdown in process
context and that's where we will release the port resources.

But the overall intent still isn't clear, not without the answers to
the questions above.

Alan Stern

2009-07-22 16:16:23

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

> > free resources
> > wake any pending openers
>
> Where exactly is the code that wakes the other openers?

tty_port_close_end wakes port->open_wait

if another opener was blocked during the hangup they then exit
tty_blocked_until_ready and error

if another open occurs after the hangup completes (but while the other
use has the port open bug hung up then they will execute

serial_open()
->type->open()

> To rephrase the question above, how is the device driver made aware
> that it needs to reallocate resources and restart the uart?

For the case where it needs to do that the hangup will have finished and
it will get another call to serial_open()

> There's an obvious race here between hangup and close. The assignment
> of hung_up_tty_fops to filp->f_op is protected by the BKL and not much
> else. Does the code in tty_release_dev() check to see whether this
> assignment has been made before calling tty->ops->close()? It doesn't
> like like it to me. With the wrong timing, you could end up telling
> the device driver to stop the uart twice.

The core hangup and close code are interlocked (now - didn't use to be).

> > The tty notion of "open" is really open->hangup or open->close. Once the
> > hangup occurs you may have a file handle to a tty object but it doesn't
> > talk to hardware.
>
> But it still talks to the device driver via tty_release_dev's call to
> tty->ops->close. How is the driver supposed to know that this method
> call shouldn't talk to the hardware?

tty_hung_up_p() will be true

> In fact, what point is there in making the call at all? Once the
> hangup has occurred, the driver shouldn't do _anything_ when the
> corresponding release happens. As you say, the notion is open->hangup
> or open->close, not open->(hangup followed by close).

Beats me - not something I designed. However the driver would always need
to be aware of it because the following can occur

CPU1 CPU2
close begins
hangup
update ops
close handler runs

The tty_port code handles that internally, but has to handle it anyway.

There are similar issues with all the other calls if they are pending and
I've not even begun to tackle them yet as they are basically
inconveniences only. Also because I'm still hoping someone will implement
revoke() on Linux and do all the hard work for me.

> What about the other historical model, a user sitting at a terminal and
> telling his modem to dial-out? One wouldn't think the modem's device
> file would need to be reopened between calls.

The termios flags control this behaviour.

Alan

2009-07-22 18:21:27

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Wed, 22 Jul 2009, Alan Cox wrote:

> > > free resources
> > > wake any pending openers
> >
> > Where exactly is the code that wakes the other openers?
>
> tty_port_close_end wakes port->open_wait
>
> if another opener was blocked during the hangup they then exit
> tty_blocked_until_ready and error

Hmmm. serial_open() doesn't call tty_port_block_til_ready() until just
before it returns. Shouldn't it do this before locking port->mutex and
incrementing port->port.count?

In fact, should usb-serial.c be touching port->port.count at all? Is
it reserved for use by the tty core?

> > There's an obvious race here between hangup and close. The assignment
> > of hung_up_tty_fops to filp->f_op is protected by the BKL and not much
> > else. Does the code in tty_release_dev() check to see whether this
> > assignment has been made before calling tty->ops->close()? It doesn't
> > like like it to me. With the wrong timing, you could end up telling
> > the device driver to stop the uart twice.
>
> The core hangup and close code are interlocked (now - didn't use to be).

But are they interlocked enough?

> > > The tty notion of "open" is really open->hangup or open->close. Once the
> > > hangup occurs you may have a file handle to a tty object but it doesn't
> > > talk to hardware.
> >
> > But it still talks to the device driver via tty_release_dev's call to
> > tty->ops->close. How is the driver supposed to know that this method
> > call shouldn't talk to the hardware?
>
> tty_hung_up_p() will be true

With no synchronization. So there's still a race.

Why doesn't tty_release_dev() test tty_hung_up_p() before calling
tty->ops->close() instead of making the driver do it?

> > In fact, what point is there in making the call at all? Once the
> > hangup has occurred, the driver shouldn't do _anything_ when the
> > corresponding release happens. As you say, the notion is open->hangup
> > or open->close, not open->(hangup followed by close).
>
> Beats me - not something I designed. However the driver would always need
> to be aware of it because the following can occur
>
> CPU1 CPU2
> close begins
> hangup
> update ops
> close handler runs
>
> The tty_port code handles that internally, but has to handle it anyway.

This is the race I have been talking about.

> There are similar issues with all the other calls if they are pending and
> I've not even begun to tackle them yet as they are basically
> inconveniences only. Also because I'm still hoping someone will implement
> revoke() on Linux and do all the hard work for me.

:-)

Alan Stern


2009-07-22 22:34:40

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

> Hmmm. serial_open() doesn't call tty_port_block_til_ready() until just
> before it returns. Shouldn't it do this before locking port->mutex and
> incrementing port->port.count?

There shouldn't be a serial_open or serial_close as such IMHO, but I can't
simply rewrite everything in one go. Instead commonality is getting
extracted piece by piece.

> In fact, should usb-serial.c be touching port->port.count at all? Is
> it reserved for use by the tty core?

Ultimately I am pretty sure that for open and close usb serial should in
fact have to provide

- power management hooks
- hardware init/shutdown hooks
- modem control
- modem status

and nothing else. The same model will fit all the other drivers but they
all currently contain different code for this and do stuff differently.
Power to some means PCI D3 to USB means the usb autopm etc.
Hibernate/Resume has to fit there somewhere and I'll freely admit I've
not quite figured out how that should go together with this model.

So far I've extract the modem hooks needed for open/close (DTR control,
CD state, wakeups) and some of the code duplicated everywhere in
open/close/hangup (and which in USB serial were basically a work of
fiction with no resemblance to the standard). The open is only half done
so far (hence the wait_until helper) and the close path is not all there
- hence close_start/close_end.

At the moment I have two main goals

- Get to the point where every serial port in the system contains a tty
port
- Remove the 10 million alternative mis-implementations of half the tty
code found in each driver or in "glue" that does what the tty layer
should be doing (half of usb-serial, most of the uart layer)

All without being able to just break it for six months and drop support
for most of the hardware.

> > > There's an obvious race here between hangup and close. The assignment
> > > of hung_up_tty_fops to filp->f_op is protected by the BKL and not much
> > > else. Does the code in tty_release_dev() check to see whether this
> > > assignment has been made before calling tty->ops->close()? It doesn't
> > > like like it to me. With the wrong timing, you could end up telling
> > > the device driver to stop the uart twice.
> >
> > The core hangup and close code are interlocked (now - didn't use to be).
>
> But are they interlocked enough?

I think so at this point. I may be wrong. I was wrong about it earlier
this -rc series ;)

> Why doesn't tty_release_dev() test tty_hung_up_p() before calling
> tty->ops->close() instead of making the driver do it?

Ask Ted, see if he can remember why he did it that way 15 years ago on a
non SMP non lock using (except for kernel mode non-pre-emption) kernel.
Basically the tty layer was designed for a different world and then
patched up badly. The original design was wrong in several areas and the
result is a can of worms that eventually had to be opened.

Alan

2009-07-22 22:45:09

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug

On Wed, 22 Jul 2009, Alan Cox wrote:

> At the moment I have two main goals
>
> - Get to the point where every serial port in the system contains a tty
> port
> - Remove the 10 million alternative mis-implementations of half the tty
> code found in each driver or in "glue" that does what the tty layer
> should be doing (half of usb-serial, most of the uart layer)
>
> All without being able to just break it for six months and drop support
> for most of the hardware.

Okay. If there's anything more I can do to help with updating
usb-serial, let me know.

Alan Stern

2009-07-22 22:46:55

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [usb-serial] fix Ooops on uplug


> Okay. If there's anything more I can do to help with updating
> usb-serial, let me know.

Scream at me when I break it 8)