2011-05-26 00:24:31

by Andreas Bombe

[permalink] [raw]
Subject: tty_lock held during transmit wait in close: still unresolved

Short reminder/summary: Unlike the BKL, the tty_lock mechanism that
replaced it does not release the lock while sleeping. This causes the
tty_lock to be held for extended periods of time when uart_close()
(running with tty_lock held) tries to flush the transmit buffer but is
unable to do so.

The result is that until the transmit flush completes by the remote end
finally accepting the data or the driver timing out and giving up,
pretty much all tty operations freeze. No programs can be started
and for some reason X freezes for me. From a user viewpoint, the
computer effectively locks up for that time.

After I tracked the issue down to the tty_lock mechanism, I found some
discussion on LKML about that from last November (thread starting here:
http://lkml.kernel.org/r/[email protected] ). However
nothing has come of it, it appears, at least not in the official kernel.

A minimalist way to trigger this issue is with a serial port that has
nothing attached:

stty -F /dev/ttyS0 crtscts
echo >/dev/ttyS0

The echo triggers a 30 second timeout on close while the driver is
trying to flush the newline. Another way is developing an USB device
with a virtual serial port and having it stop (by debugger or
crash/lockup)...

Any ideas on how to progress?

--
Andreas Bombe


2011-05-26 07:14:27

by Greg KH

[permalink] [raw]
Subject: Re: tty_lock held during transmit wait in close: still unresolved

On Thu, May 26, 2011 at 01:59:22AM +0200, Andreas Bombe wrote:
> Short reminder/summary: Unlike the BKL, the tty_lock mechanism that
> replaced it does not release the lock while sleeping. This causes the
> tty_lock to be held for extended periods of time when uart_close()
> (running with tty_lock held) tries to flush the transmit buffer but is
> unable to do so.
>
> The result is that until the transmit flush completes by the remote end
> finally accepting the data or the driver timing out and giving up,
> pretty much all tty operations freeze. No programs can be started
> and for some reason X freezes for me. From a user viewpoint, the
> computer effectively locks up for that time.
>
> After I tracked the issue down to the tty_lock mechanism, I found some
> discussion on LKML about that from last November (thread starting here:
> http://lkml.kernel.org/r/[email protected] ). However
> nothing has come of it, it appears, at least not in the official kernel.
>
> A minimalist way to trigger this issue is with a serial port that has
> nothing attached:
>
> stty -F /dev/ttyS0 crtscts
> echo >/dev/ttyS0
>
> The echo triggers a 30 second timeout on close while the driver is
> trying to flush the newline. Another way is developing an USB device
> with a virtual serial port and having it stop (by debugger or
> crash/lockup)...
>
> Any ideas on how to progress?

Can you try Linus's tree right now? A number of changes went in during
this merge window that might help out here I think.

thanks,

greg k-h

2011-05-26 08:15:52

by Alan

[permalink] [raw]
Subject: Re: tty_lock held during transmit wait in close: still unresolved

> A minimalist way to trigger this issue is with a serial port that has
> nothing attached:
>
> stty -F /dev/ttyS0 crtscts
> echo >/dev/ttyS0
>
> The echo triggers a 30 second timeout on close while the driver is
> trying to flush the newline. Another way is developing an USB device
> with a virtual serial port and having it stop (by debugger or
> crash/lockup)...
>
> Any ideas on how to progress?

Send patches.

At any point you can show the code sleeps you can drop and retake the
tty mutex either side of it, so you should be able to do that in the
close timeout case. You may need to think about the order of locking with
the port mutex but I suspect you can drop and retake both there.

Most use of the mutex (but not quite all) could also be moved to a tty
specific mutex fairly easily once someone has time, but that alone won't
entirely fix your case.

Alan

2011-05-27 00:29:33

by Andreas Bombe

[permalink] [raw]
Subject: Re: tty_lock held during transmit wait in close: still unresolved

On Thu, May 26, 2011 at 12:11:04AM -0700, Greg KH wrote:
> On Thu, May 26, 2011 at 01:59:22AM +0200, Andreas Bombe wrote:
> > A minimalist way to trigger this issue is with a serial port that has
> > nothing attached:
> >
> > stty -F /dev/ttyS0 crtscts
> > echo >/dev/ttyS0
> >
> > The echo triggers a 30 second timeout on close while the driver is
> > trying to flush the newline. Another way is developing an USB device
> > with a virtual serial port and having it stop (by debugger or
> > crash/lockup)...
> >
> > Any ideas on how to progress?
>
> Can you try Linus's tree right now? A number of changes went in during
> this merge window that might help out here I think.

I think I had those changes already in the last try (Wednesday's git)
but I tried again with the very latest. No noticeable change.

--
Andreas Bombe

2011-05-27 00:41:10

by Andreas Bombe

[permalink] [raw]
Subject: Re: tty_lock held during transmit wait in close: still unresolved

On Thu, May 26, 2011 at 09:17:26AM +0100, Alan Cox wrote:
> > A minimalist way to trigger this issue is with a serial port that has
> > nothing attached:
> >
> > stty -F /dev/ttyS0 crtscts
> > echo >/dev/ttyS0
> >
> > The echo triggers a 30 second timeout on close while the driver is
> > trying to flush the newline. Another way is developing an USB device
> > with a virtual serial port and having it stop (by debugger or
> > crash/lockup)...
> >
> > Any ideas on how to progress?
>
> Send patches.

I would have, but I'm not exactly familiar with the tty code and I
thought messing around with the locking is probably not the most
promising way to start.

However…

> At any point you can show the code sleeps you can drop and retake the
> tty mutex either side of it, so you should be able to do that in the
> close timeout case. You may need to think about the order of locking with
> the port mutex but I suspect you can drop and retake both there.

…basically emulating the BKL semantics? Sounds more doable. I'll look
into it.

Of course that means it has to be done individually in all drivers.

> Most use of the mutex (but not quite all) could also be moved to a tty
> specific mutex fairly easily once someone has time, but that alone won't
> entirely fix your case.
>
> Alan

--
Andreas Bombe

2011-05-27 11:39:15

by Arnd Bergmann

[permalink] [raw]
Subject: Re: tty_lock held during transmit wait in close: still unresolved

On Friday 27 May 2011, Andreas Bombe wrote:
> > At any point you can show the code sleeps you can drop and retake the
> > tty mutex either side of it, so you should be able to do that in the
> > close timeout case. You may need to think about the order of locking with
> > the port mutex but I suspect you can drop and retake both there.
>
> …basically emulating the BKL semantics? Sounds more doable. I'll look
> into it.

If I understand it correctly, the problem is the msleep_interruptible()
in __uart_wait_until_sent(), right?

Note that this function may be called with or without the port mutex
held, depending on whether the caller is uart_close or uart_wait_until_sent.
The tricky part here will be making sure that you hold neither the
port mutex nor the tty_mutex while sleeping, and to always retake them
in the correct order (tty_mutex before port mutex).

My mistake here must have been that I assumed the timeout was relatively
short to not hurt when holding a mutex, since we already hold the port
mutex. I expected the wait time to be a fraction of a second as in the time
that it takes to send a few remaining characters, which would be acceptable,
unlike the 30 second sleep that you are seeing.

> Of course that means it has to be done individually in all drivers.

Right. Fortunately, we have now reduced the number of drivers a bit, by
moving some of them to staging or completely out of the kernel.

Some drivers call their wait_until_sent function directly from their
close function, some call it through tty_wait_until_sent, and some
actually do both.

Further, some of the drivers have a rather ugly part in them where we take
tty_lock() conditionally in wait_until_sent(), depending on whether the
current thread already holds it (i.e. when coming from ->close, not when
coming from ioctl).

Arnd

2011-05-27 12:18:29

by Jiri Slaby

[permalink] [raw]
Subject: Re: tty_lock held during transmit wait in close: still unresolved

> > At any point you can show the code sleeps you can drop and retake the
> > tty mutex either side of it, so you should be able to do that in the
> > close timeout case. You may need to think about the order of locking with
> > the port mutex but I suspect you can drop and retake both there.
>
> #basically emulating the BKL semantics? Sounds more doable. I'll look
> into it.

Actually the best would be to get rid of tty_lock completely. Here I
have a patch removing tty_lock from wait_until_sent path. I've been
using a kernel with the patch applied for about month without any
problems. But I have only 8250-based serial.

Why I ahven't sent it is that I need to re-investigate users whether
this will hurt. (I already did, but stopped in the middle to do other
things.) Mostly wait_until_sent hooks in uart drivers do simple reads
from device.

The patch is only the first step. When this patch is OK and works, we
can move the whole uart thing to use tty_port_open/close helpers.

Does the patch below fix the issue?

---
drivers/tty/serial/serial_core.c | 30 +++++++-----------------------
1 files changed, 7 insertions(+), 23 deletions(-)

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index db7912c..2cbf1bd 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -57,7 +57,7 @@ static struct lock_class_key port_lock_key;

static void uart_change_speed(struct tty_struct *tty, struct uart_state *state,
struct ktermios *old_termios);
-static void __uart_wait_until_sent(struct uart_port *port, int timeout);
+static void uart_wait_until_sent(struct tty_struct *tty, int timeout);
static void uart_change_pm(struct uart_state *state, int pm_state);

/*
@@ -1304,16 +1304,8 @@ static void uart_close(struct tty_struct *tty, struct file *filp)
tty->closing = 1;
spin_unlock_irqrestore(&port->lock, flags);

- if (port->closing_wait != ASYNC_CLOSING_WAIT_NONE) {
- /*
- * hack: open-coded tty_wait_until_sent to avoid
- * recursive tty_lock
- */
- long timeout = msecs_to_jiffies(port->closing_wait);
- if (wait_event_interruptible_timeout(tty->write_wait,
- !tty_chars_in_buffer(tty), timeout) >= 0)
- __uart_wait_until_sent(uport, timeout);
- }
+ if (port->closing_wait != ASYNC_CLOSING_WAIT_NONE)
+ tty_wait_until_sent(tty, msecs_to_jiffies(port->closing_wait));

/*
* At this point, we stop accepting input. To do this, we
@@ -1329,7 +1321,7 @@ static void uart_close(struct tty_struct *tty, struct file *filp)
* has completely drained; this is especially
* important if there is a transmit FIFO!
*/
- __uart_wait_until_sent(uport, uport->timeout);
+ uart_wait_until_sent(tty, uport->timeout);
}

uart_shutdown(tty, state);
@@ -1363,8 +1355,10 @@ done:
mutex_unlock(&port->mutex);
}

-static void __uart_wait_until_sent(struct uart_port *port, int timeout)
+static void uart_wait_until_sent(struct tty_struct *tty, int timeout)
{
+ struct uart_state *state = tty->driver_data;
+ struct uart_port *port = state->uart_port;
unsigned long char_time, expire;

if (port->type == PORT_UNKNOWN || port->fifosize == 0)
@@ -1416,16 +1410,6 @@ static void __uart_wait_until_sent(struct uart_port *port, int timeout)
}
}

-static void uart_wait_until_sent(struct tty_struct *tty, int timeout)
-{
- struct uart_state *state = tty->driver_data;
- struct uart_port *port = state->uart_port;
-
- tty_lock();
- __uart_wait_until_sent(port, timeout);
- tty_unlock();
-}
-
/*
* This is called with the BKL held in
* linux/drivers/char/tty_io.c:do_tty_hangup()
--
1.7.4.2

2011-05-27 13:53:26

by Arnd Bergmann

[permalink] [raw]
Subject: Re: tty_lock held during transmit wait in close: still unresolved

On Friday 27 May 2011, Jiri Slaby wrote:
> Actually the best would be to get rid of tty_lock completely. Here I
> have a patch removing tty_lock from wait_until_sent path. I've been
> using a kernel with the patch applied for about month without any
> problems. But I have only 8250-based serial.

I agree that this would be the best option. The only reason I had
for holding the tty_lock in wait_until_sent is that it the BKL
has been held there tradiditonally. I could not find anything that
would break without holding it, but I also didn't look very hard.

Arnd