2009-07-15 13:30:09

by Aneesh Kumar K.V

[permalink] [raw]
Subject: tty related hangs with 2.6.31-rc3

Hi,

I am finding tty related hangs with 2.6.31-rc3. This didn't happen
before. This happen when i close the emacs session. The /proc/<pid>/stack
content is below

[<c0362c5b>] process_echoes+0x2b/0x2e0
[<c03638cb>] n_tty_write_wakeup+0xb/0x40
[<c0360a88>] tty_wakeup+0x58/0x70
[<c0368347>] pty_write+0x67/0x70
[<c035f1cb>] tty_put_char+0x2b/0x40
[<c036291f>] do_output_char+0xef/0x200
[<c0362d4e>] process_echoes+0x11e/0x2e0
[<c0364292>] n_tty_receive_char+0x102/0x710
[<c0364ac0>] n_tty_receive_buf+0x220/0x410
[<c036059c>] tiocsti+0x8c/0xa0
[<c0361aca>] tty_ioctl+0x25a/0x310
[<c01e52c8>] vfs_ioctl+0x28/0x80
[<c01e54f4>] do_vfs_ioctl+0x64/0x1c0
[<c01e56a3>] sys_ioctl+0x53/0x70
[<c0102e3c>] sysenter_do_call+0x12/0x28
[<ffffffff>] 0xffffffff

-aneesh


2009-07-15 14:02:21

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: tty related hangs with 2.6.31-rc3

On Wed, Jul 15, 2009 at 06:59:56PM +0530, Aneesh Kumar K.V wrote:
> Hi,
>
> I am finding tty related hangs with 2.6.31-rc3. This didn't happen
> before. This happen when i close the emacs session. The /proc/<pid>/stack
> content is below
>
> [<c0362c5b>] process_echoes+0x2b/0x2e0
> [<c03638cb>] n_tty_write_wakeup+0xb/0x40
> [<c0360a88>] tty_wakeup+0x58/0x70
> [<c0368347>] pty_write+0x67/0x70
> [<c035f1cb>] tty_put_char+0x2b/0x40
> [<c036291f>] do_output_char+0xef/0x200
> [<c0362d4e>] process_echoes+0x11e/0x2e0
> [<c0364292>] n_tty_receive_char+0x102/0x710
> [<c0364ac0>] n_tty_receive_buf+0x220/0x410
> [<c036059c>] tiocsti+0x8c/0xa0
> [<c0361aca>] tty_ioctl+0x25a/0x310
> [<c01e52c8>] vfs_ioctl+0x28/0x80
> [<c01e54f4>] do_vfs_ioctl+0x64/0x1c0
> [<c01e56a3>] sys_ioctl+0x53/0x70
> [<c0102e3c>] sysenter_do_call+0x12/0x28
> [<ffffffff>] 0xffffffff
>

Another hang

[<c0362c5b>] process_echoes+0x2b/0x2e0
[<c03639ff>] n_tty_write+0x6f/0x2b0
[<c0361cb2>] tty_write+0x132/0x1e0
[<c01d8966>] vfs_write+0x96/0x190
[<c01d8b1d>] sys_write+0x3d/0x70
[<c0102e3c>] sysenter_do_call+0x12/0x28
[<ffffffff>] 0xffffffff

-aneesh

2009-07-15 15:10:44

by Alan

[permalink] [raw]
Subject: Re: tty related hangs with 2.6.31-rc3

On Wed, 15 Jul 2009 18:59:56 +0530
"Aneesh Kumar K.V" <[email protected]> wrote:

> Hi,
>
> I am finding tty related hangs with 2.6.31-rc3. This didn't happen
> before. This happen when i close the emacs session. The /proc/<pid>/stack
> content is below

Thanks - nice clear trace. Looks like a bug in the n_tty locking changes
from a few releases back that the pty changes are triggering.

Basically process_echoes calls tty_put_char which if it thinks a device
queue was full and now has a bit of space will call tty_wakeup which can
call process_echoes and thus deadlock. With a physical serial device we
will even sometimes call tty_wakeup() from the serial transmit path
which is an irq path (which makes this doubly wrong as it then takes
mutexes)

Emacs presumably uses fasync which is the trigger for this. You need the
right timing combined with the new pty behaviour combined with FASYNC to
trigger it.


> [<c0362c5b>] process_echoes+0x2b/0x2e0
[which tries to take the lock we already hold (end A)]
> [<c03638cb>] n_tty_write_wakeup+0xb/0x40
[which processes our ldisc wakeup (end A)]
> [<c0360a88>] tty_wakeup+0x58/0x70
[which wakes up our tty (end A)]
> [<c0368347>] pty_write+0x67/0x70
[our write method is for tty/pty pairs end A output, queued to end B]
> [<c035f1cb>] tty_put_char+0x2b/0x40
[calls tty_put_char to write the echoed byte to end A output]
> [<c036291f>] do_output_char+0xef/0x200
[the fake typed character is echoed back towards end B]
> [<c0362d4e>] process_echoes+0x11e/0x2e0
[tries to process echo characters on end A]
> [<c0364292>] n_tty_receive_char+0x102/0x710
[ receives a byte that we've faked typing to end A input]
> [<c0364ac0>] n_tty_receive_buf+0x220/0x410
[ioctl method calls the ld->ops->receive_buf for n_tty (unsafely but that
bug is old]
> [<c036059c>] tiocsti+0x8c/0xa0
> [<c0361aca>] tty_ioctl+0x25a/0x310
> [<c01e52c8>] vfs_ioctl+0x28/0x80
> [<c01e54f4>] do_vfs_ioctl+0x64/0x1c0
> [<c01e56a3>] sys_ioctl+0x53/0x70
> [<c0102e3c>] sysenter_do_call+0x12/0x28
> [<ffffffff>] 0xffffffff

Do you have the lock validator enabled and if so did it have anything
useful to report ?

Please try the following. I suspect this is the real fix:

n_tty: Fix echo race

From: Alan Cox <[email protected]>

If a tty in N_TTY mode with echo enabled manages to get itself into a state
where
- echo characters are pending
- FASYNC is enabled
- tty_write_wakeup is called from either
- a device write path (pty)
- an IRQ (serial)

then it either deadlocks or explodes taking a mutex in the IRQ path.

On the serial side it is almost impossible to reproduce because you have to
go from a full serial port to a near empty one with echo characters
pending. The pty case happens to have become possible to trigger using
emacs and ptys, the pty changes having created a scenario which shows up
this bug.

The code path is

n_tty:process_echoes() (takes mutex)
tty_io:tty_put_char()
pty:pty_write (or serial paths)
tty_wakeup (from pty_write or serial IRQ)
n_tty_write_wakeup()
process_echoes()
*KABOOM*

Signed-off-by: Alan Cox <[email protected]>
---

drivers/char/n_tty.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)


diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 94a5d50..ff47907 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -1331,9 +1331,6 @@ handle_newline:

static void n_tty_write_wakeup(struct tty_struct *tty)
{
- /* Write out any echoed characters that are still pending */
- process_echoes(tty);
-
if (tty->fasync && test_and_clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags))
kill_fasync(&tty->fasync, SIGIO, POLL_OUT);
}

2009-07-15 16:53:17

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: tty related hangs with 2.6.31-rc3

On Wed, Jul 15, 2009 at 04:11:42PM +0100, Alan Cox wrote:
> On Wed, 15 Jul 2009 18:59:56 +0530
> "Aneesh Kumar K.V" <[email protected]> wrote:
>
> > Hi,
> >
> > I am finding tty related hangs with 2.6.31-rc3. This didn't happen
> > before. This happen when i close the emacs session. The /proc/<pid>/stack
> > content is below
>
> Thanks - nice clear trace. Looks like a bug in the n_tty locking changes
> from a few releases back that the pty changes are triggering.
>
> Basically process_echoes calls tty_put_char which if it thinks a device
> queue was full and now has a bit of space will call tty_wakeup which can
> call process_echoes and thus deadlock. With a physical serial device we
> will even sometimes call tty_wakeup() from the serial transmit path
> which is an irq path (which makes this doubly wrong as it then takes
> mutexes)
>
> Emacs presumably uses fasync which is the trigger for this. You need the
> right timing combined with the new pty behaviour combined with FASYNC to
> trigger it.
>
>
> > [<c0362c5b>] process_echoes+0x2b/0x2e0
> [which tries to take the lock we already hold (end A)]
> > [<c03638cb>] n_tty_write_wakeup+0xb/0x40
> [which processes our ldisc wakeup (end A)]
> > [<c0360a88>] tty_wakeup+0x58/0x70
> [which wakes up our tty (end A)]
> > [<c0368347>] pty_write+0x67/0x70
> [our write method is for tty/pty pairs end A output, queued to end B]
> > [<c035f1cb>] tty_put_char+0x2b/0x40
> [calls tty_put_char to write the echoed byte to end A output]
> > [<c036291f>] do_output_char+0xef/0x200
> [the fake typed character is echoed back towards end B]
> > [<c0362d4e>] process_echoes+0x11e/0x2e0
> [tries to process echo characters on end A]
> > [<c0364292>] n_tty_receive_char+0x102/0x710
> [ receives a byte that we've faked typing to end A input]
> > [<c0364ac0>] n_tty_receive_buf+0x220/0x410
> [ioctl method calls the ld->ops->receive_buf for n_tty (unsafely but that
> bug is old]
> > [<c036059c>] tiocsti+0x8c/0xa0
> > [<c0361aca>] tty_ioctl+0x25a/0x310
> > [<c01e52c8>] vfs_ioctl+0x28/0x80
> > [<c01e54f4>] do_vfs_ioctl+0x64/0x1c0
> > [<c01e56a3>] sys_ioctl+0x53/0x70
> > [<c0102e3c>] sysenter_do_call+0x12/0x28
> > [<ffffffff>] 0xffffffff
>
> Do you have the lock validator enabled and if so did it have anything
> useful to report ?
>
> Please try the following. I suspect this is the real fix:
>

With limited testing i am not seeing the hang any more. I will keep running
with the patch and report back if i hit the hang again.

-aneesh

2009-07-16 17:57:24

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: tty related hangs with 2.6.31-rc3

On Thu, Jul 16, 2009 at 02:19:02AM +0530, Aneesh Kumar K.V wrote:
> Hi Alan,
>
> On Wed, Jul 15, 2009 at 10:23:08PM +0530, Aneesh Kumar K.V wrote:
> > On Wed, Jul 15, 2009 at 04:11:42PM +0100, Alan Cox wrote:
> > > On Wed, 15 Jul 2009 18:59:56 +0530
> > > "Aneesh Kumar K.V" <[email protected]> wrote:
> > >
> > > > Hi,
> >
>
> I am finding a new error with emacs now. Now i am not sure how to explain the problem
> as a nice bug report. But anyhow i will try. emacs support a mode to compile different
> prg. So to reproduce write a small program something like below
>
> main()
> {
> printf("%d\n", i);
> }
>
> Now in emacs say ALT-x compile
>
> use cc a.c command to compile. It is supposed to show the error messages in the compile
> window. With 2.6.30 it works fine. Where as 2.6.31-rc3 it doesn't list me any error.
> My limitted understanding about emacs compile window is, it looks for error messages
> using pattern matching from the compile ouput. I guess it is missing some of the
> compile message output.
>
>
> sorry that the report is sparse. If you need any specific inforation let me know.

reverting the patch d945cb9cce20ac7143c2de8d88b187f62db99bdc fix the bug for me.
So i guess that would imply it is pty changes related

-aneesh

2009-07-17 11:06:49

by Takashi Iwai

[permalink] [raw]
Subject: Re: tty related hangs with 2.6.31-rc3

At Wed, 15 Jul 2009 16:11:42 +0100,
Alan Cox wrote:
>
> On Wed, 15 Jul 2009 18:59:56 +0530
> "Aneesh Kumar K.V" <[email protected]> wrote:
>
> > Hi,
> >
> > I am finding tty related hangs with 2.6.31-rc3. This didn't happen
> > before. This happen when i close the emacs session. The /proc/<pid>/stack
> > content is below
>
> Thanks - nice clear trace. Looks like a bug in the n_tty locking changes
> from a few releases back that the pty changes are triggering.
>
> Basically process_echoes calls tty_put_char which if it thinks a device
> queue was full and now has a bit of space will call tty_wakeup which can
> call process_echoes and thus deadlock. With a physical serial device we
> will even sometimes call tty_wakeup() from the serial transmit path
> which is an irq path (which makes this doubly wrong as it then takes
> mutexes)
>
> Emacs presumably uses fasync which is the trigger for this. You need the
> right timing combined with the new pty behaviour combined with FASYNC to
> trigger it.
>
>
> > [<c0362c5b>] process_echoes+0x2b/0x2e0
> [which tries to take the lock we already hold (end A)]
> > [<c03638cb>] n_tty_write_wakeup+0xb/0x40
> [which processes our ldisc wakeup (end A)]
> > [<c0360a88>] tty_wakeup+0x58/0x70
> [which wakes up our tty (end A)]
> > [<c0368347>] pty_write+0x67/0x70
> [our write method is for tty/pty pairs end A output, queued to end B]
> > [<c035f1cb>] tty_put_char+0x2b/0x40
> [calls tty_put_char to write the echoed byte to end A output]
> > [<c036291f>] do_output_char+0xef/0x200
> [the fake typed character is echoed back towards end B]
> > [<c0362d4e>] process_echoes+0x11e/0x2e0
> [tries to process echo characters on end A]
> > [<c0364292>] n_tty_receive_char+0x102/0x710
> [ receives a byte that we've faked typing to end A input]
> > [<c0364ac0>] n_tty_receive_buf+0x220/0x410
> [ioctl method calls the ld->ops->receive_buf for n_tty (unsafely but that
> bug is old]
> > [<c036059c>] tiocsti+0x8c/0xa0
> > [<c0361aca>] tty_ioctl+0x25a/0x310
> > [<c01e52c8>] vfs_ioctl+0x28/0x80
> > [<c01e54f4>] do_vfs_ioctl+0x64/0x1c0
> > [<c01e56a3>] sys_ioctl+0x53/0x70
> > [<c0102e3c>] sysenter_do_call+0x12/0x28
> > [<ffffffff>] 0xffffffff
>
> Do you have the lock validator enabled and if so did it have anything
> useful to report ?

I hit the same bug. The log is attached below.


> Please try the following. I suspect this is the real fix:
>
> n_tty: Fix echo race
>
> From: Alan Cox <[email protected]>
>
> If a tty in N_TTY mode with echo enabled manages to get itself into a state
> where
> - echo characters are pending
> - FASYNC is enabled
> - tty_write_wakeup is called from either
> - a device write path (pty)
> - an IRQ (serial)
>
> then it either deadlocks or explodes taking a mutex in the IRQ path.
>
> On the serial side it is almost impossible to reproduce because you have to
> go from a full serial port to a near empty one with echo characters
> pending. The pty case happens to have become possible to trigger using
> emacs and ptys, the pty changes having created a scenario which shows up
> this bug.
>
> The code path is
>
> n_tty:process_echoes() (takes mutex)
> tty_io:tty_put_char()
> pty:pty_write (or serial paths)
> tty_wakeup (from pty_write or serial IRQ)
> n_tty_write_wakeup()
> process_echoes()
> *KABOOM*
>
> Signed-off-by: Alan Cox <[email protected]>

The patch seems to fix the problem indeed.
Thanks!

Tested-by: Takashi Iwai <[email protected]>


Takashi

=============================================
[ INFO: possible recursive locking detected ]
2.6.31-rc3-test #43
---------------------------------------------
events/3/18 is trying to acquire lock:
(&tty->output_lock){+.+...}, at: [<ffffffff81245af7>] process_echoes+0x45/0x2bf

but task is already holding lock:
(&tty->output_lock){+.+...}, at: [<ffffffff81245af7>] process_echoes+0x45/0x2bf

other info that might help us debug this:
4 locks held by events/3/18:
#0: (events){+.+.+.}, at: [<ffffffff8107000f>] worker_thread+0x1c5/0x330
#1: (&(&tty->buf.work)->work){+.+...}, at: [<ffffffff8107000f>] worker_thread+0x1c5/0x330
#2: (&tty->output_lock){+.+...}, at: [<ffffffff81245af7>] process_echoes+0x45/0x2bf
#3: (&tty->echo_lock){+.+...}, at: [<ffffffff81245b0c>] process_echoes+0x5a/0x2bf

stack backtrace:
Pid: 18, comm: events/3 Not tainted 2.6.31-rc3-test #43
Call Trace:
[<ffffffff8108b1fb>] __lock_acquire+0x14d6/0x157d
[<ffffffff810879f7>] ? save_trace+0x4e/0xc0
[<ffffffff81087af9>] ? add_lock_to_list+0x90/0xeb
[<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
[<ffffffff8108b390>] lock_acquire+0xee/0x12e
[<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
[<ffffffff81249a4f>] ? tty_ldisc_try+0x2b/0x6f
[<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
[<ffffffff81334b6f>] mutex_lock_nested+0x66/0x2f2
[<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
[<ffffffff81088b6c>] ? mark_held_locks+0x65/0x9b
[<ffffffff81336926>] ? _spin_unlock_irqrestore+0x55/0x7a
[<ffffffff81245af7>] process_echoes+0x45/0x2bf
[<ffffffff81245d91>] n_tty_write_wakeup+0x20/0x6a
[<ffffffff8124402e>] tty_wakeup+0x44/0x84
[<ffffffff8124bfcb>] pty_write+0x62/0x82
[<ffffffff81245b0c>] ? process_echoes+0x5a/0x2bf
[<ffffffff81241ba8>] tty_put_char+0x3c/0x52
[<ffffffff81245a84>] do_output_char+0x1ce/0x1fc
[<ffffffff81245cbd>] process_echoes+0x20b/0x2bf
[<ffffffff813345ec>] ? mutex_unlock+0x1c/0x32
[<ffffffff8124783d>] n_tty_receive_buf+0x33e/0xf1e
[<ffffffff81088b6c>] ? mark_held_locks+0x65/0x9b
[<ffffffff81336926>] ? _spin_unlock_irqrestore+0x55/0x7a
[<ffffffff81088e78>] ? trace_hardirqs_on_caller+0x124/0x15e
[<ffffffff81088ed2>] ? trace_hardirqs_on+0x20/0x36
[<ffffffff8124ade7>] flush_to_ldisc+0x119/0x1c0
[<ffffffff8124acce>] ? flush_to_ldisc+0x0/0x1c0
[<ffffffff81070061>] worker_thread+0x217/0x330
[<ffffffff8107000f>] ? worker_thread+0x1c5/0x330
[<ffffffff81088e78>] ? trace_hardirqs_on_caller+0x124/0x15e
[<ffffffff81075d40>] ? autoremove_wake_function+0x0/0x5a
[<ffffffff8106fe4a>] ? worker_thread+0x0/0x330
[<ffffffff81075900>] kthread+0x94/0x9c
[<ffffffff8101402a>] child_rip+0xa/0x20
[<ffffffff81013990>] ? restore_args+0x0/0x30
[<ffffffff8107586c>] ? kthread+0x0/0x9c
[<ffffffff81014020>] ? child_rip+0x0/0x20