Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934426AbZGQLGt (ORCPT ); Fri, 17 Jul 2009 07:06:49 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S934410AbZGQLGs (ORCPT ); Fri, 17 Jul 2009 07:06:48 -0400 Received: from cantor2.suse.de ([195.135.220.15]:46109 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934393AbZGQLGr (ORCPT ); Fri, 17 Jul 2009 07:06:47 -0400 Date: Fri, 17 Jul 2009 13:06:45 +0200 Message-ID: From: Takashi Iwai To: Alan Cox Cc: "Aneesh Kumar K.V" , linux-kernel@vger.kernel.org Subject: Re: tty related hangs with 2.6.31-rc3 In-Reply-To: <20090715161142.55083da7@lxorguk.ukuu.org.uk> References: <20090715132956.GA10004@skywalker> <20090715161142.55083da7@lxorguk.ukuu.org.uk> User-Agent: Wanderlust/2.12.0 (Your Wildest Dreams) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (=?ISO-8859-4?Q?Sanj=F2?=) APEL/10.6 Emacs/22.3 (x86_64-suse-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6509 Lines: 165 At Wed, 15 Jul 2009 16:11:42 +0100, Alan Cox wrote: > > On Wed, 15 Jul 2009 18:59:56 +0530 > "Aneesh Kumar K.V" wrote: > > > Hi, > > > > I am finding tty related hangs with 2.6.31-rc3. This didn't happen > > before. This happen when i close the emacs session. The /proc//stack > > content is below > > Thanks - nice clear trace. Looks like a bug in the n_tty locking changes > from a few releases back that the pty changes are triggering. > > Basically process_echoes calls tty_put_char which if it thinks a device > queue was full and now has a bit of space will call tty_wakeup which can > call process_echoes and thus deadlock. With a physical serial device we > will even sometimes call tty_wakeup() from the serial transmit path > which is an irq path (which makes this doubly wrong as it then takes > mutexes) > > Emacs presumably uses fasync which is the trigger for this. You need the > right timing combined with the new pty behaviour combined with FASYNC to > trigger it. > > > > [] process_echoes+0x2b/0x2e0 > [which tries to take the lock we already hold (end A)] > > [] n_tty_write_wakeup+0xb/0x40 > [which processes our ldisc wakeup (end A)] > > [] tty_wakeup+0x58/0x70 > [which wakes up our tty (end A)] > > [] pty_write+0x67/0x70 > [our write method is for tty/pty pairs end A output, queued to end B] > > [] tty_put_char+0x2b/0x40 > [calls tty_put_char to write the echoed byte to end A output] > > [] do_output_char+0xef/0x200 > [the fake typed character is echoed back towards end B] > > [] process_echoes+0x11e/0x2e0 > [tries to process echo characters on end A] > > [] n_tty_receive_char+0x102/0x710 > [ receives a byte that we've faked typing to end A input] > > [] n_tty_receive_buf+0x220/0x410 > [ioctl method calls the ld->ops->receive_buf for n_tty (unsafely but that > bug is old] > > [] tiocsti+0x8c/0xa0 > > [] tty_ioctl+0x25a/0x310 > > [] vfs_ioctl+0x28/0x80 > > [] do_vfs_ioctl+0x64/0x1c0 > > [] sys_ioctl+0x53/0x70 > > [] sysenter_do_call+0x12/0x28 > > [] 0xffffffff > > Do you have the lock validator enabled and if so did it have anything > useful to report ? I hit the same bug. The log is attached below. > Please try the following. I suspect this is the real fix: > > n_tty: Fix echo race > > From: Alan Cox > > If a tty in N_TTY mode with echo enabled manages to get itself into a state > where > - echo characters are pending > - FASYNC is enabled > - tty_write_wakeup is called from either > - a device write path (pty) > - an IRQ (serial) > > then it either deadlocks or explodes taking a mutex in the IRQ path. > > On the serial side it is almost impossible to reproduce because you have to > go from a full serial port to a near empty one with echo characters > pending. The pty case happens to have become possible to trigger using > emacs and ptys, the pty changes having created a scenario which shows up > this bug. > > The code path is > > n_tty:process_echoes() (takes mutex) > tty_io:tty_put_char() > pty:pty_write (or serial paths) > tty_wakeup (from pty_write or serial IRQ) > n_tty_write_wakeup() > process_echoes() > *KABOOM* > > Signed-off-by: Alan Cox The patch seems to fix the problem indeed. Thanks! Tested-by: Takashi Iwai Takashi ============================================= [ INFO: possible recursive locking detected ] 2.6.31-rc3-test #43 --------------------------------------------- events/3/18 is trying to acquire lock: (&tty->output_lock){+.+...}, at: [] process_echoes+0x45/0x2bf but task is already holding lock: (&tty->output_lock){+.+...}, at: [] process_echoes+0x45/0x2bf other info that might help us debug this: 4 locks held by events/3/18: #0: (events){+.+.+.}, at: [] worker_thread+0x1c5/0x330 #1: (&(&tty->buf.work)->work){+.+...}, at: [] worker_thread+0x1c5/0x330 #2: (&tty->output_lock){+.+...}, at: [] process_echoes+0x45/0x2bf #3: (&tty->echo_lock){+.+...}, at: [] process_echoes+0x5a/0x2bf stack backtrace: Pid: 18, comm: events/3 Not tainted 2.6.31-rc3-test #43 Call Trace: [] __lock_acquire+0x14d6/0x157d [] ? save_trace+0x4e/0xc0 [] ? add_lock_to_list+0x90/0xeb [] ? process_echoes+0x45/0x2bf [] lock_acquire+0xee/0x12e [] ? process_echoes+0x45/0x2bf [] ? tty_ldisc_try+0x2b/0x6f [] ? process_echoes+0x45/0x2bf [] mutex_lock_nested+0x66/0x2f2 [] ? process_echoes+0x45/0x2bf [] ? mark_held_locks+0x65/0x9b [] ? _spin_unlock_irqrestore+0x55/0x7a [] process_echoes+0x45/0x2bf [] n_tty_write_wakeup+0x20/0x6a [] tty_wakeup+0x44/0x84 [] pty_write+0x62/0x82 [] ? process_echoes+0x5a/0x2bf [] tty_put_char+0x3c/0x52 [] do_output_char+0x1ce/0x1fc [] process_echoes+0x20b/0x2bf [] ? mutex_unlock+0x1c/0x32 [] n_tty_receive_buf+0x33e/0xf1e [] ? mark_held_locks+0x65/0x9b [] ? _spin_unlock_irqrestore+0x55/0x7a [] ? trace_hardirqs_on_caller+0x124/0x15e [] ? trace_hardirqs_on+0x20/0x36 [] flush_to_ldisc+0x119/0x1c0 [] ? flush_to_ldisc+0x0/0x1c0 [] worker_thread+0x217/0x330 [] ? worker_thread+0x1c5/0x330 [] ? trace_hardirqs_on_caller+0x124/0x15e [] ? autoremove_wake_function+0x0/0x5a [] ? worker_thread+0x0/0x330 [] kthread+0x94/0x9c [] child_rip+0xa/0x20 [] ? restore_args+0x0/0x30 [] ? kthread+0x0/0x9c [] ? child_rip+0x0/0x20 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/