Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751853AbcDOSVj (ORCPT ); Fri, 15 Apr 2016 14:21:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41291 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751761AbcDOSV3 (ORCPT ); Fri, 15 Apr 2016 14:21:29 -0400 Date: Fri, 15 Apr 2016 19:19:50 +0200 From: Oleg Nesterov To: Peter Hurley , Greg Kroah-Hartman , Jiri Slaby Cc: Milos Vyletel , Stanislav Kozina , linux-kernel@vger.kernel.org Subject: Q: tty_open() && hangup Message-ID: <20160415171950.GA17586@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3138 Lines: 100 Hi, I am fingting with obscure pty bug in rhel6 which _might be_ explained by some hangup/reopen races, and this motivated me to look at upstream code which I can't understand too ;) Lets look at tty_open(), 2112 tty = tty_open_current_tty(device, filp); 2113 if (!tty) 2114 tty = tty_open_by_driver(device, inode, filp); 2115 2116 if (IS_ERR(tty)) { 2117 tty_free_file(filp); 2118 retval = PTR_ERR(tty); 2119 if (retval != -EAGAIN || signal_pending(current)) 2120 return retval; 2121 schedule(); 2122 goto retry_open; And why do we need this schedule() above? It is nop in TASK_RUNNING. If we need it for non-preempt kernels to avoid some sort of livelock this can't work if rt_task(current) == T. If we just want to add a preemption point then perhaps we should use cond_resched/might_resched ? I am not saying this is wrong, but looks confusing. 2130 if (tty->ops->open) 2131 retval = tty->ops->open(tty, filp); 2132 else 2133 retval = -ENODEV; 2134 filp->f_flags = saved_flags; 2135 2136 if (retval) { 2137 tty_debug_hangup(tty, "open error %d, releasing\n", retval); 2138 2139 tty_unlock(tty); /* need to call tty_release without BTM */ 2140 tty_release(inode, filp); 2141 if (retval != -ERESTARTSYS) 2142 return retval; 2143 2144 if (signal_pending(current)) 2145 return retval; So we can only get here if tty->ops->open returns -ERESTARTSYS without signal_pending(). This doesn't look good. But OK, lets forget this. 2147 schedule(); Again, this looks confusing. 2148 /* 2149 * Need to reset f_op in case a hangup happened. 2150 */ 2151 if (tty_hung_up_p(filp)) 2152 filp->f_op = &tty_fops; And this is called after tty_add_file() which makes this filp visible to __tty_hangup(), and without tty_lock(). How can we trust this check? __tty_hangup() can set hung_up_tty_fops() right after tty_hung_up_p() returns F. Don't we need something like below? Otherwise afaics tty_open() can actually succeed (and clear TTY_HUPPED) after restart, but return with tty_hung_up_p(filp) == T. Oleg. --- x/drivers/tty/tty_io.c +++ x/drivers/tty/tty_io.c @@ -2122,6 +2122,13 @@ retry_open: goto retry_open; } + /* + * This is only possible after retry_open, we drop tty_lock() + * without tty_del_file(). + */ + if (tty_hung_up_p(filp)) + filp->f_op = &tty_fops; + tty_add_file(tty, filp); check_tty_count(tty, __func__); @@ -2145,11 +2152,6 @@ retry_open: return retval; schedule(); - /* - * Need to reset f_op in case a hangup happened. - */ - if (tty_hung_up_p(filp)) - filp->f_op = &tty_fops; goto retry_open; } clear_bit(TTY_HUPPED, &tty->flags);