Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933969AbbLPHAh (ORCPT ); Wed, 16 Dec 2015 02:00:37 -0500 Received: from TYO200.gate.nec.co.jp ([210.143.35.50]:41293 "EHLO tyo200.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932197AbbLPHAf convert rfc822-to-8bit (ORCPT ); Wed, 16 Dec 2015 02:00:35 -0500 X-Greylist: delayed 1600 seconds by postgrey-1.27 at vger.kernel.org; Wed, 16 Dec 2015 02:00:35 EST From: Junichi Nomura To: "peter@hurleysoftware.com" CC: "bhe@redhat.com" , "gregkh@linuxfoundation.org" , "jslaby@suse.com" , "linux-kernel@vger.kernel.org" Subject: v4.4-rc1: /dev/console open fails with -EIO Thread-Topic: v4.4-rc1: /dev/console open fails with -EIO Thread-Index: AQHRN8t8WZPj/ktmK0+lTqtCjEoO1Q== Date: Wed, 16 Dec 2015 06:32:08 +0000 Message-ID: <20151216063206.GA9866@xzibit.linux.bs1.fc.nec.co.jp> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.34.125.85] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: <5989E2A0A90027418E793F523094EA42@gisp.nec.co.jp> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2842 Lines: 89 Since kernel v4.4-rc1, kdump capture service with Fedora23 / RHEL7.2 almost always fails on my test system which uses serial console. It used to work fine until kernel v4.3. Kdump fails with an error like this: kdump.sh[1040]: /bin/kdump.sh: line 8: /dev/console: Input/output error The line 8 of kdump.sh is doing this: exec &> /dev/console (http://pkgs.fedoraproject.org/cgit/kexec-tools.git/tree/dracut-kdump.sh) and the EIO is returned by this code in tty_reopen(): if (!tty->count) return -EIO; Bisection tells that commit 79c1faa4511e ("tty: Remove tty_wait_until_sent_from_close()") is the first bad commit. Actually, after reverting the commit, kdump capture starts working again. Open of /dev/console used to return -EIO when it races with close. (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/554172/comments/245) But the commit seems widening the race window. Before the commit: tty_release() tty_lock(tty) tty->ops->close(tty, filp) tty_unlock(tty) tty_wait_until_sent() // the window starts from here tty_lock(tty) decrement tty->count tty_unlock(tty) (releasing tty if count became zero) After the commit tty_release() // the window starts from here tty_lock(tty) tty->ops->close(tty, filp) tty_wait_until_sent() decrement tty->count tty_unlock(tty) (releasing tty if count became zero) While it might be possible for user space to cope with the problem by retrying open(), there is no clue whether and how long it should. Also current situation makes shell scripting like the above kdump.sh fragile for this sort of timing change. How about retrying tty_open in kernel instead, like the attached patch? If !tty->count in tty_reopen() means the race has happened, that seems reasonable. --- Jun'ichi Nomura, NEC Corporation diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index bcc8e1e..070ea66 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1462,8 +1462,9 @@ static int tty_reopen(struct tty_struct *tty) { struct tty_driver *driver = tty->driver; + /* We cannot re-open tty which is being released. */ if (!tty->count) - return -EIO; + return -ERESTARTSYS; if (driver->type == TTY_DRIVER_TYPE_PTY && driver->subtype == PTY_TYPE_MASTER) @@ -2087,6 +2088,11 @@ retry_open: if (IS_ERR(tty)) { retval = PTR_ERR(tty); + if (retval == -ERESTARTSYS && !signal_pending(current)) { + tty_free_file(filp); + schedule(); + goto retry_open; + } goto err_file; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/