Message-ID: <1491462281.2815.47.camel@neuling.org>
Subject: tty crash in tty_ldisc_receive_buf()
From: Michael Neuling <mikey@neuling.org>
To: Al Viro <viro@ZenIV.linux.org.uk>, johan Hovold <johan@kernel.org>,
        Peter Hurley <peter@hurleysoftware.com>,
        Wang YanQing <udknight@gmail.com>,
        Alexander Popov <alex.popov@linux.com>, Rob Herring <robh@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
        Dmitry Vyukov <dvyukov@google.com>, benh <benh@kernel.crashing.org>,
        LKML <linux-kernel@vger.kernel.org>
Date: Thu, 06 Apr 2017 17:04:41 +1000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8BIT
Mime-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2560
Lines: 65

Hi all,

We are seeing the following crash (in linux-next but has been around since at
least v4.10). 

[  417.514499] Unable to handle kernel paging request for data at address 0x00002260
[  417.515361] Faulting instruction address: 0xc0000000006fad80
cpu 0x15: Vector: 300 (Data Access) at [c00000799411f890]
    pc: c0000000006fad80: n_tty_receive_buf_common+0xc0/0xbd0
    lr: c0000000006fad5c: n_tty_receive_buf_common+0x9c/0xbd0
    sp: c00000799411fb10
   msr: 900000000280b033
   dar: 2260
 dsisr: 40000000
  current = 0xc0000079675d1e00
  paca    = 0xc00000000fb0d200	 softe: 0	 irq_happened: 0x01
    pid   = 5, comm = kworker/u56:0
Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 2017
enter ? for help
[c00000799411fbe0] c0000000006ff968 tty_ldisc_receive_buf+0x48/0xe0
[c00000799411fc10] c0000000007009d8 tty_port_default_receive_buf+0x68/0xe0
[c00000799411fc50] c0000000006ffce4 flush_to_ldisc+0x114/0x130
[c00000799411fca0] c00000000010a0fc process_one_work+0x1ec/0x580
[c00000799411fd30] c00000000010a528 worker_thread+0x98/0x5d0
[c00000799411fdc0] c00000000011343c kthread+0x16c/0x1b0
[c00000799411fe30] c00000000000b4e8 ret_from_kernel_thread+0x5c/0x74

It seems the null ptr deref is in n_tty_receive_buf_common() where we do:

		size_t tail = smp_load_acquire(&ldata->read_tail);

ldata is NULL.

We see this usually on boot but can also see it if we kill a getty attached to
tty (which is then respawned by systemd).  It seems like we are flushing data to
a tty at the same time as it's being torn down and restarted.

I did try the below patch which avoids the crash but locks up one of the CPUs. I
guess the data never gets flushed if we say nothing is processed.

This is on powerpc but has also been reported by parisc.

I'm not at all familiar with the tty layer and looking at the locks, mutexes,
semaphores and reference counting in there scares the hell out of me. 

If anyone has an idea, I'm happy to try a patch.

Regards,
Mikey

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index bdf0e6e899..99dd757aa4 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1673,6 +1673,10 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 
 	down_read(&tty->termios_rwsem);
 
+	/* This probably shouldn't happen, but return 0 data processed */
+	if (!ldata)
+		return 0;
+
 	while (1) {
 		/*
 		 * When PARMRK is set, each input char may take up to 3 chars