Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756670AbdDFHEv convert rfc822-to-8bit (ORCPT ); Thu, 6 Apr 2017 03:04:51 -0400 Received: from ozlabs.org ([103.22.144.67]:55179 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756352AbdDFHEn (ORCPT ); Thu, 6 Apr 2017 03:04:43 -0400 Message-ID: <1491462281.2815.47.camel@neuling.org> Subject: tty crash in tty_ldisc_receive_buf() From: Michael Neuling To: Al Viro , johan Hovold , Peter Hurley , Wang YanQing , Alexander Popov , Rob Herring Cc: Mikulas Patocka , Dmitry Vyukov , benh , LKML Date: Thu, 06 Apr 2017 17:04:41 +1000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Mailer: Evolution 3.22.3-0ubuntu0.1 Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2560 Lines: 65 Hi all, We are seeing the following crash (in linux-next but has been around since at least v4.10). [  417.514499] Unable to handle kernel paging request for data at address 0x00002260 [  417.515361] Faulting instruction address: 0xc0000000006fad80 cpu 0x15: Vector: 300 (Data Access) at [c00000799411f890]     pc: c0000000006fad80: n_tty_receive_buf_common+0xc0/0xbd0     lr: c0000000006fad5c: n_tty_receive_buf_common+0x9c/0xbd0     sp: c00000799411fb10    msr: 900000000280b033    dar: 2260  dsisr: 40000000   current = 0xc0000079675d1e00   paca    = 0xc00000000fb0d200  softe: 0  irq_happened: 0x01     pid   = 5, comm = kworker/u56:0 Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 2017 enter ? for help [c00000799411fbe0] c0000000006ff968 tty_ldisc_receive_buf+0x48/0xe0 [c00000799411fc10] c0000000007009d8 tty_port_default_receive_buf+0x68/0xe0 [c00000799411fc50] c0000000006ffce4 flush_to_ldisc+0x114/0x130 [c00000799411fca0] c00000000010a0fc process_one_work+0x1ec/0x580 [c00000799411fd30] c00000000010a528 worker_thread+0x98/0x5d0 [c00000799411fdc0] c00000000011343c kthread+0x16c/0x1b0 [c00000799411fe30] c00000000000b4e8 ret_from_kernel_thread+0x5c/0x74 It seems the null ptr deref is in n_tty_receive_buf_common() where we do: size_t tail = smp_load_acquire(&ldata->read_tail); ldata is NULL. We see this usually on boot but can also see it if we kill a getty attached to tty (which is then respawned by systemd). It seems like we are flushing data to a tty at the same time as it's being torn down and restarted. I did try the below patch which avoids the crash but locks up one of the CPUs. I guess the data never gets flushed if we say nothing is processed. This is on powerpc but has also been reported by parisc. I'm not at all familiar with the tty layer and looking at the locks, mutexes, semaphores and reference counting in there scares the hell out of me.  If anyone has an idea, I'm happy to try a patch. Regards, Mikey diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index bdf0e6e899..99dd757aa4 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1673,6 +1673,10 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, down_read(&tty->termios_rwsem); + /* This probably shouldn't happen, but return 0 data processed */ + if (!ldata) + return 0; + while (1) { /* * When PARMRK is set, each input char may take up to 3 chars