2017-04-07 03:51:14

by Michael Neuling

[permalink] [raw]
Subject: [PATCH] tty: Fix crash with flush_to_ldisc()

When reiniting a tty we can end up with:

[ 417.514499] Unable to handle kernel paging request for data at address 0x00002260
[ 417.515361] Faulting instruction address: 0xc0000000006fad80
cpu 0x15: Vector: 300 (Data Access) at [c00000799411f890]
pc: c0000000006fad80: n_tty_receive_buf_common+0xc0/0xbd0
lr: c0000000006fad5c: n_tty_receive_buf_common+0x9c/0xbd0
sp: c00000799411fb10
msr: 900000000280b033
dar: 2260
dsisr: 40000000
current = 0xc0000079675d1e00
paca = 0xc00000000fb0d200 softe: 0 irq_happened: 0x01
pid = 5, comm = kworker/u56:0
Linux version 4.11.0-rc5-next-20170405 (mikey@bml86) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Thu Apr 6 00:36:46 CDT 2017
enter ? for help
[c00000799411fbe0] c0000000006ff968 tty_ldisc_receive_buf+0x48/0xe0
[c00000799411fc10] c0000000007009d8 tty_port_default_receive_buf+0x68/0xe0
[c00000799411fc50] c0000000006ffce4 flush_to_ldisc+0x114/0x130
[c00000799411fca0] c00000000010a0fc process_one_work+0x1ec/0x580
[c00000799411fd30] c00000000010a528 worker_thread+0x98/0x5d0
[c00000799411fdc0] c00000000011343c kthread+0x16c/0x1b0
[c00000799411fe30] c00000000000b4e8 ret_from_kernel_thread+0x5c/0x74

This is due to a NULL ptr dref of tty->disc_data in
tty_ldisc_receive_buf() called from flush_to_ldisc()

This fixes the issue by moving the disc_data read to after we take the
semaphore. Then when disc_data NULL returning 0 data processed rather
than dereferencing it.

Cc: <[email protected]> [4.10+]
Signed-off-by: Michael Neuling <[email protected]>
---
drivers/tty/n_tty.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index bdf0e6e899..a2a9832a42 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1668,11 +1668,17 @@ static int
n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
char *fp, int count, int flow)
{
- struct n_tty_data *ldata = tty->disc_data;
+ struct n_tty_data *ldata;
int room, n, rcvd = 0, overflow;

down_read(&tty->termios_rwsem);

+ ldata = tty->disc_data;
+ if (!ldata) {
+ up_read(&tty->termios_rwsem);
+ return 0;
+ }
+
while (1) {
/*
* When PARMRK is set, each input char may take up to 3 chars
--
2.9.3


2017-04-07 04:12:27

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] tty: Fix crash with flush_to_ldisc()

On Fri, Apr 07, 2017 at 01:50:53PM +1000, Michael Neuling wrote:

> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index bdf0e6e899..a2a9832a42 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -1668,11 +1668,17 @@ static int
> n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
> char *fp, int count, int flow)
> {
> - struct n_tty_data *ldata = tty->disc_data;
> + struct n_tty_data *ldata;
> int room, n, rcvd = 0, overflow;
>
> down_read(&tty->termios_rwsem);
>
> + ldata = tty->disc_data;
> + if (!ldata) {
> + up_read(&tty->termios_rwsem);

I very much doubt that it's correct. It shouldn't have been called after
the n_tty_close(); apparently it has been. ->termios_rwsem won't serialize
against it, and something apparently has gone wrong with the exclusion there.
At the very least I would like to see what's to prevent n_tty_close() from
overlapping the exection of this function - if *that* is what broke, your
patch will only paper over the problem.

2017-04-07 04:58:20

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH] tty: Fix crash with flush_to_ldisc()

Al,

On Fri, 2017-04-07 at 05:12 +0100, Al Viro wrote:
> On Fri, Apr 07, 2017 at 01:50:53PM +1000, Michael Neuling wrote:
>
> > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> > index bdf0e6e899..a2a9832a42 100644
> > --- a/drivers/tty/n_tty.c
> > +++ b/drivers/tty/n_tty.c
> > @@ -1668,11 +1668,17 @@ static int
> >  n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
> >    char *fp, int count, int flow)
> >  {
> > - struct n_tty_data *ldata = tty->disc_data;
> > + struct n_tty_data *ldata;
> >   int room, n, rcvd = 0, overflow;
> >  
> >   down_read(&tty->termios_rwsem);
> >  
> > + ldata = tty->disc_data;
> > + if (!ldata) {
> > + up_read(&tty->termios_rwsem);
>
> I very much doubt that it's correct.  It shouldn't have been called after
> the n_tty_close(); apparently it has been.  ->termios_rwsem won't serialize
> against it, and something apparently has gone wrong with the exclusion there.
> At the very least I would like to see what's to prevent n_tty_close() from
> overlapping the exection of this function - if *that* is what broke, your
> patch will only paper over the problem.

It does seem like I'm papering over a problem. Would you be happy with the patch
if we add a WARN_ON_ONCE()?

I think the problem is permanent rather than a race/transient with the disc_data
being NULL as if we read it again later, it's still NULL.

Benh and I looked at this a bunch and we did notice tty_ldisc_reinit() was being
called called without the tty lock in one location. We tried the below patch
but it didn't help (not an upstreamable patch, just a test).

There has been a few attempts are trying to fix this but none have worked for
me:
https://lkml.org/lkml/2017/3/23/569
and
https://patchwork.kernel.org/patch/9114561/

I'm not that familiar with the tty layer (and I value my sanity) so I'm
struggling to root cause it by myself.

Mikey


diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 734a635e73..121402ff25 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1454,6 +1454,9 @@ static void tty_driver_remove_tty(struct tty_driver *driver, struct tty_struct *
driver->ttys[tty->index] = NULL;
}

+extern int tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout);
+extern void tty_ldisc_unlock(struct tty_struct *tty);
+
/*
* tty_reopen() - fast re-open of an open tty
* @tty - the tty to open
@@ -1466,6 +1469,7 @@ static void tty_driver_remove_tty(struct tty_driver *driver, struct tty_struct *
static int tty_reopen(struct tty_struct *tty)
{
struct tty_driver *driver = tty->driver;
+ int rc = 0;

if (driver->type == TTY_DRIVER_TYPE_PTY &&
driver->subtype == PTY_TYPE_MASTER)
@@ -1479,10 +1483,12 @@ static int tty_reopen(struct tty_struct *tty)

tty->count++;

+ tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT);
if (!tty->ldisc)
- return tty_ldisc_reinit(tty, tty->termios.c_line);
+ rc = tty_ldisc_reinit(tty, tty->termios.c_line);
+ tty_ldisc_unlock(tty);

- return 0;
+ return rc;
}

/**
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index d0e84b6226..3b13ff11c5 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -334,7 +334,7 @@ static inline void __tty_ldisc_unlock(struct tty_struct *tty)
ldsem_up_write(&tty->ldisc_sem);
}

-static int tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout)
+int tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout)
{
int ret;

@@ -345,7 +345,7 @@ static int tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout)
return 0;
}

-static void tty_ldisc_unlock(struct tty_struct *tty)
+void tty_ldisc_unlock(struct tty_struct *tty)
{
clear_bit(TTY_LDISC_HALTED, &tty->flags);
__tty_ldisc_unlock(tty);