2010-01-15 05:27:06

by Matthias Urlichs

[permalink] [raw]
Subject: [PATCH] ldisc switching on a HUPped pty hangs the caller

Calling vhangup() and then switching line disciplines results in
incomplete clean-up and a wedged process if it then calls ioctl()
on the pty.

Signed-Off-By: Matthias Urlichs <[email protected]>

---
I recently upgraded my gateway machine to 2.6.31. This caused
ppp-over-ssh to stop working. Indeed, the PPP process got wedged into
noninterruptible sleep, which this patch fixes.

(The comment, by the way, seems to be wrong. There was no race.)

The underlying problem, however, turns out to be the vhangup() syscall
which the SSH server emits before exec'ing pppd. This causes the tty's
HUPPED bit to get set, which in turn causes the above error.

Browbeating sshd into not issuing vhangup() (why does it do that,
anyway? Presumably, using PTMX we can be sure that nobody is hogging the
pty in the first place, right?) means that I now have a working ssh
tunnel.

… until somebody forces me to switch to a 'real' VPN. :-P

---

diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
index aafdbae..bb92f5e 100644
--- a/drivers/char/tty_ldisc.c
+++ b/drivers/char/tty_ldisc.c
@@ -621,9 +621,8 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
/* We were raced by the hangup method. It will have stomped
the ldisc data and closed the ldisc down */
clear_bit(TTY_LDISC_CHANGING, &tty->flags);
- mutex_unlock(&tty->ldisc_mutex);
- tty_ldisc_put(new_ldisc);
- return -EIO;
+ retval = -EIO;
+ goto out;
}

/* Shutdown the current discipline. */
@@ -652,7 +651,7 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
/*
* Allow ldisc referencing to occur again
*/
-
+out:
tty_ldisc_enable(tty);
if (o_tty)
tty_ldisc_enable(o_tty);



2010-01-15 10:45:27

by Alan

[permalink] [raw]
Subject: Re: [PATCH] ldisc switching on a HUPped pty hangs the caller

> I recently upgraded my gateway machine to 2.6.31. This caused
> ppp-over-ssh to stop working. Indeed, the PPP process got wedged into
> noninterruptible sleep, which this patch fixes.
>
> (The comment, by the way, seems to be wrong. There was no race.)

Really ?

Think about

set_ldisc
hangup
close
open
set to N_TTY etc
Now what ???


> The underlying problem, however, turns out to be the vhangup() syscall
> which the SSH server emits before exec'ing pppd. This causes the tty's
> HUPPED bit to get set, which in turn causes the above error.

vhangup sets the huppet bit to make sure that anything touching the tty
beyond that point goes away and dies.

> diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
> index aafdbae..bb92f5e 100644
> --- a/drivers/char/tty_ldisc.c
> +++ b/drivers/char/tty_ldisc.c
> @@ -621,9 +621,8 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
> /* We were raced by the hangup method. It will have stomped
> the ldisc data and closed the ldisc down */
> clear_bit(TTY_LDISC_CHANGING, &tty->flags);
> - mutex_unlock(&tty->ldisc_mutex);
> - tty_ldisc_put(new_ldisc);
> - return -EIO;
> + retval = -EIO;
> + goto out;
> }
>
> /* Shutdown the current discipline. */
> @@ -652,7 +651,7 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
> /*
> * Allow ldisc referencing to occur again
> */
> -
> +out:
> tty_ldisc_enable(tty);
> if (o_tty)
> tty_ldisc_enable(o_tty);

And falls through to restart work queues and stuff that may not be safe
to do


So: NAK

I agree there is a bug but you've swapped one bug for sevral nastier bugs.

As far as I can see from a quick look the real problem in your case is
that we don't do enough work in the case where tty->driver->flags doesn't
contain TTY_DRIVER_RESET_TERMIOS. We need to reinit the ldisc either way.

2010-01-15 11:13:22

by Alan

[permalink] [raw]
Subject: Re: [PATCH] ldisc switching on a HUPped pty hangs the caller

Give this a spin
tty: Fix the ldisc hangup race

From: Alan Cox <[email protected]>

This was noticed by Matthias Urlichs and he proposed a fix. This patch
does the fixing a different way to avoid introducing several new race
conditions into the code.

The problem case is TTY_DRIVER_RESET_TERMIOS = 0. In that case while we
abort the ldisc change the hangup processing has not cleaned up and restarted
the ldisc either.

We can't restart the ldisc stuff in the set_ldisc as we don't know what
the hangup did and may touch stuff we shouldn't as we are no longer
supposed to influence the tty at that point in case it has been re-opened
before we get rescheduled.

Instead do it the simple way. Always re-init the ldisc on the hangup, but
use TTY_DRIVER_RESET_TERMIOS to indicate that we should force N_TTY.

Signed-off-by: Alan Cox <[email protected]>
---

drivers/char/tty_ldisc.c | 50 ++++++++++++++++++++++++++++------------------
1 files changed, 30 insertions(+), 20 deletions(-)


diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
index 3f653f7..500e740 100644
--- a/drivers/char/tty_ldisc.c
+++ b/drivers/char/tty_ldisc.c
@@ -706,12 +706,13 @@ static void tty_reset_termios(struct tty_struct *tty)
/**
* tty_ldisc_reinit - reinitialise the tty ldisc
* @tty: tty to reinit
+ * @ldisc: line discipline to reinitialize
*
- * Switch the tty back to N_TTY line discipline and leave the
- * ldisc state closed
+ * Switch the tty to a line discipline and leave the ldisc
+ * state closed
*/

-static void tty_ldisc_reinit(struct tty_struct *tty)
+static void tty_ldisc_reinit(struct tty_struct *tty, int ldisc)
{
struct tty_ldisc *ld;

@@ -721,10 +722,10 @@ static void tty_ldisc_reinit(struct tty_struct *tty)
/*
* Switch the line discipline back
*/
- ld = tty_ldisc_get(N_TTY);
+ ld = tty_ldisc_get(ldisc);
BUG_ON(IS_ERR(ld));
tty_ldisc_assign(tty, ld);
- tty_set_termios_ldisc(tty, N_TTY);
+ tty_set_termios_ldisc(tty, ldisc);
}

/**
@@ -745,6 +746,8 @@ static void tty_ldisc_reinit(struct tty_struct *tty)
void tty_ldisc_hangup(struct tty_struct *tty)
{
struct tty_ldisc *ld;
+ int reset = tty->driver->flags & TTY_DRIVER_RESET_TERMIOS;
+ int err = 0;

/*
* FIXME! What are the locking issues here? This may me overdoing
@@ -772,25 +775,32 @@ void tty_ldisc_hangup(struct tty_struct *tty)
wake_up_interruptible_poll(&tty->read_wait, POLLIN);
/*
* Shutdown the current line discipline, and reset it to
- * N_TTY.
+ * N_TTY if need be.
+ *
+ * Avoid racing set_ldisc or tty_ldisc_release
*/
- if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) {
- /* Avoid racing set_ldisc or tty_ldisc_release */
- mutex_lock(&tty->ldisc_mutex);
- tty_ldisc_halt(tty);
- if (tty->ldisc) { /* Not yet closed */
- /* Switch back to N_TTY */
- tty_ldisc_reinit(tty);
- /* At this point we have a closed ldisc and we want to
- reopen it. We could defer this to the next open but
- it means auditing a lot of other paths so this is
- a FIXME */
+ mutex_lock(&tty->ldisc_mutex);
+ tty_ldisc_halt(tty);
+ /* At this point we have a closed ldisc and we want to
+ reopen it. We could defer this to the next open but
+ it means auditing a lot of other paths so this is
+ a FIXME */
+ if (tty->ldisc) { /* Not yet closed */
+ if (reset == 0) {
+ tty_ldisc_reinit(tty, tty->termios->c_line);
+ err = tty_ldisc_open(tty, tty->ldisc);
+ }
+ /* If the re-open fails or we reset then go to N_TTY. The
+ N_TTY open cannot fail */
+ if (reset || err) {
+ tty_ldisc_reinit(tty, N_TTY);
WARN_ON(tty_ldisc_open(tty, tty->ldisc));
- tty_ldisc_enable(tty);
}
- mutex_unlock(&tty->ldisc_mutex);
- tty_reset_termios(tty);
+ tty_ldisc_enable(tty);
}
+ mutex_unlock(&tty->ldisc_mutex);
+ if (reset)
+ tty_reset_termios(tty);
}

/**

2010-01-15 12:04:49

by Matthias Urlichs

[permalink] [raw]
Subject: Re: [PATCH] ldisc switching on a HUPped pty hangs the caller

On Fri, 2010-01-15 at 10:48 +0000, Alan Cox wrote:
> > (The comment, by the way, seems to be wrong. There was no race.)
>
> Really ?
>
Yes -- I didn't mean to imply that there was no possibility of a race,
but that there need not be any race for creating a deadlock situation.
A call to vhangup() is sufficient.

I'll test your fix later today. Rebooting a production gateway in the
middle of the work day isn't a good idea. :-P