2016-04-15 18:21:39

by Oleg Nesterov

[permalink] [raw]
Subject: Q: tty_open() && hangup

Hi,

I am fingting with obscure pty bug in rhel6 which _might be_ explained
by some hangup/reopen races, and this motivated me to look at upstream
code which I can't understand too ;)

Lets look at tty_open(),

2112 tty = tty_open_current_tty(device, filp);
2113 if (!tty)
2114 tty = tty_open_by_driver(device, inode, filp);
2115
2116 if (IS_ERR(tty)) {
2117 tty_free_file(filp);
2118 retval = PTR_ERR(tty);
2119 if (retval != -EAGAIN || signal_pending(current))
2120 return retval;
2121 schedule();
2122 goto retry_open;

And why do we need this schedule() above? It is nop in TASK_RUNNING.

If we need it for non-preempt kernels to avoid some sort of livelock
this can't work if rt_task(current) == T.

If we just want to add a preemption point then perhaps we should use
cond_resched/might_resched ?

I am not saying this is wrong, but looks confusing.

2130 if (tty->ops->open)
2131 retval = tty->ops->open(tty, filp);
2132 else
2133 retval = -ENODEV;
2134 filp->f_flags = saved_flags;
2135
2136 if (retval) {
2137 tty_debug_hangup(tty, "open error %d, releasing\n", retval);
2138
2139 tty_unlock(tty); /* need to call tty_release without BTM */
2140 tty_release(inode, filp);
2141 if (retval != -ERESTARTSYS)
2142 return retval;
2143
2144 if (signal_pending(current))
2145 return retval;

So we can only get here if tty->ops->open returns -ERESTARTSYS without
signal_pending(). This doesn't look good. But OK, lets forget this.

2147 schedule();

Again, this looks confusing.

2148 /*
2149 * Need to reset f_op in case a hangup happened.
2150 */
2151 if (tty_hung_up_p(filp))
2152 filp->f_op = &tty_fops;

And this is called after tty_add_file() which makes this filp visible to
__tty_hangup(), and without tty_lock().

How can we trust this check? __tty_hangup() can set hung_up_tty_fops()
right after tty_hung_up_p() returns F.

Don't we need something like below? Otherwise afaics tty_open() can
actually succeed (and clear TTY_HUPPED) after restart, but return with
tty_hung_up_p(filp) == T.

Oleg.

--- x/drivers/tty/tty_io.c
+++ x/drivers/tty/tty_io.c
@@ -2122,6 +2122,13 @@ retry_open:
goto retry_open;
}

+ /*
+ * This is only possible after retry_open, we drop tty_lock()
+ * without tty_del_file().
+ */
+ if (tty_hung_up_p(filp))
+ filp->f_op = &tty_fops;
+
tty_add_file(tty, filp);

check_tty_count(tty, __func__);
@@ -2145,11 +2152,6 @@ retry_open:
return retval;

schedule();
- /*
- * Need to reset f_op in case a hangup happened.
- */
- if (tty_hung_up_p(filp))
- filp->f_op = &tty_fops;
goto retry_open;
}
clear_bit(TTY_HUPPED, &tty->flags);


2016-04-15 19:29:17

by Peter Hurley

[permalink] [raw]
Subject: Re: Q: tty_open() && hangup

Hi Oleg,

On 04/15/2016 10:19 AM, Oleg Nesterov wrote:
> Hi,
>
> I am fingting with obscure pty bug in rhel6 which _might be_ explained
> by some hangup/reopen races, and this motivated me to look at upstream
> code which I can't understand too ;)
>
> Lets look at tty_open(),
>
> 2112 tty = tty_open_current_tty(device, filp);
> 2113 if (!tty)
> 2114 tty = tty_open_by_driver(device, inode, filp);
> 2115
> 2116 if (IS_ERR(tty)) {
> 2117 tty_free_file(filp);
> 2118 retval = PTR_ERR(tty);
> 2119 if (retval != -EAGAIN || signal_pending(current))
> 2120 return retval;
> 2121 schedule();
> 2122 goto retry_open;
>
> And why do we need this schedule() above? It is nop in TASK_RUNNING.
>
> If we need it for non-preempt kernels to avoid some sort of livelock
> this can't work if rt_task(current) == T.
>
> If we just want to add a preemption point then perhaps we should use
> cond_resched/might_resched ?

Yeah, it's just a preemption point that pre-dates cond_resched.


> I am not saying this is wrong, but looks confusing.
>
> 2130 if (tty->ops->open)
> 2131 retval = tty->ops->open(tty, filp);
> 2132 else
> 2133 retval = -ENODEV;
> 2134 filp->f_flags = saved_flags;
> 2135
> 2136 if (retval) {
> 2137 tty_debug_hangup(tty, "open error %d, releasing\n", retval);
> 2138
> 2139 tty_unlock(tty); /* need to call tty_release without BTM */
> 2140 tty_release(inode, filp);
> 2141 if (retval != -ERESTARTSYS)
> 2142 return retval;
> 2143
> 2144 if (signal_pending(current))
> 2145 return retval;
>
> So we can only get here if tty->ops->open returns -ERESTARTSYS without
> signal_pending(). This doesn't look good. But OK, lets forget this.

tty_port_block_til_ready() returns -ERESTARTSYS if !ASYNC_HUP_NOTIFY and
tty was hungup or the h/w was reset (eg. while waiting for tty lock).
IOW, parallel hangup() or release() happened at the point where the
tty lock was not held.


> 2147 schedule();
>
> Again, this looks confusing.
>
> 2148 /*
> 2149 * Need to reset f_op in case a hangup happened.
> 2150 */
> 2151 if (tty_hung_up_p(filp))
> 2152 filp->f_op = &tty_fops;
>
> And this is called after tty_add_file() which makes this filp visible to
> __tty_hangup(), and without tty_lock().

tty_release() has removed the filp from the files list already (with the
tty lock held), so if the file operations were reset it's because a
hangup happened when the tty lock was not held (eg. waiting for reopen
or after dropping the lock and reacquiring it in tty_release())


> How can we trust this check? __tty_hangup() can set hung_up_tty_fops()
> right after tty_hung_up_p() returns F.

__tty_hangup() will not have had visibility to this filp after
tty_release().


> Don't we need something like below? Otherwise afaics tty_open() can
> actually succeed (and clear TTY_HUPPED) after restart, but return with
> tty_hung_up_p(filp) == T.

No.

Regards,
Peter Hurley



> --- x/drivers/tty/tty_io.c
> +++ x/drivers/tty/tty_io.c
> @@ -2122,6 +2122,13 @@ retry_open:
> goto retry_open;
> }
>
> + /*
> + * This is only possible after retry_open, we drop tty_lock()
> + * without tty_del_file().
> + */
> + if (tty_hung_up_p(filp))
> + filp->f_op = &tty_fops;
> +
> tty_add_file(tty, filp);
>
> check_tty_count(tty, __func__);
> @@ -2145,11 +2152,6 @@ retry_open:
> return retval;
>
> schedule();
> - /*
> - * Need to reset f_op in case a hangup happened.
> - */
> - if (tty_hung_up_p(filp))
> - filp->f_op = &tty_fops;
> goto retry_open;
> }
> clear_bit(TTY_HUPPED, &tty->flags);
>

2016-04-16 16:29:13

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Q: tty_open() && hangup

Hi Peter,

On 04/15, Peter Hurley wrote:
> >
> > If we just want to add a preemption point then perhaps we should use
> > cond_resched/might_resched ?
>
> Yeah, it's just a preemption point that pre-dates cond_resched.

OK, so perhaps s/schedule/might_reched/ makes sense to make it more
clear.
> > So we can only get here if tty->ops->open returns -ERESTARTSYS without
> > signal_pending(). This doesn't look good. But OK, lets forget this.
>
> tty_port_block_til_ready() returns -ERESTARTSYS if !ASYNC_HUP_NOTIFY and
> tty was hungup or the h/w was reset

Yes, yes, I saw this code, and I am not saying this is buggy.

I meant this looks confusing. At least to me, until I grepped for ERESTARTSYS
I thought that these code paths do something non-trivial with signal handling.

IMHO, ERESTARTSYS should only be used if signal_pending() is true and we are
going to return this error code to user-space. But tty_port_block_til_ready()
returns ERESTARTSYS if !ASYNC_HUP_NOTIFY _or_ signal_pending(), that is why
tty_open() has to check signal_pending() too.

I think this deserves a cleanup, but this is minor and of course subjective,
please forget.

> > 2148 /*
> > 2149 * Need to reset f_op in case a hangup happened.
> > 2150 */
> > 2151 if (tty_hung_up_p(filp))
> > 2152 filp->f_op = &tty_fops;
> >
> > And this is called after tty_add_file() which makes this filp visible to
> > __tty_hangup(), and without tty_lock().
>
> tty_release() has removed the filp from the files list already (with the
> tty lock held),

Heh. Can't understand how did I miss that ;)

Thanks Peter.

Oleg.