Hello Oleg, everyone,
I have noticed something, which may be considered a race in the
interaction of ptrace and pseudoterminal interfaces. Basically, what
happens is this:
- we have two processes: A and B. B has the slave end of the pty open,
A has the master. A is tracing B.
- B writes some data through the slave end and then stops.
- A waits for B to stop.
- A does a select on the master pty endpoint. select returns there is
no data available
- later, A tries the select again, and this time the data appears.
We are encountering this (very rare) issue in our debugger test suite,
where we check the stdout of the tracee to make sure it is behaving as
expected. I have attached a small program reproducing this behavior
(it fails after about 1000 iterations on a 3.13.0 kernel, I can retry
it on a newer kernel next week if you believe it might work there).
Interestingly, when I replace the pty with a regular pipe, it works as
expected (the data is available as soon as the program stops).
My question is: Is this behavior something that you would consider a
bug? If yes, do you have any pointers, as to where I should look to
fix it?
kind regards,
pavel
Hi Pavel,
On 11/03/2015 06:16 PM, Pavel Labath wrote:
> Hello Oleg, everyone,
>
> I have noticed something, which may be considered a race in the
> interaction of ptrace and pseudoterminal interfaces. Basically, what
> happens is this:
> - we have two processes: A and B. B has the slave end of the pty open,
> A has the master. A is tracing B.
> - B writes some data through the slave end and then stops.
> - A waits for B to stop.
> - A does a select on the master pty endpoint. select returns there is
> no data available
> - later, A tries the select again, and this time the data appears.
This happens because a separate kworker processes the input from slave
and wakes the master. At the moment of select() on the master pty, the
kworker has not processed the latest input (in fact it may only be
scheduled and not running yet).
Essentially, you're measuring a asynchronous i/o path with a synchronous
method.
> We are encountering this (very rare) issue in our debugger test suite,
> where we check the stdout of the tracee to make sure it is behaving as
> expected. I have attached a small program reproducing this behavior
> (it fails after about 1000 iterations on a 3.13.0 kernel, I can retry
> it on a newer kernel next week if you believe it might work there).
> Interestingly, when I replace the pty with a regular pipe, it works as
> expected (the data is available as soon as the program stops).
>
> My question is: Is this behavior something that you would consider a
> bug? If yes, do you have any pointers, as to where I should look to
> fix it?
I don't consider it a bug.
That said, I could see a couple of different ways to add this
functionality:
1. Implement f_op->fsync() for ttys, which would flush the workqueue
(thus waiting for i/o completion). The debugger would fsync() before
select() on the master.
2. Automagically for ptraced processes. The basic idea would be that
writes to the slave end while a process was being ptraced would
set state that would trigger workqueue flush by select/poll/read of
the master end.
Regards,
Peter Hurley
On 11/04, Peter Hurley wrote:
>
> Hi Pavel,
>
> On 11/03/2015 06:16 PM, Pavel Labath wrote:
> > Hello Oleg, everyone,
> >
> > I have noticed something, which may be considered a race in the
> > interaction of ptrace and pseudoterminal interfaces. Basically, what
> > happens is this:
> > - we have two processes: A and B. B has the slave end of the pty open,
> > A has the master. A is tracing B.
> > - B writes some data through the slave end and then stops.
> > - A waits for B to stop.
> > - A does a select on the master pty endpoint. select returns there is
> > no data available
> > - later, A tries the select again, and this time the data appears.
>
> This happens because a separate kworker processes the input from slave
> and wakes the master. At the moment of select() on the master pty, the
> kworker has not processed the latest input (in fact it may only be
> scheduled and not running yet).
>
> Essentially, you're measuring a asynchronous i/o path with a synchronous
> method.
Thanks a lot Peter!
> > We are encountering this (very rare) issue in our debugger test suite,
> > where we check the stdout of the tracee to make sure it is behaving as
> > expected. I have attached a small program reproducing this behavior
> > (it fails after about 1000 iterations on a 3.13.0 kernel, I can retry
> > it on a newer kernel next week if you believe it might work there).
> > Interestingly, when I replace the pty with a regular pipe, it works as
> > expected (the data is available as soon as the program stops).
> >
> > My question is: Is this behavior something that you would consider a
> > bug? If yes, do you have any pointers, as to where I should look to
> > fix it?
>
> I don't consider it a bug.
>
> That said, I could see a couple of different ways to add this
> functionality:
> 1. Implement f_op->fsync() for ttys, which would flush the workqueue
> (thus waiting for i/o completion). The debugger would fsync() before
> select() on the master.
> 2. Automagically for ptraced processes. The basic idea would be that
> writes to the slave end while a process was being ptraced would
> set state that would trigger workqueue flush by select/poll/read of
> the master end.
Oh, I don't think "Automagically if ptrace" makes any sense... What makes
ptrace special? Afaics nothing.
We can modify this test-case to use signals/futexes/whatever to let the
the parent know that the child has already done write(writefd), and it can
"fail" the same way.
Oleg.
On 11/04/2015 02:43 PM, Oleg Nesterov wrote:
> On 11/04, Peter Hurley wrote:
>>
>> Hi Pavel,
>>
>> On 11/03/2015 06:16 PM, Pavel Labath wrote:
>>> Hello Oleg, everyone,
>>>
>>> I have noticed something, which may be considered a race in the
>>> interaction of ptrace and pseudoterminal interfaces. Basically, what
>>> happens is this:
>>> - we have two processes: A and B. B has the slave end of the pty open,
>>> A has the master. A is tracing B.
>>> - B writes some data through the slave end and then stops.
>>> - A waits for B to stop.
>>> - A does a select on the master pty endpoint. select returns there is
>>> no data available
>>> - later, A tries the select again, and this time the data appears.
>>
>> This happens because a separate kworker processes the input from slave
>> and wakes the master. At the moment of select() on the master pty, the
>> kworker has not processed the latest input (in fact it may only be
>> scheduled and not running yet).
>>
>> Essentially, you're measuring a asynchronous i/o path with a synchronous
>> method.
>
> Thanks a lot Peter!
>
>>> We are encountering this (very rare) issue in our debugger test suite,
>>> where we check the stdout of the tracee to make sure it is behaving as
>>> expected. I have attached a small program reproducing this behavior
>>> (it fails after about 1000 iterations on a 3.13.0 kernel, I can retry
>>> it on a newer kernel next week if you believe it might work there).
>>> Interestingly, when I replace the pty with a regular pipe, it works as
>>> expected (the data is available as soon as the program stops).
>>>
>>> My question is: Is this behavior something that you would consider a
>>> bug? If yes, do you have any pointers, as to where I should look to
>>> fix it?
>>
>> I don't consider it a bug.
>>
>> That said, I could see a couple of different ways to add this
>> functionality:
>> 1. Implement f_op->fsync() for ttys, which would flush the workqueue
>> (thus waiting for i/o completion). The debugger would fsync() before
>> select() on the master.
>> 2. Automagically for ptraced processes. The basic idea would be that
>> writes to the slave end while a process was being ptraced would
>> set state that would trigger workqueue flush by select/poll/read of
>> the master end.
>
> Oh, I don't think "Automagically if ptrace" makes any sense... What makes
> ptrace special? Afaics nothing.
>
> We can modify this test-case to use signals/futexes/whatever to let the
> the parent know that the child has already done write(writefd), and it can
> "fail" the same way.
True.
Also, new patches in mainline head make this _much_ less likely
by scheduling the input processing kworker on the unbound wq (which means
the kworker can start immediately on another cpu rather than pinned to
the cpu performing the slave write).
After thinking more about this, this use-case seems trivially solvable
by re-select()ing with a timeout prior to reporting mismatch output
failure.
Regards,
Peter Hurley
On 5 November 2015 at 05:25, Peter Hurley <[email protected]> wrote:
> On 11/04/2015 02:43 PM, Oleg Nesterov wrote:
>> Oh, I don't think "Automagically if ptrace" makes any sense... What makes
>> ptrace special? Afaics nothing.
>>
>> We can modify this test-case to use signals/futexes/whatever to let the
>> the parent know that the child has already done write(writefd), and it can
>> "fail" the same way.
>
> True.
>
> Also, new patches in mainline head make this _much_ less likely
> by scheduling the input processing kworker on the unbound wq (which means
> the kworker can start immediately on another cpu rather than pinned to
> the cpu performing the slave write).
>
> After thinking more about this, this use-case seems trivially solvable
> by re-select()ing with a timeout prior to reporting mismatch output
> failure.
>
> Regards,
> Peter Hurley
>
Thank you for the replies.
I agree that this can be worked around on our side, but I wanted to
confirm whether this is expected behavior or a bug. Judging from your
answers, it seems this is working as intended.
That said, it seems to me that this could be a generally useful
feature. For the test suite, I can insert a sleep (even a large one,
to be sure), but this seems like a sub-optimal solution for general
debugger operation. E.g., when we want to display all tracee output(*)
before we print out the debugger prompt, we don't know if the tracee
has written anything, and we would need to sleep always, just in case
it has done that. This is especially tricky for remote debugging, as
the current gdb-remote protocol does not allow sending stdio after the
stop notification. So, I actually quite like the fsync() idea, but I
don't know if this is something that would be generally accepted (?).
(*) To avoid mixing output we don't have the tracee share the same
terminal with the debugger, but we create a new one, and do the
forwarding ourselves. Aside from avoiding output mixing, this
facilitates IDE integration, remote debugging, etc.
A side question: When I replace the pty with a pipe, the data seems to
be delivered immediately. Is this something that is guaranteed, or
this happens to work only accidentally and could change in the future
(e.g. by moving the pipe processing to a kworker process or whatever)?
regards,
pl
On 11/05/2015 01:35 PM, Pavel Labath wrote:
> On 5 November 2015 at 05:25, Peter Hurley <[email protected]> wrote:
>> On 11/04/2015 02:43 PM, Oleg Nesterov wrote:
>>> Oh, I don't think "Automagically if ptrace" makes any sense... What makes
>>> ptrace special? Afaics nothing.
>>>
>>> We can modify this test-case to use signals/futexes/whatever to let the
>>> the parent know that the child has already done write(writefd), and it can
>>> "fail" the same way.
>>
>> True.
>>
>> Also, new patches in mainline head make this _much_ less likely
>> by scheduling the input processing kworker on the unbound wq (which means
>> the kworker can start immediately on another cpu rather than pinned to
>> the cpu performing the slave write).
>>
>> After thinking more about this, this use-case seems trivially solvable
>> by re-select()ing with a timeout prior to reporting mismatch output
>> failure.
>>
>> Regards,
>> Peter Hurley
>>
>
> Thank you for the replies.
>
> I agree that this can be worked around on our side, but I wanted to
> confirm whether this is expected behavior or a bug. Judging from your
> answers, it seems this is working as intended.
>
> That said, it seems to me that this could be a generally useful
> feature. For the test suite, I can insert a sleep (even a large one,
> to be sure), but this seems like a sub-optimal solution for general
> debugger operation. E.g., when we want to display all tracee output(*)
> before we print out the debugger prompt, we don't know if the tracee
> has written anything, and we would need to sleep always, just in case
> it has done that.
My comment suggesting re-select()ing was aimed at the test suite only.
For the debugger, I would always mixin new output from the target
regardless of when it arrived. But feel free to ignore my unsolicited
design advice :)
> This is especially tricky for remote debugging, as
> the current gdb-remote protocol does not allow sending stdio after the
> stop notification.
Hmm, I could swear I've seen gdb scrolling away with new output while
stopped.
> So, I actually quite like the fsync() idea, but I
> don't know if this is something that would be generally accepted (?).
Let me think more on this; maybe I can come up with a way to trip it
within an existing method.
> (*) To avoid mixing output we don't have the tracee share the same
> terminal with the debugger, but we create a new one, and do the
> forwarding ourselves. Aside from avoiding output mixing, this
> facilitates IDE integration, remote debugging, etc.
>
>
> A side question: When I replace the pty with a pipe, the data seems to
> be delivered immediately. Is this something that is guaranteed, or
> this happens to work only accidentally and could change in the future
> (e.g. by moving the pipe processing to a kworker process or whatever)?
I would think the existing pipe behavior is more or less guaranteed, since
pipes are commonly used for process synchronization.
Regards,
Peter Hurley
On 5 November 2015 at 20:29, Peter Hurley <[email protected]> wrote:
> On 11/05/2015 01:35 PM, Pavel Labath wrote:
>> That said, it seems to me that this could be a generally useful
>> feature. For the test suite, I can insert a sleep (even a large one,
>> to be sure), but this seems like a sub-optimal solution for general
>> debugger operation. E.g., when we want to display all tracee output(*)
>> before we print out the debugger prompt, we don't know if the tracee
>> has written anything, and we would need to sleep always, just in case
>> it has done that.
>
> My comment suggesting re-select()ing was aimed at the test suite only.
>
> For the debugger, I would always mixin new output from the target
> regardless of when it arrived. But feel free to ignore my unsolicited
> design advice :)
;)
>
>
>> This is especially tricky for remote debugging, as
>> the current gdb-remote protocol does not allow sending stdio after the
>> stop notification.
>
> Hmm, I could swear I've seen gdb scrolling away with new output while
> stopped.
That's quite possible if this wasn't a remote session. Gdb shares the
terminal with the tracee, so the order the output comes out really
depends on the internal terminal implementation. In lldb, we create a
new pty for the tracee and control the output forwarding ourselves.
If it was a remote session than I would be very interested in it as I
don't think the remote protocol supports that.
>
>> So, I actually quite like the fsync() idea, but I
>> don't know if this is something that would be generally accepted (?).
>
> Let me think more on this; maybe I can come up with a way to trip it
> within an existing method.
Thanks. I have not seen this occurring since, I contacted you, so it's
not a big priority for me now, but I may revisit it later.
>> A side question: When I replace the pty with a pipe, the data seems to
>> be delivered immediately. Is this something that is guaranteed, or
>> this happens to work only accidentally and could change in the future
>> (e.g. by moving the pipe processing to a kworker process or whatever)?
>
> I would think the existing pipe behavior is more or less guaranteed, since
> pipes are commonly used for process synchronization.
That's good to know, thanks. :)
cheers,
pl