2011-03-01 15:25:06

by Tejun Heo

[permalink] [raw]
Subject: [RFC] Proposal for ptrace improvements

The current ptrace implementation has many issues on various aspects.
Some of them are outright bugs. Some are ambiguously defined grey
areas and others are missing features. Among these, the most
promienent is interactions with jctl (job control) where nothing is
really well defined and the current behaviors are broken to the point
where achieving transparency with userland work-arounds is impossible.

During the past couple of months, there have been some dicussions on
how to improve ptrace[1]. I'd like to summarize some of it and
describe what I think would be a good way to proceed.


IDENTIFIED ISSUES
-----------------

I1. TASK_STOPPED and TASK_TRACED

Currently, a tracee may stop in two different ways. When stopping for
jctl, it stops inside do_signal_stop() and puts itself into
TASK_STOPPED. For ptrace traps, it stops inside ptrace_stop() with
TASK_TRACED.

The biggest difference between the two stops is that when a tracee is
in TASK_STOPPED, it can be resumed by emission of SIGCONT (as Roland
pointed out, emission ends jctl stop, not reception), while only the
tracer and SIGKILL can resume from TASK_TRACED.

When a tracer issues a ptrace request to a TASK_STOPPED tracee, the
tracer silently changes the tracee's state from TASK_STOPPED to
TASK_TRACED. This behavior is probably intended to enable some level
of job control transparency, so that a tracee can still be stopped and
resumed by jctl; unfortunately, this silent transition is problematic.

* Some architectures require tracees to take certain steps before
being poked by tracers. This is implemented as arch_ptrace_stop()
callback in ptrace_stop(). The silent transition from TASK_STOPPED
to TASK_TRACED skips this step and may result in presenting
incorrect tracee states to tracers.

* Any ptrace request initiates the silent transition. As tracers
can't obtain a lot of information from wait(2), they usually have to
issue one or more ptrace requests after notification, which forces
tracees into TASK_TRACED making the whole transparency thing moot.

* The mixed use of jctl and ptrace stop is error-prone. For example,
wait(2) exit_code handling is different between TASK_STOPPED and
TASK_TRACED. Using jctl stop while ptraced makes it more
complicated and fragile.

* If a tracee is continued by SIGCONT before its tracer issues a
ptrace request, the ptrace request would fail with -ESRCH. Due to
the tracer behavior described above, the window is usually very
small. This necessiates a cold path which would be travelled
seldomly and thus not tested very well.


I2. Loss of jctl notifications to real parent

When a task is ptraced, it gets "re-parented" to the tracer. The
tracer becomes the parent and intercepts jctl notifications. This
means, among other things, that when gdb(1) or strace(1) is attached
to a process which is run from an interactive shell, the usual jctl
mechanism via ^Z doesn't work. The STOP signal is sent but the shell
is never notified that the child has stopped.


I3. Not well-defined job control behaviors while traced

In general, jctl behaviors while ptraced aren't well defined. The
currently implemented behaviors are undeterministic and ambiguous on
many aspects; however, thanks to the previously described
shortcomings, jctl while traced is broken to the point where these
ambiguities don't matter all that much.


I4. SIGSTOP sent on PTRACE_ATTACH

PTRACE_ATTACH implies SIGSTOP. This makes it impossible for the
tracer to be transparent with respect to jctl from the get-go.


BASELINE
--------

First, I'd like to lay out two existing rules of the current ptrace
implementation as they became points of contention.

* ptrace is by large task-centric. When PTRACE_ATTACH happens, the
reparenting separates the tracee from the task group (process) and
most interactions are confined between the tracer and tracee. In
the current code, the only notable exception is the implied SIGSTOP
on attach which affects the whole process.

* PTRACE_CONT and other requests which resume the tracee overrides, or
rather works below, jctl stop. If jctl stop takes place on the task
group a tracee belongs to, the tracee will eventually participate in
the group stop and its tracer will be notified; however, when
PTRACE_CONT or other resuming request is made, the tracee will
resume execution regardless of and without affecting the jctl stop.

I don't know whether these are by design or just happened as
by-products of the evolution of task group implementation in the
kernel, but regardless, in my opinion, both rules are sound and
useful. They might not be immediately intuitive and the resulting
behavior might seem quirky but to me it seems to be one of those
things which looks awkward at first but is ultimately right in its
usefulness and relative simplicity.

More importantly, it doesn't matter what I or, for that matter, anyone
else thinks about them. They're tightly ingrained into the
userland-visible behavior and actively exploited by the current users
- for example, dynamic evalution in tracee context in gdb(1).
Changing behaviors as fundamental as these would impact the current
applications and debugging behaviors expected by (human) users.

So, I don't think it's possible or even desirable to change these
basic rules even if it makes certain aspects of jctl and ptrace
interaction more elegant.

I don't believe every detail of kernel behavior should remain
completely static. There are behavior changes which go unnoticed or
are even wildly welcome but changing these is way out of scope. If
we're gonna make changes as fundamental as these, we really should be
looking at implementing a completely new API and planning for
deprecation of the current one. Such API deprecation, in turn,
requires very strong supporting rationales, which I don't see here,
not when the existing one can be improved to be, far from perfect but,
useful and sane _enough_.

What we can and should do is much more gradual approach. First, fix
the existing bugs, iron out ambiguities and so on. In the process,
there will be minor behavior changes. We'll be fixing user-visible
bugs too after all, but we actually have some latitude thanks to the
wild breakages. Then, we can add small pieces to augment the existing
interface.


PROPOSAL
--------

P1. Always TASK_TRACED while ptraced

The silent transition from TASK_STOPPED to TASK_TRACED is outright
buggy. If the tracer wants to transit the tracee into TASK_TRACED, it
should ask the tracee to wake up, execute the necessary steps and then
enter TASK_TRACED.

As described in I1, entering TASK_STOPPED while ptraced doesn't bring
a lot of benefits while giving rise to several issues. I think it's
best to always enter TASK_TRACED while traced whether the stop is for
jctl or ptrace trap. After all, it's not like jctl stops while traced
can be handled the same way as usual jctl stops. They require special
ptrace specific handling.

This introduces two behavioral differences. One is that the
TASK_STOPPED <-> TASK_RUNNING <-> TASK_TRACED transitions become
visible via /proc and other subtleties. We can use different levels
of workarounds to mask these transitions. In my opinion, it's enough
to mask the transition from the tracing task itself. IOW, if the
tracer is multi-thread or process, the transitions could be visible to
other threads and processes but are always transparent to the ptracing
thread.

The second difference is that the tracee would now be in TASK_TRACED
immediately after it stops for jctl while ptraced. As described above
this feature isn't really useful and the existing users can't and thus
don't take advantage of it. They immediately follow wait(2)
notifications with PTRACE requests putting the tracee into
TASK_TRACED. I highly doubt the change would be noticeable or missed.


P2. Fix notifications to the real parent

This pleasantly proved to be the least contentious change to make.
The usual group stop / continued notifications should be propagated to
the real parent whether the children are ptraced or not. There isn't
much to be discussed about the wanted behavior. Notifications which
would have been generated and delivered to the real parent in the
absense of ptrace should be generated and delivered to the real parent
the same.


P3. Keep ptrace resume separate from and beneath jctl stop

As written above, I think the current ptrace behavior, despite a lot
of rough edges, is in the right direction in that ptrace operates
beneath jctl. Therefore, keep the basic operation principles but
clearly define how jctl and ptrace interacts, or rather, how they
don't. The following two rules clearly separate jctl and ptrace.

* jctl stop initiates when one of the stop signals is received and
completes when all the member tasks participate in the group stop,
where participation preciesly means that a member task stops in
do_signal_stop(). Any member task can only participate once in any
given group stop. ptrace does NOT make any difference in this
regard.

* However, PTRACE_DETACH should maintain the integrity of group stop.
After a tracee is detached, it should be in a state which is
conformant to the current jctl state. If jctl stop is in effect,
the task should be put into TASK_STOPPED; otherwise, TASK_RUNNING.


P4. PTRACE_SEIZE

As the implied SIGSTOP is very visible from userland, solving I4
mandates a different way to attach to a tracee. There is a proposal
from Roland[2], but I'd like to propose something slightly different.

Roland proposed two new ptrace requests - PTRACE_ATTACH_NOSTOP and
PTRACE_INTERRUPT. As the name implies, PTRACE_ATTACH_NOSTOP attaches
to the specified task but doesn't do anything about its execution
state and PTRACE_INTERRUPT interrupts execution of a tracee without
affecting its jctl state.

I don't think it's a good idea to attach without putting the tracee
into TASK_TRACED. The API becomes more complex because attaching
doesn't atomically establish a fixed state as shown by the necessity
for PTRACE_O_INHERIT and the ability to set other options on
PTRACE_ATTACH_NOSTOP.

I can't see much, if any, benefit in implementing ATTACH and INTERRUPT
separately. They can be combined into one request, say, PTRACE_SEIZE.
If the target task isn't already attached, it attaches and puts the
tracee into TASK_TRACED. If already attached, the tracee is forced
into TASK_TRACED. In both cases, jctl state is unaffected.

Completion notification is delivered in the usual way via wait(2). If
the task was in jctl stop, it would report the stop signal with the
matching siginfo. If the task hits an existing ptrace trap condition,
the matching SIGTRAP will be reported; otherwise, SIGTRAP will be
reported with siginfo indicating PTRACE_SEIZE trap.

IOW, PTRACE_SEIZE guarantees that the tracee, whether new or existing,
enters TASK_TRACED. If there is an existing stop condition, that will
be taken and reported; otherwise, PTRACE_SEIZE trap will be reported.


P5. "^Z" and "fg" for tracees

A ptracer, as it currently stands and proposed here, has full control
over the execution state of its tracee. The tracer is notified
whenever the tracee stops and can always resume its execution;
however, there is one missing piece.

As proposed, when a tracee enters jctl stop, it enters TASK_TRACED
from which emission of SIGCONT can't resume the tracee. This makes it
impossible for a tracer to become transparent with respect to jctl.
For example, after strace(1) is attached to a task, the task can be
^Z'd but then can't be fg'd.

One approach to this problem is somehow making it work implicitly from
the kernel - as in putting the tracee into TASK_STOPPED or somehow
handling TASK_TRACED for jctl stop differently; however, I think such
approach is cumbersome in both concept and implementation. Instead of
being able to say "while ptraced, a tracee's execution is fully under
the control of its tracer", subtle and fragile exceptions need to be
introduced.

A better way to solve this is simply giving the tracer the capability
to listen for the end of jctl stop. That way, the problem is solved
in a manner which is consistent, may not be to everyone's liking but
nonetheless consistent, with the rest of ptrace. Execution state of
the tracee is always under the control of the tracer. The only thing
which changes is that the tracer now can find out when jctl stop ends,
which also could be an additional useful debugging feature.

It would be most fitting to use wait(2) for delivery of this
notification. WCONTINUED is the obvious candidate but I think it is
better to use STOPPED notification because the task is not really
resumed. Only its mode of stop changes. What state the tracee is in
can be determined by retriving siginfo using PTRACE_GETSIGINFO.

This also effectively makes the notification level-triggered instead
of edge-triggered, which is a big plus. No matter which state the
tracee is in, a jctl stopped notification is guaranteed to happen
after the lastest event and the tracer can always find out the latest
state with PTRACE_GETSIGINFO.

Using stopped notification also makes the new addition harmless to the
existing users. It's just another stopped notification. Both
strace(1) and gdb(1) don't distinguish the signal delivery and jctl
stop notifications and react the same way by resuming the tracee
unconditionally. One more stopped notification on SIGCONT emission
doesn't change much.

Of course, another way to add this is selectively enabling it when the
tracee was attached with PTRACE_SEIZE, but unless necessary, and given
that SIGCONT currently simply doesn't work while ptraced I think it's
unnecessary, it would be much better to avoid such implied subtle
behavior difference.


WAY FORWARD (yeah, I'm feeling some marketing vibe)
-----------

ptrace currently is in a pretty bad shape and I think one of the
biggest reasons is a lot of effort has been spent trying to come up
with something completely new instead of concentrating on improving
what's already there. I think the existing principles are pretty
sound. They just need some love and attention here and there.

I believe the proposed approach covers most of the raised issues in a
gradual and evolutionary manner. If I missed something, scream it to
me but let's _please_ concentrate on gradual improvements. What
someone would want if one could start from the scratch is interesting
but ultimately irrelevant. We have what we have and that's where we
build from. Like our eyes - the frigging wiring is in front of the
sensor array but still my pair have been working pretty well for me.

Once agreed upon, I think I'll be able to implement the proposed
changes in relatively short time, probably ready to be merged during
2.6.40-rc1. So, let's move on.

Thank you.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1093410
[2] http://sourceware.org/ml/archer/2011-q1/msg00026.html


2011-03-01 16:59:41

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Tuesday 01 March 2011 16:24, Tejun Heo wrote:
> PROPOSAL
> --------
> ...
> P3. Keep ptrace resume separate from and beneath jctl stop
>
> As written above, I think the current ptrace behavior, despite a lot
> of rough edges, is in the right direction in that ptrace operates
> beneath jctl. Therefore, keep the basic operation principles but
> clearly define how jctl and ptrace interacts, or rather, how they
> don't. The following two rules clearly separate jctl and ptrace.
>
> * jctl stop initiates when one of the stop signals is received and
> completes when all the member tasks participate in the group stop,
> where participation preciesly means that a member task stops in
> do_signal_stop(). Any member task can only participate once in any
> given group stop. ptrace does NOT make any difference in this
> regard.

This proposal adds a new rule for handling of stop notification
by debuggers. Let's spell it out, because currently strace
doesn't act according to this new rule:


"When waitpid indicates stop on a *stop* signal, then it may be either:
* a signal delivery (strace will inject this signal with PTRACE_SYSCALL(sig));
* or it may be a stop notification, in which case strace *must not*
try to inject this signal (this would be a bug, it'd make task running).
Instead, strace should just go back to waiting in waitpid().

These two possibilities can be distinquished by querying
PTRACE_GETSIGINFO. On stop notifications, PTRACE_GETSIGINFO
errors out - stop notification is not a signal delivery
and therefore it has no siginfo."


This is easy to implement (in strace at least).

> * However, PTRACE_DETACH should maintain the integrity of group stop.
> After a tracee is detached, it should be in a state which is
> conformant to the current jctl state. If jctl stop is in effect,
> the task should be put into TASK_STOPPED; otherwise, TASK_RUNNING.

This means that without changes to gdb, this:

# gdb -p pid_of_application_currently_in_jctl_stop
(gdb) print getpid()
(gdb) print show_me_your_internal_debug_dump()
(gdb) continue

will make application run, whereas this:

# gdb -p pid_of_application_currently_in_jctl_stop
(gdb) print getpid()
(gdb) print show_me_your_internal_debug_dump()
(gdb) quit

will leave application stopped. This looks a bit inconsistent to me.

Do you propose gdb to be chaged so that "continue" command
issues PTRACE_CONT only if gdb knows that application is not
in jctl stop?

--
vda

2011-03-01 17:09:59

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hello, Denys.

On Tue, Mar 01, 2011 at 05:57:48PM +0100, Denys Vlasenko wrote:
> > * jctl stop initiates when one of the stop signals is received and
> > completes when all the member tasks participate in the group stop,
> > where participation preciesly means that a member task stops in
> > do_signal_stop(). Any member task can only participate once in any
> > given group stop. ptrace does NOT make any difference in this
> > regard.
>
> This proposal adds a new rule for handling of stop notification
> by debuggers. Let's spell it out, because currently strace
> doesn't act according to this new rule:
>
> "When waitpid indicates stop on a *stop* signal, then it may be either:
> * a signal delivery (strace will inject this signal with PTRACE_SYSCALL(sig));
> * or it may be a stop notification, in which case strace *must not*
> try to inject this signal (this would be a bug, it'd make task running).
> Instead, strace should just go back to waiting in waitpid().
>
> These two possibilities can be distinquished by querying
> PTRACE_GETSIGINFO. On stop notifications, PTRACE_GETSIGINFO
> errors out - stop notification is not a signal delivery
> and therefore it has no siginfo."

Hmmm... but the above also holds for the current kernel too. That
hasn't really changed and the current broken behavior - unconditional
PTRACE_SYSCALL(sig) - will behave the same regardless of the proposed
changes.

The difference would be that if you implement the above on the current
kernel, there will be no way to tell when the job control stop ends,
so the current broken behavior seems to be justified.

> This is easy to implement (in strace at least).

Yeay!

> > * However, PTRACE_DETACH should maintain the integrity of group stop.
> > After a tracee is detached, it should be in a state which is
> > conformant to the current jctl state. If jctl stop is in effect,
> > the task should be put into TASK_STOPPED; otherwise, TASK_RUNNING.
>
> This means that without changes to gdb, this:
>
> # gdb -p pid_of_application_currently_in_jctl_stop
> (gdb) print getpid()
> (gdb) print show_me_your_internal_debug_dump()
> (gdb) continue
>
> will make application run, whereas this:
>
> # gdb -p pid_of_application_currently_in_jctl_stop
> (gdb) print getpid()
> (gdb) print show_me_your_internal_debug_dump()
> (gdb) quit
>
> will leave application stopped. This looks a bit inconsistent to me.
>
> Do you propose gdb to be chaged so that "continue" command
> issues PTRACE_CONT only if gdb knows that application is not
> in jctl stop?

gdb can do whatever it wants to do but I don't think the above needs
fixing. In the first case, the user is explicitly telling gdb to
continue the tracee, so it continues as it always has. In the latter
case, the debugging session is over. The tracee now should do
whatever it's supposed to do. I don't see any conflict there. In
fact, with the recent removal of the unditional extra
wake_up_process() from ptrace detach, you're already likely to see the
above behavior (it isn't deterministic tho).

Thanks.

--
tejun

2011-03-01 17:12:38

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Tue, Mar 01, 2011 at 06:09:53PM +0100, Tejun Heo wrote:
> > Do you propose gdb to be chaged so that "continue" command
> > issues PTRACE_CONT only if gdb knows that application is not
> > in jctl stop?
>
> gdb can do whatever it wants to do but I don't think the above needs
> fixing.

Oh, just in case, I'm not saying gdb shouldn't be improved. gdb can
definitely take advantage of better ptrace behaviors and do something
more useful. e.g. informing the user that the task is in group stop
and ask for confirmation for forced resume or something.

--
tejun

2011-03-01 17:22:58

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Tue, Mar 1, 2011 at 6:09 PM, Tejun Heo <[email protected]> wrote:
> Hello, Denys.
>
> On Tue, Mar 01, 2011 at 05:57:48PM +0100, Denys Vlasenko wrote:
>> > * jctl stop initiates when one of the stop signals is received and
>> > ? completes when all the member tasks participate in the group stop,
>> > ? where participation preciesly means that a member task stops in
>> > ? do_signal_stop(). ?Any member task can only participate once in any
>> > ? given group stop. ?ptrace does NOT make any difference in this
>> > ? regard.
>>
>> This proposal adds a new rule for handling of stop notification
>> by debuggers. Let's spell it out, because currently strace
>> doesn't act according to this new rule:
>>
>> "When waitpid indicates stop on a *stop* signal, then it may be either:
>> * a signal delivery (strace will inject this signal with PTRACE_SYSCALL(sig));
>> * or it may be a stop notification, in which case strace *must not*
>> ? try to inject this signal (this would be a bug, it'd make task running).
>> ? Instead, strace should just go back to waiting in waitpid().
>>
>> These two possibilities can be distinquished by querying
>> PTRACE_GETSIGINFO. On stop notifications, PTRACE_GETSIGINFO
>> errors out - stop notification is not a signal delivery
>> and therefore it has no siginfo."
>
> Hmmm... but the above also holds for the current kernel too. ?That
> hasn't really changed and the current broken behavior - unconditional
> PTRACE_SYSCALL(sig) - will behave the same regardless of the proposed
> changes.

There is a different proposal under which current strace behavior
would be a correct one.

I'm not saying that I like that other proposal more -
I am saying that your proposal needs to specify whether strace
needs to be changed, and how exactly, to correctly handle
SIGSTOPs under this proposal.


>> > * However, PTRACE_DETACH should maintain the integrity of group stop.
>> > ? After a tracee is detached, it should be in a state which is
>> > ? conformant to the current jctl state. ?If jctl stop is in effect,
>> > ? the task should be put into TASK_STOPPED; otherwise, TASK_RUNNING.
>>
>> This means that without changes to gdb, this:
>>
>> # gdb -p pid_of_application_currently_in_jctl_stop
>> (gdb) print getpid()
>> (gdb) print show_me_your_internal_debug_dump()
>> (gdb) continue
>>
>> will make application run, whereas this:
>>
>> # gdb -p pid_of_application_currently_in_jctl_stop
>> (gdb) print getpid()
>> (gdb) print show_me_your_internal_debug_dump()
>> (gdb) quit
>>
>> will leave application stopped. This looks a bit inconsistent to me.
>>
>> Do you propose gdb to be chaged so that "continue" command
>> issues PTRACE_CONT only if gdb knows that application is not
>> in jctl stop?
>
> gdb can do whatever it wants to do but I don't think the above needs
> fixing. ?In the first case, the user is explicitly telling gdb to
> continue the tracee, so it continues as it always has.

It does not look like that to me.

User attached to some process. User might be unaware that
the process is currently stopped (imagine a group of processes
which use SIGSTOP/SIGCONT in their normal interactions).

User peeked some state, and then wants to let process
continue whatever process was doing, but remain in the debugger.

What user did not know is that "whatever process was doing" =
"being stopped by SIGSTOP, waiting to be woken up".
Therefore, if "continue" makes process run, it does not
return process to whatever process was doing.

> In the latter
> case, the debugging session is over. ?The tracee now should do
> whatever it's supposed to do.

It should do that in both cases.

--
vda

2011-03-01 18:34:58

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Tue, Mar 01, 2011 at 06:21:49PM +0100, Denys Vlasenko wrote:
> I'm not saying that I like that other proposal more -
> I am saying that your proposal needs to specify whether strace
> needs to be changed, and how exactly, to correctly handle
> SIGSTOPs under this proposal.

Yeah, definitely.

> > gdb can do whatever it wants to do but I don't think the above needs
> > fixing. ?In the first case, the user is explicitly telling gdb to
> > continue the tracee, so it continues as it always has.
>
> It does not look like that to me.
>
> User attached to some process. User might be unaware that
> the process is currently stopped (imagine a group of processes
> which use SIGSTOP/SIGCONT in their normal interactions).
>
> User peeked some state, and then wants to let process
> continue whatever process was doing, but remain in the debugger.
>
> What user did not know is that "whatever process was doing" =
> "being stopped by SIGSTOP, waiting to be woken up".
> Therefore, if "continue" makes process run, it does not
> return process to whatever process was doing.
>
> > In the latter
> > case, the debugging session is over. ?The tracee now should do
> > whatever it's supposed to do.
>
> It should do that in both cases.

Maybe it should, maybe not, but that's mostly irrelevant because the
described behavior is the current behavior. There is no
continue-if-not-job-control-stopped operation and we shouldn't change
that beneath gdb because otherwise not only the behavior changes
unexpectedly but also the user doesn't have a way to resume the tracee
from within gdb. The user has to go to another terminal and send
SIGCONT explicitly. If gdb wants to improve the behavior, it sure can
implement proper job control behavior using the proposed changes.

Thanks.

--
tejun

2011-03-01 19:07:03

by Jan Kratochvil

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Tue, 01 Mar 2011 16:24:57 +0100, Tejun Heo wrote:
> P4. PTRACE_SEIZE
[...]
> Completion notification is delivered in the usual way via wait(2). If
> the task was in jctl stop, it would report the stop signal with the
> matching siginfo. If the task hits an existing ptrace trap condition,
> the matching SIGTRAP will be reported; otherwise, SIGTRAP will be
> reported with siginfo indicating PTRACE_SEIZE trap.

This is a change from the Roland's proposed PTRACE_ATTACH_NOSTOP.
Currently it is already a problem that apps did not / do not expect the first
waitpid after PTRACE_ATTACH may not be SIGSTOP. And in the racy case of
a pending signal during PTRACE_ATTACH which gets delivered by first waidpid
the tracer either crashes or loses the signal etc. (It was a problem for
older GDB, it is fixed in recent GDBs, there exist other ptrace using tools.)

And after all it gets complicated to call PTRACE_ATTACH and then postpone the
reception of SIGSTOP while other signals are being delivered first.


> WAY FORWARD (yeah, I'm feeling some marketing vibe)
> -----------
[...]
> What someone would want if one could start from the scratch is interesting
> but ultimately irrelevant.

The ugdb project provides unified debugging facility for both local and remote
debugging. With the ptrace extensions a network protocol server needs to be
developed/used anyway and it will be just a thin userland layer mapping
read/write calls to the ptrace calls on the server side in an ineffective way.

OTOH one can look at the similar case of TUX/khttpd (also) has never been
merged into the kernel.


Regards,
Jan

2011-03-01 22:14:36

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Tue, Mar 1, 2011 at 8:06 PM, Jan Kratochvil
<[email protected]> wrote:
> On Tue, 01 Mar 2011 16:24:57 +0100, Tejun Heo wrote:
>> P4. PTRACE_SEIZE
> [...]
>> Completion notification is delivered in the usual way via wait(2). ?If
>> the task was in jctl stop, it would report the stop signal with the
>> matching siginfo. ?If the task hits an existing ptrace trap condition,
>> the matching SIGTRAP will be reported; otherwise, SIGTRAP will be
>> reported with siginfo indicating PTRACE_SEIZE trap.
>
> This is a change from the Roland's proposed PTRACE_ATTACH_NOSTOP.

As I see it, Tejun's PTRACE_SEIZE looks almost identical to the sequence of
PTRACE_ATTACH_NOSTOP ("attach, but don't affect current state of the
tracee. In particular, if it runs normally, let it continue to run")
plus
PTRACE_INTERRUPT ("if tracee isn't stopped yet, stop it with magic SIGTRAP").

There may be reasons to have PTRACE_SEIZE operation split like that.
For one, this allows debugger to do PTRACE_CONT, and later issue
PTRACE_INTERRUPT to stop the tracee again. PTRACE_INTERRUPT stop may
be better for some scenarios where debugger wants to make the stop
invisible to the parent, or when debugger wants to stop just one
thread of the process.


> Currently it is already a problem that apps did not / do not expect the first
> waitpid after PTRACE_ATTACH may not be SIGSTOP.

That's exactly why we want to add a better alternative, which doesn't
insert that blasted SIGSTOP.

--
vda

2011-03-01 22:59:24

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Tue, Mar 1, 2011 at 4:24 PM, Tejun Heo <[email protected]> wrote:
> PROPOSAL
> --------
>
> P1. Always TASK_TRACED while ptraced
> P2. Fix notifications to the real parent
> P3. Keep ptrace resume separate from and beneath jctl stop
> P4. PTRACE_SEIZE
> P5. "^Z" and "fg" for tracees

Since this plan might get transformed into a file in Documentation/,
I would like to chime in a few less important things we might want
to clarify while we are at it.



P6. Make PTRACE_GETSIGINFO useful for determining the type of ptrace stop.

PTRACE_GETSIGINFO on real signal should return associated struct siginfo.

PTRACE_GETSIGINFO on job control stops and process exits (WEXITED /
WSIGNALED) should error out.

What does PTRACE_GETSIGINFO return on various ptrace-generated
SIGTRAPs and SIGSTOPs?
Error? If not, what value is returned in siginfo->si_code?

We have the following kinds of SIGTRAPs:

* real SIGTRAP signal sent by breakpoint instruction
* real SIGTRAP signal sent by other process
* entering syscall (after PTRACE_SYSCALL)
* exiting syscall (after PTRACE_SYSCALL)
* after single step (after PTRACE_SINGLESTEP)
* after execve (if PTRACE_O_TRACEEXEC opt is not on)
* after execve (if PTRACE_O_TRACEEXEC opt is on)
* in parent after fork (if PTRACE_O_TRACEFORK opt is on)
* in parent after vfork (if PTRACE_O_TRACEVFORK opt is on)
* in parent after clone (if PTRACE_O_TRACECLONE opt is on)
* in child after vfork (if PTRACE_O_TRACEVFORKDONE opt is on)
* before exit (if PTRACE_O_TRACEEXIT opt is on)

Currently strace has to keep precise track on the alternating sequence
of syscall enter/syscall exit stops. Which gets even trickier
with extra magic SIGTRAP thrown in by execve and such.
There were (and I suspect will be) hard to debug bugs when strace
was getting out-of sync and printing garbage.

Defining the PTRACE_GETSIGINFO's si_code so that each of these stops
can be easily distinguished would be useful. I propose using values
of SI_KERNEL + 1, SI_KERNEL + 2 etc, suitably #defined of course.


We also have magic SIGSTOPs (magic in a sense they aren't
real signals sent by other processes):
* at PTRACE_ATTACH
* in child (if PTRACE_O_TRACE[V]FORK or PTRACE_O_TRACECLONE opt is on)

For example, flagging PTRACE_ATTACH SIGSTOP so that it can be
uniquely identified would solve some problems gdb is having with it.

I propose to similarly do it by using unique si_code for each.

--
vda

2011-03-01 23:16:47

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Tue, Mar 1, 2011 at 4:24 PM, Tejun Heo <[email protected]> wrote:
> PROPOSAL
> --------
> ...
> P5. "^Z" and "fg" for tracees
>
> A ptracer, as it currently stands and proposed here, has full control
> over the execution state of its tracee. ?The tracer is notified
> whenever the tracee stops and can always resume its execution;
> however, there is one missing piece.
>
> As proposed, when a tracee enters jctl stop, it enters TASK_TRACED
> from which emission of SIGCONT can't resume the tracee. ?This makes it
> impossible for a tracer to become transparent with respect to jctl.
> For example, after strace(1) is attached to a task, the task can be
> ^Z'd but then can't be fg'd.
>
> One approach to this problem is somehow making it work implicitly from
> the kernel - as in putting the tracee into TASK_STOPPED or somehow
> handling TASK_TRACED for jctl stop differently; however, I think such
> approach is cumbersome in both concept and implementation. ?Instead of
> being able to say "while ptraced, a tracee's execution is fully under
> the control of its tracer", subtle and fragile exceptions need to be
> introduced.
>
> A better way to solve this is simply giving the tracer the capability
> to listen for the end of jctl stop. ?That way, the problem is solved
> in a manner which is consistent, may not be to everyone's liking but
> nonetheless consistent, with the rest of ptrace. ?Execution state of
> the tracee is always under the control of the tracer. ?The only thing
> which changes is that the tracer now can find out when jctl stop ends,
> which also could be an additional useful debugging feature.
>
> It would be most fitting to use wait(2) for delivery of this
> notification. ?WCONTINUED is the obvious candidate but I think it is
> better to use STOPPED notification because the task is not really
> resumed. ?Only its mode of stop changes. ?What state the tracee is in
> can be determined by retriving siginfo using PTRACE_GETSIGINFO.
>
> This also effectively makes the notification level-triggered instead
> of edge-triggered, which is a big plus. ?No matter which state the
> tracee is in, a jctl stopped notification is guaranteed to happen
> after the lastest event and the tracer can always find out the latest
> state with PTRACE_GETSIGINFO.
>
> Using stopped notification also makes the new addition harmless to the
> existing users. ?It's just another stopped notification. ?Both
> strace(1) and gdb(1) don't distinguish the signal delivery and jctl
> stop notifications and react the same way by resuming the tracee
> unconditionally. ?One more stopped notification on SIGCONT emission
> doesn't change much.

Let's spell this out in detail. Please correct me if
I misunderstood your proposal:

We have a stopped task under ptrace.
(More precisely: debugger got a WSTOPPED notification via waitpid.
Debugger decided to emulate the job control stop, therefore it
keeps tracee stopped, therefore it just waits on waitpid
without doing any PTRACE_CONTs).

Another task sends SIGCONT to the tracee.

Debugger gets waitpid notification of the
WSTOPPED, WSTOPSIG == SIGCONT form.

Debugger can check PTRACE_GETSIGINFO, which succeeds.
Debugger now knows it's a signal delivery notification.
(This step looks optional, since currently
WSTOPPED, WSTOPSIG == SIGCONT combination is only possible
on signal delivery, unlike, for example,
WSTOPPED, WSTOPSIG == SIGSTOP, which is ambiguous).

Debugger performs PTRACE_CONT(SIGCONT) - it injects the signal.
[Question: what if debugger doesn't? IOW: is it possible
for debugger to suppress SIGCONTs, or not?
IOW2: what should happen if debugger
(a) does not do any PTRACE_CONT at all? or
(b) does PTRACE_CONT(<other_sig>)? or
(c) does PTRACE_CONT(0)?
]

Debugger gets WCONTINUED waitpid notification.
[question: do we need this?]


--
vda

2011-03-01 23:51:46

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Tue, Mar 1, 2011 at 7:34 PM, Tejun Heo <[email protected]> wrote:
> On Tue, Mar 01, 2011 at 06:21:49PM +0100, Denys Vlasenko wrote:
>> > gdb can do whatever it wants to do but I don't think the above needs
>> > fixing. ?In the first case, the user is explicitly telling gdb to
>> > continue the tracee, so it continues as it always has.
>>
>> It does not look like that to me.
>>
>> User attached to some process. User might be unaware that
>> the process is currently stopped (imagine a group of processes
>> which use SIGSTOP/SIGCONT in their normal interactions).
>>
>> User peeked some state, and then wants to let process
>> continue whatever process was doing, but remain in the debugger.
>>
>> What user did not know is that "whatever process was doing" =
>> "being stopped by SIGSTOP, waiting to be woken up".
>> Therefore, if "continue" makes process run, it does not
>> return process to whatever process was doing.
>>
>> > In the latter
>> > case, the debugging session is over. ?The tracee now should do
>> > whatever it's supposed to do.
>>
>> It should do that in both cases.
>
> Maybe it should, maybe not, but that's mostly irrelevant because the
> described behavior is the current behavior.

This is not a good argument. We are in this discussion exactly because
there are cases of current behavior in strace and gdb which are clearly
wrong and which we want to change. Therefore, "it's the current
behavior" does not automatically mean it is desired behavior.

>?There is no
> continue-if-not-job-control-stopped operation and we shouldn't change
> that beneath gdb

I feel that "continue" is meant to be such operation. Currently
it is not merely because ptrace is buggy.

There is already continue-no-matter-what command in gdb:

(gdb) signal SIGCONT

> because otherwise not only the behavior changes
> unexpectedly

"strace no longer breaks ^Z" is also an unexpected
change in behavior. This doesn't mean we should avoid it,
right?

Bottom line is: I am not trying to shoot your proposal down.
It looks good to me.

I am only discussing it in more detail from the userspace API
POV and from "what changes will be needed in strace and gdb?"
and "what improvements in strace and gdb are becoming possible
with this proposal, and how exactly to implement them there?"
POVs.

--
vda

2011-03-02 05:07:46

by Indan Zupancic

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hello,

On Tue, March 1, 2011 19:34, Tejun Heo wrote:
> Maybe it should, maybe not, but that's mostly irrelevant because the
> described behavior is the current behavior. There is no
> continue-if-not-job-control-stopped operation and we shouldn't change
> that beneath gdb because otherwise not only the behavior changes
> unexpectedly but also the user doesn't have a way to resume the tracee
> from within gdb. The user has to go to another terminal and send
> SIGCONT explicitly. If gdb wants to improve the behavior, it sure can
> implement proper job control behavior using the proposed changes.

I'm not sure what Denys is talking about: Currently it's impossible to
pass along SIGSTOP to traced processes. Quoting the ptrace manpage:

PTRACE_CONT
Restarts the stopped child process. If data is nonzero and not
SIGSTOP, it is interpreted as a signal to be delivered to the
child; otherwise, no signal is delivered.

So you can pass along signals, but it's ignored if it happens to be
SIGSTOP. If you emulate SIGSTOP signals in the tracer by not calling
PTRACE_CONT, SIGCONT on the task won't work because the task can't
receive the signal until it's continued by the tracer, but the tracer
doesn't get a new notification until it resumes the task.

As for distinguishing STOP signals from stopped childs: Just don't set
the WUNTRACED flag in the tracer for waitpid.

If you want SIGSTOP to work on a traced task, then you have to add a
new PTRACE_SETOPTIONS option which enables the changed behaviour.
E.g. PTRACE_O_SIGSTOP which does pass all signals, even SIGSTOP for
PTRACE_CONT/SYSCALL/etc. Without something like this I don't see how
you can make job control for traced tasks work.

To me it seems clear that job ctl state should be managed independently
of ptrace stopped state. I'm not sure how that fits in with your
proposed changes, but my impression is that you make everything a lot
simpler by separating traced stopping/continuing from SIGSTOP/SIGCONT
job control. It's just not the same. A task stopped by a trace event
shouldn't generate a STOP signal for it's parent, only for real SIGSTOPS.

Greetings,

Indan

2011-03-02 07:10:59

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hello, Denys.

On Wed, Mar 02, 2011 at 12:51:24AM +0100, Denys Vlasenko wrote:
> > Maybe it should, maybe not, but that's mostly irrelevant because the
> > described behavior is the current behavior.
>
> This is not a good argument. We are in this discussion exactly because
> there are cases of current behavior in strace and gdb which are clearly
> wrong and which we want to change. Therefore, "it's the current
> behavior" does not automatically mean it is desired behavior.

That actually is the whole point of the proposal and the only viable
way to proceed. We shouldn't change the kernel so that gdb somehow
magically changes its behavior to fit someone's "should"s. We fix the
bugs and provide new features which future versions of strace and gdb
can take advantage of to implement better behavior.

So, _NO_, we don't want to make things work magically.

> >?There is no
> > continue-if-not-job-control-stopped operation and we shouldn't change
> > that beneath gdb
>
> I feel that "continue" is meant to be such operation. Currently
> it is not merely because ptrace is buggy.

Again, that's _YOUR_ feeling. I feel differently and what you or I
feel ultimately is irrelevant because that's a very user-visible
behavior. We CAN'T change that. What we can do is providing
mechanisms to userland so that they can implement what they think is
appropriate.

If gdb maintainers think cont should obey job control, fine, but
that's gdb's decision to make. e.g. gdb can be updated to inform the
user that the tracee entered job control stop instead and ask what the
user wants to do - wait, resume the tracee or send SIGCONT, but these
decisions don't belong in the kernel and we definitely can't and
shouldn't make that happen beneath gdb.

> There is already continue-no-matter-what command in gdb:
>
> (gdb) signal SIGCONT

Whatever, it doesn't matter. That's not the current mode of operation
people are used to. People expect the tracee to resume execution no
matter which condition it was in when they type cont and press enter.
That's it.

> > because otherwise not only the behavior changes
> > unexpectedly
>
> "strace no longer breaks ^Z" is also an unexpected
> change in behavior. This doesn't mean we should avoid it,
> right?

The proposal doen't change that either. It _PROVIDES_ ways to
implement better behavior. It doesn't magically change the existing
behaviors. That's the whole frigging point.

> Bottom line is: I am not trying to shoot your proposal down.
> It looks good to me.
>
> I am only discussing it in more detail from the userspace API
> POV and from "what changes will be needed in strace and gdb?"
> and "what improvements in strace and gdb are becoming possible
> with this proposal, and how exactly to implement them there?"
> POVs.

If you were arguing "what change will be needed in userland", we
wouldn't be arguing at all. You're arguing about changing existing
behavior beneath the current users. If you're trying to do the
former, I gotta pull a SJ here, you're doing it wrong.

Thanks.

--
tejun

2011-03-02 07:28:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hello,

On Tue, Mar 01, 2011 at 11:14:14PM +0100, Denys Vlasenko wrote:
> There may be reasons to have PTRACE_SEIZE operation split like that.
> For one, this allows debugger to do PTRACE_CONT, and later issue
> PTRACE_INTERRUPT to stop the tracee again. PTRACE_INTERRUPT stop may
> be better for some scenarios where debugger wants to make the stop
> invisible to the parent, or when debugger wants to stop just one
> thread of the process.

PTRACE_SEIZE can be used like PTRACE_INTERRUPT. It works whether the
tracee is attached or not.

Thanks.

--
tejun

2011-03-02 07:32:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hey,

On Tue, Mar 01, 2011 at 11:59:02PM +0100, Denys Vlasenko wrote:
> Currently strace has to keep precise track on the alternating sequence
> of syscall enter/syscall exit stops. Which gets even trickier
> with extra magic SIGTRAP thrown in by execve and such.

I see. Yeah, it would be a good idea to make sure each trap condition
can be uniquely identified by siginfo and this should definitely be
documented in the man page.

> There were (and I suspect will be) hard to debug bugs when strace
> was getting out-of sync and printing garbage.
>
> Defining the PTRACE_GETSIGINFO's si_code so that each of these stops
> can be easily distinguished would be useful. I propose using values
> of SI_KERNEL + 1, SI_KERNEL + 2 etc, suitably #defined of course.
>
>
> We also have magic SIGSTOPs (magic in a sense they aren't
> real signals sent by other processes):
> * at PTRACE_ATTACH
> * in child (if PTRACE_O_TRACE[V]FORK or PTRACE_O_TRACECLONE opt is on)
>
> For example, flagging PTRACE_ATTACH SIGSTOP so that it can be
> uniquely identified would solve some problems gdb is having with it.

This, I don't agree with. All we need is a better attach call without
the implied SIGSTOP, there's no reason to diddle with PTRACE_ATTACH
further.

Thanks.

--
tejun

2011-03-02 07:37:33

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, Mar 02, 2011 at 12:16:23AM +0100, Denys Vlasenko wrote:
> Let's spell this out in detail. Please correct me if
> I misunderstood your proposal:
>
> We have a stopped task under ptrace.
> (More precisely: debugger got a WSTOPPED notification via waitpid.
> Debugger decided to emulate the job control stop, therefore it
> keeps tracee stopped, therefore it just waits on waitpid
> without doing any PTRACE_CONTs).
>
> Another task sends SIGCONT to the tracee.
>
> Debugger gets waitpid notification of the
> WSTOPPED, WSTOPSIG == SIGCONT form.

I think WSTOPSIG should be SIGTRAP as the tracee left group stop and
entered ptrace trap.

> Debugger can check PTRACE_GETSIGINFO, which succeeds.
> Debugger now knows it's a signal delivery notification.

No, it's not a signal delivery notification. It's a ptrace trap
notification. SIGCONT may not be delivered to this task. Please
remember that it's the emission of SIGCONT which ends a group stop,
not delivery.

> (This step looks optional, since currently
> WSTOPPED, WSTOPSIG == SIGCONT combination is only possible
> on signal delivery, unlike, for example,
> WSTOPPED, WSTOPSIG == SIGSTOP, which is ambiguous).
>
> Debugger performs PTRACE_CONT(SIGCONT) - it injects the signal.
> [Question: what if debugger doesn't? IOW: is it possible
> for debugger to suppress SIGCONTs, or not?

SIGCONT shouldn't be used here and wouldn't make any difference.
We're not in signal delivery path.

> IOW2: what should happen if debugger
> (a) does not do any PTRACE_CONT at all? or

The tracee stays stopped.

> (b) does PTRACE_CONT(<other_sig>)? or
> (c) does PTRACE_CONT(0)?

See above.

> Debugger gets WCONTINUED waitpid notification.
> [question: do we need this?]

I don't think we need this. The tracer needs all the stopped
notifications but it doesn't need the continued notification because a
tracee is never continued without the tracer saying so.

Thanks.

--
tejun

2011-03-02 07:44:53

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, Mar 02, 2011 at 06:07:35AM +0100, Indan Zupancic wrote:
> I'm not sure what Denys is talking about: Currently it's impossible to
> pass along SIGSTOP to traced processes. Quoting the ptrace manpage:
>
> PTRACE_CONT
> Restarts the stopped child process. If data is nonzero and not
> SIGSTOP, it is interpreted as a signal to be delivered to the
> child; otherwise, no signal is delivered.

AFAICS, that's not true. SIGSTOP isn't treated differently from other
signals in the ptrace signal delivery path. Maybe it was true in the
past.

> As for distinguishing STOP signals from stopped childs: Just don't set
> the WUNTRACED flag in the tracer for waitpid.

I'm not following. Can you please elaborate?

> To me it seems clear that job ctl state should be managed independently
> of ptrace stopped state. I'm not sure how that fits in with your
> proposed changes, but my impression is that you make everything a lot
> simpler by separating traced stopping/continuing from SIGSTOP/SIGCONT
> job control. It's just not the same. A task stopped by a trace event
> shouldn't generate a STOP signal for it's parent, only for real SIGSTOPS.

Again, not following. In the proposal, job control and ptrace operate
independently, so on that we seem to agree, but I can't understand
where the STOP signal for the parent comes from? What are you
referring to?

Thanks.

--
tejun

2011-03-02 10:58:38

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, Mar 2, 2011 at 8:28 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Tue, Mar 01, 2011 at 11:14:14PM +0100, Denys Vlasenko wrote:
>> There may be reasons to have PTRACE_SEIZE operation split like that.
>> For one, this allows debugger to do PTRACE_CONT, and later issue
>> PTRACE_INTERRUPT to stop the tracee again. PTRACE_INTERRUPT stop may
>> be better for some scenarios where debugger wants to make the stop
>> invisible to the parent, or when debugger wants to stop just one
>> thread of the process.
>
> PTRACE_SEIZE can be used like PTRACE_INTERRUPT. ?It works whether the
> tracee is attached or not.

Makes sense.

--
vda

2011-03-02 11:07:32

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, Mar 2, 2011 at 8:32 AM, Tejun Heo <[email protected]> wrote:
> On Tue, Mar 01, 2011 at 11:59:02PM +0100, Denys Vlasenko wrote:
>> We also have magic SIGSTOPs (magic in a sense they aren't
>> real signals sent by other processes):
>> * at PTRACE_ATTACH
>> * in child (if PTRACE_O_TRACE[V]FORK or PTRACE_O_TRACECLONE opt is on)
>>
>> For example, flagging PTRACE_ATTACH SIGSTOP so that it can be
>> uniquely identified would solve some problems gdb is having with it.
>
> This, I don't agree with. ?All we need is a better attach call without
> the implied SIGSTOP, there's no reason to diddle with PTRACE_ATTACH
> further.

Sure.

What do you think about SIGSTOP generated in in children on auto-attach
via PTRACE_O_TRACE[V]FORK / PTRACE_O_TRACECLONE options?

IMHO, it would be good if we'd have a way to distinguish them from
real SIGSTOP signals.

--
vda

2011-03-02 11:21:50

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, Mar 2, 2011 at 8:37 AM, Tejun Heo <[email protected]> wrote:
> On Wed, Mar 02, 2011 at 12:16:23AM +0100, Denys Vlasenko wrote:
>> Let's spell this out in detail. Please correct me if
>> I misunderstood your proposal:
>>
>> We have a stopped task under ptrace.
>> (More precisely: debugger got a WSTOPPED notification via waitpid.
>> Debugger decided to emulate the job control stop, therefore it
>> keeps tracee stopped, therefore it just waits on waitpid
>> without doing any PTRACE_CONTs).
>>
>> Another task sends SIGCONT to the tracee.
>>
>> Debugger gets waitpid notification of the
>> WSTOPPED, WSTOPSIG == SIGCONT form.
>
> I think WSTOPSIG should be SIGTRAP as the tracee left group stop and
> entered ptrace trap.

This would be, by my count, 13th kind of SIGTRAP use by ptrace.
Which makes multi-level if's in debuggers even more complex
and more error-prone.

Why not SIGCONT? This event is, after all, caused by SIGCONT.
It would be so much nicer to be able to detect it with single if()
in the debugger...


>> Debugger can check PTRACE_GETSIGINFO, which succeeds.
>> Debugger now knows it's a signal delivery notification.
>
> No, it's not a signal delivery notification. ?It's a ptrace trap
> notification. ?SIGCONT may not be delivered to this task. ?Please
> remember that it's the emission of SIGCONT which ends a group stop,
> not delivery.

>From userspace POV it's really a kernel's implementation detail.


>> Debugger performs PTRACE_CONT(SIGCONT) - it injects the signal.
>> [Question: what if debugger doesn't? IOW: is it possible
>> for debugger to suppress SIGCONTs, or not?
>
> SIGCONT shouldn't be used here and wouldn't make any difference.
> We're not in signal delivery path.
>
>> IOW2: what should happen if debugger
>> (a) does not do any PTRACE_CONT at all? or
>
> The tracee stays stopped.
>
>> (b) does PTRACE_CONT(<other_sig>)? or
>> (c) does PTRACE_CONT(0)?
>
> See above.

This means that SIGCONT handler will be executed in the tracee
after debugger does PTRACE_CONT(<any_signo>) at this point.

Which makes SIGCONT special: debugger can suppress execution
of other signal handlers in tracee, but not SIGCONT handler.
Another special case. Can we avoid having it?


>> Debugger gets WCONTINUED waitpid notification.
>> [question: do we need this?]
>
> I don't think we need this. ?The tracer needs all the stopped
> notifications but it doesn't need the continued notification because a
> tracee is never continued without the tracer saying so.

Yes, I think it's ok.

--
vda

2011-03-02 11:23:13

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hey,

On Wed, Mar 02, 2011 at 12:02:37PM +0100, Denys Vlasenko wrote:
> What do you think about SIGSTOP generated in in children on auto-attach
> via PTRACE_O_TRACE[V]FORK / PTRACE_O_TRACECLONE options?

Ah, you're right. I actually haven't been thinking about them.

> IMHO, it would be good if we'd have a way to distinguish them from
> real SIGSTOP signals.

Yeah, probably.

Thanks.

--
tejun

2011-03-02 11:28:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, Mar 02, 2011 at 12:21:28PM +0100, Denys Vlasenko wrote:
> > I think WSTOPSIG should be SIGTRAP as the tracee left group stop and
> > entered ptrace trap.
>
> This would be, by my count, 13th kind of SIGTRAP use by ptrace.
> Which makes multi-level if's in debuggers even more complex
> and more error-prone.

Of course, all ptrace traps are SIGTRAPs.

> Why not SIGCONT? This event is, after all, caused by SIGCONT.
> It would be so much nicer to be able to detect it with single if()
> in the debugger...

I disagree. It's a ptrace trap. It should use SIGTRAP. We just need
well defined siginfo output to distinguish between them. It's not
like we can avoid siginfo anyway.

> > No, it's not a signal delivery notification. ?It's a ptrace trap
> > notification. ?SIGCONT may not be delivered to this task. ?Please
> > remember that it's the emission of SIGCONT which ends a group stop,
> > not delivery.
>
> From userspace POV it's really a kernel's implementation detail.

Not really. This is actually a visible difference. Roland wrote in
the previous discussion. One visible difference is that ptrace can
veto job control stop but it can't veto the end of job control. Job
control actions happen before SIGCONT hits the signal delivery path
which is visible through ptrace.

> >> (b) does PTRACE_CONT(<other_sig>)? or
> >> (c) does PTRACE_CONT(0)?
> >
> > See above.
>
> This means that SIGCONT handler will be executed in the tracee
> after debugger does PTRACE_CONT(<any_signo>) at this point.
>
> Which makes SIGCONT special: debugger can suppress execution
> of other signal handlers in tracee, but not SIGCONT handler.
> Another special case. Can we avoid having it?

Hmmm.... I think you're confused about how SIGCONT is handled or maybe
I am. Either way, please elaborate. I can't really follow.

Thanks.

--
tejun

2011-03-02 11:32:41

by Indan Zupancic

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, March 2, 2011 08:44, Tejun Heo wrote:
> On Wed, Mar 02, 2011 at 06:07:35AM +0100, Indan Zupancic wrote:
>> I'm not sure what Denys is talking about: Currently it's impossible to
>> pass along SIGSTOP to traced processes. Quoting the ptrace manpage:
>>
>> PTRACE_CONT
>> Restarts the stopped child process. If data is nonzero and not
>> SIGSTOP, it is interpreted as a signal to be delivered to the
>> child; otherwise, no signal is delivered.
>
> AFAICS, that's not true. SIGSTOP isn't treated differently from other
> signals in the ptrace signal delivery path. Maybe it was true in the
> past.

Well, I can't find it in the code either, but it's probably a side-effect
of how ptrace is currently implemented. Test program code below, see for
yourself. I hope it's a program bug, but perhaps it's a kernel bug, as I
seem to get two SIGSTOP events when I allow the SIGSTOP, but only one when
denying it. In both cases the traced process isn't actually stopped.

What might happen is that, because the current code handles the traced
task as stopped, the new SIGSTOP signal is first added and then cleared
when the task continues. This doesn't explain the double SIGSTOP
notifications, I'd expect it to either loop indefinitely or to not
notify the SIGSTOP twice.

>> As for distinguishing STOP signals from stopped childs: Just don't set
>> the WUNTRACED flag in the tracer for waitpid.
>
> I'm not following. Can you please elaborate?

Sorry, should have quoted Denys:

This proposal adds a new rule for handling of stop notification
by debuggers. Let's spell it out, because currently strace
doesn't act according to this new rule:


"When waitpid indicates stop on a *stop* signal, then it may be either:
* a signal delivery (strace will inject this signal with PTRACE_SYSCALL(sig));
* or it may be a stop notification, in which case strace *must not*
try to inject this signal (this would be a bug, it'd make task running).
Instead, strace should just go back to waiting in waitpid().

These two possibilities can be distinquished by querying
PTRACE_GETSIGINFO. On stop notifications, PTRACE_GETSIGINFO
errors out - stop notification is not a signal delivery
and therefore it has no siginfo."

End quote.

You don't get the second case when not setting the WUNTRACED flag.

>> To me it seems clear that job ctl state should be managed independently
>> of ptrace stopped state. I'm not sure how that fits in with your
>> proposed changes, but my impression is that you make everything a lot
>> simpler by separating traced stopping/continuing from SIGSTOP/SIGCONT
>> job control. It's just not the same. A task stopped by a trace event
>> shouldn't generate a STOP signal for it's parent, only for real SIGSTOPS.
>
> Again, not following. In the proposal, job control and ptrace operate
> independently, so on that we seem to agree, but I can't understand
> where the STOP signal for the parent comes from? What are you
> referring to?

What I mean is, if you have a parent P with a child C, and C is ptraced by T,
P shouldn't get SIGSTOP notifications when it waits for C with WUNTRACED set
and C is stopped because of a ptrace event.

Greetings,

Indan

---

Test program:

#include <errno.h>
#include <sched.h> /* clone */
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/stat.h>

typedef enum event_type
{
EVENT_NONE,
EVENT_INTR,
EVENT_SIG,
EVENT_EXIT,
} event_t;

struct event
{
event_t type;
int pid;
int num;
};

int trace_fork(void);
event_t trace_event(struct event* e);
void event_allow(struct event* e);
void event_deny(struct event* e);
static void event_end(struct event* e, int deny);

int trace_fork(void)
{
int pid;
int ret;

fflush(NULL);
pid = fork();

/* child */
if (pid == 0){
ret = ptrace(PTRACE_TRACEME, 0, NULL, NULL);
if (ret)
perror("ptrace");
return 0;
}
/* parent process */
return pid;
}

static void event_end(struct event* e, int deny)
{
int ret;
event_t type = e->type;
long sig = 0;

switch (type) {
case EVENT_NONE:
case EVENT_INTR:
case EVENT_EXIT:
return;
case EVENT_SIG:
if (!deny)
sig = e->num;
break;
}
/* continue the stopped child, sending signal 'sig' */
ret = ptrace(PTRACE_SYSCALL, e->pid, NULL, (void*)sig);
if (ret)
perror("event_end");
}

void event_allow(struct event* e)
{
event_end(e, 0);
}

void event_deny(struct event* e)
{
event_end(e, 1);
}


event_t trace_event(struct event* e)
{
long pid;
int status, sig;
retry:
memset(e, 0, sizeof(*e));
// pid = waitpid(-1, &status, __WALL);
pid = wait(&status);
if (pid == -1){
if (errno == EINTR){
e->type = EVENT_INTR;
return EVENT_INTR;
}
/* We've no child processes left */
return EVENT_NONE;
}
e->pid = pid;
if (!WIFSTOPPED(status)){
e->type = EVENT_EXIT;
if (WIFEXITED(status)){
e->num = WEXITSTATUS(status);
} else if (WIFSIGNALED(status)){
e->num = WTERMSIG(status) + 128;
} else {
/* Don't send SIGCHLD signals up */
goto retry;
}
return EVENT_EXIT;
}
e->type = EVENT_SIG;
sig = WSTOPSIG(status);
switch (sig){
case SIGTRAP:
/* Don't kill the prisoner by sending it SIGTRAP */
event_deny(e);
/* Don't sent SIGTRAP events up */
goto retry;
default:
/* Signal */
e->num = sig;
}
return e->type;
}

int main(int argc, char* argv[])
{
struct event e;
int pid;

pid = trace_fork();
if (pid < 0){
perror("trace_fork");
return errno;
} else if (pid == 0){
/* Child */
while (1){
sleep(2);
printf("Still running!\n");
}
return 0;
}
/* Tracer */
printf("New child %d\n", pid);
while (1){
int event = trace_event(&e);

if (event == EVENT_NONE)
return 0;
if (event == EVENT_SIG)
printf("Got signal %d\n", e.num);
event_allow(&e);
}
return 0;
}

2011-03-02 11:50:11

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, Mar 2, 2011 at 12:27 PM, Tejun Heo <[email protected]> wrote:
> On Wed, Mar 02, 2011 at 12:21:28PM +0100, Denys Vlasenko wrote:
>> > I think WSTOPSIG should be SIGTRAP as the tracee left group stop and
>> > entered ptrace trap.
>>
>> This would be, by my count, 13th kind of SIGTRAP use by ptrace.
>> Which makes multi-level if's in debuggers even more complex
>> and more error-prone.
>
> Of course, all ptrace traps are SIGTRAPs.

Except for those SIGSTOPs in children on auto-attach
via PTRACE_O_TRACE[V]FORK / PTRACE_O_TRACECLONE options...

>> Why not SIGCONT? This event is, after all, caused by SIGCONT.
>> It would be so much nicer to be able to detect it with single if()
>> in the debugger...
>
> I disagree. ?It's a ptrace trap. ?It should use SIGTRAP. ?We just need
> well defined siginfo output to distinguish between them. ?It's not
> like we can avoid siginfo anyway.

Performance problem here. Strace is already suffering from being
rather slow, especially for multi-threaded processes.

So far strace was able to avoid querying siginfo on every stop.

In order to make job control stop work properly, it will now need
to query siginfo, but only if signo==SIGSTOP. SIGSTOPs don't
occur too often, definitely not twice per syscall as SIGTRAPs do,
so it's not a problem.

With your proposal to show resume-from-job-control-stop-via-SIGCONT
as SIGTRAP, *every* SIGTRAP stop needs to be followed
by PTRACE_GETSIGINFO.


>> > No, it's not a signal delivery notification. ?It's a ptrace trap
>> > notification. ?SIGCONT may not be delivered to this task. ?Please
>> > remember that it's the emission of SIGCONT which ends a group stop,
>> > not delivery.
>>
>> From userspace POV it's really a kernel's implementation detail.
>
> Not really. ?This is actually a visible difference. ?Roland wrote in
> the previous discussion. ?One visible difference is that ptrace can
> veto job control stop but it can't veto the end of job control. ?Job
> control actions happen before SIGCONT hits the signal delivery path
> which is visible through ptrace.
>
>> >> (b) does PTRACE_CONT(<other_sig>)? or
>> >> (c) does PTRACE_CONT(0)?
>> >
>> > See above.
>>
>> This means that SIGCONT handler will be executed in the tracee
>> after debugger does PTRACE_CONT(<any_signo>) at this point.
>>
>> Which makes SIGCONT special: debugger can suppress execution
>> of other signal handlers in tracee, but not SIGCONT handler.
>> Another special case. Can we avoid having it?
>
> Hmmm.... I think you're confused about how SIGCONT is handled or maybe
> I am. ?Either way, please elaborate. ?I can't really follow.

signal(SIGCONT, my_handler) does install a handler for SIGCONT
in userspace, and this handler does run when SIGCONT is delivered:

#include <errno.h>
#include <string.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
static void sig(int n)
{
char buf[128];
int e = errno;
sprintf(buf, "sig: %d %s\n", n, strsignal(n));
write(1, buf, strlen(buf));
errno = e;
}
int main()
{
signal(SIGSTOP, sig);
signal(SIGCONT, sig);
signal(SIGWINCH, sig);
signal(SIGABRT, sig);
again:
printf("PID: %d\n", getpid());
fflush(NULL);
errno = 0;
sleep(30);
int e = errno;
printf("after sleep: errno=%d %s\n", e, strerror(e));
if (e) goto again;
return 0;
}

# ./a.out
PID: 16382
<------ kill -STOP 16382
<------ kill -ABRT 16382
<------ kill -WINCH 16382
<------ kill -CONT 16382
sig: 28 Window changed
sig: 18 Continued
sig: 6 Aborted
after sleep: errno=4 Interrupted system call
PID: 16382


Therefore we also need to think about this aspect of SIGCONT behavior
under debuggers.

Do we provide for the mechanism for debuggers to
prevent execution of *SIGCONT userspace handler*?

And, looking at the example above, I see that on resume from stop,
*SIGCONT userspace handler* actually doesn't run as *the first handler*
after SIGCONT. Other pending signal's handlers may be executed before it.

How would the above example look under ptraced process? Particularly,
this sequence:
<------ kill -STOP 16382
<------ kill -ABRT 16382
<------ kill -WINCH 16382
<------ kill -CONT 16382
sig: 28 Window changed
sig: 18 Continued
sig: 6 Aborted


--
vda

2011-03-02 11:52:58

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, Mar 2, 2011 at 12:32 PM, Indan Zupancic <[email protected]> wrote:
> On Wed, March 2, 2011 08:44, Tejun Heo wrote:
>> On Wed, Mar 02, 2011 at 06:07:35AM +0100, Indan Zupancic wrote:
>>> I'm not sure what Denys is talking about: Currently it's impossible to
>>> pass along SIGSTOP to traced processes. Quoting the ptrace manpage:
>>>
>>> ? ?PTRACE_CONT
>>> ? ? ? ? ? Restarts ?the stopped child process. ?If data is nonzero and not
>>> ? ? ? ? ? SIGSTOP, it is interpreted as a signal to be ?delivered ?to ?the
>>> ? ? ? ? ? child; ?otherwise, ?no ?signal is delivered.
>>
>> AFAICS, that's not true. ?SIGSTOP isn't treated differently from other
>> signals in the ptrace signal delivery path. ?Maybe it was true in the
>> past.
>
> Well, I can't find it in the code either, but it's probably a side-effect
> of how ptrace is currently implemented. Test program code below, see for
> yourself. I hope it's a program bug, but perhaps it's a kernel bug, as I
> seem to get two SIGSTOP events when I allow the SIGSTOP, but only one when
> denying it.

This was discussed recently (again). Second SIGSTOP you see
is a job control stop notification (as opposed to signal delivery notification).
It's not a bug.

--
vda

2011-03-02 13:40:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On 03/02, Tejun Heo wrote:
>
> On Wed, Mar 02, 2011 at 06:07:35AM +0100, Indan Zupancic wrote:
> > I'm not sure what Denys is talking about: Currently it's impossible to
> > pass along SIGSTOP to traced processes. Quoting the ptrace manpage:
> >
> > PTRACE_CONT
> > Restarts the stopped child process. If data is nonzero and not
> > SIGSTOP, it is interpreted as a signal to be delivered to the
> > child; otherwise, no signal is delivered.
>
> AFAICS, that's not true. SIGSTOP isn't treated differently from other
> signals in the ptrace signal delivery path. Maybe it was true in the
> past.

Yes, this is not true. And it seems this was never true.

This is the second time this manpage confuses people in this discussion,
probably it should be fixed...

Add Michael.

Oleg.

2011-03-02 14:43:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hi,

On Wed, Mar 02, 2011 at 12:48:56PM +0100, Denys Vlasenko wrote:
> > Of course, all ptrace traps are SIGTRAPs.
>
> Except for those SIGSTOPs in children on auto-attach
> via PTRACE_O_TRACE[V]FORK / PTRACE_O_TRACECLONE options...

Aren't they just using SIGSTOP's to trigger signal delivery path? The
signal can be delivered, right? Checking the source code.... Yeah, it
genuinely generates SIGSTOP.

> >> Why not SIGCONT? This event is, after all, caused by SIGCONT.
> >> It would be so much nicer to be able to detect it with single if()
> >> in the debugger...
> >
> > I disagree. ?It's a ptrace trap. ?It should use SIGTRAP. ?We just need
> > well defined siginfo output to distinguish between them. ?It's not
> > like we can avoid siginfo anyway.
>
> Performance problem here. Strace is already suffering from being
> rather slow, especially for multi-threaded processes.
>
> So far strace was able to avoid querying siginfo on every stop.
>
> In order to make job control stop work properly, it will now need
> to query siginfo, but only if signo==SIGSTOP. SIGSTOPs don't
> occur too often, definitely not twice per syscall as SIGTRAPs do,
> so it's not a problem.
>
> With your proposal to show resume-from-job-control-stop-via-SIGCONT
> as SIGTRAP, *every* SIGTRAP stop needs to be followed
> by PTRACE_GETSIGINFO.

I don't think it's a good idea to make these basic design choices
based on optimization concerns like the above. We're not talking
about read/write(2) here. If PTRACE_GETSIGINFO really becomes that
much of a performance bottleneck, we can put it in vdso, but I really
doubt it would come to that.

> int main()
> {
> signal(SIGSTOP, sig);
> signal(SIGCONT, sig);
> signal(SIGWINCH, sig);
> signal(SIGABRT, sig);
> again:
> printf("PID: %d\n", getpid());
> fflush(NULL);
> errno = 0;
> sleep(30);
> int e = errno;
> printf("after sleep: errno=%d %s\n", e, strerror(e));
> if (e) goto again;
> return 0;
> }
>
> # ./a.out
> PID: 16382
> <------ kill -STOP 16382
> <------ kill -ABRT 16382
> <------ kill -WINCH 16382
> <------ kill -CONT 16382
> sig: 28 Window changed
> sig: 18 Continued
> sig: 6 Aborted
> after sleep: errno=4 Interrupted system call
> PID: 16382
>
>
> Therefore we also need to think about this aspect of SIGCONT behavior
> under debuggers.
>
> Do we provide for the mechanism for debuggers to
> prevent execution of *SIGCONT userspace handler*?

Yeah, it's not different from any other signal. Just squash the
signal when ptrace signal delivery trap is taken, which is completely
separate from termination of job control stop triggered by _emission_
of SIGCONT. The two are separate. The proposed changes don't affect
the delivery path at all. I really can't understand what your point
is.

> And, looking at the example above, I see that on resume from stop,
> *SIGCONT userspace handler* actually doesn't run as *the first handler*
> after SIGCONT. Other pending signal's handlers may be executed before it.

Signal delivery is not FIFO. There are some rules that the code
describes. If you're interested, take a look at the code but in
general it would be better to avoid assuming fixed order between
signal generations and deliveries.

> How would the above example look under ptraced process? Particularly,
> this sequence:
> <------ kill -STOP 16382
> <------ kill -ABRT 16382
> <------ kill -WINCH 16382
> <------ kill -CONT 16382
> sig: 28 Window changed
> sig: 18 Continued
> sig: 6 Aborted

There's NO difference regarding signal delivery. It stays the SAME.

Thanks.

--
tejun

2011-03-02 14:50:59

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hello,

On Wed, Mar 02, 2011 at 12:32:36PM +0100, Indan Zupancic wrote:
> What might happen is that, because the current code handles the traced
> task as stopped, the new SIGSTOP signal is first added and then cleared
> when the task continues. This doesn't explain the double SIGSTOP
> notifications, I'd expect it to either loop indefinitely or to not
> notify the SIGSTOP twice.

That happens with any stopping signals. They're two different
notifications for two different events. Please read the original
thread referenced in the RFC for details.

> "When waitpid indicates stop on a *stop* signal, then it may be either:
> * a signal delivery (strace will inject this signal with PTRACE_SYSCALL(sig));
> * or it may be a stop notification, in which case strace *must not*
> try to inject this signal (this would be a bug, it'd make task running).
> Instead, strace should just go back to waiting in waitpid().
>
> These two possibilities can be distinquished by querying
> PTRACE_GETSIGINFO. On stop notifications, PTRACE_GETSIGINFO
> errors out - stop notification is not a signal delivery
> and therefore it has no siginfo."
>
> End quote.
>
> You don't get the second case when not setting the WUNTRACED flag.

WUNTRACED is ignored while ptracing.

> > Again, not following. In the proposal, job control and ptrace operate
> > independently, so on that we seem to agree, but I can't understand
> > where the STOP signal for the parent comes from? What are you
> > referring to?
>
> What I mean is, if you have a parent P with a child C, and C is ptraced by T,
> P shouldn't get SIGSTOP notifications when it waits for C with WUNTRACED set
> and C is stopped because of a ptrace event.

Yeah, sure, what I'm confused about is why you're bringing that up.
Nothing changes anything related to that. There's no reason to bring
it up. Am I missing something?

Thanks.

--
tejun

2011-03-02 15:17:34

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, Mar 2, 2011 at 3:43 PM, Tejun Heo <[email protected]> wrote:
>> # ./a.out
>> PID: 16382
>> ?<------ kill -STOP 16382
>> ?<------ kill -ABRT 16382
>> ?<------ kill -WINCH 16382
>> ?<------ kill -CONT 16382
>> sig: 28 Window changed
>> sig: 18 Continued
>> sig: 6 Aborted
>> after sleep: errno=4 Interrupted system call
>> PID: 16382
>>
>>
>> Therefore we also need to think about this aspect of SIGCONT behavior
>> under debuggers.
>>
>> Do we provide for the mechanism for debuggers to
>> prevent execution of *SIGCONT userspace handler*?
>
> Yeah, it's not different from any other signal. ?Just squash the
> signal when ptrace signal delivery trap is taken, which is completely
> separate from termination of job control stop triggered by _emission_
> of SIGCONT. ?The two are separate. ?The proposed changes don't affect
> the delivery path at all. ?I really can't understand what your point
> is.
>
>> And, looking at the example above, I see that on resume from stop,
>> *SIGCONT userspace handler* actually doesn't run as *the first handler*
>> after SIGCONT. Other pending signal's handlers may be executed before it.
>
> Signal delivery is not FIFO. ?There are some rules that the code
> describes. ?If you're interested, take a look at the code but in
> general it would be better to avoid assuming fixed order between
> signal generations and deliveries.

The above example does not show any FIFO-like behavior.

What it does show is that signals queued during stop take effect
immediately after job control stop is terminated.

>> How would the above example look under ptraced process? Particularly,
>> this sequence:
>> ?<------ kill -STOP 16382
>> ?<------ kill -ABRT 16382
>> ?<------ kill -WINCH 16382
>> ?<------ kill -CONT 16382
>> sig: 28 Window changed
>> sig: 18 Continued
>> sig: 6 Aborted
>
> There's NO difference regarding signal delivery. ?It stays the SAME.

Ok, let's see whether I understand you.

Assuming the program is run under simple debugger which
resumes execution using PTRACE_CONT(sig) on signal delivery stops,
with PTRACE_CONT(0) on ptrace stops,
and doesn't do any PTRACE_CONT on job control stops,
with your proposal the debugger will see and perform
the following actions:

waitpid...
<------ kill -STOP 16382
waitpid returns WSTOPPED, WSTOPSIG = SIGSTOP
ptrace(PTRACE_GETSIGINFO) doesn't fail (=> it's signal delivery)
ptrace(PTRACE_CONT, SIGSTOP)
waitpid returns WSTOPPED, WSTOPSIG = SIGSTOP
ptrace(PTRACE_GETSIGINFO) fails (=> it's job control stop)
waitpid...
<------ kill -ABRT 16382
...debugger doesn't wake up...
<------ kill -WINCH 16382
...debugger doesn't wake up...
<------ kill -CONT 16382
waitpid returns WSTOPPED, WSTOPSIG = SIGTRAP (it's a ptrace-stop)
ptrace(PTRACE_CONT, 0)
waitpid returns WSTOPPED, WSTOPSIG = SIGWINCH
ptrace(PTRACE_CONT, SIGWINCH)
waitpid returns WSTOPPED, WSTOPSIG = SIGCONT
ptrace(PTRACE_CONT, SIGCONT)
waitpid returns WSTOPPED, WSTOPSIG = SIGABRT
ptrace(PTRACE_CONT, SIGABRT)

Correct?

--
vda

2011-03-02 15:31:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hey,

On Wed, Mar 02, 2011 at 04:16:50PM +0100, Denys Vlasenko wrote:
> Assuming the program is run under simple debugger which
> resumes execution using PTRACE_CONT(sig) on signal delivery stops,
> with PTRACE_CONT(0) on ptrace stops,
> and doesn't do any PTRACE_CONT on job control stops,
> with your proposal the debugger will see and perform
> the following actions:
>
> waitpid...
> <------ kill -STOP 16382
> waitpid returns WSTOPPED, WSTOPSIG = SIGSTOP
> ptrace(PTRACE_GETSIGINFO) doesn't fail (=> it's signal delivery)
> ptrace(PTRACE_CONT, SIGSTOP)
> waitpid returns WSTOPPED, WSTOPSIG = SIGSTOP
> ptrace(PTRACE_GETSIGINFO) fails (=> it's job control stop)
> waitpid...
> <------ kill -ABRT 16382
> ...debugger doesn't wake up...
> <------ kill -WINCH 16382
> ...debugger doesn't wake up...
> <------ kill -CONT 16382
> waitpid returns WSTOPPED, WSTOPSIG = SIGTRAP (it's a ptrace-stop)
> ptrace(PTRACE_CONT, 0)
> waitpid returns WSTOPPED, WSTOPSIG = SIGWINCH
> ptrace(PTRACE_CONT, SIGWINCH)
> waitpid returns WSTOPPED, WSTOPSIG = SIGCONT
> ptrace(PTRACE_CONT, SIGCONT)
> waitpid returns WSTOPPED, WSTOPSIG = SIGABRT
> ptrace(PTRACE_CONT, SIGABRT)
>
> Correct?

Yeah, seems correct to me.

Thanks.

--
tejun

2011-03-03 00:47:55

by Indan Zupancic

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, March 2, 2011 14:32, Oleg Nesterov wrote:
> On 03/02, Tejun Heo wrote:
>> On Wed, Mar 02, 2011 at 06:07:35AM +0100, Indan Zupancic wrote:
>> > I'm not sure what Denys is talking about: Currently it's impossible to
>> > pass along SIGSTOP to traced processes. Quoting the ptrace manpage:
>> >
>> > PTRACE_CONT
>> > Restarts the stopped child process. If data is nonzero and not
>> > SIGSTOP, it is interpreted as a signal to be delivered to the
>> > child; otherwise, no signal is delivered.
>>
>> AFAICS, that's not true. SIGSTOP isn't treated differently from other
>> signals in the ptrace signal delivery path. Maybe it was true in the
>> past.
>
> Yes, this is not true. And it seems this was never true.
>
> This is the second time this manpage confuses people in this discussion,
> probably it should be fixed...

Passing SIGSTOP does not actually stop the traced task, which is in line
with what the manpage says. All it does is generating that second SIGSTOP
notification, but when the task is continued it's running, not stopped.
So ptraced tasks can't be stopped with SIGSTOP and continued with SIGCONT.

What the manpage says might not be "true", but it is the behaviour
actually seen.

On Wed, March 2, 2011 15:50, Tejun Heo wrote:
> That happens with any stopping signals. They're two different
> notifications for two different events. Please read the original
> thread referenced in the RFC for details.

I found subthreads:
http://article.gmane.org/gmane.linux.kernel/1099908
http://thread.gmane.org/gmane.linux.kernel/1093410

So you guys seem to know that SIGSTOP/SIGCONT doesn't work with
current ptrace, why say the manpage is wrong then? I'm confused.

You can fix ptrace to behave sensibly and then update the manpage,
but don't blame the manpage for describing the current behaviour.

>
> WUNTRACED is ignored while ptracing.

Do you mean that waitid always behaves as if WUNTRACED is set? That
is, it reports process stop events even when not asked to? That is not
documented anywhere, neither in the ptrace manpage, nor the waitpid one!

I consider that a bug, it would be a lot more useful if the default was
to not notify stopped events except when asked to. Anyway, for my program
it doesn't matter, as long as blindly passing on the second SIGSTOP is
harmless.

>
>> > Again, not following. In the proposal, job control and ptrace operate
>> > independently, so on that we seem to agree, but I can't understand
>> > where the STOP signal for the parent comes from? What are you
>> > referring to?
>>
>> What I mean is, if you have a parent P with a child C, and C is ptraced by T,
>> P shouldn't get SIGSTOP notifications when it waits for C with WUNTRACED set
>> and C is stopped because of a ptrace event.
>
> Yeah, sure, what I'm confused about is why you're bringing that up.
> Nothing changes anything related to that. There's no reason to bring
> it up. Am I missing something?

Ignore it, it's not important. I think I got confused about what you
were talking about, because of the entanglement between tracing and
task stopped state.

Guys, any change to ptrace to make it behave more sensible is welcome,
it's fine to change the behaviour if it improves it, as long as it's
backward compatible in the sense that it doesn't break old ptrace users.
Especially SIGSTOP related stuff is more or less transparent for tracers.
Currently it doesn't work and no matter what the tracer does, it can't
work.

I'd argue that not notifying stopped tasks events by default when WUNTRACED
isn't set is fine too, because doing that is just weird and unexpected by
pretty much all current ptrace users.

Greetings,

Indan

2011-03-03 01:30:35

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Thursday 03 March 2011 01:47, Indan Zupancic wrote:
> On Wed, March 2, 2011 14:32, Oleg Nesterov wrote:
> > On 03/02, Tejun Heo wrote:
> >> On Wed, Mar 02, 2011 at 06:07:35AM +0100, Indan Zupancic wrote:
> >> > I'm not sure what Denys is talking about: Currently it's impossible to
> >> > pass along SIGSTOP to traced processes. Quoting the ptrace manpage:
> >> >
> >> > PTRACE_CONT
> >> > Restarts the stopped child process. If data is nonzero and not
> >> > SIGSTOP, it is interpreted as a signal to be delivered to the
> >> > child; otherwise, no signal is delivered.
> >>
> >> AFAICS, that's not true. SIGSTOP isn't treated differently from other
> >> signals in the ptrace signal delivery path. Maybe it was true in the
> >> past.
> >
> > Yes, this is not true. And it seems this was never true.
> >
> > This is the second time this manpage confuses people in this discussion,
> > probably it should be fixed...
>
> Passing SIGSTOP does not actually stop the traced task, which is in line
> with what the manpage says. All it does is generating that second SIGSTOP
> notification, but when the task is continued it's running, not stopped.

It can be argued that after this the task is running _precisely_
because it was continued by the debugger.

> So ptraced tasks can't be stopped with SIGSTOP and continued with SIGCONT.

It can be stopped - just do not PTRACE_CONT it after second SIGSTOP
notification.

The bug is that it can't be continued with SIGCONT after that.

That's the gist of Tejun Heo's proposal.

Oleg's proposal is a bit different. It proposes that we do need
to do PTRACE_CONT after second SIGSTOP notification too,
but task will be indeed stopped after this, and resumed
when SIGCONT arrives.

--
vda

2011-03-03 01:55:37

by Indan Zupancic

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Thu, March 3, 2011 02:30, Denys Vlasenko wrote:
> On Thursday 03 March 2011 01:47, Indan Zupancic wrote:
>> On Wed, March 2, 2011 14:32, Oleg Nesterov wrote:
>> > On 03/02, Tejun Heo wrote:
>> >> On Wed, Mar 02, 2011 at 06:07:35AM +0100, Indan Zupancic wrote:
>> >> > I'm not sure what Denys is talking about: Currently it's impossible to
>> >> > pass along SIGSTOP to traced processes. Quoting the ptrace manpage:
>> >> >
>> >> > PTRACE_CONT
>> >> > Restarts the stopped child process. If data is nonzero and not
>> >> > SIGSTOP, it is interpreted as a signal to be delivered to the
>> >> > child; otherwise, no signal is delivered.
>> >>
>> >> AFAICS, that's not true. SIGSTOP isn't treated differently from other
>> >> signals in the ptrace signal delivery path. Maybe it was true in the
>> >> past.
>> >
>> > Yes, this is not true. And it seems this was never true.
>> >
>> > This is the second time this manpage confuses people in this discussion,
>> > probably it should be fixed...
>>
>> Passing SIGSTOP does not actually stop the traced task, which is in line
>> with what the manpage says. All it does is generating that second SIGSTOP
>> notification, but when the task is continued it's running, not stopped.
>
> It can be argued that after this the task is running _precisely_
> because it was continued by the debugger.

That would be an incorrect argument. PTRACE_CONT is supposed to let the
task continue doing whatever it was doing, and in the case of receiving
SIGSTOP that was going into a stopped state.

>> So ptraced tasks can't be stopped with SIGSTOP and continued with SIGCONT.
>
> It can be stopped - just do not PTRACE_CONT it after second SIGSTOP
> notification.

Stopped for tracing is not the same as stopped by SIGSTOP! So saying
to just hang in the traced state is not a good solution. This would
mean that there are special rules for SIGSTOP handling compared to
other signals, that's stupid.

- The tracer shouldn't get that second notification at all if it didn't
ask for it.

- It's hard to distinguish a SIGSTOP from an undocumented "task stopping"
notification which is lying because the task didn't stop after that
event was handled.

- There is no need or use for this complexity, just pass along SIGSTOP!

>
> The bug is that it can't be continued with SIGCONT after that.

No, that is a side effect of your buggy work around which proves that
SIGSTOP doesn't currently work.

For a debugger it doesn't matter much, but for anything that tries to be
unobtrusive like strace this is just madness.

> That's the gist of Tejun Heo's proposal.
>
> Oleg's proposal is a bit different. It proposes that we do need
> to do PTRACE_CONT after second SIGSTOP notification too,
> but task will be indeed stopped after this, and resumed
> when SIGCONT arrives.

Please implement Oleg's proposal, anything else is madness.

Ptrace and task stopping because of SIGSTOP are two independent things
and should be treated like that, if you want to improve ptrace.

Greetings,

Indan

2011-03-03 07:03:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hey, Indan.

On Thu, Mar 03, 2011 at 02:55:32AM +0100, Indan Zupancic wrote:
> > It can be argued that after this the task is running _precisely_
> > because it was continued by the debugger.
>
> That would be an incorrect argument. PTRACE_CONT is supposed to let the
> task continue doing whatever it was doing, and in the case of receiving
> SIGSTOP that was going into a stopped state.

Again, that's your "supposed" which apparently is widely different
depending on who you ask and irrelevant because the currently
implemented visible behavior is completely different.

> Stopped for tracing is not the same as stopped by SIGSTOP! So saying
> to just hang in the traced state is not a good solution. This would
> mean that there are special rules for SIGSTOP handling compared to
> other signals, that's stupid.
>
> - The tracer shouldn't get that second notification at all if it didn't
> ask for it.

You're confused about the two notifications. The first one is signal
delivery notification. The second one is job control stop
notification. They are completely separate.

> - It's hard to distinguish a SIGSTOP from an undocumented "task stopping"
> notification which is lying because the task didn't stop after that
> event was handled.

It is not difficult to distinguish. Use GET_SIGINFO. It has been
like that from the beginning.

> - There is no need or use for this complexity, just pass along SIGSTOP!

No, SIGSTOP delivery and job control stop are two separate steps.

> > The bug is that it can't be continued with SIGCONT after that.
>
> No, that is a side effect of your buggy work around which proves that
> SIGSTOP doesn't currently work.

You're basically trying to restart the previous discussion from the
beginning. Go back and read the thread and read the RFC again.

> For a debugger it doesn't matter much, but for anything that tries to be
> unobtrusive like strace this is just madness.

It actually seems pretty sane to me, but if you think it's a madness,
think of it as wiring in front of sensors. It will help you to accept
it. Why do you think I spent time writing the RFC?

> > Oleg's proposal is a bit different. It proposes that we do need
> > to do PTRACE_CONT after second SIGSTOP notification too,
> > but task will be indeed stopped after this, and resumed
> > when SIGCONT arrives.
>
> Please implement Oleg's proposal, anything else is madness.
>
> Ptrace and task stopping because of SIGSTOP are two independent things
> and should be treated like that, if you want to improve ptrace.

Again, go back, read the thread. Understand it. Read the RFC.
Understand it and come back if you still have something to contribute.
You're pretty short on details but long on claims.

Thanks.

--
tejun

2011-03-03 17:42:57

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hi Tejun,

I didn't read the whole thread yet... perhaps this was already
discussed in more details. IOW, please ignore "I don't understand"
parts, I'll ask the questions later.

On 03/01, Tejun Heo wrote:
>
> I3. Not well-defined job control behaviors while traced
>
> In general, jctl behaviors while ptraced aren't well defined.

I'd say it is not defined at all ;) And to me this is the root of
all problems.

So, many thanks for this RFC. This is the first time someone tries
to define the rules.

> * PTRACE_CONT and other requests which resume the tracee overrides, or
> rather works below, jctl stop. If jctl stop takes place on the task
> group a tracee belongs to, the tracee will eventually participate in
> the group stop and its tracer will be notified; however, when
> PTRACE_CONT or other resuming request is made, the tracee will
> resume execution regardless of and without affecting the jctl stop.
>
> I don't know whether these are by design or just happened as
> by-products of the evolution of task group implementation in the
> kernel, but regardless, in my opinion, both rules are sound and
> useful. They might not be immediately intuitive and the resulting
> behavior might seem quirky but to me it seems to be one of those
> things which looks awkward at first but is ultimately right in its
> usefulness and relative simplicity.
>
> More importantly, it doesn't matter what I or, for that matter, anyone
> else thinks about them. They're tightly ingrained into the
> userland-visible behavior and actively exploited by the current users
> - for example, dynamic evalution in tracee context in gdb(1).
> Changing behaviors as fundamental as these would impact the current
> applications and debugging behaviors expected by (human) users.

OK. I have to agree. Lets forget the PTRACE_CONT-needs-SIGCONT idea.
Nobody like it, including Jan (even if he didn't nack it explicitly).
Forget.

> PROPOSAL
> --------
>
> P1. Always TASK_TRACED while ptraced

OK.

> P2. Fix notifications to the real parent
>
> This pleasantly proved to be the least contentious change to make.
> The usual group stop / continued notifications should be propagated to
> the real parent whether the children are ptraced or not.

Agreed.

> P3. Keep ptrace resume separate from and beneath jctl stop

This is not exactly clear to me... Probably I understand what
you mean, but I am not sure about details.

> P4. PTRACE_SEIZE

This is the new request. You know, I'd like to discuss the details
later and separately. Actually, I think the user-space developers
should participate and tell what they need. As for me, I certainly
agree that SIGSTOP from PTRACE_ATTACH is very wrong, and it is very
bad that gdb has to send SIGSTOP if it wants to stop the tracee.
IOW, I agree that something like this is needed and useful. In
particular,

> In both cases, jctl state is unaffected.

Agreed.

> P5. "^Z" and "fg" for tracees
>
> As proposed, when a tracee enters jctl stop, it enters TASK_TRACED
> from which emission of SIGCONT can't resume the tracee. This makes it
> impossible for a tracer to become transparent with respect to jctl.
> For example, after strace(1) is attached to a task, the task can be
> ^Z'd but then can't be fg'd.
>
> A better way to solve this is simply giving the tracer the capability
> to listen for the end of jctl stop.

Hmm. I don't understand what "the end of jctl stop" actually means.

Since you are talking about WCONTINUED below, I guess it means that
the process is not group-stopped any longer, say, SIGCONT comes.
OK. If the tracer is notified about the end of jctl stop, it can
resume the tracee if it wants. end-of-jctl-stop is per-process, but
I guess the debugger will be notified per-thread. Looks reasonable
to me.

> WCONTINUED is the obvious candidate but I think it is
> better to use STOPPED notification because the task is not really
> resumed. Only its mode of stop changes.

OK.

> What state the tracee is in
> can be determined by retriving siginfo using PTRACE_GETSIGINFO.

I don't understand this this details right now... But I guess this
doesn't matter right now.

Either way, debugger should have the ability to know the tracee's
state wrt group-stop. To oversimplify, it should know the state of
SIGNAL_STOP_STOPPED bit. Correct?

Oleg.

2011-03-03 19:35:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On 03/02, Tejun Heo wrote:
>
> Hey,
>
> On Wed, Mar 02, 2011 at 12:02:37PM +0100, Denys Vlasenko wrote:
> > What do you think about SIGSTOP generated in in children on auto-attach
> > via PTRACE_O_TRACE[V]FORK / PTRACE_O_TRACECLONE options?
>
> Ah, you're right. I actually haven't been thinking about them.
>
> > IMHO, it would be good if we'd have a way to distinguish them from
> > real SIGSTOP signals.
>
> Yeah, probably.

Perhaps... but at first glance I think we should not change this
at all.

We are not going to change PTRACE_ATTACH, we can't. We are going
to add the new request which avoids tkill(SIGSTOP). The same with
PTRACE_O_TRACECLONE/etc imho. We need the better control on
auto-attach, but this needs another discussion.

Oleg.

2011-03-03 20:31:20

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On 03/03, Oleg Nesterov wrote:
>
> I'll ask the questions later.

Right now I do not see any holes (but I'll try more ;)

One question, to ensure I really understand you. To simplify,
consider this particular example.

Tracee:

int main(void)
{
kill(SIGSTOP, getpid());

printf("I am running\n");

for (;;)
;
}

To simplify again, suppose that the debugger attaches when it is
already stopped, then it does PTRACE_CONT(0).

In this case the tracee remains SIGNAL_STOP_STOPPED but prints
"I am running" and enters the endless loop.

(the new debugger can do PTRACE_SEIZE after that and "return"
it to the stopped state without affecting jctl state).

Now, if SIGCONT comes (from anywhere) it clears SIGNAL_STOP_STOPPED,
the tracee traps and reports this event to debugger.

Correct?


And, once again. In the mt case, I assume that SIGCONT makes
every traced thread to report this event individually, right?

(I am talking about the case when the group-stop was finished,
iow "every" probably means the threads which participated and
reported CLD_STOPPED to the debugger).


In both cases, later then this SIGCONT will be reported again
as any "normal" signal when some thread dequeues it.


Is my understanding correct?

Thanks,

Oleg.

2011-03-04 08:23:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hey, Oleg.

On Thu, Mar 03, 2011 at 09:22:46PM +0100, Oleg Nesterov wrote:
> On 03/03, Oleg Nesterov wrote:
> >
> > I'll ask the questions later.
>
> Right now I do not see any holes (but I'll try more ;)

Heh, I'm sure you'll find some. :-)

> One question, to ensure I really understand you. To simplify,
> consider this particular example.
>
> Tracee:
>
> int main(void)
> {
> kill(SIGSTOP, getpid());
>
> printf("I am running\n");
>
> for (;;)
> ;
> }
>
> To simplify again, suppose that the debugger attaches when it is
> already stopped, then it does PTRACE_CONT(0).
>
> In this case the tracee remains SIGNAL_STOP_STOPPED but prints
> "I am running" and enters the endless loop.
>
> (the new debugger can do PTRACE_SEIZE after that and "return"
> it to the stopped state without affecting jctl state).
>
> Now, if SIGCONT comes (from anywhere) it clears SIGNAL_STOP_STOPPED,
> the tracee traps and reports this event to debugger.
>
> Correct?

The notification of the end of job control stop (ie. emission of
SIGCONT) is probably the most hazy part and probably would change a
bit while implemented, but here are the baselines I have on mind.

* The notification of the job control stop itself is the only time
that wait(2) reports the job control signal and the siginfo which
was sent together.

* When job control stop ends, exit_code is changed to indicate ptrace
trap and siginfo indicates the trap site and that job control stop
is no long in effect. This of course should wake up the tracer if
it's wait(2)ing.

* The above requires another ptrace trap site which can probably
shared with PTRACE_SEIZE. The question is whether to make group
stop state available for other trap sites too or just enable it in
the new trap site. ATM, I'm leaning toward the latter.

> And, once again. In the mt case, I assume that SIGCONT makes
> every traced thread to report this event individually, right?
>
> (I am talking about the case when the group-stop was finished,
> iow "every" probably means the threads which participated and
> reported CLD_STOPPED to the debugger).

Yeap, it's per-task ptrace trap which is broadcasted to every ptraced
task which participated in the group stop.

> In both cases, later then this SIGCONT will be reported again
> as any "normal" signal when some thread dequeues it.

Yeap, that's something which happens in the delivery path for SIGCONT.
It should behave the same (other than fixing notification to real
parent, that is).

> Is my understanding correct?

Yeap, seems pretty accurate.

Thank you.

--
tejun

2011-03-04 08:46:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hey, Oleg.

On Thu, Mar 03, 2011 at 06:34:22PM +0100, Oleg Nesterov wrote:
> > P4. PTRACE_SEIZE
>
> This is the new request. You know, I'd like to discuss the details
> later and separately. Actually, I think the user-space developers
> should participate and tell what they need. As for me, I certainly
> agree that SIGSTOP from PTRACE_ATTACH is very wrong, and it is very
> bad that gdb has to send SIGSTOP if it wants to stop the tracee.
> IOW, I agree that something like this is needed and useful. In
> particular,

While discussing is good, I'd like to keep things slightly more
driven. I think, as anything else, there's a balance to hit between
discussing and just pushing things forward. We did fair amount of
discussion past two+ months and well I think it's about time to push
forward.

By now gdb/strace ppl should be aware of what's going on, right? So,
if you guys have something on mind w.r.t. kernel behavior, please
share, but I won't wait for some discussion elsewhere to reach a
conclusion especially because the issues which seem to cause the most
amount of delay are not really about what's necessary to achieve the
desired functionalities but wildly different wet wild dreams about
sexy unicorns.

> > A better way to solve this is simply giving the tracer the capability
> > to listen for the end of jctl stop.
>
> Hmm. I don't understand what "the end of jctl stop" actually means.
>
> Since you are talking about WCONTINUED below, I guess it means that
> the process is not group-stopped any longer, say, SIGCONT comes.

Yeap, I meant emssion of SIGCONT ending the job control stop in
effect.

> > What state the tracee is in can be determined by retriving siginfo
> > using PTRACE_GETSIGINFO.
>
> I don't understand this this details right now... But I guess this
> doesn't matter right now.
>
> Either way, debugger should have the ability to know the tracee's
> state wrt group-stop. To oversimplify, it should know the state of
> SIGNAL_STOP_STOPPED bit. Correct?

Yes, that information should be available via PTRACE_GETSIGINFO.

Thanks.

--
tejun

2011-03-04 13:01:36

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Thu, Mar 3, 2011 at 9:22 PM, Oleg Nesterov <[email protected]> wrote:
> On 03/03, Oleg Nesterov wrote:
>>
>> I'll ask the questions later.
>
> Right now I do not see any holes (but I'll try more ;)
>
> One question, to ensure I really understand you. To simplify,
> consider this particular example.
>
> Tracee:
>
> ? ? ? ?int main(void)
> ? ? ? ?{
> ? ? ? ? ? ? ? ?kill(SIGSTOP, getpid());
>
> ? ? ? ? ? ? ? ?printf("I am running\n");
>
> ? ? ? ? ? ? ? ?for (;;)
> ? ? ? ? ? ? ? ? ? ? ? ?;
> ? ? ? ?}
>
> To simplify again, suppose that the debugger attaches when it is
> already stopped, then it does PTRACE_CONT(0).

I think whatever new attach operation we invent needs to provide
a way to know whether attached task is stopped or not.

If debugger wants to say "please continue doing whatever
you were doing before attachment", then, using this information,
debugger can decide whether to do PTRACE_CONT(0) and
sleep on waitpid, or _don't_ do PTRACE_CONT(0) and
sleep on waitpid.

> In this case the tracee remains SIGNAL_STOP_STOPPED but prints
> "I am running" and enters the endless loop.
>
> (the new debugger can do PTRACE_SEIZE after that and "return"
> ?it to the stopped state without affecting jctl state).

As far as I can understand the proposal, yes.
Basically, this behavior is intended for gdb to have a way
to implement it's backdoor-ish hack to have stopped tasks
to nevertheless run some code.
(I bet other people will eventually abuse this in horrible ways)


> Now, if SIGCONT comes (from anywhere) it clears SIGNAL_STOP_STOPPED,
> the tracee traps and reports this event to debugger.

And again, I would like to ask you kernel guys to give
userspace a way to distinguish this stop from other possible stops.

IOW: debugger PTRACE_SYSCALL(0)'ed a job control stopped task.
Debugger gets WIFSTOPPED, WSTOPSIG = SIGTRAP.
Debugger wants to know: is it a "syscall entry/exit" stop
or an "end of job control stop" stop?
Preferably without the need to query PTRACE_GETSIGINFO
on every SIGTRAP - that'd slow strace down a lot.

I imagine one way to do it is to #define a PTRACE_EVENT_foo
for "end of job control stop" stop and return it in high byte
of waitpid status, just like other PTRACE_EVENTs are returned today.

--
vda

2011-03-04 13:41:32

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hello,

On Fri, Mar 04, 2011 at 02:01:04PM +0100, Denys Vlasenko wrote:
> I think whatever new attach operation we invent needs to provide
> a way to know whether attached task is stopped or not.
>
> If debugger wants to say "please continue doing whatever
> you were doing before attachment", then, using this information,
> debugger can decide whether to do PTRACE_CONT(0) and
> sleep on waitpid, or _don't_ do PTRACE_CONT(0) and
> sleep on waitpid.

Definitely.

> > Now, if SIGCONT comes (from anywhere) it clears SIGNAL_STOP_STOPPED,
> > the tracee traps and reports this event to debugger.
>
> And again, I would like to ask you kernel guys to give
> userspace a way to distinguish this stop from other possible stops.
>
> IOW: debugger PTRACE_SYSCALL(0)'ed a job control stopped task.
> Debugger gets WIFSTOPPED, WSTOPSIG = SIGTRAP.
> Debugger wants to know: is it a "syscall entry/exit" stop
> or an "end of job control stop" stop?
> Preferably without the need to query PTRACE_GETSIGINFO
> on every SIGTRAP - that'd slow strace down a lot.

There's no question that it should be distinguishible. I was still
mostly thinking about siginfo tho. Is PTRACE_GETSIGINFO such big
deal? It's a very simple syscall. It enters kernel, copyout the
structure and returns. Are you interested in adding GET_SIGINFO call
to the current ptrace and see how much things actually slow down?

> I imagine one way to do it is to #define a PTRACE_EVENT_foo
> for "end of job control stop" stop and return it in high byte
> of waitpid status, just like other PTRACE_EVENTs are returned today.

Maybe but I'm not fully sure yet because I think for debuggers the
group stop status needs to be available for other ptrace traps too, so
it might really not fit there. I'd like to avoid twisting debugging
API for performance. As I wrote before, if this actually is a problem
and can't be done cleanly without siginfo, I think a better way would
be using vdso but that's a bit of complexity and would require some
numbers to justify the complexity.

Thanks.

--
tejun

2011-03-04 14:00:09

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Fri, Mar 4, 2011 at 2:41 PM, Tejun Heo <[email protected]> wrote:
>> And again, I would like to ask you kernel guys to give
>> userspace a way to distinguish this stop from other possible stops.
>>
>> IOW: debugger PTRACE_SYSCALL(0)'ed a job control stopped task.
>> Debugger gets WIFSTOPPED, WSTOPSIG = SIGTRAP.
>> Debugger wants to know: is it a "syscall entry/exit" stop
>> or an "end of job control stop" stop?
>> Preferably without the need to query PTRACE_GETSIGINFO
>> on every SIGTRAP - that'd slow strace down a lot.
>
> There's no question that it should be distinguishible. ?I was still
> mostly thinking about siginfo tho. ?Is PTRACE_GETSIGINFO such big
> deal? ?It's a very simple syscall. ?It enters kernel, copyout the
> structure and returns. ?Are you interested in adding GET_SIGINFO call
> to the current ptrace and see how much things actually slow down?

I would rather speed strace up than slow it down further, even if slightly.


>> I imagine one way to do it is to #define a PTRACE_EVENT_foo
>> for "end of job control stop" stop and return it in high byte
>> of waitpid status, just like other PTRACE_EVENTs are returned today.
>
> Maybe but I'm not fully sure yet because I think for debuggers the
> group stop status needs to be available for other ptrace traps too, so

By all means, do make this information accessible
with PTRACE_GETSIGINFO too. I have nothing against that.


> it might really not fit there. ?I'd like to avoid twisting debugging
> API

It is *already* twisted that way. Other ptrace stops *already*
return extra information in waitpid status: run "man ptrace"
and read PTRACE_SETOPTIONS section. For example:

"PTRACE_O_TRACEEXEC (since Linux 2.5.46)
Stop the child at the next execve(2) call with
SIGTRAP | PTRACE_EVENT_EXEC << 8."

I don't see why adding another PTRACE_EVENT_foo
is a big deal.

--
vda

2011-03-04 14:07:51

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hello,

On Fri, Mar 04, 2011 at 02:59:32PM +0100, Denys Vlasenko wrote:
> I would rather speed strace up than slow it down further, even if
> slightly.

The question to ask is at what cost? If mostly unnoticeable slow down
makes the API cleaner, I'll go that way. Everything is a tradeoff.

> > it might really not fit there. ?I'd like to avoid twisting debugging
> > API
>
> It is *already* twisted that way. Other ptrace stops *already*
> return extra information in waitpid status: run "man ptrace"
> and read PTRACE_SETOPTIONS section. For example:
>
> "PTRACE_O_TRACEEXEC (since Linux 2.5.46)
> Stop the child at the next execve(2) call with
> SIGTRAP | PTRACE_EVENT_EXEC << 8."
>
> I don't see why adding another PTRACE_EVENT_foo
> is a big deal.

No, I wasn't talking about that. My concern was that. Let's say, we
need a flag somewhere to indicate group stop state (because the state
is necessary regardless of which trap is taken) and then we can't or
that to SIGTRAP transparently because that would change it for the
existing events and at the same time I don't want to make the
information available in two separate forms.

That said, I think, for strace, it should be okay. Unless the tracer
resumes the tracee while job control stop is in effect, the trap which
is taken after SIGCONT emission will be a new distinguishible trap
which will be assigned a separate PTRACE_EVENT_*, so it wouldn't have
to depend on siginfo.

Anyways, I'll keep the distinction performance on mind but if it comes
down to API convolution, I don't think bending that way is a good
idea. If we're gonna bend, there should be numbers justifying the
bending.

Thanks.

--
tejun

2011-03-04 14:31:58

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Fri, Mar 4, 2011 at 3:07 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Fri, Mar 04, 2011 at 02:59:32PM +0100, Denys Vlasenko wrote:
>> I would rather speed strace up than slow it down further, even if
>> slightly.
>
> The question to ask is at what cost? ?If mostly unnoticeable slow down
> makes the API cleaner, I'll go that way. ?Everything is a tradeoff.

# time sh -c 'ls -lR /usr/share >/dev/null'
real 0m2.633s
user 0m0.540s
sys 0m2.047s

strace without PTRACE_GETSIGINFO:

# time sh -c './strace -e trace=exit_group ls -lR /usr/share >/dev/null'
exit_group(0) = ?
real 0m47.023s
user 0m3.843s
sys 0m29.050s
# time sh -c './strace -e trace=exit_group ls -lR /usr/share >/dev/null'
exit_group(0) = ?
real 0m48.799s
user 0m3.808s
sys 0m29.736s
# time sh -c './strace -e trace=exit_group ls -lR /usr/share >/dev/null'
exit_group(0) = ?
real 0m47.695s
user 0m3.965s
sys 0m29.047s

strace with PTRACE_GETSIGINFO:

# time sh -c './strace -e trace=exit_group ls -lR /usr/share >/dev/null'
exit_group(0) = ?
real 0m51.958s
user 0m3.899s
sys 0m31.584s
# time sh -c './strace -e trace=exit_group ls -lR /usr/share >/dev/null'
exit_group(0) = ?
real 0m53.773s
user 0m3.920s
sys 0m29.713s
# time sh -c './strace -e trace=exit_group ls -lR /usr/share >/dev/null'
exit_group(0) = ?
real 0m51.625s
user 0m3.961s
sys 0m30.970s

--
vda

2011-03-04 14:41:16

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Fri, Mar 04, 2011 at 03:31:15PM +0100, Denys Vlasenko wrote:
> On Fri, Mar 4, 2011 at 3:07 PM, Tejun Heo <[email protected]> wrote:
> > Hello,
> >
> > On Fri, Mar 04, 2011 at 02:59:32PM +0100, Denys Vlasenko wrote:
> >> I would rather speed strace up than slow it down further, even if
> >> slightly.
> >
> > The question to ask is at what cost? ?If mostly unnoticeable slow down
> > makes the API cleaner, I'll go that way. ?Everything is a tradeoff.
>
> # time sh -c 'ls -lR /usr/share >/dev/null'
> real 0m2.633s
>
> strace without PTRACE_GETSIGINFO:
> real 0m47.023s
> real 0m48.799s
> real 0m47.695s
>
> strace with PTRACE_GETSIGINFO:
> real 0m51.958s
> real 0m53.773s
> real 0m51.625s

Great, numbers, so it's ~10 slow down. Gees, with or without that
change, strace(2) is heavy, >18 times slower than without. Maybe it
should give up on ptrace and use the new tracing infrastructure?

Anyways, thanks a lot for the numbers. Much appreciated.

--
tejun

2011-03-04 16:10:57

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On 03/04, Tejun Heo wrote:
>
> Hey, Oleg.
>
> On Thu, Mar 03, 2011 at 06:34:22PM +0100, Oleg Nesterov wrote:
> > > P4. PTRACE_SEIZE
> >
> > This is the new request. You know, I'd like to discuss the details
> > later and separately. Actually, I think the user-space developers
> > should participate and tell what they need. As for me, I certainly
> > agree that SIGSTOP from PTRACE_ATTACH is very wrong, and it is very
> > bad that gdb has to send SIGSTOP if it wants to stop the tracee.
> > IOW, I agree that something like this is needed and useful. In
> > particular,
>
> While discussing is good, I'd like to keep things slightly more
> driven. I think, as anything else, there's a balance to hit between
> discussing and just pushing things forward. We did fair amount of
> discussion past two+ months and well I think it's about time to push
> forward.
>
> By now gdb/strace ppl should be aware of what's going on, right?

Yes. I think I wasn't clear.

What I meant, I think the exact details can be discussed separately.
Say, personally I'd prefer 2 different requests, ATTACH && INTERUPT,
but I think this is very minor, and I agree with everything as long
as user-space developers do not object. I just tried to avoid the
discussion of the "cosmetic" details at this point.

> So,
> if you guys have something on mind w.r.t. kernel behavior, please
> share, but I won't wait for some discussion elsewhere

No. I agree, this should be discussed here.

Oleg.

2011-03-04 16:15:48

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hello, Oleg.

On Fri, Mar 04, 2011 at 05:01:51PM +0100, Oleg Nesterov wrote:
> What I meant, I think the exact details can be discussed separately.
> Say, personally I'd prefer 2 different requests, ATTACH && INTERUPT,
> but I think this is very minor, and I agree with everything as long
> as user-space developers do not object. I just tried to avoid the
> discussion of the "cosmetic" details at this point.

Understood. One thing tho. Do you think having ATTACH_NO_STOP would
be better? I think that's a noticeable difference. To me, it seems
to only complicate things. If we decide to go for
ATTACH_STOP_WITHOUT_SIDE_EFFECT then the difference between that and
INTERRUPT again becomes really small, so that was the reason why I
proposed to have a unified one.

Thanks.

--
tejun

2011-03-04 16:22:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On 03/04, Denys Vlasenko wrote:
>
> # time sh -c './strace -e trace=exit_group ls -lR /usr/share >/dev/null'
> exit_group(0) = ?
> real 0m47.695s
> user 0m3.965s
> sys 0m29.047s
>
> strace with PTRACE_GETSIGINFO:
>
> # time sh -c './strace -e trace=exit_group ls -lR /usr/share >/dev/null'
> exit_group(0) = ?
> real 0m51.958s
> user 0m3.899s
> sys 0m31.584s

I am wondering how much ptrace_check_attach()->wait_task_inactive()
contributes... it can sleep.

And note that many ptrace requests do not really need this,
PTRACE_GETSIGINFO in particular.

Oleg.

2011-03-04 16:23:51

by Jan Kratochvil

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Tue, 01 Mar 2011 23:14:14 +0100, Denys Vlasenko wrote:
> On Tue, Mar 1, 2011 at 8:06 PM, Jan Kratochvil <[email protected]> wrote:
> > Currently it is already a problem that apps did not / do not expect the first
> > waitpid after PTRACE_ATTACH may not be SIGSTOP.
>
> That's exactly why we want to add a better alternative, which doesn't
> insert that blasted SIGSTOP.

But it insteads blasted SIGTRAP (or some other signal) instead.

It would be best if such PTRACE_SEIZE (similar to PTRACE_INTERRUPT) would
guarantee the first waitpid afterwards returns the artificial signal from
PTRACE_SEIZE.


Thanks,
Jan

2011-03-04 16:35:11

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On 03/04, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Fri, Mar 04, 2011 at 05:01:51PM +0100, Oleg Nesterov wrote:
> > What I meant, I think the exact details can be discussed separately.
> > Say, personally I'd prefer 2 different requests, ATTACH && INTERUPT,
> > but I think this is very minor, and I agree with everything as long
> > as user-space developers do not object. I just tried to avoid the
> > discussion of the "cosmetic" details at this point.
>
> Understood. One thing tho. Do you think having ATTACH_NO_STOP would
> be better?

No, I don't think so. More precisely, I simply do not know, we should
probably ask Jan.

But. from the previous discussions, gdb seems to need
PTRACE_O_ATTACH_NEW_THREADS_AND_DO_NOT_STOP. Because gdb simply do not
want to know about the new thread, until it does something "interesting".
However this leads to other questions.

Oleg.

2011-03-04 16:39:20

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On 03/04, Oleg Nesterov wrote:
>
> On 03/04, Denys Vlasenko wrote:
> >
> > # time sh -c './strace -e trace=exit_group ls -lR /usr/share >/dev/null'
> > exit_group(0) = ?
> > real 0m47.695s
> > user 0m3.965s
> > sys 0m29.047s
> >
> > strace with PTRACE_GETSIGINFO:
> >
> > # time sh -c './strace -e trace=exit_group ls -lR /usr/share >/dev/null'
> > exit_group(0) = ?
> > real 0m51.958s
> > user 0m3.899s
> > sys 0m31.584s
>
> I am wondering how much ptrace_check_attach()->wait_task_inactive()
> contributes... it can sleep.
>
> And note that many ptrace requests do not really need this,
> PTRACE_GETSIGINFO in particular.

could you replace the extra PTRACE_GETSIGINFO by invalid request?

Oleg.

2011-03-04 16:42:17

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Fri, Mar 4, 2011 at 5:14 PM, Jan Kratochvil
<[email protected]> wrote:
> On Tue, 01 Mar 2011 23:14:14 +0100, Denys Vlasenko wrote:
>> On Tue, Mar 1, 2011 at 8:06 PM, Jan Kratochvil <[email protected]> wrote:
>> > Currently it is already a problem that apps did not / do not expect the first
>> > waitpid after PTRACE_ATTACH may not be SIGSTOP.
>>
>> That's exactly why we want to add a better alternative, which doesn't
>> insert that blasted SIGSTOP.
>
> But it inserts blasted SIGTRAP (or some other signal) instead.

Not a problem. The problem with SIGSTOP is that we race against
possible real SIGSTOP.

> It would be best if such PTRACE_SEIZE (similar to PTRACE_INTERRUPT)

My understanding is that those are the same idea with different names.

> would
> guarantee the first waitpid afterwards returns the artificial signal from
> PTRACE_SEIZE.

Yes.

--
vda

2011-03-04 17:06:02

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Fri, Mar 4, 2011 at 3:40 PM, Tejun Heo <[email protected]> wrote:
> On Fri, Mar 04, 2011 at 03:31:15PM +0100, Denys Vlasenko wrote:
>> On Fri, Mar 4, 2011 at 3:07 PM, Tejun Heo <[email protected]> wrote:
>> > Hello,
>> >
>> > On Fri, Mar 04, 2011 at 02:59:32PM +0100, Denys Vlasenko wrote:
>> >> I would rather speed strace up than slow it down further, even if
>> >> slightly.
>> >
>> > The question to ask is at what cost? ?If mostly unnoticeable slow down
>> > makes the API cleaner, I'll go that way. ?Everything is a tradeoff.
>>
>> # time sh -c 'ls -lR /usr/share >/dev/null'
>> real ?0m2.633s
>>
>> strace without PTRACE_GETSIGINFO:
>> real ?0m47.023s
>> real ?0m48.799s
>> real ?0m47.695s
>>
>> strace with PTRACE_GETSIGINFO:
>> real ?0m51.958s
>> real ?0m53.773s
>> real ?0m51.625s
>
> Great, numbers, so it's ~10 slow down. ?Gees, with or without that
> change, strace(2) is heavy, >18 times slower than without.

Here's a typical sequence of operations strace is doing:

wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], __WALL, NULL) = 10445
ptrace(PTRACE_GETSIGINFO, 10445, 0, {si_signo=SIGTRAP, si_code=0x5,
si_pid=10445, si_uid=0, si_value={int=0, ptr=0}}) = 0
ptrace(PTRACE_PEEKUSER, 10445, 8*ORIG_RAX, [0x36]) = 0
ptrace(PTRACE_PEEKUSER, 10445, 8*CS, [0x23]) = 0
ptrace(PTRACE_PEEKUSER, 10445, 8*RAX, [0xffffffffffffffda]) = 0
ptrace(PTRACE_PEEKUSER, 10445, 8*RBX, [0x1]) = 0
ptrace(PTRACE_PEEKUSER, 10445, 8*RCX, [0x5401]) = 0
ptrace(PTRACE_PEEKUSER, 10445, 8*RDX, [0xffb83f10]) = 0
ptrace(PTRACE_SYSCALL, 10445, 0x1, SIG_0) = 0

IOW: strace wastes a lot of time just transiting to the kernel and back
with simple syscalls to get the information.

What would help here is a "vectorized" waitpid operation
which retrieves much more information in one go:

waitpid_vec(int max_results, int status_vector[], siginfo_t
si_vector[], struct pt_regs reg_vector[])

--
vda

2011-03-04 17:13:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Fri, Mar 4, 2011 at 9:05 AM, Denys Vlasenko <[email protected]> wrote:
>
> Here's a typical sequence of operations strace is doing:
>
> wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], __WALL, NULL) = 10445
> ptrace(PTRACE_GETSIGINFO, 10445, 0, {si_signo=SIGTRAP, si_code=0x5,
> si_pid=10445, si_uid=0, si_value={int=0, ptr=0}}) = 0
> ptrace(PTRACE_PEEKUSER, 10445, 8*ORIG_RAX, [0x36]) = 0
> ptrace(PTRACE_PEEKUSER, 10445, 8*CS, [0x23]) = 0
> ptrace(PTRACE_PEEKUSER, 10445, 8*RAX, [0xffffffffffffffda]) = 0
> ptrace(PTRACE_PEEKUSER, 10445, 8*RBX, [0x1]) = 0
> ptrace(PTRACE_PEEKUSER, 10445, 8*RCX, [0x5401]) = 0
> ptrace(PTRACE_PEEKUSER, 10445, 8*RDX, [0xffb83f10]) = 0
> ptrace(PTRACE_SYSCALL, 10445, 0x1, SIG_0) = 0
>
> IOW: strace wastes a lot of time just transiting to the kernel and back
> with simple syscalls to get the information.
>
> What would help here is a "vectorized" waitpid operation
> which retrieves much more information in one go:
>
> waitpid_vec(int max_results, int status_vector[], siginfo_t
> si_vector[], struct pt_regs reg_vector[])

No. Start off with what we already have: make ptrace use
"PTRACE_GETREGSET" instead of PTRACE_PEEKUSER.

That should get the whole "user_regs_struct" in one single system call.

Linus

2011-03-04 17:16:43

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On 03/04, Jan Kratochvil wrote:
>
> On Tue, 01 Mar 2011 23:14:14 +0100, Denys Vlasenko wrote:
> > On Tue, Mar 1, 2011 at 8:06 PM, Jan Kratochvil <[email protected]> wrote:
> > > Currently it is already a problem that apps did not / do not expect the first
> > > waitpid after PTRACE_ATTACH may not be SIGSTOP.
> >
> > That's exactly why we want to add a better alternative, which doesn't
> > insert that blasted SIGSTOP.
>
> But it insteads blasted SIGTRAP (or some other signal) instead.

I don't really understand your concerns... If you modify gdb to use
PTRACE_SEIZE you can forget about the current problems with the first
signal.

Currently gdb has to take care, but mostly because it should "dismiss"
the real signal sent by attach.

> It would be best if such PTRACE_SEIZE (similar to PTRACE_INTERRUPT) would
> guarantee the first waitpid afterwards returns the artificial signal from
> PTRACE_SEIZE.

Again, I don't think this really matters.

Suppose that the tracee reports, say, a signal after PTRACE_SEIZE/INTERRUPT.
And this is possible anyway if the debugger races with kill(). Why this
is bad?

Oleg.

2011-03-04 18:13:17

by Jan Kratochvil

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Fri, 04 Mar 2011 18:07:37 +0100, Oleg Nesterov wrote:
> Suppose that the tracee reports, say, a signal after PTRACE_SEIZE/INTERRUPT.
> And this is possible anyway if the debugger races with kill(). Why this
> is bad?

I was asking if it is possible or if it could be avoided.

When you check gdb-6.8.tar it asserts the first received signal is SIGSTOP or
in a different case it ignores the first signal (whatever it is). This is
because if the programmer sees during the development the first signal that
comes is SIGSTOP (s)he automatically writes the code with that assumption.

When the tracer has a function to attach a task it should be a self-sufficient
function returning the tracee in some normal task like after other events.
So the attach operation should neither leave pending some excessive signals
nor it should eat some normal vital signals (like PTRACE_EVENT_FORK).

Sure the tracer can always handle it some way, ignore this signal, remember if
it has seen that signal etc. But if we design a new ptrace interface it
should be simple to use and it should not suggest coding racy/buggy tracers.


Thanks,
Jan

2011-03-04 18:25:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On 03/04, Tejun Heo wrote:
>
> > Tracee:
> >
> > int main(void)
> > {
> > kill(SIGSTOP, getpid());
> >
> > printf("I am running\n");
> >
> > for (;;)
> > ;
> > }
> >
> > To simplify again, suppose that the debugger attaches when it is
> > already stopped, then it does PTRACE_CONT(0).
> >
> > In this case the tracee remains SIGNAL_STOP_STOPPED but prints
> > "I am running" and enters the endless loop.
> >
> > (the new debugger can do PTRACE_SEIZE after that and "return"
> > it to the stopped state without affecting jctl state).
> >
> > Now, if SIGCONT comes (from anywhere) it clears SIGNAL_STOP_STOPPED,
> > the tracee traps and reports this event to debugger.
> >
> > Correct?
>
> * The above requires another ptrace trap site which can probably
> shared with PTRACE_SEIZE.

OK, thanks. I was a bit confused by TASK_TRACED | TASK_STOPPED state
idea. Obviously the state can't be TASK_RUNNING | TASK_STOPPED.

> The question is whether to make group
> stop state available for other trap sites too or just enable it in
> the new trap site. ATM, I'm leaning toward the latter.

Yes.

There is another corner case. Suppose that another SIGSTOP comes
while the tracee runs the endless loop above.

In this case nothing changes, the tracee should report this signal.
But what should it do if the debugger does PTRACE_CONT(SIGSTOP) after
that?

Should it stop and report another job control stop after that, or
should it ignore this signal? In the first case, at least we should
not notify the real parent again. In the latter case, perhaps the
naive debugger can be confused and this differs from the current
behaviour.

And, if it stops, should this also stop other PTRACE_CONT'ed threads
as well? Currently we do...

Not that I think this is terribly important, but I think it makes
sense to discuss/document this case anyway.


Anyway. I think this RFC is fine.

Oleg.

2011-03-04 19:00:23

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Friday 04 March 2011 18:12, Linus Torvalds wrote:
> On Fri, Mar 4, 2011 at 9:05 AM, Denys Vlasenko <[email protected]> wrote:
> >
> > Here's a typical sequence of operations strace is doing:
> >
> > wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], __WALL, NULL) = 10445
> > ptrace(PTRACE_GETSIGINFO, 10445, 0, {si_signo=SIGTRAP, si_code=0x5,
> > si_pid=10445, si_uid=0, si_value={int=0, ptr=0}}) = 0
> > ptrace(PTRACE_PEEKUSER, 10445, 8*ORIG_RAX, [0x36]) = 0
> > ptrace(PTRACE_PEEKUSER, 10445, 8*CS, [0x23]) = 0
> > ptrace(PTRACE_PEEKUSER, 10445, 8*RAX, [0xffffffffffffffda]) = 0
> > ptrace(PTRACE_PEEKUSER, 10445, 8*RBX, [0x1]) = 0
> > ptrace(PTRACE_PEEKUSER, 10445, 8*RCX, [0x5401]) = 0
> > ptrace(PTRACE_PEEKUSER, 10445, 8*RDX, [0xffb83f10]) = 0
> > ptrace(PTRACE_SYSCALL, 10445, 0x1, SIG_0) = 0
> >
> > IOW: strace wastes a lot of time just transiting to the kernel and back
> > with simple syscalls to get the information.
> >
> > What would help here is a "vectorized" waitpid operation
> > which retrieves much more information in one go:
> >
> > waitpid_vec(int max_results, int status_vector[], siginfo_t
> > si_vector[], struct pt_regs reg_vector[])
>
> No. Start off with what we already have: make ptrace use
> "PTRACE_GETREGSET" instead of PTRACE_PEEKUSER.
>
> That should get the whole "user_regs_struct" in one single system call.

Aye. That's the idea: get more info in one go.

I bet a part of the rationale for PTRACE_GETREGSET
was to get not one register per syscall, but all of them.

We have more of these:

readv/writev

Get not one signal, but many of them (signalfd).

Get not one task's wait notification, but many of them,
and also get accompanying siginfo etc (waitpidv?
or better, waitfd?)

--
vda

2011-03-04 19:24:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Fri, Mar 4, 2011 at 10:59 AM, Denys Vlasenko
<[email protected]> wrote:
>>
>> That should get the whole "user_regs_struct" in one single system call.
>
> Aye. That's the idea: get more info in one go.

My point is that it's totally idiotic to start talking about new
kernel interfaces, when strace doesn't even use the ones we HAVE, and
have had for a long time.

So the strace use is clearly not a valid reason to start worrying
about new interfaces. It doesn't even care about all the improvements
we've done several years ago.

Linus

2011-03-05 08:33:45

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hey, Oleg.

On Fri, Mar 04, 2011 at 07:16:30PM +0100, Oleg Nesterov wrote:
> > The question is whether to make group
> > stop state available for other trap sites too or just enable it in
> > the new trap site. ATM, I'm leaning toward the latter.

I changed my mind and am now leaning toward the former. That would
make things easier for debuggers which resume the tracee but wants to
keep track of the job control state. Let's see how implementation
turns out. Planning can do only so much and implementation often
comes with surprises of its own.

> There is another corner case. Suppose that another SIGSTOP comes
> while the tracee runs the endless loop above.
>
> In this case nothing changes, the tracee should report this signal.
> But what should it do if the debugger does PTRACE_CONT(SIGSTOP) after
> that?
>
> Should it stop and report another job control stop after that, or
> should it ignore this signal? In the first case, at least we should
> not notify the real parent again. In the latter case, perhaps the
> naive debugger can be confused and this differs from the current
> behaviour.

Interesting. Ignoring would be the cleaner choice but I think we're
bound to deliver it to the tracer and suppress notification to the
real parent; otherwise, the behavior would change a bit too much - an
invisible stop signal mask out of nowhere wouldn't be too pleasant.

> And, if it stops, should this also stop other PTRACE_CONT'ed threads
> as well? Currently we do...

I think so; otherwise, again, the behavior would change too much for
the current users. This definitely needs to be documented in detail.

> Not that I think this is terribly important, but I think it makes
> sense to discuss/document this case anyway.

Yeah, definitely. I wasn't thinking about cases like the above but we
need to so please keep them coming. :-)

> Anyway. I think this RFC is fine.

Great. Thanks.

--
tejun

2011-03-05 08:47:14

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Fri, Mar 04, 2011 at 07:12:50PM +0100, Jan Kratochvil wrote:
> On Fri, 04 Mar 2011 18:07:37 +0100, Oleg Nesterov wrote:
> > Suppose that the tracee reports, say, a signal after PTRACE_SEIZE/INTERRUPT.
> > And this is possible anyway if the debugger races with kill(). Why this
> > is bad?
>
> I was asking if it is possible or if it could be avoided.

First of all, if the conditions are distinguishible reliably, I think
things should be fine. Also, after seizing, at least two conditions
must be distinguishible immediately - whether the tracee was in group
stop or just ptrace trapped, so I don't think guaranteeing single
condition is feasible to begin with.

> When the tracer has a function to attach a task it should be a self-sufficient
> function returning the tracee in some normal task like after other events.
> So the attach operation should neither leave pending some excessive signals
> nor it should eat some normal vital signals (like PTRACE_EVENT_FORK).

That was one of the reasons why I suggested PTRACE_SEIZE to be both
attach and interrupt. In both cases, the possible states of the task
would be exactly the same. As debugger is likely to travel various
paths in the interrupt case regularly, it will be much easier to get
it right and once you get that right, attach becomes right at no
additional cost!

> Sure the tracer can always handle it some way, ignore this signal,
> remember if it has seen that signal etc. But if we design a new
> ptrace interface it should be simple to use and it should not
> suggest coding racy/buggy tracers.

There's no signal to ignore. It might not be as simple as you fancy
but will (at least strive to) be reliable and race-free. I think
racy/buggy tracers have a lot more to do with lack of documentation
and clear rules. Everyone is left to guess and everyone of course
ends up with different conclusions. Let's add a detailed
documentation of rules and subtleties.

Thanks.

--
tejun

2011-03-07 15:16:43

by Oleg Nesterov

[permalink] [raw]
Subject: PTRACE_SEIZE/INTERRUPT: [RFC] Proposal for ptrace improvements

On 03/01, Tejun Heo wrote:
>
> P4. PTRACE_SEIZE

Now that we more or less agree with Tejun's ideas, I'd like to
discuss the "cosmetic" issues. But first of all let me repeat,
I do not have the strong opinion, I agree with everything in
advance.

The only really important thing (to me) is that ATTACH/INTERRUPT do
not induce SIGSTOP but force the tracee to trap (if needed), this
looks obviously good.

> I don't think it's a good idea to attach without putting the tracee
> into TASK_TRACED.

Agreed.

Jan, do you have any reason for attach-without-putting-into-TRACED?

> I can't see much, if any, benefit in implementing ATTACH and INTERRUPT
> separately.

I can ;) Although the benefit is minor.

First of all, the single request can "hide" the user-space bugs.
For example, gdb is very complex, suppose that it uses the wrong
pid during the debugging session. In this case it would be nice if
the kernel returns the error instead of attaching to the wrong
thread.

Of course, gdb can use the wrong pid during attach as well, but
this is less likely. Sometimes gdb doesn't even know that some
thread has gone away, suppose that it blindly does PTRACE_SEIZE
after that and this pid was alredy reused.

Or. Suppose that gdb tries to attach the whole thread group (it
always does this), and attaches to the same thread twice by mistake.
The kernel can help in this case and return the error.

And I think there are other reasons. Say, suppose we want to add
the options for ATTACH/INTERRUPT. Right now I do not see the need,
but who knows.


And. Error codes. Currently ptrace_check_attach() can only return
ESRCH, this doesn't look very convenient. If we add INTERRUPT we
can make this more informative. The same for PTRACE_ATTACH, btw.
EPERM should mean security error, nothing else. IOW, error codes
should be different depending on attach/interrupt mode.


In any case. Jan, Denys, what do you think? Your opinion is more
important, I can only speculate.



Final note... Previously I thought that we should not (I meant, can
not) change the current behaviour of PTRACE_O_TRACECLONE/etc which
sends SIGSTOP during the auto-attach. Now I am not sure, probably
we can avoid SIGSTOP if the forking task was PTRACE_SEIZE'ed. IOW,
perhaps the new ATTACH can have other "side effects". But, otoh,
this can complicate the transition to the new requests. Say, you
can't simply change strace to use PTRACE_SEIZE without auditing
the "-f" code.


And. This is off-topic, but we can also add PTRACE_DETACH_XXX which
does not require the stopped tracee. strace certainly needs it,
although INTERRUPT can solve most of the problems.

Oleg.

2011-03-07 20:44:26

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

I've only skimmed through this whole thread, and I'm not going to try to
respond to all the details. I've lost interest in working in this area
and I don't plan to keep up with all the details any more. If you want
to reach me about kernel subjects after March 11, you'll need to use the
address <[email protected]> as I won't be getting @redhat.com any more.

I'll just give you a few thoughts that I don't think have been raised.
Then I'll leave it to Oleg and the rest of you to decide how to move
forward.

I've said before more than once what I think are the important
principles about compatibility that ought to be maintained so as not to
break existing applications such as older versions of GDB and strace
(not to mention things less well-known and not publically visible, where
code has come to depend on details of ptrace behavior and there may not
even be anyone who really knows what they are depending on by now).
When real-world applications have worked in practice, even if the
behavior they were seeing was not pedantically reliable, they should not
be broken. Saner behavior can be provided when new requests or new
options are used, without breaking any old usage.

I see the appeal of the PTRACE_SEIZE idea, and I don't dismiss it
entirely. But I am quite skeptical that this is a good approach to be
the sole new mechanism to replace PTRACE_ATTACH with something sane and
predictable. There are a few things that concern me.

A problem long identified with ptrace is that there is no way to attach
or detach without perturbing some of the user-visible behavior of the
traced threads. (There will always be some perturbation of the timing
of the thread's activities, but I mean factors other than that alone.)
Not overloading SIGSTOP is certainly an improvement. But, PTRACE_SEIZE
still has this problem in ways that the proposed PTRACE_ATTACH_NOSTOP
does not. For any passive tracing use (such as strace -p), you don't
actually want the thing to stop right away, you only want it to stop
when a new event happens (such as the next syscall entry/exit). The
PTRACE_SEIZE idea does not give the option of attaching without any
perturbation when you don't care about "seizing".

Anything that works via interruption can perturb the user-visible
behavior of a system call already in progress. It would be nice if all
uninterruptible waits were truly reliably short and if all system call
paths supported syscall restart thoroughly so that they could be
interrupted with TIF_SIGPENDING and then restarted (a la SA_RESTART, or
its equivalent when there is no actual signal to handle) with no change
in semantics that userland can perceive (aside from timing). But it
just isn't so, and the way the kernel is organized makes it a difficult
and open-ended task (perhaps an impossible one for some cases) to try to
hunt down and fix every violation of that principle or to prevent
introductions of new violations in the future.

The PTRACE_INTERRUPT idea addresses this in two key ways. Firstly, it's
a separate request not unavoidably implied by attaching, so tracers have
the option of simply not doing it if they don't want to perturb the
tracee at all. Second, the PTRACE_INTERRUPT idea came with a variant
that's either a parallel request PTRACE_REPORT, or option flags in the
one request, or whatever, that does not interrupt at all. Instead, it
uses the TIF_NOTIFY_RESUME mechanism (see set_notify_resume() in
linux/tracehook.h) to arrest all user-mode activity without affecting
kernel-mode activity at all (hence, the report/stop may not be immediate
at all if the thread might block in the kernel, but it will immediately
guarantee that no more user instructions will run before a report).
These possibilities give an intelligent tracer the opportunity to
control the user thread in the ways it needs to while avoiding
perturbation of the user-visible semantics of kernel operations.

The other areas of concern with PTRACE_SEIZE are its robustness and
scalability. The whole point of this request is that the one ptrace
call does a full synchronization with the tracee, blocking until it has
been interrupted and stopped. This necessarily entails some ping-pong
scheduling to interrupt the thread, yield to it, and wait for it to
stop, and yield back to the tracer thread. That's fine and useful to
roll into one kernel operation when you want to synchronously arrest a
single tracee thread. But there are two problems there.

For robustness, the PTRACE_SEIZE block itself must be interruptible.
If the tracer thread will block arbitrarily waiting for the tracee
thread to finish up its kernel activities and stop, then it would be
irresponsible to make it impossible for the tracer thread to be
interrupted from this block. If it does get interrupted, then in what
state does it leave the tracee? Normal principles of interrupted and
restartable calls would suggest that it leave the tracee as it was
before, i.e. unattached. Is that the plan? If not, then would it
leave the tracee attached but not "seized", which defeats the whole
purpose of simple and predictable behavior for the tracer. If the
call is not interruptible at all, then that's a recipe for permanently
wedged debugger threads and that's just a lousy thing to permit. Of
course the bare minimum would be to make it a noninterruptible but
killable wait (TASK_KILLABLE), meaning all you can do is SIGKILL a
wedged debugger. That's better than nothing, but it's hardly good
enough to allow one to write a robust debugger in any sensible fashion.

Finally, there is the scalability concern. This synchronization delay
happens inside each single PTRACE_SEIZE call acting on one tracee
thread. When a debugger wants to attach to a process, it wants to
attach to all the threads in that process, and there can be thousands.
For that to work at all reasonably with nontrivial numbers of threads,
the act of attaching to each individual thread should be asynchronous.
>From the debugger implementor's perspective, it's certainly even more
desireable to have a single kernel call that will attach to all the
threads in the process. But that needs to be scalable as well, and
it's not necessarily at all what the debugger wants that you stop each
and every thread at attach time.

None of this means at all that PTRACE_SEIZE is worthless. But it is
certainly inadequate to meet the essential needs that motivate adding
new interfaces in this area. The PTRACE_ATTACH_NOSTOP idea I
suggested is far from complete for all the issues as well, but it is a
more versatile building block than PTRACE_SEIZE.

I don't intend to debate all these subjects. As I said, I'm no longer
going to work on this area of the kernel. I'm glad Oleg and Tejun are
putting effort into improving it, regardless of how the details shake
out. I've raised some issues that I think have been overlooked by the
discussion so far. I hope they'll now be considered more carefully in
how the work moves forward.


Thanks,
Roland

2011-03-09 09:41:44

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_SEIZE/INTERRUPT: [RFC] Proposal for ptrace improvements

Hello, Oleg.

On Mon, Mar 07, 2011 at 04:08:03PM +0100, Oleg Nesterov wrote:
> Now that we more or less agree with Tejun's ideas,

Yay! I finally succeeded at wearing down everyone. :-)

> > I can't see much, if any, benefit in implementing ATTACH and INTERRUPT
> > separately.
>
> I can ;) Although the benefit is minor.
>
> First of all, the single request can "hide" the user-space bugs.
> For example, gdb is very complex, suppose that it uses the wrong
> pid during the debugging session. In this case it would be nice if
> the kernel returns the error instead of attaching to the wrong
> thread.
>
> Of course, gdb can use the wrong pid during attach as well, but
> this is less likely. Sometimes gdb doesn't even know that some
> thread has gone away, suppose that it blindly does PTRACE_SEIZE
> after that and this pid was alredy reused.
>
> Or. Suppose that gdb tries to attach the whole thread group (it
> always does this), and attaches to the same thread twice by mistake.
> The kernel can help in this case and return the error.

Hmmm, okay, I see.

> And I think there are other reasons. Say, suppose we want to add
> the options for ATTACH/INTERRUPT. Right now I do not see the need,
> but who knows.

I think it would actually be better to share the option flags. The
two operations (whether implemented as separate operations or not)
share the interrupting aspect and I think using separate set of
options can be a bit confusing. Hmmm... maybe it's actually better to
make them have different prefixes and let the attach also accept the
interrupt flags.

> And. Error codes. Currently ptrace_check_attach() can only return
> ESRCH, this doesn't look very convenient. If we add INTERRUPT we
> can make this more informative. The same for PTRACE_ATTACH, btw.
> EPERM should mean security error, nothing else. IOW, error codes
> should be different depending on attach/interrupt mode.

I think there are two ways - two separate operations, say SEIZE (I
want to a different verb from ATTACH to avoid confusion) and
INTERRUPT, or let SEIZE take an option which enables attach mode, but
if we're gonna distinguish them it's probably better to opt for
separate opcodes.

> Final note... Previously I thought that we should not (I meant, can
> not) change the current behaviour of PTRACE_O_TRACECLONE/etc which
> sends SIGSTOP during the auto-attach. Now I am not sure, probably
> we can avoid SIGSTOP if the forking task was PTRACE_SEIZE'ed. IOW,
> perhaps the new ATTACH can have other "side effects". But, otoh,
> this can complicate the transition to the new requests. Say, you
> can't simply change strace to use PTRACE_SEIZE without auditing
> the "-f" code.

We can add attach option SIGSTOP_ON_AUTOATTACH to help the transition
but then again it requires userland application change anyway so I
think it would be better to simply enforce the new behavior when the
new attach is used. It's not like lack of SIGSTOP is gonna be super
subtle.

> And. This is off-topic, but we can also add PTRACE_DETACH_XXX which
> does not require the stopped tracee. strace certainly needs it,
> although INTERRUPT can solve most of the problems.

I don't know. The thing is that guaranteeing the tracee is in
TASK_TRACED on attach/detach prevents a lot of subtle corner cases.
Unless there are pretty compelling reasons, I'd like to keep that
invariant intact. What else does strace require that can't be
provided by INTERRUPT + DETACH?

Thanks.

--
tejun

2011-03-09 10:33:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hello, Roland.

On Mon, Mar 07, 2011 at 12:43:46PM -0800, Roland McGrath wrote:
> I've only skimmed through this whole thread, and I'm not going to try to
> respond to all the details. I've lost interest in working in this area
> and I don't plan to keep up with all the details any more. If you want
> to reach me about kernel subjects after March 11, you'll need to use the
> address <[email protected]> as I won't be getting @redhat.com any more.

I see. That's a pity. I'll keep the new address cc'd for related
discussions.

> I've said before more than once what I think are the important
> principles about compatibility that ought to be maintained so as not to
> break existing applications such as older versions of GDB and strace
> (not to mention things less well-known and not publically visible, where
> code has come to depend on details of ptrace behavior and there may not
> even be anyone who really knows what they are depending on by now).
> When real-world applications have worked in practice, even if the
> behavior they were seeing was not pedantically reliable, they should not
> be broken. Saner behavior can be provided when new requests or new
> options are used, without breaking any old usage.

The biggest changes the current ptrace users are gonna see are
probably the ones from P1 and those are really corner cases - /proc
state, behavior change visible only to other thread in a multithreaded
debugger, and behavior change on back-to-back DETACH/ATTACH sequence
on STOPPED task, which BTW was broken due to the extra
wake_up_process() anyway.

The biggest visible changes are the ones visible to a real parent
while the children are being ptraced - most of the changes introduced
by the recent P2 patchset. As noted there, I don't think
conditionalizing those behavior changes is necessary given that the
previous behavior was utterly broken. If somebody was actually
depending on job control events being broken while ptraced, well, I
primarily don't care, but if the problem actually is significant we'll
think about workarounds.

What I'm trying to say is that it's _ALWAYS_ about balances and trade
offs. Sticking to some or any rules in fundamentalistic manner is a
guaranteed way to horrible code base which is not only painful to
develop and maintain but also will deliever a lot of WTF moments to
its users too in the long run.

So, let's balance it. Avoiding changes to the userland visible
behaviors does have a lot of weight but its mass is NOT infinite.

> A problem long identified with ptrace is that there is no way to attach
> or detach without perturbing some of the user-visible behavior of the
> traced threads. (There will always be some perturbation of the timing
> of the thread's activities, but I mean factors other than that alone.)
> Not overloading SIGSTOP is certainly an improvement. But, PTRACE_SEIZE
> still has this problem in ways that the proposed PTRACE_ATTACH_NOSTOP
> does not. For any passive tracing use (such as strace -p), you don't
> actually want the thing to stop right away, you only want it to stop
> when a new event happens (such as the next syscall entry/exit). The
> PTRACE_SEIZE idea does not give the option of attaching without any
> perturbation when you don't care about "seizing".
>
> Anything that works via interruption can perturb the user-visible
> behavior of a system call already in progress. It would be nice if all
> uninterruptible waits were truly reliably short and if all system call
> paths supported syscall restart thoroughly so that they could be
> interrupted with TIF_SIGPENDING and then restarted (a la SA_RESTART, or
> its equivalent when there is no actual signal to handle) with no change
> in semantics that userland can perceive (aside from timing). But it
> just isn't so, and the way the kernel is organized makes it a difficult
> and open-ended task (perhaps an impossible one for some cases) to try to
> hunt down and fix every violation of that principle or to prevent
> introductions of new violations in the future.

But the only side effect would be that from signal_wake_up(). Our
hibernation code does that to every single thread and naturally any
signal delivery would also do that. It's something fundamentally
ingrained into the design of the whole UNIX syscall mechanism. If we
have undocumented behaviors there, we should fix and/or document them.
I don't think ptrace is the right place to to incorporate workaround
for such basic assumption.

Also, ptrace is inherently a very heavy mechanism. It is intertwined
with the whole process model and hijacks the target task and if you
look at the provided operations, they aren't designed for light weight
monitoring. The whole thing is designed to be heavy weight for dirty
diddling.

If someone is looking for completely transparent light weight
monitoring, there is a much better fitting mechanism for that and it
works frigging well and provides much better insight into what's going
on with the system.

Use tracing for tracing.

> The other areas of concern with PTRACE_SEIZE are its robustness and
> scalability. The whole point of this request is that the one ptrace
> call does a full synchronization with the tracee, blocking until it has
> been interrupted and stopped.

No, I'm planning to do the waiting by wait(2), so there won't be
latency, interruptible sleep or scalability (compared to the current
attach) problems.

> None of this means at all that PTRACE_SEIZE is worthless. But it is
> certainly inadequate to meet the essential needs that motivate adding
> new interfaces in this area. The PTRACE_ATTACH_NOSTOP idea I
> suggested is far from complete for all the issues as well, but it is a
> more versatile building block than PTRACE_SEIZE.

I skipped a lot of parts but in general I think that you're trying to
do too much with ptrace. ptrace has its place which is called
debugging. Let's concentrate on that. It doesn't have to do every
thing one can dream of. There are a lot better tools for most of
them.

Thanks.

--
tejun

2011-03-09 17:39:26

by Oleg Nesterov

[permalink] [raw]
Subject: Re: PTRACE_SEIZE/INTERRUPT: [RFC] Proposal for ptrace improvements

Hi Tejun,

On 03/09, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Mon, Mar 07, 2011 at 04:08:03PM +0100, Oleg Nesterov wrote:
> > Now that we more or less agree with Tejun's ideas,
>
> Yay! I finally succeeded at wearing down everyone. :-)

Yes, thanks for correcting me, this is what I actually meant ;)

> > And I think there are other reasons. Say, suppose we want to add
> > the options for ATTACH/INTERRUPT. Right now I do not see the need,
> > but who knows.
>
> I think it would actually be better to share the option flags. The
> two operations (whether implemented as separate operations or not)
> share the interrupting aspect and I think using separate set of
> options can be a bit confusing. Hmmm... maybe it's actually better to
> make them have different prefixes and let the attach also accept the
> interrupt flags.

Perhaps, I agree with everything. As I said, I don't have a strong
opinion, just some random thoughts.

> > Final note... Previously I thought that we should not (I meant, can
> > not) change the current behaviour of PTRACE_O_TRACECLONE/etc which
> > sends SIGSTOP during the auto-attach. Now I am not sure, probably
> > we can avoid SIGSTOP if the forking task was PTRACE_SEIZE'ed. IOW,
> > perhaps the new ATTACH can have other "side effects". But, otoh,
> > this can complicate the transition to the new requests. Say, you
> > can't simply change strace to use PTRACE_SEIZE without auditing
> > the "-f" code.
>
> We can add attach option SIGSTOP_ON_AUTOATTACH to help the transition
> but then again it requires userland application change anyway so I
> think it would be better to simply enforce the new behavior when the
> new attach is used. It's not like lack of SIGSTOP is gonna be super
> subtle.

Agreed.

> > And. This is off-topic, but we can also add PTRACE_DETACH_XXX which
> > does not require the stopped tracee. strace certainly needs it,
> > although INTERRUPT can solve most of the problems.
>
> I don't know. The thing is that guaranteeing the tracee is in
> TASK_TRACED on attach/detach prevents a lot of subtle corner cases.

Agreed, but detach is different. It has to work with the running
tracee anyway.

However,

> Unless there are pretty compelling reasons, I'd like to keep that
> invariant intact. What else does strace require that can't be
> provided by INTERRUPT + DETACH?

Yes, probably INTERRUPT + DETACH is enough. At least this certainly
solves the most annoying problems. IIRC. Denys can correct me.

Oleg.

2011-03-10 15:55:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Mon, Mar 07, 2011 at 12:43:46PM -0800, Roland McGrath wrote:
> I've only skimmed through this whole thread, and I'm not going to try to
> respond to all the details. I've lost interest in working in this area
> and I don't plan to keep up with all the details any more. If you want
> to reach me about kernel subjects after March 11, you'll need to use the
> address <[email protected]> as I won't be getting @redhat.com any more.

Sorry to hear this. I guess working on ptrace make anyone want to leave.

I guess your last patch to this matter should then be:

diff --git a/MAINTAINERS b/MAINTAINERS
index 560ecce..62859ca 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5022,7 +5022,6 @@ S: Maintained
F: drivers/block/ps3vram.c

PTRACE SUPPORT
-M: Roland McGrath <[email protected]>
M: Oleg Nesterov <[email protected]>
S: Maintained
F: include/asm-generic/syscall.h


I hope you will still be involved in the community for other aspects of the
kernel.

-- Steve

2011-03-10 18:33:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

On Wed, Mar 09, 2011 at 11:28:55AM +0100, Tejun Heo wrote:
>
> If someone is looking for completely transparent light weight
> monitoring, there is a much better fitting mechanism for that and it
> works frigging well and provides much better insight into what's going
> on with the system.
>
> Use tracing for tracing.
>

Hmm, what tracing utility exactly? If I want to trace a running task,
that I have the debug info on it where I would have the ability to
insert probes, which utility would you recommend?

strace and gdb use ptrace

ftrace focuses on the kernel.

I don't think perf has a good way to trace userspace yet.

I haven't taken a good look at lttng, but I think it has some sort of
library that is attached to the process. Is there a better way than
attching a library to said task.

systemtap may have ways too, but I think it depends on utrace which has
pretty much been nak'd in the kernel.

-- Steve

2011-03-11 08:13:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

Hello, Steven.

On Thu, Mar 10, 2011 at 01:33:06PM -0500, Steven Rostedt wrote:
> > Use tracing for tracing.
>
> Hmm, what tracing utility exactly? If I want to trace a running task,
> that I have the debug info on it where I would have the ability to
> insert probes, which utility would you recommend?
>
> strace and gdb use ptrace

strace only monitors kernel events - syscalls, fork, exit and signals.
We'll probably need to beef up the tracepoints a bit to reach feature
parity but it shouldn't be too difficult and provides much better
insight into what happens inside the kernel.

> ftrace focuses on the kernel.
>
> I don't think perf has a good way to trace userspace yet.
>
> I haven't taken a good look at lttng, but I think it has some sort of
> library that is attached to the process. Is there a better way than
> attching a library to said task.
>
> systemtap may have ways too, but I think it depends on utrace which has
> pretty much been nak'd in the kernel.

I don't have much idea on how to do userland tracing but am pretty
sure ptrace is not the answer. ptrace involves switching to another
process context and back on each event. It just ain't gonna work and
none of Roland's suggestions changes anything about that.

Thanks.

--
tejun

2011-03-11 08:23:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements


* Steven Rostedt <[email protected]> wrote:

> I don't think perf has a good way to trace userspace yet.

uprobes are being worked on (and that comes integrated with perf), and that will
give a good mechanism - and it (of course) wont be doing any context-switching.

Peter might be able to fill us in on the details of the review progress of the
uprobes patches.

> ftrace focuses on the kernel.

It will hopefully be integrated into the perf ABI soon, so that developers and users
have a unified event platform for doing tracing, profiling, event logging and more -
all of which go back to a very similar looking event platform. (Not sure what's
holding up that process - there's no technical roadblocks really - it 'just has to
be done'.)

Thanks,

Ingo

2011-03-11 09:40:19

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements

>
> * Steven Rostedt <[email protected]> wrote:
>
> > I don't think perf has a good way to trace userspace yet.
>
> uprobes are being worked on (and that comes integrated with perf), and that will
> give a good mechanism - and it (of course) wont be doing any context-switching.
>

> Peter might be able to fill us in on the details of the review progress of the
> uprobes patches.

Just an update on uprobes.
I am currently testing my next version of uprobes and I should be
sending it out for review next monday.


--
Thanks and Regards
Srikar

2011-03-11 09:43:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements


* Srikar Dronamraju <[email protected]> wrote:

> >
> > * Steven Rostedt <[email protected]> wrote:
> >
> > > I don't think perf has a good way to trace userspace yet.
> >
> > uprobes are being worked on (and that comes integrated with perf), and that will
> > give a good mechanism - and it (of course) wont be doing any context-switching.
> >
>
> > Peter might be able to fill us in on the details of the review progress of the
> > uprobes patches.
>
> Just an update on uprobes.
> I am currently testing my next version of uprobes and I should be
> sending it out for review next monday.

Great, thanks!

Ingo

2011-03-14 01:04:31

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: [RFC] Proposal for ptrace improvements


Hi -

tj wrote:

> [...]
>> I've said before more than once what I think are the important
>> principles about compatibility that ought to be maintained [...]
> The biggest changes the current ptrace users are gonna see are
> probably the ones from P1 and those are really corner cases - /proc
> state, behavior change visible only to other thread in a multithreaded
> debugger [...]
> Also, ptrace is inherently a very heavy mechanism. [...]
> If someone is looking for completely transparent light weight
> monitoring [look elsewhere...]
> I skipped a lot of parts but in general I think that you're trying
> to do too much with ptrace. ptrace has its place which is called
> debugging. [...]

Take care not to define your problems away by something close to a
false dichotomy: intrusive ptrace vs. transparent tracing. One needs
transparent enough debugging too, so that the presence of the debugger
does not subvert the occurrence of complex or transient phenomena.
This hazard is especially acute for multithreaded programs. While
ptrace is known to be an imperfect substrate for multithreaded
debugging, it would be nice not to make it any worse.

Note that I'm not claiming that your current proposals necessarily
would do so, but the argument for treating ptrace as machinery
appropriate only for "heavy-weight diddling" could lead that way.

- FChE