2006-03-15 09:24:47

by Chuck Ebbert

[permalink] [raw]
Subject: [RFC] Proposed manpage additions for ptrace(2)

The following is what I propose to add the the manpages entry for
ptrace(2). Some of it came from experimentation, some from linux-kernel
messages and the rest came from reading the source code.


PTRACE_GETSIGINFO
Retrieve information about the signal that caused the stop.
Copies a siginfo_t from the child to location data in the par-
ent.

PTRACE_SETSIGINFO
Set signal information. Copies a siginfo_t from location data
in the parent to the child.

PTRACE_SETOPTIONS
Sets ptrace options from data in the parent. data is inter-
preted as a bitmask of options, which are specified by the fol-
lowing (addr is ignored:)

PTRACE_O_TRACESYSGOOD
When delivering syscall traps, set bit 7 in the signal
number (i.e. deliver (SIGTRAP | 0x80) This makes it easy
for the tracer to tell the difference between normal
traps and those caused by a syscall.

PTRACE_O_TRACEFORK
Stop the child at the next fork() call with SIGTRAP |
PTRACE_EVENT_FORK << 8 and automatically start tracing
the newly forked process, which will start with a
SIGSTOP. The pid for the new process can be retrieved
with PTRACE_GETEVENTMSG.

PTRACE_O_TRACEVFORK
Stop the child at the next vfork() call with SIGTRAP |
PTRACE_EVENT_VFORK << 8 and automatically start tracing
the the newly vforked process, which will start with a
SIGSTOP. The pid for the new process can be retrieved
with PTRACE_GETEVENTMSG.

PTRACE_O_TRACECLONE
Stop the child at the next clone() call with SIGTRAP |
PTRACE_EVENT_CLONE << 8 and automatically start tracing
the newly cloned process, which will start with a
SIGSTOP. The pid for the new process can be retrieved
with PTRACE_GETEVENTMSG.

PTRACE_O_TRACEEXEC
Stop the child at the next exec() call with SIGTRAP |
PTRACE_EVENT_EXEC << 8.

PTRACE_O_TRACEVFORKDONE
Stop the child at the completion of the next vfork() call
with SIGTRAP | PTRACE_EVENT_VFORK_DONE << 8.

PTRACE_O_TRACEEXIT
Stop the child at exit with SIGTRAP | PTRACE_EVENT_EXIT
<< 8. The child?s exit status can be retrieved with
PTRACE_GETEVENTMSG. This stop will be done early during
process exit whereas the normal notification is done
after the process is done exiting.

PTRACE_GETEVENTMSG
Retrieve a message (as an unsigned long) about the ptrace event
that just happened to the location data in the parent. For
PTRACE_EVENT_EXIT this is the child?s exit code. For
PTRACE_EVENT_FORK, PTRACE_EVENT_VFORK and PTRACE_EVENT_CLONE
this is the pid of the new process.

PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
For PTRACE_SYSEMU, continue and stop on entry to the next
syscall, which will not be executed. For PTRACE_SYSEMU_SIN-
GLESTEP, so the same but also singlestep if not a syscall.

--
Chuck
"Penguins don't come from next door, they come from the Antarctic!"


2006-03-15 20:39:18

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [RFC] Proposed manpage additions for ptrace(2)

Chuck,

> The following is what I propose to add the the manpages entry
> for ptrace(2). Some of it came from experimentation, some from
> linux-kernel messages and the rest came from reading the source code.

Thanks -- this looks promising. I'm not sure, but I think Daniel Jacobowitz
made a number of these additions to ptrace(). Perhaps he
has some comments on the accuracy and completeness of the text.

Any comments Daniel?

Cheers,

Michael


> PTRACE_GETSIGINFO
> Retrieve information about the signal that caused the stop.
> Copies a siginfo_t from the child to location data in the par-
> ent.
>
> PTRACE_SETSIGINFO
> Set signal information. Copies a siginfo_t from location data
> in the parent to the child.
>
> PTRACE_SETOPTIONS
> Sets ptrace options from data in the parent. data is inter-
> preted as a bitmask of options, which are specified by the fol-
> lowing (addr is ignored:)
>
> PTRACE_O_TRACESYSGOOD
> When delivering syscall traps, set bit 7 in the signal
> number (i.e. deliver (SIGTRAP | 0x80) This makes it easy
> for the tracer to tell the difference between normal
> traps and those caused by a syscall.
>
> PTRACE_O_TRACEFORK
> Stop the child at the next fork() call with SIGTRAP |
> PTRACE_EVENT_FORK << 8 and automatically start tracing
> the newly forked process, which will start with a
> SIGSTOP. The pid for the new process can be retrieved
> with PTRACE_GETEVENTMSG.
>
> PTRACE_O_TRACEVFORK
> Stop the child at the next vfork() call with SIGTRAP |
> PTRACE_EVENT_VFORK << 8 and automatically start tracing
> the the newly vforked process, which will start with a
> SIGSTOP. The pid for the new process can be retrieved
> with PTRACE_GETEVENTMSG.
>
> PTRACE_O_TRACECLONE
> Stop the child at the next clone() call with SIGTRAP |
> PTRACE_EVENT_CLONE << 8 and automatically start tracing
> the newly cloned process, which will start with a
> SIGSTOP. The pid for the new process can be retrieved
> with PTRACE_GETEVENTMSG.
>
> PTRACE_O_TRACEEXEC
> Stop the child at the next exec() call with SIGTRAP |
> PTRACE_EVENT_EXEC << 8.
>
> PTRACE_O_TRACEVFORKDONE
> Stop the child at the completion of the next vfork() call
> with SIGTRAP | PTRACE_EVENT_VFORK_DONE << 8.
>
> PTRACE_O_TRACEEXIT
> Stop the child at exit with SIGTRAP | PTRACE_EVENT_EXIT
> << 8. The child?s exit status can be retrieved with
> PTRACE_GETEVENTMSG. This stop will be done early during
> process exit whereas the normal notification is done
> after the process is done exiting.
>
> PTRACE_GETEVENTMSG
> Retrieve a message (as an unsigned long) about the ptrace event
> that just happened to the location data in the parent. For
> PTRACE_EVENT_EXIT this is the child?s exit code. For
> PTRACE_EVENT_FORK, PTRACE_EVENT_VFORK and PTRACE_EVENT_CLONE
> this is the pid of the new process.
>
> PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
> For PTRACE_SYSEMU, continue and stop on entry to the next
> syscall, which will not be executed. For PTRACE_SYSEMU_SIN-
> GLESTEP, so the same but also singlestep if not a syscall.

--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
ftp://ftp.win.tue.nl/pub/linux-local/manpages/,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2006-03-16 20:02:06

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [RFC] Proposed manpage additions for ptrace(2)

First of all, thanks for doing this.

One nice addition might be when they became available; it varies,
but most of these were mid-2.5. PTRACE_O_TRACESYSGOOD is older,
but it hasn't always worked on all architectures (and I'm not sure if
it does now).

On Wed, Mar 15, 2006 at 04:12:10AM -0500, Chuck Ebbert wrote:
> The following is what I propose to add the the manpages entry for
> ptrace(2). Some of it came from experimentation, some from linux-kernel
> messages and the rest came from reading the source code.
>
>
> PTRACE_GETSIGINFO
> Retrieve information about the signal that caused the stop.
> Copies a siginfo_t from the child to location data in the par-
> ent.
>
> PTRACE_SETSIGINFO
> Set signal information. Copies a siginfo_t from location data
> in the parent to the child.

Right. These are usable any time we're stopped in ptrace_stop, and
nowadays I think that ought to be any time we're stopped. However,
when we come from ptrace_notify, you'll notice that the resulting
siginfo is discarded! It's only used in the get_signal_to_deliver
case.

I don't know offhand if there's a robust way to tell which of those
two cases we're in.

> PTRACE_O_TRACEFORK
> Stop the child at the next fork() call with SIGTRAP |
> PTRACE_EVENT_FORK << 8 and automatically start tracing
> the newly forked process, which will start with a
> SIGSTOP. The pid for the new process can be retrieved
> with PTRACE_GETEVENTMSG.
>
> PTRACE_O_TRACEVFORK
> Stop the child at the next vfork() call with SIGTRAP |
> PTRACE_EVENT_VFORK << 8 and automatically start tracing
> the the newly vforked process, which will start with a
> SIGSTOP. The pid for the new process can be retrieved
> with PTRACE_GETEVENTMSG.
>
> PTRACE_O_TRACECLONE
> Stop the child at the next clone() call with SIGTRAP |
> PTRACE_EVENT_CLONE << 8 and automatically start tracing
> the newly cloned process, which will start with a
> SIGSTOP. The pid for the new process can be retrieved
> with PTRACE_GETEVENTMSG.

Specifically, the three kinds of cloning are distinguished as:

if CLONE_VFORK -> PTRACE_EVENT_VFORK
else if clone exit signal == SIGCHLD -> PTRACE_EVENT_FORK
else PTRACE_EVENT_CLONE

You need to do some juggling to get the actual clone flags.

> PTRACE_O_TRACEEXEC
> Stop the child at the next exec() call with SIGTRAP |
> PTRACE_EVENT_EXEC << 8.
>
> PTRACE_O_TRACEVFORKDONE
> Stop the child at the completion of the next vfork() call
> with SIGTRAP | PTRACE_EVENT_VFORK_DONE << 8.

Right. BTW, I believe there are still some potential deadlocks between
the vfork event and the vfork done event; I used to regularly generate
unkillable processes working on this code.

> PTRACE_O_TRACEEXIT
> Stop the child at exit with SIGTRAP | PTRACE_EVENT_EXIT
> << 8. The child?s exit status can be retrieved with
> PTRACE_GETEVENTMSG. This stop will be done early during
> process exit whereas the normal notification is done
> after the process is done exiting.

Right. This is useful because the process's registers are still
available - we can actually check where we were before exiting!
However, there's no way to prevent the exit at this point.

> PTRACE_GETEVENTMSG
> Retrieve a message (as an unsigned long) about the ptrace event
> that just happened to the location data in the parent. For
> PTRACE_EVENT_EXIT this is the child?s exit code. For
> PTRACE_EVENT_FORK, PTRACE_EVENT_VFORK and PTRACE_EVENT_CLONE
> this is the pid of the new process.

Right.

> PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
> For PTRACE_SYSEMU, continue and stop on entry to the next
> syscall, which will not be executed. For PTRACE_SYSEMU_SIN-
> GLESTEP, so the same but also singlestep if not a syscall.

I think this is right; I had nothing to do with these :-)


--
Daniel Jacobowitz
CodeSourcery

2006-03-16 21:17:44

by Charles P. Wright

[permalink] [raw]
Subject: Re: [RFC] Proposed manpage additions for ptrace(2)

On Thu, 2006-03-16 at 15:02 -0500, Daniel Jacobowitz wrote:
> > PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
> > For PTRACE_SYSEMU, continue and stop on entry to the next
> > syscall, which will not be executed. For PTRACE_SYSEMU_SIN-
> > GLESTEP, so the same but also singlestep if not a syscall.
>
> I think this is right; I had nothing to do with these :-)
I didn't have anything to do with it, but this description is correct
(if a bit confusing). I think that you should explicitly say (assuming
that Paolo does not have any objections):

PTRACE_SYSEMU only makes sense at a call's exit, not at entry.
PTRACE_SYSEMU is only practical if you want to emulate all of a
process's system calls (as is done in UML), because you can not examine
the process's registers before making the decision to emulate a call.

Charles

2006-03-17 11:47:41

by Chuck Ebbert

[permalink] [raw]
Subject: Re: [RFC] Proposed manpage additions for ptrace(2)

In-Reply-To: <[email protected]>

On Thu, 16 Mar 2006 15:02:01 -0500, Daniel Jacobowitz wrote:

> > PTRACE_O_TRACEFORK
> > Stop the child at the next fork() call with SIGTRAP |
> > PTRACE_EVENT_FORK << 8 and automatically start tracing
> > the newly forked process, which will start with a
> > SIGSTOP. The pid for the new process can be retrieved
> > with PTRACE_GETEVENTMSG.
> >
> > PTRACE_O_TRACEVFORK
> > Stop the child at the next vfork() call with SIGTRAP |
> > PTRACE_EVENT_VFORK << 8 and automatically start tracing
> > the the newly vforked process, which will start with a
> > SIGSTOP. The pid for the new process can be retrieved
> > with PTRACE_GETEVENTMSG.
> >
> > PTRACE_O_TRACECLONE
> > Stop the child at the next clone() call with SIGTRAP |
> > PTRACE_EVENT_CLONE << 8 and automatically start tracing
> > the newly cloned process, which will start with a
> > SIGSTOP. The pid for the new process can be retrieved
> > with PTRACE_GETEVENTMSG.
>
> Specifically, the three kinds of cloning are distinguished as:
>
> if CLONE_VFORK -> PTRACE_EVENT_VFORK
> else if clone exit signal == SIGCHLD -> PTRACE_EVENT_FORK
> else PTRACE_EVENT_CLONE
>
> You need to do some juggling to get the actual clone flags.

It might be best to leave these descriptions in terms of C library functions
rather than kernel-internal. Looking at sys_clone() and sys_fork() I can see
what you mean but I'm not sure how to describe it to a programmer.

> BTW, I believe there are still some potential deadlocks between
> the vfork event and the vfork done event; I used to regularly generate
> unkillable processes working on this code.

I have a test program and didn't hit any problems yet. Maybe this was fixed?


--
Chuck
"Penguins don't come from next door, they come from the Antarctic!"

2006-03-17 18:51:44

by Blaisorblade

[permalink] [raw]
Subject: Re: [RFC] Proposed manpage additions for ptrace(2)

Side note - I'm attaching another (incomplete) patch for ptrace.2 with some
addition - there is already something worth adding, though I only listed the
2.6-only ptrace options since I don't know what they do.

On Thursday 16 March 2006 22:16, Charles P. Wright wrote:
> On Thu, 2006-03-16 at 15:02 -0500, Daniel Jacobowitz wrote:
> > > PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
> > > For PTRACE_SYSEMU, continue and stop on entry to
> > > the next syscall, which will not be executed. For
> > > PTRACE_SYSEMU_SIN- GLESTEP, so the same but also singlestep if not a
> > > syscall.
> >
> > I think this is right; I had nothing to do with these :-)
>
> I didn't have anything to do with it, but this description is correct
> (if a bit confusing). I think that you should explicitly say (assuming
> that Paolo does not have any objections):

> PTRACE_SYSEMU only makes sense at a call's exit, not at entry.

I must indeed check the detail but I'm almost totally sure that this point is
totally wrong; conceptually, I understand what you mean,
but mixing PTRACE_SYSEMU and PTRACE_SYSCALL has non-obvious results.

Indeed, whether a syscall is performed or skipped is decided by the call which
stops at syscall entry, not by the call which resumes the syscall.

Actually, choosing what to do at resume time would make more sense, but for
historical reasons this was overlooked (also see at the end of this email).

I've described the exact semantics below, however I feel that mixing the calls
does not make any sense - we implemented this support mainly for testing - on
2.4 hosts we could test UML performance this way, on 2.6 we couldn't and then
"fixed" the API to be again this strange one.

The exact semantics are these:

remember that PTRACE_SYSEMU is called once per syscall; you attach the
process, do a PTRACE_SYSEMU on it, and it'll stop at syscall #1 entry, and
this syscall will not be executed; you look at calls param, perform what you
want to do, and then resume it.

If you resume it with PTRACE_SYSEMU, it'll stop at next syscall entry, as
expected, and next syscall will not be executed.

If you resume it with PTRACE_SYSCALL (which made sense only for debugging),
the only thing which changes is that _next_ syscall will be executed
normally; then after stopping at syscall #2 exit you can choose to resume
with PTRACE_SYSEMU. You can do that even at syscall #2 entry, but you get the
same result.

> PTRACE_SYSEMU is only practical if you want to emulate all of a
> process's system calls (as is done in UML), because you can not examine
> the process's registers before making the decision to emulate a call.

This is true.

However, I remember I answered to your request to fix this problem with some
patches to test (I remember I was sure enough of their correctness, for what
can be seen by code inspection), but got no answer and didn't finish
anything. Lost the email or the interest?

I was also busy so I didn't test them myself (even because reading this code
and following the exact states causes me a headache).

Bye

> Charles

--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


Attachments:
(No filename) (3.23 kB)
Man-Patch (1.77 kB)
Download all attachments

2006-03-17 20:04:36

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [RFC] Proposed manpage additions for ptrace(2)

On Fri, Mar 17, 2006 at 06:44:21AM -0500, Chuck Ebbert wrote:
> > Specifically, the three kinds of cloning are distinguished as:
> >
> > if CLONE_VFORK -> PTRACE_EVENT_VFORK
> > else if clone exit signal == SIGCHLD -> PTRACE_EVENT_FORK
> > else PTRACE_EVENT_CLONE
> >
> > You need to do some juggling to get the actual clone flags.
>
> It might be best to leave these descriptions in terms of C library functions
> rather than kernel-internal. Looking at sys_clone() and sys_fork() I can see
> what you mean but I'm not sure how to describe it to a programmer.

Those are user accessible flags. Fork will give you a
PTRACE_EVENT_FORK, vfork will give you a PTRACE_EVENT_VFORK, but
clone may give you any of the above, depending on what arguments you
pass to it. The SIGCHLD test matches the bit described in clone(2)
for __WALL or __WCLONE, for instance.

> > BTW, I believe there are still some potential deadlocks between
> > the vfork event and the vfork done event; I used to regularly generate
> > unkillable processes working on this code.
>
> I have a test program and didn't hit any problems yet. Maybe this was fixed?

One thing that IIRC was a problem was killing the parent before the
child (or maybe the other way round) when stopped at this point - such
as would happen if you typed "kill" at a GDB prompt after catch vfork.

--
Daniel Jacobowitz
CodeSourcery

2006-03-18 20:37:47

by Charles P. Wright

[permalink] [raw]
Subject: Re: [RFC] Proposed manpage additions for ptrace(2)

On Fri, 2006-03-17 at 19:46 +0100, Blaisorblade wrote:
> Side note - I'm attaching another (incomplete) patch for ptrace.2 with some
> addition - there is already something worth adding, though I only listed the
> 2.6-only ptrace options since I don't know what they do.
>
> On Thursday 16 March 2006 22:16, Charles P. Wright wrote:
> > On Thu, 2006-03-16 at 15:02 -0500, Daniel Jacobowitz wrote:
> > > > PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
> > > > For PTRACE_SYSEMU, continue and stop on entry to
> > > > the next syscall, which will not be executed. For
> > > > PTRACE_SYSEMU_SIN- GLESTEP, so the same but also singlestep if not a
> > > > syscall.
> > >
> > > I think this is right; I had nothing to do with these :-)
> >
> > I didn't have anything to do with it, but this description is correct
> > (if a bit confusing). I think that you should explicitly say (assuming
> > that Paolo does not have any objections):
>
> > PTRACE_SYSEMU only makes sense at a call's exit, not at entry.
>
> I must indeed check the detail but I'm almost totally sure that this point is
> totally wrong; conceptually, I understand what you mean,
> but mixing PTRACE_SYSEMU and PTRACE_SYSCALL has non-obvious results.
Yes, that is probably a better warning that what I was suggesting.

> Indeed, whether a syscall is performed or skipped is decided by the call which
> stops at syscall entry, not by the call which resumes the syscall.
>
> Actually, choosing what to do at resume time would make more sense, but for
> historical reasons this was overlooked (also see at the end of this email).
>
> I've described the exact semantics below, however I feel that mixing the calls
> does not make any sense - we implemented this support mainly for testing - on
> 2.4 hosts we could test UML performance this way, on 2.6 we couldn't and then
> "fixed" the API to be again this strange one.
>
> The exact semantics are these:
>
> remember that PTRACE_SYSEMU is called once per syscall; you attach the
> process, do a PTRACE_SYSEMU on it, and it'll stop at syscall #1 entry, and
> this syscall will not be executed; you look at calls param, perform what you
> want to do, and then resume it.
>
> If you resume it with PTRACE_SYSEMU, it'll stop at next syscall entry, as
> expected, and next syscall will not be executed.
>
> If you resume it with PTRACE_SYSCALL (which made sense only for debugging),
> the only thing which changes is that _next_ syscall will be executed
> normally; then after stopping at syscall #2 exit you can choose to resume
> with PTRACE_SYSEMU. You can do that even at syscall #2 entry, but you get the
> same result.
If you do the PTRACE_SYSEMU at the entry, then it seems that you

> However, I remember I answered to your request to fix this problem with some
> patches to test (I remember I was sure enough of their correctness, for what
> can be seen by code inspection), but got no answer and didn't finish
> anything. Lost the email or the interest?
Actually, I remember that you said that it wasn't very practical to try
and fix SYSEMU because UML already relies on its interface. You
suggested a "checked" version of the call, which I didn't actually lose
interest in. I've included a patch to 2.6.15 (based on your original
patch) that I've been using that adds "PTRACE_CHECKEMU", which I think
has more user-friendly semantics.

The PTRACE_CHECKEMU call makes the emulation decision after
ptrace_notify is called so that the tracing process can examine/update
registers before issuing PTRACE_CHECKEMU (to emulate the call) or
PTRACE_SYSCALL (to let the call go through).

I've also got a patch that allows you to execute the call, but skip the
return. This is useful when you are emulating a subset of calls, and
don't care about the return value of unemulated calls.

> I was also busy so I didn't test them myself (even because reading this code
> and following the exact states causes me a headache).
There is indeed some headache in here. I think particularly for SYSEMU,
because there is a large gap between calling it and the decision that is
made.

Charles

This patch adds support for checked PTRACE_SYSEMU, or PTRACE_CHECKEMU. The key
difference between SYSEMU and CHECKEMU is that the ptrace monitor can decide
whether to emulate a call after examining the registers rather than only before
examining the registers.

This is the interface improvement described here:
http://lkml.org/lkml/2005/7/30/131

Signed-off-by: Gopalan Sivathanu <[email protected]>
Signed-off-by: Charles P. Wright <[email protected]>

diff -ur linux-2.6.15-vanilla/arch/i386/kernel/entry.S linux-2.6.15-checkemu/arch/i386/kernel/entry.S
--- linux-2.6.15-vanilla/arch/i386/kernel/entry.S 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15-checkemu/arch/i386/kernel/entry.S 2006-02-03 00:46:37.000000000 -0500
@@ -203,7 +203,7 @@
GET_THREAD_INFO(%ebp)

/* Note, _TIF_SECCOMP is bit number 8, and so it needs testw and not testb */
- testw $(_TIF_SYSCALL_EMU|_TIF_SYSCALL_TRACE|_TIF_SECCOMP|_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
+ testw $(_TIF_SYSCALL_CHECKEMU|_TIF_SYSCALL_EMU|_TIF_SYSCALL_TRACE|_TIF_SECCOMP|_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
jnz syscall_trace_entry
cmpl $(nr_syscalls), %eax
jae syscall_badsys
@@ -228,7 +228,7 @@
GET_THREAD_INFO(%ebp)
# system call tracing in operation / emulation
/* Note, _TIF_SECCOMP is bit number 8, and so it needs testw and not testb */
- testw $(_TIF_SYSCALL_EMU|_TIF_SYSCALL_TRACE|_TIF_SECCOMP|_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
+ testw $(_TIF_SYSCALL_CHECKEMU|_TIF_SYSCALL_EMU|_TIF_SYSCALL_TRACE|_TIF_SECCOMP|_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
jnz syscall_trace_entry
cmpl $(nr_syscalls), %eax
jae syscall_badsys
diff -ur linux-2.6.15-vanilla/arch/i386/kernel/ptrace.c linux-2.6.15-checkemu/arch/i386/kernel/ptrace.c
--- linux-2.6.15-vanilla/arch/i386/kernel/ptrace.c 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15-checkemu/arch/i386/kernel/ptrace.c 2006-02-03 00:46:40.000000000 -0500
@@ -475,6 +475,7 @@
break;

case PTRACE_SYSEMU: /* continue and stop at next syscall, which will not be executed */
+ case PTRACE_CHECKEMU: /* like SYSEMU, but allow per-call emulation decisions. */
case PTRACE_SYSCALL: /* continue and stop at next (return from) syscall */
case PTRACE_CONT: /* restart after signal. */
ret = -EIO;
@@ -483,12 +484,19 @@
if (request == PTRACE_SYSEMU) {
set_tsk_thread_flag(child, TIF_SYSCALL_EMU);
clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+ clear_tsk_thread_flag(child, TIF_SYSCALL_CHECKEMU);
} else if (request == PTRACE_SYSCALL) {
set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
+ clear_tsk_thread_flag(child, TIF_SYSCALL_CHECKEMU);
+ } else if (request == PTRACE_CHECKEMU) {
+ set_tsk_thread_flag(child, TIF_SYSCALL_CHECKEMU);
+ clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+ clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
} else {
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+ clear_tsk_thread_flag(child, TIF_SYSCALL_CHECKEMU);
}
child->exit_code = data;
/* make sure the single step bit is not set. */
@@ -524,6 +532,7 @@
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);

clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+ clear_tsk_thread_flag(child, TIF_SYSCALL_CHECKEMU);
set_singlestep(child);
child->exit_code = data;
/* give it a chance to run. */
@@ -655,6 +664,7 @@
int do_syscall_trace(struct pt_regs *regs, int entryexit)
{
int is_sysemu = test_thread_flag(TIF_SYSCALL_EMU);
+ int is_systrace = test_thread_flag(TIF_SYSCALL_TRACE) || test_thread_flag(TIF_SYSCALL_CHECKEMU);
/*
* With TIF_SYSCALL_EMU set we want to ignore TIF_SINGLESTEP for syscall
* interception
@@ -697,7 +707,7 @@
if (is_singlestep)
send_sigtrap(current, regs, 0);

- if (!test_thread_flag(TIF_SYSCALL_TRACE) && !is_sysemu)
+ if (!is_systrace && !is_sysemu)
goto out;

/* the 0x80 provides a way for the tracing parent to distinguish
@@ -705,6 +715,12 @@
/* Note that the debugger could change the result of test_thread_flag!*/
ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) ? 0x80:0));

+ /* The difference between PTRACE_SYSEMU and PTRACE_CHECKEMU is
+ * that PTRACE_CHECKEMU allows you to decide whether to emulate
+ * the call after you've examined the registers. */
+ if (test_thread_flag(TIF_SYSCALL_CHECKEMU))
+ is_sysemu = 1;
+
/*
* this isn't the same as continuing with a signal, but it will do
* for normal use. strace only continues with a signal if the
diff -ur linux-2.6.15-vanilla/include/asm-i386/thread_info.h linux-2.6.15-checkemu/include/asm-i386/thread_info.h
--- linux-2.6.15-vanilla/include/asm-i386/thread_info.h 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15-checkemu/include/asm-i386/thread_info.h 2006-02-03 00:47:15.000000000 -0500
@@ -142,6 +142,7 @@
#define TIF_SYSCALL_EMU 6 /* syscall emulation active */
#define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
#define TIF_SECCOMP 8 /* secure computing */
+#define TIF_SYSCALL_CHECKEMU 9 /* checked syscall emulation active */
#define TIF_POLLING_NRFLAG 16 /* true if poll_idle() is polling TIF_NEED_RESCHED */
#define TIF_MEMDIE 17

@@ -152,6 +153,7 @@
#define _TIF_SINGLESTEP (1<<TIF_SINGLESTEP)
#define _TIF_IRET (1<<TIF_IRET)
#define _TIF_SYSCALL_EMU (1<<TIF_SYSCALL_EMU)
+#define _TIF_SYSCALL_CHECKEMU (1<<TIF_SYSCALL_CHECKEMU)
#define _TIF_SYSCALL_AUDIT (1<<TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP (1<<TIF_SECCOMP)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
@@ -159,7 +161,7 @@
/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK \
(0x0000FFFF & ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP|\
- _TIF_SECCOMP|_TIF_SYSCALL_EMU))
+ _TIF_SECCOMP|_TIF_SYSCALL_EMU|_TIF_SYSCALL_CHECKEMU))
/* work to do on any return to u-space */
#define _TIF_ALLWORK_MASK (0x0000FFFF & ~_TIF_SECCOMP)

diff -ur linux-2.6.15-vanilla/include/linux/ptrace.h linux-2.6.15-checkemu/include/linux/ptrace.h
--- linux-2.6.15-vanilla/include/linux/ptrace.h 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15-checkemu/include/linux/ptrace.h 2006-02-03 00:47:18.000000000 -0500
@@ -22,6 +22,7 @@
#define PTRACE_SYSCALL 24
#define PTRACE_SYSEMU 31
#define PTRACE_SYSEMU_SINGLESTEP 32
+#define PTRACE_CHECKEMU 33

/* 0x4200-0x4300 are reserved for architecture-independent additions. */
#define PTRACE_SETOPTIONS 0x4200
diff -ur linux-2.6.15-vanilla/kernel/fork.c linux-2.6.15-checkemu/kernel/fork.c
--- linux-2.6.15-vanilla/kernel/fork.c 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15-checkemu/kernel/fork.c 2006-02-03 00:47:11.000000000 -0500
@@ -1016,6 +1016,9 @@
#ifdef TIF_SYSCALL_EMU
clear_tsk_thread_flag(p, TIF_SYSCALL_EMU);
#endif
+#ifdef TIF_SYSCALL_CHECKEMU
+ clear_tsk_thread_flag(p, TIF_SYSCALL_CHECKEMU);
+#endif

/* Our parent execution domain becomes current domain
These must match for thread signalling to apply */

2006-03-25 00:07:43

by Blaisorblade

[permalink] [raw]
Subject: Re: [RFC] Proposed manpage additions for ptrace(2)

On Saturday 18 March 2006 21:37, Charles P. Wright wrote:
> On Fri, 2006-03-17 at 19:46 +0100, Blaisorblade wrote:

> > If you resume it with PTRACE_SYSEMU, it'll stop at next syscall entry, as
> > expected, and next syscall will not be executed.
> >
> > If you resume it with PTRACE_SYSCALL (which made sense only for
> > debugging), the only thing which changes is that _next_ syscall will be
> > executed normally; then after stopping at syscall #2 exit you can choose
> > to resume with PTRACE_SYSEMU. You can do that even at syscall #2 entry,
> > but you get the same result.

> If you do the PTRACE_SYSEMU at the entry, then it seems that you

What were you going to write here?

> > However, I remember I answered to your request to fix this problem with
> > some patches to test (I remember I was sure enough of their correctness,
> > for what can be seen by code inspection), but got no answer and didn't
> > finish anything. Lost the email or the interest?

> Actually, I remember that you said that it wasn't very practical to try
> and fix SYSEMU because UML already relies on its interface.

Yep, I implemented in fact an extension of the call, via setting a ptrace
option.

> You
> suggested a "checked" version of the call, which I didn't actually lose
> interest in. I've included a patch to 2.6.15 (based on your original
> patch) that I've been using that adds "PTRACE_CHECKEMU", which I think
> has more user-friendly semantics.

Indeed it is simpler to do PTRACE_CHECKEMU than to set an option and do
PTRACE_SYSEMU, but a quick read I felt that doing so adds additional
complication to the code... (mostly in the flag setting in arch_ptrace()).

Also, I've to look well at your changes to do_syscall_trace() to judge about
them. The logic used is different from the one I wrote, though it seems valid
too. Until I look well (and I've not the time now) I won't be able to see
whether it's an improvement or not.

> The PTRACE_CHECKEMU call makes the emulation decision after
> ptrace_notify is called so that the tracing process can examine/update
> registers before issuing PTRACE_CHECKEMU (to emulate the call) or
> PTRACE_SYSCALL (to let the call go through).

Ok, if I'm not missing anything this is the "better interface" we talked
about.

> I've also got a patch that allows you to execute the call, but skip the
> return. This is useful when you are emulating a subset of calls, and
> don't care about the return value of unemulated calls.

> > I was also busy so I didn't test them myself (even because reading this
> > code and following the exact states causes me a headache).

> There is indeed some headache in here. I think particularly for SYSEMU,
> because there is a large gap between calling it and the decision that is
> made.

Yes, but the "large gap" is achieved by the initial checking of the
tsk_thread_flag.

> @@ -475,6 +475,7 @@
> break;
>
> case PTRACE_SYSEMU: /* continue and stop at next syscall, which will not
> be executed */ + case PTRACE_CHECKEMU: /* like SYSEMU, but allow per-call
> emulation decisions. */ case PTRACE_SYSCALL: /* continue and stop at next
> (return from) syscall */ case PTRACE_CONT: /* restart after signal. */
> ret = -EIO;
> @@ -483,12 +484,19 @@

This "if" has probably become too large - better separate into an inline(?)
function the common code and split all request in different "case
PTRACE_XXX:" labels.

> if (request == PTRACE_SYSEMU) {
> set_tsk_thread_flag(child, TIF_SYSCALL_EMU);
> clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
> + clear_tsk_thread_flag(child, TIF_SYSCALL_CHECKEMU);
> } else if (request == PTRACE_SYSCALL) {
> set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
> clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
> + clear_tsk_thread_flag(child, TIF_SYSCALL_CHECKEMU);
> + } else if (request == PTRACE_CHECKEMU) {
> + set_tsk_thread_flag(child, TIF_SYSCALL_CHECKEMU);
> + clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
> + clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
> } else {
> clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
> clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
> + clear_tsk_thread_flag(child, TIF_SYSCALL_CHECKEMU);
> }
> child->exit_code = data;
> /* make sure the single step bit is not set. */
> @@ -524,6 +532,7 @@
> clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
>
> clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
> + clear_tsk_thread_flag(child, TIF_SYSCALL_CHECKEMU);
> set_singlestep(child);
> child->exit_code = data;
> /* give it a chance to run. */

> diff -ur linux-2.6.15-vanilla/include/linux/ptrace.h
> linux-2.6.15-checkemu/include/linux/ptrace.h ---
> linux-2.6.15-vanilla/include/linux/ptrace.h 2006-01-02 22:21:10.000000000
> -0500 +++ linux-2.6.15-checkemu/include/linux/ptrace.h 2006-02-03
> 00:47:18.000000000 -0500 @@ -22,6 +22,7 @@
> #define PTRACE_SYSCALL 24
> #define PTRACE_SYSEMU 31
> #define PTRACE_SYSEMU_SINGLESTEP 32
> +#define PTRACE_CHECKEMU 33

Argh - PTRACE_CHECKEMU shouldn't be 33, it was wrong from me to do so, I've
been taught only subsequently, and it shouldn't be in linux/ptrace.h; either
use the arch-independent range 0x4200-0x4300 (which is the better road IMHO)
or move it to asm-i386/ptrace.h:

> /* 0x4200-0x4300 are reserved for architecture-independent additions. */
> #define PTRACE_SETOPTIONS 0x4200

Indeed, I already moved PTRACE_SYSEMU* to asm-i386, because their value
conflicted with another ptrace code, on another arch.

--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade



___________________________________
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive
http://it.messenger.yahoo.com