2014-01-06 22:52:43

by Sergio Durigan Junior

[permalink] [raw]
Subject: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

Hi,

This is a somewhat long-wanted feature in the GDB world. I'm taking my
chances and trying to implement this, let's see how it goes!

This patch implements the new PTRACE_EVENT_SYSCALL_{ENTER,EXIT} events
for ptrace. The goal is kind of obvious: it lets the tracer to request
for notifications when a syscall is called or has returned in the
tracee. This is very useful because currently there is no easy/direct
way to inspect whether we are dealing with a call or a return of a
syscall. GDB itself has an open bug about this, because it can get
confused when the program being debugged is restarted in the middle of a
syscall that has been caught by "catch syscall".

The other nice thing that I have implemented is the ability to obtain
the syscall number related to the event by using PTRACE_GET_EVENTMSG.
This way, we don't need to inspect registers anymore when we want to
know which syscall is responsible for this or that event. This was easy
to implement, and is a pretty good thing for Linux to export to
userspace, so I decided to include in the patch as well.

I only tested this patch on x86_64 GNU/Linux (Fedora 19). No tests have
been done on the other architectures affected by this patch, mostly
because I don't have access to them, but also because the modifications
seem pretty straightforward. However, I will gladly welcome testers for
this.

This patch is mostly an RFC, so I did not take much time to split it
into smaller pieces and send them to the correct maintainers of each
part. What I'm looking for now is a feedback of my approach. The
really interesting bits are at the bottom of the patch, which deal with
tracehook.h and its friends in order to implement the new ptrace
options.

Comments are welcome. Thanks.

--
Sergio

Signed-off-by: Sergio Durigan Junior <[email protected]>

diff --git a/arch/alpha/kernel/ptrace.c b/arch/alpha/kernel/ptrace.c
index 2a4a80f..fbbea67 100644
--- a/arch/alpha/kernel/ptrace.c
+++ b/arch/alpha/kernel/ptrace.c
@@ -317,7 +317,8 @@ asmlinkage unsigned long syscall_trace_enter(void)
{
unsigned long ret = 0;
if (test_thread_flag(TIF_SYSCALL_TRACE) &&
- tracehook_report_syscall_entry(current_pt_regs()))
+ tracehook_report_syscall_entry(current_pt_regs(),
+ current_pt_regs()->r0))
ret = -1UL;
return ret ?: current_pt_regs()->r0;
}
@@ -326,5 +327,6 @@ asmlinkage void
syscall_trace_leave(void)
{
if (test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(current_pt_regs(), 0);
+ tracehook_report_syscall_exit(current_pt_regs(), 0,
+ current_pt_regs()->r0);
}
diff --git a/arch/arc/kernel/ptrace.c b/arch/arc/kernel/ptrace.c
index 5d76706..063949a 100644
--- a/arch/arc/kernel/ptrace.c
+++ b/arch/arc/kernel/ptrace.c
@@ -156,7 +156,7 @@ long arch_ptrace(struct task_struct *child, long request,

asmlinkage int syscall_trace_entry(struct pt_regs *regs)
{
- if (tracehook_report_syscall_entry(regs))
+ if (tracehook_report_syscall_entry(regs, regs->r8))
return ULONG_MAX;

return regs->r8;
@@ -164,5 +164,5 @@ asmlinkage int syscall_trace_entry(struct pt_regs *regs)

asmlinkage void syscall_trace_exit(struct pt_regs *regs)
{
- tracehook_report_syscall_exit(regs, 0);
+ tracehook_report_syscall_exit(regs, 0, regs->r8);
}
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 0dd3b79..1a115c7 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -921,8 +921,10 @@ static int tracehook_report_syscall(struct pt_regs *regs,
regs->ARM_ip = dir;

if (dir == PTRACE_SYSCALL_EXIT)
- tracehook_report_syscall_exit(regs, 0);
- else if (tracehook_report_syscall_entry(regs))
+ tracehook_report_syscall_exit(regs, 0,
+ current_thread_info()->syscall);
+ else if (tracehook_report_syscall_entry(regs,
+ current_thread_info()->syscall))
current_thread_info()->syscall = -1;

regs->ARM_ip = ip;
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 6a8928b..34d283a 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1079,8 +1079,8 @@ asmlinkage int syscall_trace(int dir, struct pt_regs *regs)
}

if (dir)
- tracehook_report_syscall_exit(regs, 0);
- else if (tracehook_report_syscall_entry(regs))
+ tracehook_report_syscall_exit(regs, 0, regs->syscallno);
+ else if (tracehook_report_syscall_entry(regs, regs->syscallno))
regs->syscallno = ~0UL;

if (is_compat_task())
diff --git a/arch/blackfin/kernel/ptrace.c b/arch/blackfin/kernel/ptrace.c
index e1f88e0..e282468 100644
--- a/arch/blackfin/kernel/ptrace.c
+++ b/arch/blackfin/kernel/ptrace.c
@@ -392,7 +392,7 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
int ret = 0;

if (test_thread_flag(TIF_SYSCALL_TRACE))
- ret = tracehook_report_syscall_entry(regs);
+ ret = tracehook_report_syscall_entry(regs, regs->p0);

return ret;
}
@@ -403,5 +403,5 @@ asmlinkage void syscall_trace_leave(struct pt_regs *regs)

step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, step);
+ tracehook_report_syscall_exit(regs, step, regs->p0);
}
diff --git a/arch/c6x/kernel/ptrace.c b/arch/c6x/kernel/ptrace.c
index 3c494e8..7f8dd66 100644
--- a/arch/c6x/kernel/ptrace.c
+++ b/arch/c6x/kernel/ptrace.c
@@ -167,7 +167,7 @@ long arch_ptrace(struct task_struct *child, long request,
*/
asmlinkage unsigned long syscall_trace_entry(struct pt_regs *regs)
{
- if (tracehook_report_syscall_entry(regs))
+ if (tracehook_report_syscall_entry(regs, regs->b0))
/* tracing decided this syscall should not happen, so
* We'll return a bogus call number to get an ENOSYS
* error, but leave the original number in
@@ -183,5 +183,5 @@ asmlinkage unsigned long syscall_trace_entry(struct pt_regs *regs)
*/
asmlinkage void syscall_trace_exit(struct pt_regs *regs)
{
- tracehook_report_syscall_exit(regs, 0);
+ tracehook_report_syscall_exit(regs, 0, regs->b0);
}
diff --git a/arch/frv/kernel/ptrace.c b/arch/frv/kernel/ptrace.c
index 3987ff8..d2038c8 100644
--- a/arch/frv/kernel/ptrace.c
+++ b/arch/frv/kernel/ptrace.c
@@ -355,7 +355,7 @@ long arch_ptrace(struct task_struct *child, long request,
asmlinkage unsigned long syscall_trace_entry(void)
{
__frame->__status |= REG__STATUS_SYSC_ENTRY;
- if (tracehook_report_syscall_entry(__frame)) {
+ if (tracehook_report_syscall_entry(__frame, __frame->syscallno)) {
/* tracing decided this syscall should not happen, so
* We'll return a bogus call number to get an ENOSYS
* error, but leave the original number in
@@ -373,5 +373,5 @@ asmlinkage unsigned long syscall_trace_entry(void)
asmlinkage void syscall_trace_exit(void)
{
__frame->__status |= REG__STATUS_SYSC_EXIT;
- tracehook_report_syscall_exit(__frame, 0);
+ tracehook_report_syscall_exit(__frame, 0, __frame->syscallno);
}
diff --git a/arch/hexagon/kernel/traps.c b/arch/hexagon/kernel/traps.c
index 7858663..2f698cd 100644
--- a/arch/hexagon/kernel/traps.c
+++ b/arch/hexagon/kernel/traps.c
@@ -366,7 +366,7 @@ void do_trap0(struct pt_regs *regs)

/* allow strace to catch syscall args */
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACE) &&
- tracehook_report_syscall_entry(regs)))
+ tracehook_report_syscall_entry(regs, regs->r06)))
return; /* return -ENOSYS somewhere? */

/* Interrupts should be re-enabled for syscall processing */
@@ -404,7 +404,7 @@ void do_trap0(struct pt_regs *regs)

/* allow strace to get the syscall return state */
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACE)))
- tracehook_report_syscall_exit(regs, 0);
+ tracehook_report_syscall_exit(regs, 0, regs->r06);

break;
case TRAP_DEBUG:
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index b7a5fff..605a22e 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -1211,7 +1211,7 @@ syscall_trace_enter (long arg0, long arg1, long arg2, long arg3,
struct pt_regs regs)
{
if (test_thread_flag(TIF_SYSCALL_TRACE))
- if (tracehook_report_syscall_entry(&regs))
+ if (tracehook_report_syscall_entry(&regs, regs.r15))
return -ENOSYS;

/* copy user rbs to kernel rbs */
@@ -1237,7 +1237,7 @@ syscall_trace_leave (long arg0, long arg1, long arg2, long arg3,

step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(&regs, step);
+ tracehook_report_syscall_exit(&regs, step, regs.r15);

/* copy user rbs to kernel rbs */
if (test_thread_flag(TIF_RESTORE_RSE))
diff --git a/arch/m68k/kernel/ptrace.c b/arch/m68k/kernel/ptrace.c
index 1bc10e6..6f3e159 100644
--- a/arch/m68k/kernel/ptrace.c
+++ b/arch/m68k/kernel/ptrace.c
@@ -292,13 +292,15 @@ asmlinkage int syscall_trace_enter(void)
int ret = 0;

if (test_thread_flag(TIF_SYSCALL_TRACE))
- ret = tracehook_report_syscall_entry(task_pt_regs(current));
+ ret = tracehook_report_syscall_entry(task_pt_regs(current),
+ task_pt_regs(current)->d1);
return ret;
}

asmlinkage void syscall_trace_leave(void)
{
if (test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(task_pt_regs(current), 0);
+ tracehook_report_syscall_exit(task_pt_regs(current), 0,
+ task_pt_regs(current)->d1);
}
#endif /* CONFIG_COLDFIRE */
diff --git a/arch/metag/kernel/ptrace.c b/arch/metag/kernel/ptrace.c
index 7563628..4b84218 100644
--- a/arch/metag/kernel/ptrace.c
+++ b/arch/metag/kernel/ptrace.c
@@ -396,7 +396,7 @@ int syscall_trace_enter(struct pt_regs *regs)
int ret = 0;

if (test_thread_flag(TIF_SYSCALL_TRACE))
- ret = tracehook_report_syscall_entry(regs);
+ ret = tracehook_report_syscall_entry(regs, regs->ctx.DX[0].U1);

if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->ctx.DX[0].U1);
@@ -410,5 +410,5 @@ void syscall_trace_leave(struct pt_regs *regs)
trace_sys_exit(regs, regs->ctx.DX[0].U1);

if (test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, 0);
+ tracehook_report_syscall_exit(regs, 0, regs->ctx.DX[0].U1);
}
diff --git a/arch/microblaze/kernel/ptrace.c b/arch/microblaze/kernel/ptrace.c
index 39cf508..60a44c0 100644
--- a/arch/microblaze/kernel/ptrace.c
+++ b/arch/microblaze/kernel/ptrace.c
@@ -139,7 +139,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
secure_computing_strict(regs->r12);

if (test_thread_flag(TIF_SYSCALL_TRACE) &&
- tracehook_report_syscall_entry(regs))
+ tracehook_report_syscall_entry(regs, regs->r12))
/*
* Tracing decided this syscall should not happen.
* We'll return a bogus call number to get an ENOSYS
@@ -161,7 +161,7 @@ asmlinkage void do_syscall_trace_leave(struct pt_regs *regs)

step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, step);
+ tracehook_report_syscall_exit(regs, step, regs->r12);
}

void ptrace_disable(struct task_struct *child)
diff --git a/arch/mn10300/kernel/ptrace.c b/arch/mn10300/kernel/ptrace.c
index 5bd5851..621fa8d 100644
--- a/arch/mn10300/kernel/ptrace.c
+++ b/arch/mn10300/kernel/ptrace.c
@@ -365,7 +365,7 @@ long arch_ptrace(struct task_struct *child, long request,
*/
asmlinkage unsigned long syscall_trace_entry(struct pt_regs *regs)
{
- if (tracehook_report_syscall_entry(regs))
+ if (tracehook_report_syscall_entry(regs, regs->orig_d0))
/* tracing decided this syscall should not happen, so
* We'll return a bogus call number to get an ENOSYS
* error, but leave the original number in
@@ -381,5 +381,5 @@ asmlinkage unsigned long syscall_trace_entry(struct pt_regs *regs)
*/
asmlinkage void syscall_trace_exit(struct pt_regs *regs)
{
- tracehook_report_syscall_exit(regs, 0);
+ tracehook_report_syscall_exit(regs, 0, regs->orig_d0);
}
diff --git a/arch/openrisc/kernel/ptrace.c b/arch/openrisc/kernel/ptrace.c
index 71a2a0c..70c819c 100644
--- a/arch/openrisc/kernel/ptrace.c
+++ b/arch/openrisc/kernel/ptrace.c
@@ -179,7 +179,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
long ret = 0;

if (test_thread_flag(TIF_SYSCALL_TRACE) &&
- tracehook_report_syscall_entry(regs))
+ tracehook_report_syscall_entry(regs, regs->gpr[11]))
/*
* Tracing decided this syscall should not happen.
* We'll return a bogus call number to get an ENOSYS
@@ -202,5 +202,5 @@ asmlinkage void do_syscall_trace_leave(struct pt_regs *regs)

step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, step);
+ tracehook_report_syscall_exit(regs, step, regs->gpr[11]);
}
diff --git a/arch/parisc/kernel/ptrace.c b/arch/parisc/kernel/ptrace.c
index e842ee2..13f75ee 100644
--- a/arch/parisc/kernel/ptrace.c
+++ b/arch/parisc/kernel/ptrace.c
@@ -271,7 +271,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
long ret = 0;

if (test_thread_flag(TIF_SYSCALL_TRACE) &&
- tracehook_report_syscall_entry(regs))
+ tracehook_report_syscall_entry(regs, regs->gr[20]))
ret = -1L;

#ifdef CONFIG_64BIT
@@ -300,5 +300,5 @@ void do_syscall_trace_exit(struct pt_regs *regs)
audit_syscall_exit(regs);

if (stepping || test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, stepping);
+ tracehook_report_syscall_exit(regs, stepping, regs->gr[20]);
}
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 2e3d2bf..f9430b9 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1775,7 +1775,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
secure_computing_strict(regs->gpr[0]);

if (test_thread_flag(TIF_SYSCALL_TRACE) &&
- tracehook_report_syscall_entry(regs))
+ tracehook_report_syscall_entry(regs, regs->gpr[0]))
/*
* Tracing decided this syscall should not happen.
* We'll return a bogus call number to get an ENOSYS
@@ -1815,7 +1815,7 @@ void do_syscall_trace_leave(struct pt_regs *regs)

step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, step);
+ tracehook_report_syscall_exit(regs, step, regs->gpr[0]);

user_enter();
}
diff --git a/arch/s390/kernel/ptrace.c b/arch/s390/kernel/ptrace.c
index e65c91c..8088fde 100644
--- a/arch/s390/kernel/ptrace.c
+++ b/arch/s390/kernel/ptrace.c
@@ -798,7 +798,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
* call number to gprs[2].
*/
if (test_thread_flag(TIF_SYSCALL_TRACE) &&
- (tracehook_report_syscall_entry(regs) ||
+ (tracehook_report_syscall_entry(regs, regs->gprs[2]) ||
regs->gprs[2] >= NR_syscalls)) {
/*
* Tracing decided this syscall should not happen or the
@@ -829,7 +829,7 @@ asmlinkage void do_syscall_trace_exit(struct pt_regs *regs)
trace_sys_exit(regs, regs->gprs[2]);

if (test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, 0);
+ tracehook_report_syscall_exit(regs, 0, regs->gprs[2]);
}

/*
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 668c816..f3b8a72 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -502,7 +502,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
secure_computing_strict(regs->regs[0]);

if (test_thread_flag(TIF_SYSCALL_TRACE) &&
- tracehook_report_syscall_entry(regs))
+ tracehook_report_syscall_entry(regs, regs->regs[3]))
/*
* Tracing decided this syscall should not happen.
* We'll return a bogus call number to get an ENOSYS
@@ -531,5 +531,5 @@ asmlinkage void do_syscall_trace_leave(struct pt_regs *regs)

step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, step);
+ tracehook_report_syscall_exit(regs, step, regs->regs[3]);
}
diff --git a/arch/sh/kernel/ptrace_64.c b/arch/sh/kernel/ptrace_64.c
index af90339..93089e4 100644
--- a/arch/sh/kernel/ptrace_64.c
+++ b/arch/sh/kernel/ptrace_64.c
@@ -525,7 +525,7 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
secure_computing_strict(regs->regs[9]);

if (test_thread_flag(TIF_SYSCALL_TRACE) &&
- tracehook_report_syscall_entry(regs))
+ tracehook_report_syscall_entry(regs, regs->regs[9]))
/*
* Tracing decided this syscall should not happen.
* We'll return a bogus call number to get an ENOSYS
@@ -554,7 +554,7 @@ asmlinkage void do_syscall_trace_leave(struct pt_regs *regs)

step = test_thread_flag(TIF_SINGLESTEP);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, step);
+ tracehook_report_syscall_exit(regs, step, regs->regs[9]);
}

/* Called with interrupts disabled */
diff --git a/arch/sparc/kernel/ptrace_32.c b/arch/sparc/kernel/ptrace_32.c
index 896ba7c..174a675 100644
--- a/arch/sparc/kernel/ptrace_32.c
+++ b/arch/sparc/kernel/ptrace_32.c
@@ -447,9 +447,11 @@ asmlinkage int syscall_trace(struct pt_regs *regs, int syscall_exit_p)

if (test_thread_flag(TIF_SYSCALL_TRACE)) {
if (syscall_exit_p)
- tracehook_report_syscall_exit(regs, 0);
+ tracehook_report_syscall_exit(regs, 0,
+ regs->u_regs[UREG_G1]);
else
- ret = tracehook_report_syscall_entry(regs);
+ ret = tracehook_report_syscall_entry(regs,
+ regs->u_regs[UREG_G1]);
}

return ret;
diff --git a/arch/sparc/kernel/ptrace_64.c b/arch/sparc/kernel/ptrace_64.c
index c13c9f2..b2019a9 100644
--- a/arch/sparc/kernel/ptrace_64.c
+++ b/arch/sparc/kernel/ptrace_64.c
@@ -1071,7 +1071,8 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
user_exit();

if (test_thread_flag(TIF_SYSCALL_TRACE))
- ret = tracehook_report_syscall_entry(regs);
+ ret = tracehook_report_syscall_entry(regs,
+ regs->u_regs[UREG_G1]);

if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->u_regs[UREG_G1]);
@@ -1099,7 +1100,7 @@ asmlinkage void syscall_trace_leave(struct pt_regs *regs)
trace_sys_exit(regs, regs->u_regs[UREG_I0]);

if (test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, 0);
+ tracehook_report_syscall_exit(regs, 0, regs->u_regs[UREG_G1]);

if (test_thread_flag(TIF_NOHZ))
user_enter();
diff --git a/arch/tile/kernel/ptrace.c b/arch/tile/kernel/ptrace.c
index de98c6d..5a22bfd 100644
--- a/arch/tile/kernel/ptrace.c
+++ b/arch/tile/kernel/ptrace.c
@@ -253,7 +253,8 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
int do_syscall_trace_enter(struct pt_regs *regs)
{
if (test_thread_flag(TIF_SYSCALL_TRACE)) {
- if (tracehook_report_syscall_entry(regs))
+ if (tracehook_report_syscall_entry(regs,
+ regs->regs[TREG_SYSCALL_NR]))
regs->regs[TREG_SYSCALL_NR] = -1;
}

@@ -281,7 +282,8 @@ void do_syscall_trace_exit(struct pt_regs *regs)
regs->regs[1] = 0;

if (test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, 0);
+ tracehook_report_syscall_exit(regs, 0,
+ regs->regs[TREG_SYSCALL_NR]);

if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
trace_sys_exit(regs, regs->regs[0]);
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index 694d551..addb5b6 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -175,7 +175,7 @@ void syscall_trace_enter(struct pt_regs *regs)
if (!test_thread_flag(TIF_SYSCALL_TRACE))
return;

- tracehook_report_syscall_entry(regs);
+ tracehook_report_syscall_entry(regs, UPT_SYSCALL_NR(&regs->regs));
}

void syscall_trace_leave(struct pt_regs *regs)
@@ -191,7 +191,7 @@ void syscall_trace_leave(struct pt_regs *regs)
if (!test_thread_flag(TIF_SYSCALL_TRACE))
return;

- tracehook_report_syscall_exit(regs, 0);
+ tracehook_report_syscall_exit(regs, 0, UPT_SYSCALL_NR(&regs->regs));
/* force do_signal() --> is_syscall() */
if (ptraced & PT_PTRACED)
set_thread_flag(TIF_SIGPENDING);
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 7461f50..f487e55 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1481,7 +1481,7 @@ long syscall_trace_enter(struct pt_regs *regs)
ret = -1L;

if ((ret || test_thread_flag(TIF_SYSCALL_TRACE)) &&
- tracehook_report_syscall_entry(regs))
+ tracehook_report_syscall_entry(regs, regs->orig_ax))
ret = -1L;

if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
@@ -1529,7 +1529,7 @@ void syscall_trace_leave(struct pt_regs *regs)
step = unlikely(test_thread_flag(TIF_SINGLESTEP)) &&
!test_thread_flag(TIF_SYSCALL_EMU);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, step);
+ tracehook_report_syscall_exit(regs, step, regs->orig_ax);

user_enter();
}
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 07d0df6..15c06e8 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -31,6 +31,8 @@
#define PT_TRACE_VFORK_DONE PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE)
#define PT_TRACE_EXIT PT_EVENT_FLAG(PTRACE_EVENT_EXIT)
#define PT_TRACE_SECCOMP PT_EVENT_FLAG(PTRACE_EVENT_SECCOMP)
+#define PT_TRACE_SYSCALL_ENTER PT_EVENT_FLAG(PTRACE_EVENT_SYSCALL_ENTER)
+#define PT_TRACE_SYSCALL_EXIT PT_EVENT_FLAG(PTRACE_EVENT_SYSCALL_EXIT)

#define PT_EXITKILL (PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 1e98b55..e977a71 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -53,16 +53,28 @@
struct linux_binprm;

/*
- * ptrace report for syscall entry and exit looks identical.
+ * ptrace report for syscall entry and exit.
*/
-static inline int ptrace_report_syscall(struct pt_regs *regs)
+static inline int ptrace_report_syscall(struct pt_regs *regs, int entry,
+ unsigned int sysno)
{
int ptrace = current->ptrace;
+ int is_sysenter = ptrace & PT_TRACE_SYSCALL_ENTER;
+ int is_sysexit = ptrace & PT_TRACE_SYSCALL_EXIT;
+ int is_ptsysgood = ptrace & PT_TRACESYSGOOD;

if (!(ptrace & PT_PTRACED))
return 0;

- ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
+ if (is_sysenter || is_sysexit) {
+ if (entry && is_sysenter)
+ ptrace_event(PTRACE_EVENT_SYSCALL_ENTER, sysno);
+ else if (!entry && is_sysexit)
+ ptrace_event(PTRACE_EVENT_SYSCALL_EXIT, sysno);
+ else
+ return 0;
+ } else
+ ptrace_notify(SIGTRAP | (is_ptsysgood ? 0x80 : 0));

/*
* this isn't the same as continuing with a signal, but it will do
@@ -80,6 +92,7 @@ static inline int ptrace_report_syscall(struct pt_regs *regs)
/**
* tracehook_report_syscall_entry - task is about to attempt a system call
* @regs: user register state of current task
+ * @sysno: system call number to be reported
*
* This will be called if %TIF_SYSCALL_TRACE has been set, when the
* current task has just entered the kernel for a system call.
@@ -97,15 +110,16 @@ static inline int ptrace_report_syscall(struct pt_regs *regs)
* Called without locks, just after entering kernel mode.
*/
static inline __must_check int tracehook_report_syscall_entry(
- struct pt_regs *regs)
+ struct pt_regs *regs, unsigned int sysno)
{
- return ptrace_report_syscall(regs);
+ return ptrace_report_syscall(regs, 1, sysno);
}

/**
* tracehook_report_syscall_exit - task has just finished a system call
* @regs: user register state of current task
* @step: nonzero if simulating single-step or block-step
+ * @sysno: system call number to be reported
*
* This will be called if %TIF_SYSCALL_TRACE has been set, when the
* current task has just finished an attempted system call. Full
@@ -119,7 +133,8 @@ static inline __must_check int tracehook_report_syscall_entry(
*
* Called without locks, just before checking for pending signals.
*/
-static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step)
+static inline void tracehook_report_syscall_exit(struct pt_regs *regs,
+ int step, unsigned int sysno)
{
if (step) {
siginfo_t info;
@@ -128,7 +143,7 @@ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step)
return;
}

- ptrace_report_syscall(regs);
+ ptrace_report_syscall(regs, 0, sysno);
}

/**
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index cf1019e..012ecc9 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -68,13 +68,16 @@ struct ptrace_peeksiginfo_args {
#define PTRACE_PEEKSIGINFO_SHARED (1 << 0)

/* Wait extended result codes for the above trace options. */
-#define PTRACE_EVENT_FORK 1
-#define PTRACE_EVENT_VFORK 2
-#define PTRACE_EVENT_CLONE 3
-#define PTRACE_EVENT_EXEC 4
-#define PTRACE_EVENT_VFORK_DONE 5
-#define PTRACE_EVENT_EXIT 6
-#define PTRACE_EVENT_SECCOMP 7
+#define PTRACE_EVENT_FORK 1
+#define PTRACE_EVENT_VFORK 2
+#define PTRACE_EVENT_CLONE 3
+#define PTRACE_EVENT_EXEC 4
+#define PTRACE_EVENT_VFORK_DONE 5
+#define PTRACE_EVENT_EXIT 6
+#define PTRACE_EVENT_SECCOMP 7
+#define PTRACE_EVENT_SYSCALL_ENTER 8
+#define PTRACE_EVENT_SYSCALL_EXIT 9
+
/* Extended result codes which enabled by means other than options. */
#define PTRACE_EVENT_STOP 128

@@ -87,11 +90,13 @@ struct ptrace_peeksiginfo_args {
#define PTRACE_O_TRACEVFORKDONE (1 << PTRACE_EVENT_VFORK_DONE)
#define PTRACE_O_TRACEEXIT (1 << PTRACE_EVENT_EXIT)
#define PTRACE_O_TRACESECCOMP (1 << PTRACE_EVENT_SECCOMP)
+#define PTRACE_O_SYSCALL_ENTER (1 << PTRACE_EVENT_SYSCALL_ENTER)
+#define PTRACE_O_SYSCALL_EXIT (1 << PTRACE_EVENT_SYSCALL_EXIT)

/* eventless options */
#define PTRACE_O_EXITKILL (1 << 20)

-#define PTRACE_O_MASK (0x000000ff | PTRACE_O_EXITKILL)
+#define PTRACE_O_MASK (0x00000fff | PTRACE_O_EXITKILL)

#include <asm/ptrace.h>


2014-01-07 15:30:46

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

On 01/06, Sergio Durigan Junior wrote:
>
> This patch implements the new PTRACE_EVENT_SYSCALL_{ENTER,EXIT} events
> for ptrace. The goal is kind of obvious: it lets the tracer to request
> for notifications when a syscall is called or has returned in the
> tracee. This is very useful because currently there is no easy/direct
> way to inspect whether we are dealing with a call or a return of a
> syscall. GDB itself has an open bug about this, because it can get
> confused when the program being debugged is restarted in the middle of a
> syscall that has been caught by "catch syscall".

Yes, this was suggested many times, probably makes sense.

But I am not sure about semantics, let me add more cc's.

> The other nice thing that I have implemented is the ability to obtain
> the syscall number related to the event by using PTRACE_GET_EVENTMSG.
> This way, we don't need to inspect registers anymore when we want to
> know which syscall is responsible for this or that event.

I won't argue, but it is not clear to me if this is really useful,
given that the debugger can read the regs.

And even if we do this, I disagree with this implementation, please
see below.

> --- a/arch/alpha/kernel/ptrace.c
> +++ b/arch/alpha/kernel/ptrace.c
> @@ -317,7 +317,8 @@ asmlinkage unsigned long syscall_trace_enter(void)
> {
> unsigned long ret = 0;
> if (test_thread_flag(TIF_SYSCALL_TRACE) &&
> - tracehook_report_syscall_entry(current_pt_regs()))
> + tracehook_report_syscall_entry(current_pt_regs(),
> + current_pt_regs()->r0))
> ret = -1UL;
> return ret ?: current_pt_regs()->r0;
> }
> @@ -326,5 +327,6 @@ asmlinkage void
> syscall_trace_leave(void)
> {
> if (test_thread_flag(TIF_SYSCALL_TRACE))
> - tracehook_report_syscall_exit(current_pt_regs(), 0);
> + tracehook_report_syscall_exit(current_pt_regs(), 0,
> + current_pt_regs()->r0);
> }

And every arch/ is changed the same way. I do not think this is needed.
We already have syscall_get_nr(), this is what ptrace_report_syscall()
needs. So afaics this patch do not need to touch arch/ at all.


> +static inline int ptrace_report_syscall(struct pt_regs *regs, int entry,
> + unsigned int sysno)
> {
> int ptrace = current->ptrace;
> + int is_sysenter = ptrace & PT_TRACE_SYSCALL_ENTER;
> + int is_sysexit = ptrace & PT_TRACE_SYSCALL_EXIT;
> + int is_ptsysgood = ptrace & PT_TRACESYSGOOD;
>
> if (!(ptrace & PT_PTRACED))
> return 0;
>
> - ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
> + if (is_sysenter || is_sysexit) {
> + if (entry && is_sysenter)
> + ptrace_event(PTRACE_EVENT_SYSCALL_ENTER, sysno);
> + else if (!entry && is_sysexit)
> + ptrace_event(PTRACE_EVENT_SYSCALL_EXIT, sysno);
> + else
> + return 0;
> + } else
> + ptrace_notify(SIGTRAP | (is_ptsysgood ? 0x80 : 0));

OK. So PTRACE_O_SYSCALL_ENTER acts like PTRACE_O_TRACESYSGOOD, you still
need ptrace(PTRACE_SYSCALL) if you want PTRACE_EVENT_SYSCALL_ENTER.

If we add the new API, perhaps we should change ptrace_resume ?
I mean,

--- x/kernel/ptrace.c
+++ x/kernel/ptrace.c
@@ -723,7 +723,9 @@ static int ptrace_resume(struct task_str
if (!valid_signal(data))
return -EIO;

- if (request == PTRACE_SYSCALL)
+ if (request == PTRACE_SYSCALL ||
+ ptrace_event_enabled(PTRACE_EVENT_SYSCALL_ENTER) ||
+ ptrace_event_enabled(PTRACE_EVENT_SYSCALL_EXIT))
set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
else
clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);


This way PTRACE_O_SYSCALL_* will work like other ptrace options which
ask to report an event.

Or. Instead, perhaps we can add a single option PTRACE_O_TRACESYSREALLYGOOD
which doesn't report the new event and simply does something like

current->ptrace_message = syscall_get_nr() | (entry << 31);
ptrace_notify(SIGTRAP | 0x80);

Finally. If we add this feature, we should probably also report
is_compat_task() somehow. Currently the debugger can't know if, say,
a 64bit tracee does int80.

OTOH, perhaps it would be better to report this via regs->flags as
(iirc) Linus suggested.

Once again, personally I am fine either way. Just I think we should
discuss every possible option.

Oleg.

2014-01-07 16:37:57

by Sergio Durigan Junior

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

On Tuesday, January 07 2014, Oleg Nesterov wrote:

> On 01/06, Sergio Durigan Junior wrote:
>>
>> This patch implements the new PTRACE_EVENT_SYSCALL_{ENTER,EXIT} events
>> for ptrace. The goal is kind of obvious: it lets the tracer to request
>> for notifications when a syscall is called or has returned in the
>> tracee. This is very useful because currently there is no easy/direct
>> way to inspect whether we are dealing with a call or a return of a
>> syscall. GDB itself has an open bug about this, because it can get
>> confused when the program being debugged is restarted in the middle of a
>> syscall that has been caught by "catch syscall".
>
> Yes, this was suggested many times, probably makes sense.
>
> But I am not sure about semantics, let me add more cc's.

Thanks for the feedback, Oleg.

>> The other nice thing that I have implemented is the ability to obtain
>> the syscall number related to the event by using PTRACE_GET_EVENTMSG.
>> This way, we don't need to inspect registers anymore when we want to
>> know which syscall is responsible for this or that event.
>
> I won't argue, but it is not clear to me if this is really useful,
> given that the debugger can read the regs.

Right. The debugger can read the regs, yes, but I still find it useful
to have a standard way to obtain the syscall number if the kernel can
easily provide it (as is the case). If we have this, it means that
"catch syscall" on GDB (for example) will work out of the box for every
architecture supported by Linux, without needing to worry about details
about register access and such.

> And even if we do this, I disagree with this implementation, please
> see below.
>
>> --- a/arch/alpha/kernel/ptrace.c
>> +++ b/arch/alpha/kernel/ptrace.c
>> @@ -317,7 +317,8 @@ asmlinkage unsigned long syscall_trace_enter(void)
>> {
>> unsigned long ret = 0;
>> if (test_thread_flag(TIF_SYSCALL_TRACE) &&
>> - tracehook_report_syscall_entry(current_pt_regs()))
>> + tracehook_report_syscall_entry(current_pt_regs(),
>> + current_pt_regs()->r0))
>> ret = -1UL;
>> return ret ?: current_pt_regs()->r0;
>> }
>> @@ -326,5 +327,6 @@ asmlinkage void
>> syscall_trace_leave(void)
>> {
>> if (test_thread_flag(TIF_SYSCALL_TRACE))
>> - tracehook_report_syscall_exit(current_pt_regs(), 0);
>> + tracehook_report_syscall_exit(current_pt_regs(), 0,
>> + current_pt_regs()->r0);
>> }
>
> And every arch/ is changed the same way. I do not think this is needed.
> We already have syscall_get_nr(), this is what ptrace_report_syscall()
> needs. So afaics this patch do not need to touch arch/ at all.

Oh, nice, I didn't know about it. I will promptly rewrite this part in
order to use syscall_get_nr(), thanks.

>> +static inline int ptrace_report_syscall(struct pt_regs *regs, int entry,
>> + unsigned int sysno)
>> {
>> int ptrace = current->ptrace;
>> + int is_sysenter = ptrace & PT_TRACE_SYSCALL_ENTER;
>> + int is_sysexit = ptrace & PT_TRACE_SYSCALL_EXIT;
>> + int is_ptsysgood = ptrace & PT_TRACESYSGOOD;
>>
>> if (!(ptrace & PT_PTRACED))
>> return 0;
>>
>> - ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
>> + if (is_sysenter || is_sysexit) {
>> + if (entry && is_sysenter)
>> + ptrace_event(PTRACE_EVENT_SYSCALL_ENTER, sysno);
>> + else if (!entry && is_sysexit)
>> + ptrace_event(PTRACE_EVENT_SYSCALL_EXIT, sysno);
>> + else
>> + return 0;
>> + } else
>> + ptrace_notify(SIGTRAP | (is_ptsysgood ? 0x80 : 0));
>
> OK. So PTRACE_O_SYSCALL_ENTER acts like PTRACE_O_TRACESYSGOOD, you still
> need ptrace(PTRACE_SYSCALL) if you want PTRACE_EVENT_SYSCALL_ENTER.
>
> If we add the new API, perhaps we should change ptrace_resume ?
> I mean,
>
> --- x/kernel/ptrace.c
> +++ x/kernel/ptrace.c
> @@ -723,7 +723,9 @@ static int ptrace_resume(struct task_str
> if (!valid_signal(data))
> return -EIO;
>
> - if (request == PTRACE_SYSCALL)
> + if (request == PTRACE_SYSCALL ||
> + ptrace_event_enabled(PTRACE_EVENT_SYSCALL_ENTER) ||
> + ptrace_event_enabled(PTRACE_EVENT_SYSCALL_EXIT))
> set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
> else
> clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
>
>
> This way PTRACE_O_SYSCALL_* will work like other ptrace options which
> ask to report an event.

Neat. I liked this suggestion very much. In fact, I was thinking about
this yesterday, but I confess it wasn't much clear in my mind. Glad you
figured that out :-). Thanks!

> Or. Instead, perhaps we can add a single option PTRACE_O_TRACESYSREALLYGOOD
> which doesn't report the new event and simply does something like
>
> current->ptrace_message = syscall_get_nr() | (entry << 31);
> ptrace_notify(SIGTRAP | 0x80);

TBH I didn't like this suggestion, so I will implement the one above.

> Finally. If we add this feature, we should probably also report
> is_compat_task() somehow. Currently the debugger can't know if, say,
> a 64bit tracee does int80.

OK, I will look into it, have no idea how to do that. Suggestions are
welcome, of course.

> OTOH, perhaps it would be better to report this via regs->flags as
> (iirc) Linus suggested.
>
> Once again, personally I am fine either way. Just I think we should
> discuss every possible option.

Thanks a lot for the feedback. I will write a new version of the patch
and post it for appreciation later. Meanwhile, other suggestions and
discussions are welcome, of course.

--
Sergio

2014-01-07 19:12:34

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

On 01/07, Sergio Durigan Junior wrote:
>
> On Tuesday, January 07 2014, Oleg Nesterov wrote:
>
> > Finally. If we add this feature, we should probably also report
> > is_compat_task() somehow. Currently the debugger can't know if, say,
> > a 64bit tracee does int80.
>
> OK, I will look into it, have no idea how to do that. Suggestions are
> welcome, of course.

Well, you can probably encode is_compat_task() in ->ptrace_message along
with syscall_get_nr(). But I am not sure about __X32_SYSCALL_BIT in
->orig_ax. Probably we should not clear it and report both bits.

Oleg.q

2014-01-09 18:49:23

by Pedro Alves

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

On 01/07/2014 03:30 PM, Oleg Nesterov wrote:
>
> If we add the new API, perhaps we should change ptrace_resume ?
> I mean,
>
> --- x/kernel/ptrace.c
> +++ x/kernel/ptrace.c
> @@ -723,7 +723,9 @@ static int ptrace_resume(struct task_str
> if (!valid_signal(data))
> return -EIO;
>
> - if (request == PTRACE_SYSCALL)
> + if (request == PTRACE_SYSCALL ||
> + ptrace_event_enabled(PTRACE_EVENT_SYSCALL_ENTER) ||
> + ptrace_event_enabled(PTRACE_EVENT_SYSCALL_EXIT))
> set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
> else
> clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
>
>
> This way PTRACE_O_SYSCALL_* will work like other ptrace options which
> ask to report an event.

+10^6. With PTRACE_SYSCALL/sysgood, we don't have a way to trace
syscalls when single-stepping, which isn't much of a problem for
strace, but of course is for GDB. That is one of the things the
new API should definitely sort out.

--
Pedro Alves

2014-01-09 21:11:26

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

> I won't argue, but it is not clear to me if this is really useful,
> given that the debugger can read the regs.

I do see the utility of having a consistent machine-independent way to get
the syscall number from userspace. But note that this could be probably
accomplished with just a uapi header change, providing an inline or macro
to extract the syscall number from the regset data. (It was once the case
that there were some machines where the syscall number is not in any
register and has to be extracted by decoding the instruction. I'm not sure
if that is still an issue anywhere.)

Also note that adding a separate copy of the syscall number introduces a
new wrinkle into the interface. This might be considered to be good, bad,
or indifferent, but I think it should at least be considered explicitly.
That is, at the entry stop the syscall number (when it's in a register) can
be changed via ptrace. So if the number is delivered via ptrace_message,
that's a separate copy of the original number that does not reflect any
changes made via ptrace--so it reflects what userland asked for, as opposed
to what the kernel actually acted on.

I don't have a particular opinion about which way to go with that.
I just wanted all the issues (I'm aware of) to be considered.

I absolutely concur that all new tracing features should work in the
existing style of PTRACE_EVENT_* that a PTRACE_O_* flag enables it and
then it applies for all kinds of running under ptrace. Things like
PTRACE_SYSCALL that combine what-to-report with how-to-run are terrible
and should never be done again.


Thanks,
Roland

2014-01-10 13:57:37

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

On 01/09, Pedro Alves wrote:
>
> On 01/07/2014 03:30 PM, Oleg Nesterov wrote:
> >
> > This way PTRACE_O_SYSCALL_* will work like other ptrace options which
> > ask to report an event.
>
> +10^6. With PTRACE_SYSCALL/sysgood, we don't have a way to trace
> syscalls when single-stepping, which isn't much of a problem for
> strace, but of course is for GDB. That is one of the things the
> new API should definitely sort out.

Hmm. I think this is a good point, but this needs more discussion.

So suppose that gdb does ptrace(PTRACE_SINGLESTEP) and the tracee
executes the "syscall" insn. What it should report?

syscall-entry looks obvious, PTRACE_EVENT_SYSCALL_ENTER should be
reported if PTRACE_O_SYSCALL_ENTER was set.

But what should syscall-exit do? Should it still report SIGSEGV as
it currently does, or should it report _SYSCALL_EXIT instead (if
PTRACE_O_SYSCALL_EXIT of course), or should it report both?

Oleg.

2014-01-13 13:35:52

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

On Mon, Jan 6, 2014 at 11:52 PM, Sergio Durigan Junior
<[email protected]> wrote:
> The other nice thing that I have implemented is the ability to obtain
> the syscall number related to the event by using PTRACE_GET_EVENTMSG.
> This way, we don't need to inspect registers anymore when we want to
> know which syscall is responsible for this or that event.

OTOH, by fetching registers using just one ptrace call you
get a lot more data.
So, this isn't *that* useful -- the debuggers already know how to fetch
and interpret regs - but it is also a cheap change. Why not?


> -static inline int ptrace_report_syscall(struct pt_regs *regs)
> +static inline int ptrace_report_syscall(struct pt_regs *regs, int entry,
> + unsigned int sysno)


This function looks ripe for de-inlining.


> /* Wait extended result codes for the above trace options. */
> -#define PTRACE_EVENT_FORK 1
> -#define PTRACE_EVENT_VFORK 2
> -#define PTRACE_EVENT_CLONE 3
> -#define PTRACE_EVENT_EXEC 4
> -#define PTRACE_EVENT_VFORK_DONE 5
> -#define PTRACE_EVENT_EXIT 6
> -#define PTRACE_EVENT_SECCOMP 7
> +#define PTRACE_EVENT_FORK 1
> +#define PTRACE_EVENT_VFORK 2
> +#define PTRACE_EVENT_CLONE 3
> +#define PTRACE_EVENT_EXEC 4
> +#define PTRACE_EVENT_VFORK_DONE 5
> +#define PTRACE_EVENT_EXIT 6
> +#define PTRACE_EVENT_SECCOMP 7
> +#define PTRACE_EVENT_SYSCALL_ENTER 8
> +#define PTRACE_EVENT_SYSCALL_EXIT 9
> +
> /* Extended result codes which enabled by means other than options. */
> #define PTRACE_EVENT_STOP 128
>
> @@ -87,11 +90,13 @@ struct ptrace_peeksiginfo_args {
> #define PTRACE_O_TRACEVFORKDONE (1 << PTRACE_EVENT_VFORK_DONE)
> #define PTRACE_O_TRACEEXIT (1 << PTRACE_EVENT_EXIT)
> #define PTRACE_O_TRACESECCOMP (1 << PTRACE_EVENT_SECCOMP)
> +#define PTRACE_O_SYSCALL_ENTER (1 << PTRACE_EVENT_SYSCALL_ENTER)
> +#define PTRACE_O_SYSCALL_EXIT (1 << PTRACE_EVENT_SYSCALL_EXIT)
>
> /* eventless options */
> #define PTRACE_O_EXITKILL (1 << 20)
>
> -#define PTRACE_O_MASK (0x000000ff | PTRACE_O_EXITKILL)
> +#define PTRACE_O_MASK (0x00000fff | PTRACE_O_EXITKILL)


You added just two bits, why did you extend the mask by four bits?
IOW: shouldn't it be 0x00003ff instead?

2014-01-19 02:29:42

by Sergio Durigan Junior

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

On Monday, January 13 2014, Denys Vlasenko wrote:

> On Mon, Jan 6, 2014 at 11:52 PM, Sergio Durigan Junior
> <[email protected]> wrote:
>> The other nice thing that I have implemented is the ability to obtain
>> the syscall number related to the event by using PTRACE_GET_EVENTMSG.
>> This way, we don't need to inspect registers anymore when we want to
>> know which syscall is responsible for this or that event.
>
> OTOH, by fetching registers using just one ptrace call you
> get a lot more data.
> So, this isn't *that* useful -- the debuggers already know how to fetch
> and interpret regs - but it is also a cheap change. Why not?

Thanks for the review, and sorry for the delay in answering.

I hope I have already touched this point in my other message to Oleg,
about making a single interface for all targets supported by the Linux
kernel and GDB, thus enabling them to obtain this information directly
from Linux instead of having to go and fetch the regs by themselves.

>> -static inline int ptrace_report_syscall(struct pt_regs *regs)
>> +static inline int ptrace_report_syscall(struct pt_regs *regs, int entry,
>> + unsigned int sysno)
>
>
> This function looks ripe for de-inlining.

Done.

>> /* Wait extended result codes for the above trace options. */
>> -#define PTRACE_EVENT_FORK 1
>> -#define PTRACE_EVENT_VFORK 2
>> -#define PTRACE_EVENT_CLONE 3
>> -#define PTRACE_EVENT_EXEC 4
>> -#define PTRACE_EVENT_VFORK_DONE 5
>> -#define PTRACE_EVENT_EXIT 6
>> -#define PTRACE_EVENT_SECCOMP 7
>> +#define PTRACE_EVENT_FORK 1
>> +#define PTRACE_EVENT_VFORK 2
>> +#define PTRACE_EVENT_CLONE 3
>> +#define PTRACE_EVENT_EXEC 4
>> +#define PTRACE_EVENT_VFORK_DONE 5
>> +#define PTRACE_EVENT_EXIT 6
>> +#define PTRACE_EVENT_SECCOMP 7
>> +#define PTRACE_EVENT_SYSCALL_ENTER 8
>> +#define PTRACE_EVENT_SYSCALL_EXIT 9
>> +
>> /* Extended result codes which enabled by means other than options. */
>> #define PTRACE_EVENT_STOP 128
>>
>> @@ -87,11 +90,13 @@ struct ptrace_peeksiginfo_args {
>> #define PTRACE_O_TRACEVFORKDONE (1 << PTRACE_EVENT_VFORK_DONE)
>> #define PTRACE_O_TRACEEXIT (1 << PTRACE_EVENT_EXIT)
>> #define PTRACE_O_TRACESECCOMP (1 << PTRACE_EVENT_SECCOMP)
>> +#define PTRACE_O_SYSCALL_ENTER (1 << PTRACE_EVENT_SYSCALL_ENTER)
>> +#define PTRACE_O_SYSCALL_EXIT (1 << PTRACE_EVENT_SYSCALL_EXIT)
>>
>> /* eventless options */
>> #define PTRACE_O_EXITKILL (1 << 20)
>>
>> -#define PTRACE_O_MASK (0x000000ff | PTRACE_O_EXITKILL)
>> +#define PTRACE_O_MASK (0x00000fff | PTRACE_O_EXITKILL)
>
>
> You added just two bits, why did you extend the mask by four bits?
> IOW: shouldn't it be 0x00003ff instead?

No particular reason, I just didn't give it much thinking. But thanks
for pointing that out, I've changed it.

--
Sergio

2014-01-19 02:40:22

by Sergio Durigan Junior

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

On Thursday, January 09 2014, Roland McGrath wrote:

>> I won't argue, but it is not clear to me if this is really useful,
>> given that the debugger can read the regs.
>
> I do see the utility of having a consistent machine-independent way to get
> the syscall number from userspace. But note that this could be probably
> accomplished with just a uapi header change, providing an inline or macro
> to extract the syscall number from the regset data. (It was once the case
> that there were some machines where the syscall number is not in any
> register and has to be extracted by decoding the instruction. I'm not sure
> if that is still an issue anywhere.)
>
> Also note that adding a separate copy of the syscall number introduces a
> new wrinkle into the interface. This might be considered to be good, bad,
> or indifferent, but I think it should at least be considered explicitly.
> That is, at the entry stop the syscall number (when it's in a register) can
> be changed via ptrace. So if the number is delivered via ptrace_message,
> that's a separate copy of the original number that does not reflect any
> changes made via ptrace--so it reflects what userland asked for, as opposed
> to what the kernel actually acted on.
>
> I don't have a particular opinion about which way to go with that.
> I just wanted all the issues (I'm aware of) to be considered.

Hm, thanks for your insights.

I don't really have a strong opinion here. I could say that I think
ptrace should report the syscall that was originally called (and not the
one that will effectively be called), but maybe that would sound like I
am defending what I current have, heh...

Anyway, one question that I'm having now is what currently happens.
Guess I will take a look.

Having said that, if adding the syscall number information to
ptrace_message is too much controversial, I guess I will abandon this
idea for now and just implement the notifications.

--
Sergio

2014-01-19 02:49:08

by Sergio Durigan Junior

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

On Friday, January 10 2014, Oleg Nesterov wrote:

> So suppose that gdb does ptrace(PTRACE_SINGLESTEP) and the tracee
> executes the "syscall" insn. What it should report?
[...]
> But what should syscall-exit do? Should it still report SIGSEGV as
> it currently does, or should it report _SYSCALL_EXIT instead (if
> PTRACE_O_SYSCALL_EXIT of course), or should it report both?

Both only if _SYSCALL_EXIT is set. Otherwise, stick to the current
behavior, I guess. Isn't it what my current patch does, by the way? I
didn't test this scenario so I'm just guessing here...

--
Sergio

2014-01-19 15:29:08

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

On 01/19, Sergio Durigan Junior wrote:
>
> On Friday, January 10 2014, Oleg Nesterov wrote:
>
> > So suppose that gdb does ptrace(PTRACE_SINGLESTEP) and the tracee
> > executes the "syscall" insn. What it should report?
> [...]
> > But what should syscall-exit do? Should it still report SIGSEGV as
> > it currently does, or should it report _SYSCALL_EXIT instead (if
> > PTRACE_O_SYSCALL_EXIT of course), or should it report both?
>
> Both only if _SYSCALL_EXIT is set. Otherwise, stick to the current
> behavior, I guess.

OK, both. In which order? Probably _EXIT first. But this looks a bit
strange. Suppose that the tracee reports _EXIT, then debugger does
ptrace(PTRACE_CONT), should the tracee report SIGTRAP?

SIGTRAP before _EXIT looks a bit strange too... Single-step trap should
be reported after insn, but we are still in syscall.

So perhaps _EXIT should win and do not report the step?

> Isn't it what my current patch does, by the way?

I forgot how this patch looks so I can be easily wrong, but iirc no.
Note that tracehook_report_syscall_exit() doesn't even call
ptrace_report_syscall() if step == T.

Btw, if you send v2, please CC Michael Kerrisk <[email protected]>.

Oleg.