2008-08-07 22:27:34

by Eduardo Habkost

[permalink] [raw]
Subject: 'strace -f' regression, bisected to tracehook


Hi,

I have just hit a problem with strace when following forks, using
recent trees. I have bisected the problem to commit 09a05394 (tracehook:
clone).

'strace -f' is not being able to trace child processes just after fork,
and traces them only after the child has run for some time. I am getting
the following output, when tracing a test program whose child exits just
after returning from fork:

clone(Process 399 attached (waiting for parent)
* resume: ptrace(PTRACE_SYSCALL, ...): No such process
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8df681a780) = 399
[pid 398] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 398] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[...]

What I expect to get (and was getting on 2.6.26 and before the bisected
commit) is:

clone(Process 391 attached (waiting for parent)
* Process 391 resumed (parent 390 ready)
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa84cf3c780) = 391
* [pid 391] exit_group(1) = ?
* Process 391 detached
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[...]


strace uses a trick to set the CLONE_PTRACE flag on clone() syscalls
made by the traced process. I don't know if the trick used by strace is
broken, or the handling of CLONE_PTRACE itself is broken.


The bisected commit is this:

commit 09a05394fe2448a4139b014936330af23fa7ec83
Author: Roland McGrath <[email protected]>
Date: Fri Jul 25 19:45:47 2008 -0700

tracehook: clone

This moves all the ptrace initialization and tracing logic for task
creation into tracehook.h and ptrace.h inlines. It reorganizes the code
slightly, but should not change any behavior.

There are four tracehook entry points, at each important stage of task
creation. This keeps the interface from the core fork.c code fairly
clean, while supporting the complex setup required for ptrace or something
like it.

Signed-off-by: Roland McGrath <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index c74abfc..dae6d85 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -154,6 +154,28 @@ static inline int ptrace_event(int mask, int event, unsigned long message)
return 1;
}

+/**
+ * ptrace_init_task - initialize ptrace state for a new child
+ * @child: new child task
+ * @ptrace: true if child should be ptrace'd by parent's tracer
+ *
+ * This is called immediately after adding @child to its parent's children
+ * list. @ptrace is false in the normal case, and true to ptrace @child.
+ *
+ * Called with current's siglock and write_lock_irq(&tasklist_lock) held.
+ */
+static inline void ptrace_init_task(struct task_struct *child, bool ptrace)
+{
+ INIT_LIST_HEAD(&child->ptrace_entry);
+ INIT_LIST_HEAD(&child->ptraced);
+ child->parent = child->real_parent;
+ child->ptrace = 0;
+ if (unlikely(ptrace)) {
+ child->ptrace = current->ptrace;
+ __ptrace_link(child, current->parent);
+ }
+}
+
#ifndef force_successful_syscall_return
/*
* System call handlers that, upon successful completion, need to return a
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 967ab47..3ebc58b 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -110,4 +110,104 @@ static inline void tracehook_report_exit(long *exit_code)
ptrace_event(PT_TRACE_EXIT, PTRACE_EVENT_EXIT, *exit_code);
}

+/**
+ * tracehook_prepare_clone - prepare for new child to be cloned
+ * @clone_flags: %CLONE_* flags from clone/fork/vfork system call
+ *
+ * This is called before a new user task is to be cloned.
+ * Its return value will be passed to tracehook_finish_clone().
+ *
+ * Called with no locks held.
+ */
+static inline int tracehook_prepare_clone(unsigned clone_flags)
+{
+ if (clone_flags & CLONE_UNTRACED)
+ return 0;
+
+ if (clone_flags & CLONE_VFORK) {
+ if (current->ptrace & PT_TRACE_VFORK)
+ return PTRACE_EVENT_VFORK;
+ } else if ((clone_flags & CSIGNAL) != SIGCHLD) {
+ if (current->ptrace & PT_TRACE_CLONE)
+ return PTRACE_EVENT_CLONE;
+ } else if (current->ptrace & PT_TRACE_FORK)
+ return PTRACE_EVENT_FORK;
+
+ return 0;
+}
+
+/**
+ * tracehook_finish_clone - new child created and being attached
+ * @child: new child task
+ * @clone_flags: %CLONE_* flags from clone/fork/vfork system call
+ * @trace: return value from tracehook_clone_prepare()
+ *
+ * This is called immediately after adding @child to its parent's children list.
+ * The @trace value is that returned by tracehook_prepare_clone().
+ *
+ * Called with current's siglock and write_lock_irq(&tasklist_lock) held.
+ */
+static inline void tracehook_finish_clone(struct task_struct *child,
+ unsigned long clone_flags, int trace)
+{
+ ptrace_init_task(child, (clone_flags & CLONE_PTRACE) || trace);
+}
+
+/**
+ * tracehook_report_clone - in parent, new child is about to start running
+ * @trace: return value from tracehook_clone_prepare()
+ * @regs: parent's user register state
+ * @clone_flags: flags from parent's system call
+ * @pid: new child's PID in the parent's namespace
+ * @child: new child task
+ *
+ * Called after a child is set up, but before it has been started running.
+ * The @trace value is that returned by tracehook_clone_prepare().
+ * This is not a good place to block, because the child has not started yet.
+ * Suspend the child here if desired, and block in tracehook_clone_complete().
+ * This must prevent the child from self-reaping if tracehook_clone_complete()
+ * uses the @child pointer; otherwise it might have died and been released by
+ * the time tracehook_report_clone_complete() is called.
+ *
+ * Called with no locks held, but the child cannot run until this returns.
+ */
+static inline void tracehook_report_clone(int trace, struct pt_regs *regs,
+ unsigned long clone_flags,
+ pid_t pid, struct task_struct *child)
+{
+ if (unlikely(trace)) {
+ /*
+ * The child starts up with an immediate SIGSTOP.
+ */
+ sigaddset(&child->pending.signal, SIGSTOP);
+ set_tsk_thread_flag(child, TIF_SIGPENDING);
+ }
+}
+
+/**
+ * tracehook_report_clone_complete - new child is running
+ * @trace: return value from tracehook_clone_prepare()
+ * @regs: parent's user register state
+ * @clone_flags: flags from parent's system call
+ * @pid: new child's PID in the parent's namespace
+ * @child: child task, already running
+ *
+ * This is called just after the child has started running. This is
+ * just before the clone/fork syscall returns, or blocks for vfork
+ * child completion if @clone_flags has the %CLONE_VFORK bit set.
+ * The @child pointer may be invalid if a self-reaping child died and
+ * tracehook_report_clone() took no action to prevent it from self-reaping.
+ *
+ * Called with no locks held.
+ */
+static inline void tracehook_report_clone_complete(int trace,
+ struct pt_regs *regs,
+ unsigned long clone_flags,
+ pid_t pid,
+ struct task_struct *child)
+{
+ if (unlikely(trace))
+ ptrace_event(0, trace, pid);
+}
+
#endif /* <linux/tracehook.h> */
diff --git a/kernel/fork.c b/kernel/fork.c
index 80e83e4..b42f8ed 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -37,6 +37,7 @@
#include <linux/swap.h>
#include <linux/syscalls.h>
#include <linux/jiffies.h>
+#include <linux/tracehook.h>
#include <linux/futex.h>
#include <linux/task_io_accounting_ops.h>
#include <linux/rcupdate.h>
@@ -865,8 +866,7 @@ static void copy_flags(unsigned long clone_flags, struct task_struct *p)

new_flags &= ~PF_SUPERPRIV;
new_flags |= PF_FORKNOEXEC;
- if (!(clone_flags & CLONE_PTRACE))
- p->ptrace = 0;
+ new_flags |= PF_STARTING;
p->flags = new_flags;
clear_freeze_flag(p);
}
@@ -907,7 +907,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
struct pt_regs *regs,
unsigned long stack_size,
int __user *child_tidptr,
- struct pid *pid)
+ struct pid *pid,
+ int trace)
{
int retval;
struct task_struct *p;
@@ -1163,8 +1164,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
*/
p->group_leader = p;
INIT_LIST_HEAD(&p->thread_group);
- INIT_LIST_HEAD(&p->ptrace_entry);
- INIT_LIST_HEAD(&p->ptraced);

/* Now that the task is set up, run cgroup callbacks if
* necessary. We need to run them before the task is visible
@@ -1195,7 +1194,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->real_parent = current->real_parent;
else
p->real_parent = current;
- p->parent = p->real_parent;

spin_lock(&current->sighand->siglock);

@@ -1237,8 +1235,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,

if (likely(p->pid)) {
list_add_tail(&p->sibling, &p->real_parent->children);
- if (unlikely(p->ptrace & PT_PTRACED))
- __ptrace_link(p, current->parent);
+ tracehook_finish_clone(p, clone_flags, trace);

if (thread_group_leader(p)) {
if (clone_flags & CLONE_NEWPID)
@@ -1323,29 +1320,13 @@ struct task_struct * __cpuinit fork_idle(int cpu)
struct pt_regs regs;

task = copy_process(CLONE_VM, 0, idle_regs(&regs), 0, NULL,
- &init_struct_pid);
+ &init_struct_pid, 0);
if (!IS_ERR(task))
init_idle(task, cpu);

return task;
}

-static int fork_traceflag(unsigned clone_flags)
-{
- if (clone_flags & CLONE_UNTRACED)
- return 0;
- else if (clone_flags & CLONE_VFORK) {
- if (current->ptrace & PT_TRACE_VFORK)
- return PTRACE_EVENT_VFORK;
- } else if ((clone_flags & CSIGNAL) != SIGCHLD) {
- if (current->ptrace & PT_TRACE_CLONE)
- return PTRACE_EVENT_CLONE;
- } else if (current->ptrace & PT_TRACE_FORK)
- return PTRACE_EVENT_FORK;
-
- return 0;
-}
-
/*
* Ok, this is the main fork-routine.
*
@@ -1380,14 +1361,14 @@ long do_fork(unsigned long clone_flags,
}
}

- if (unlikely(current->ptrace)) {
- trace = fork_traceflag (clone_flags);
- if (trace)
- clone_flags |= CLONE_PTRACE;
- }
+ /*
+ * When called from kernel_thread, don't do user tracing stuff.
+ */
+ if (likely(user_mode(regs)))
+ trace = tracehook_prepare_clone(clone_flags);

p = copy_process(clone_flags, stack_start, regs, stack_size,
- child_tidptr, NULL);
+ child_tidptr, NULL, trace);
/*
* Do this prior waking up the new thread - the thread pointer
* might get invalid after that point, if the thread exits quickly.
@@ -1405,24 +1386,30 @@ long do_fork(unsigned long clone_flags,
init_completion(&vfork);
}

- if ((p->ptrace & PT_PTRACED) || (clone_flags & CLONE_STOPPED)) {
+ tracehook_report_clone(trace, regs, clone_flags, nr, p);
+
+ /*
+ * We set PF_STARTING at creation in case tracing wants to
+ * use this to distinguish a fully live task from one that
+ * hasn't gotten to tracehook_report_clone() yet. Now we
+ * clear it and set the child going.
+ */
+ p->flags &= ~PF_STARTING;
+
+ if (unlikely(clone_flags & CLONE_STOPPED)) {
/*
* We'll start up with an immediate SIGSTOP.
*/
sigaddset(&p->pending.signal, SIGSTOP);
set_tsk_thread_flag(p, TIF_SIGPENDING);
- }
-
- if (!(clone_flags & CLONE_STOPPED))
- wake_up_new_task(p, clone_flags);
- else
__set_task_state(p, TASK_STOPPED);
-
- if (unlikely (trace)) {
- current->ptrace_message = nr;
- ptrace_notify ((trace << 8) | SIGTRAP);
+ } else {
+ wake_up_new_task(p, clone_flags);
}

+ tracehook_report_clone_complete(trace, regs,
+ clone_flags, nr, p);
+
if (clone_flags & CLONE_VFORK) {
freezer_do_not_count();
wait_for_completion(&vfork);
--
Eduardo


2008-08-07 23:38:45

by Eduardo Habkost

[permalink] [raw]
Subject: CLONE_PTRACE Oops (was Re: 'strace -f' regression, bisected to tracehook)

On Thu, Aug 07, 2008 at 07:24:34PM -0300, Eduardo Habkost wrote:
>
> Hi,
>
> I have just hit a problem with strace when following forks, using
> recent trees. I have bisected the problem to commit 09a05394 (tracehook:
> clone).
>
> 'strace -f' is not being able to trace child processes just after fork,
> and traces them only after the child has run for some time. I am getting
> the following output, when tracing a test program whose child exits just
> after returning from fork:
>
> clone(Process 399 attached (waiting for parent)
> * resume: ptrace(PTRACE_SYSCALL, ...): No such process
> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8df681a780) = 399
> [pid 398] --- SIGCHLD (Child exited) @ 0 (0) ---
> [pid 398] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
> [...]
>
> What I expect to get (and was getting on 2.6.26 and before the bisected
> commit) is:
>
> clone(Process 391 attached (waiting for parent)
> * Process 391 resumed (parent 390 ready)
> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa84cf3c780) = 391
> * [pid 391] exit_group(1) = ?
> * Process 391 detached
> --- SIGCHLD (Child exited) @ 0 (0) ---
> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
> [...]
>
>
> strace uses a trick to set the CLONE_PTRACE flag on clone() syscalls
> made by the traced process. I don't know if the trick used by strace is
> broken, or the handling of CLONE_PTRACE itself is broken.

While trying to investigate this, I have hit a BUG_ON() that can be
triggered by user-space code.

Steps to reproduce:

Compile the C program below. It will call clone() with the CLONE_PTRACE
flag set. Run it from bash (_not_ under strace).

====================
#include <sched.h>
#include <stdlib.h>

int e(void *p)
{
exit(1);
}

char stack[4096*2];

int main()
{
int r = clone(e, stack+4096, CLONE_PTRACE, 0);
if (r < 0) {
perror("clone");
return 1;
}
return 0;
}
====================

When running the program, bash hangs on a wait4() loop. Probably because
it is getting notified of the termination of the CLONE_PTRACE child but
doesn't know anything about it.

Send SIGTERM to bash. It won't have any effect.

Send SIGKILL to bash. It will trigger the BUG_ON(!child->ptrace)
at __ptrace_unlink():

------------[ cut here ]------------
kernel BUG at kernel/ptrace.c:69!
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1784, comm: bash Not tainted 2.6.26-kvm #47
RIP: 0010:[<ffffffff802393d0>] [<ffffffff802393d0>] __ptrace_unlink+0xa/0x5b
RSP: 0018:ffff88007e9dbc88 EFLAGS: 00010046
RAX: ffff88007f900328 RBX: ffff88007e9dbcc8 RCX: ffffffff8066a5a0
RDX: ffff88007ea982d8 RSI: ffff88007e9dbc58 RDI: ffff88007f900080
RBP: ffff88007e9dbc88 R08: ffffffff80681880 R09: ffffffff806817e0
R10: ffff88007e9dbc58 R11: 0000000000000282 R12: ffff88007ea98040
R13: ffff88007f900080 R14: ffff88007f9ad440 R15: 00000000ffffffff
FS: 0000000000000000(0000) GS:ffffffff806a5a80(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000483400 CR3: 0000000000201000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 1784, threadinfo ffff88007e9da000, task ffff88007ea98040)
Stack: ffff88007e9dbd08 ffffffff80234cad 0000000100000087 ffff88007f9ad438
ffff88007ea982d8 ffff88007f900328 ffff88007f9ad450 ffff88007ea98030
ffff88007e9dbcc8 ffff88007e9dbcc8 ffff88007e9dbd18 ffff88007f9ad440
Call Trace:
[<ffffffff80234cad>] do_exit+0x34c/0x7f3
[<ffffffff802351d1>] do_group_exit+0x7d/0xaa
[<ffffffff8023de2c>] get_signal_to_deliver+0x31a/0x342
[<ffffffff8020b4d5>] ? sysret_signal+0x3d/0x67
[<ffffffff8020a68e>] do_notify_resume+0x7b/0x89f
[<ffffffff80209b47>] ? __switch_to+0x1b6/0x3b2
[<ffffffff80225e69>] ? set_next_entity+0x62/0xb2
[<ffffffff804ef652>] ? thread_return+0x3d/0xc5
[<ffffffff8020b4d5>] ? sysret_signal+0x3d/0x67
[<ffffffff8020b877>] ptregscall_common+0x67/0xb0


Code: 48 89 df e8 2b 2e 00 00 48 8b bb 68 05 00 00 48 81 c7 08 08 00 00 e8 2e 0f fe ff 90 41 5b 5b c9 c3 83 7f 18 00 55 48 89 e5 75 04 <0f> 0b eb fe 48 8b 87 60 02 00 00 48 8b 97 a8 02 00 00 48 8d 8f
RIP [<ffffffff802393d0>] __ptrace_unlink+0xa/0x5b
RSP <ffff88007e9dbc88>
---[ end trace 9740fb23e0450ea6 ]---


The problem was reproduced on commit 09a05394, and not reproduced on
the commit immediately before it.


>
>
> The bisected commit is this:
>
> commit 09a05394fe2448a4139b014936330af23fa7ec83
> Author: Roland McGrath <[email protected]>
> Date: Fri Jul 25 19:45:47 2008 -0700
>
> tracehook: clone
>
> This moves all the ptrace initialization and tracing logic for task
> creation into tracehook.h and ptrace.h inlines. It reorganizes the code
> slightly, but should not change any behavior.
>
> There are four tracehook entry points, at each important stage of task
> creation. This keeps the interface from the core fork.c code fairly
> clean, while supporting the complex setup required for ptrace or something
> like it.
>
> Signed-off-by: Roland McGrath <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Reviewed-by: Ingo Molnar <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>

--
Eduardo

2008-08-08 00:38:34

by Roland McGrath

[permalink] [raw]
Subject: [PATCH] tracehook: fix CLONE_PTRACE

The following changes since commit 685d87f7ccc649ab92b55e18e507a65d0e694eb9:
Linus Torvalds (1):
Revert "pcm_native.c: remove unused label"

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git tracehook

Roland McGrath (1):
tracehook: fix CLONE_PTRACE

include/linux/ptrace.h | 2 +-
include/linux/tracehook.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

Thanks,
Roland

---
[PATCH] tracehook: fix CLONE_PTRACE

In the change in commit 09a05394fe2448a4139b014936330af23fa7ec83, I
overlooked two nits in the logic and this broke using CLONE_PTRACE
when PTRACE_O_TRACE* are not being used.

A parent that is itself traced at all but not using PTRACE_O_TRACE*,
using CLONE_PTRACE would have its new child fail to be traced.

A parent that is not itself traced at all that uses CLONE_PTRACE
(which should be a no-op in this case) would confuse the bookkeeping
and lead to a crash at exit time.

This restores the missing checks and fixes both failure modes.

Reported-by: Eduardo Habkost <[email protected]>
Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/ptrace.h | 2 +-
include/linux/tracehook.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index fd31756..ea7416c 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -172,7 +172,7 @@ static inline void ptrace_init_task(struct task_struct *child, bool ptrace)
child->ptrace = 0;
if (unlikely(ptrace)) {
child->ptrace = current->ptrace;
- __ptrace_link(child, current->parent);
+ ptrace_link(child, current->parent);
}
}

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index ab3ef7a..b48d819 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -280,7 +280,7 @@ static inline void tracehook_report_clone(int trace, struct pt_regs *regs,
unsigned long clone_flags,
pid_t pid, struct task_struct *child)
{
- if (unlikely(trace)) {
+ if (unlikely(trace) || unlikely(clone_flags & CLONE_PTRACE)) {
/*
* The child starts up with an immediate SIGSTOP.
*/