2011-06-17 14:53:49

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET] ptrace: kill most tracehooks

Hello,

At this point, tracehooks aren't useful to mainline kernel and mostly
just add an extra layer of obfuscation. Although they have comments,
without actual in-kernel users, it is difficult to tell what are their
assumptions and they're actually trying to achieve. To mainline
kernel, they just aren't worth keeping around.

This patchset kills most tracehooks which aren't used from arch codes.
The remaining ones will be dealt with with future changes.

0001-ptrace-kill-task_ptrace.patch
0002-ptrace-introduce-ptrace_event_enabled-and-simplify-p.patch
0003-ptrace-move-SIGTRAP-on-exec-2-logic-to-ptrace_event.patch
0004-ptrace-kill-trivial-tracehooks.patch
0005-ptrace-kill-clone-exec-tracehooks.patch
0006-ptrace-kill-detah-tracehooks.patch
0007-ptrace-s-tracehook_tracer_task-ptrace_parent.patch

Most conversions are straightforward. The only tricky one is 0006
which reimplements the decision logic.

This patch is on top of Oleg's ptrace branch[1] - 544b2c91a9 (ptrace:
implement PTRACE_LISTEN), and available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-tracehook

diffstat follows.

arch/s390/kernel/traps.c | 4
fs/exec.c | 9 -
fs/proc/array.c | 2
fs/proc/base.c | 2
include/linux/ptrace.h | 71 ++++++---
include/linux/tracehook.h | 333 ---------------------------------------------
kernel/exit.c | 43 +++--
kernel/fork.c | 43 ++++-
kernel/signal.c | 22 +-
mm/nommu.c | 3
mm/oom_kill.c | 3
security/apparmor/domain.c | 2
security/selinux/hooks.c | 4
13 files changed, 136 insertions(+), 405 deletions(-)

--
tejun

[1] git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git ptrace


2011-06-17 14:53:46

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/7] ptrace: kill task_ptrace()

task_ptrace(task) simply dereferences task->ptrace and isn't even used
consistently only adding confusion. Kill it and directly access
->ptrace instead.

This doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/ptrace.h | 11 -----------
include/linux/tracehook.h | 16 ++++++++--------
kernel/exit.c | 8 ++++----
kernel/signal.c | 14 +++++++-------
mm/oom_kill.c | 3 +--
5 files changed, 20 insertions(+), 32 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4f224f1..3ff20b3 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -146,17 +146,6 @@ int generic_ptrace_pokedata(struct task_struct *tsk, unsigned long addr,
unsigned long data);

/**
- * task_ptrace - return %PT_* flags that apply to a task
- * @task: pointer to &task_struct in question
- *
- * Returns the %PT_* flags that apply to @task.
- */
-static inline int task_ptrace(struct task_struct *task)
-{
- return task->ptrace;
-}
-
-/**
* ptrace_event - possibly stop for a ptrace event notification
* @mask: %PT_* bit to check in @current->ptrace
* @event: %PTRACE_EVENT_* value to report if @mask is set
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 15745cd..a3e8387 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -63,7 +63,7 @@ struct linux_binprm;
*/
static inline int tracehook_expect_breakpoints(struct task_struct *task)
{
- return (task_ptrace(task) & PT_PTRACED) != 0;
+ return (task->ptrace & PT_PTRACED) != 0;
}

/*
@@ -71,7 +71,7 @@ static inline int tracehook_expect_breakpoints(struct task_struct *task)
*/
static inline void ptrace_report_syscall(struct pt_regs *regs)
{
- int ptrace = task_ptrace(current);
+ int ptrace = current->ptrace;

if (!(ptrace & PT_PTRACED))
return;
@@ -155,7 +155,7 @@ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step)
static inline int tracehook_unsafe_exec(struct task_struct *task)
{
int unsafe = 0;
- int ptrace = task_ptrace(task);
+ int ptrace = task->ptrace;
if (ptrace & PT_PTRACED) {
if (ptrace & PT_PTRACE_CAP)
unsafe |= LSM_UNSAFE_PTRACE_CAP;
@@ -178,7 +178,7 @@ static inline int tracehook_unsafe_exec(struct task_struct *task)
*/
static inline struct task_struct *tracehook_tracer_task(struct task_struct *tsk)
{
- if (task_ptrace(tsk) & PT_PTRACED)
+ if (tsk->ptrace & PT_PTRACED)
return rcu_dereference(tsk->parent);
return NULL;
}
@@ -202,7 +202,7 @@ static inline void tracehook_report_exec(struct linux_binfmt *fmt,
struct pt_regs *regs)
{
if (!ptrace_event(PT_TRACE_EXEC, PTRACE_EVENT_EXEC, 0) &&
- unlikely(task_ptrace(current) & PT_PTRACED))
+ unlikely(current->ptrace & PT_PTRACED))
send_sig(SIGTRAP, current, 0);
}

@@ -285,7 +285,7 @@ static inline void tracehook_report_clone(struct pt_regs *regs,
unsigned long clone_flags,
pid_t pid, struct task_struct *child)
{
- if (unlikely(task_ptrace(child))) {
+ if (unlikely(child->ptrace)) {
/*
* It doesn't matter who attached/attaching to this
* task, the pending SIGSTOP is right in any case.
@@ -403,7 +403,7 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info,
static inline int tracehook_consider_ignored_signal(struct task_struct *task,
int sig)
{
- return (task_ptrace(task) & PT_PTRACED) != 0;
+ return (task->ptrace & PT_PTRACED) != 0;
}

/**
@@ -422,7 +422,7 @@ static inline int tracehook_consider_ignored_signal(struct task_struct *task,
static inline int tracehook_consider_fatal_signal(struct task_struct *task,
int sig)
{
- return (task_ptrace(task) & PT_PTRACED) != 0;
+ return (task->ptrace & PT_PTRACED) != 0;
}

#define DEATH_REAP -1
diff --git a/kernel/exit.c b/kernel/exit.c
index 289f59d..e5cc056 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -765,7 +765,7 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
p->exit_signal = SIGCHLD;

/* If it has exited notify the new parent about this child's death. */
- if (!task_ptrace(p) &&
+ if (!p->ptrace &&
p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) {
do_notify_parent(p, p->exit_signal);
if (task_detached(p)) {
@@ -795,7 +795,7 @@ static void forget_original_parent(struct task_struct *father)
do {
t->real_parent = reaper;
if (t->parent == father) {
- BUG_ON(task_ptrace(t));
+ BUG_ON(t->ptrace);
t->parent = t->real_parent;
}
if (t->pdeath_signal)
@@ -1565,7 +1565,7 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
* Notification and reaping will be cascaded to the real
* parent when the ptracer detaches.
*/
- if (likely(!ptrace) && unlikely(task_ptrace(p))) {
+ if (likely(!ptrace) && unlikely(p->ptrace)) {
/* it will become visible, clear notask_error */
wo->notask_error = 0;
return 0;
@@ -1608,7 +1608,7 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
* own children, it should create a separate process which
* takes the role of real parent.
*/
- if (likely(!ptrace) && task_ptrace(p) &&
+ if (likely(!ptrace) && p->ptrace &&
same_thread_group(p->parent, p->real_parent))
return 0;

diff --git a/kernel/signal.c b/kernel/signal.c
index 97e575a..0f33708 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1592,7 +1592,7 @@ int do_notify_parent(struct task_struct *tsk, int sig)
/* do_notify_parent_cldstop should have been called instead. */
BUG_ON(task_is_stopped_or_traced(tsk));

- BUG_ON(!task_ptrace(tsk) &&
+ BUG_ON(!tsk->ptrace &&
(tsk->group_leader != tsk || !thread_group_empty(tsk)));

info.si_signo = sig;
@@ -1631,7 +1631,7 @@ int do_notify_parent(struct task_struct *tsk, int sig)

psig = tsk->parent->sighand;
spin_lock_irqsave(&psig->siglock, flags);
- if (!task_ptrace(tsk) && sig == SIGCHLD &&
+ if (!tsk->ptrace && sig == SIGCHLD &&
(psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
(psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
/*
@@ -1731,7 +1731,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,

static inline int may_ptrace_stop(void)
{
- if (!likely(task_ptrace(current)))
+ if (!likely(current->ptrace))
return 0;
/*
* Are we in the middle of do_coredump?
@@ -1989,7 +1989,7 @@ static bool do_signal_stop(int signr)
if (!(sig->flags & SIGNAL_STOP_STOPPED))
sig->group_exit_code = signr;
else
- WARN_ON_ONCE(!task_ptrace(current));
+ WARN_ON_ONCE(!current->ptrace);

sig->group_stop_count = 0;

@@ -2014,7 +2014,7 @@ static bool do_signal_stop(int signr)
}
}

- if (likely(!task_ptrace(current))) {
+ if (likely(!current->ptrace)) {
int notify = 0;

/*
@@ -2093,7 +2093,7 @@ static void do_jobctl_trap(void)
static int ptrace_signal(int signr, siginfo_t *info,
struct pt_regs *regs, void *cookie)
{
- if (!task_ptrace(current))
+ if (!current->ptrace)
return signr;

ptrace_signal_deliver(regs, cookie);
@@ -2179,7 +2179,7 @@ relock:
do_notify_parent_cldstop(current, false, why);

leader = current->group_leader;
- if (task_ptrace(leader) && !real_parent_is_ptracer(leader))
+ if (leader->ptrace && !real_parent_is_ptracer(leader))
do_notify_parent_cldstop(leader, true, why);

read_unlock(&tasklist_lock);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index e4b0991..b0be989 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -339,8 +339,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
* then wait for it to finish before killing
* some other task unnecessarily.
*/
- if (!(task_ptrace(p->group_leader) &
- PT_TRACE_EXIT))
+ if (!(p->group_leader->ptrace & PT_TRACE_EXIT))
return ERR_PTR(-1UL);
}
}
--
1.7.5.4

2011-06-17 14:51:54

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/7] ptrace: introduce ptrace_event_enabled() and simplify ptrace_event() and tracehook_prepare_clone()

This patch implements ptrace_event_enabled() which tests whether a
given PTRACE_EVENT_* is enabled and use it to simplify ptrace_event()
and tracehook_prepare_clone().

PT_EVENT_FLAG() macro is added which calculates PT_TRACE_* flag from
PTRACE_EVENT_*. This is used to define PT_TRACE_* flags and by
ptrace_event_enabled() to find the matching flag.

This is used to make ptrace_event() and tracehook_prepare_clone()
simpler.

* ptrace_event() callers were responsible for providing mask to test
whether the event was enabled. This patch implements
ptrace_event_enabled() and make ptrace_event() drop @mask and
determine whether the event is enabled from @event. Note that
@event is constant and this conversion doesn't add runtime overhead.

All conversions except tracehook_report_clone_complete() are
trivial. tracehook_report_clone_complete() used to use 0 for @mask
(always enabled) but now tests whether the specified event is
enabled. This doesn't cause any behavior difference as it's
guaranteed that the event specified by @trace is enabled.

* tracehook_prepare_clone() now only determines which event is
applicable and use ptrace_event_enabled() for enable test.

This doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/ptrace.h | 46 +++++++++++++++++++++++++++++++-------------
include/linux/tracehook.h | 26 ++++++++++++------------
2 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 3ff20b3..18feac6 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -90,12 +90,17 @@
#define PT_DTRACE 0x00000002 /* delayed trace (used on m68k, i386) */
#define PT_TRACESYSGOOD 0x00000004
#define PT_PTRACE_CAP 0x00000008 /* ptracer can follow suid-exec */
-#define PT_TRACE_FORK 0x00000010
-#define PT_TRACE_VFORK 0x00000020
-#define PT_TRACE_CLONE 0x00000040
-#define PT_TRACE_EXEC 0x00000080
-#define PT_TRACE_VFORK_DONE 0x00000100
-#define PT_TRACE_EXIT 0x00000200
+
+/* PT_TRACE_* event enable flags */
+#define PT_EVENT_FLAG_SHIFT 4
+#define PT_EVENT_FLAG(event) (1 << (PT_EVENT_FLAG_SHIFT + (event) - 1))
+
+#define PT_TRACE_FORK PT_EVENT_FLAG(PTRACE_EVENT_FORK)
+#define PT_TRACE_VFORK PT_EVENT_FLAG(PTRACE_EVENT_VFORK)
+#define PT_TRACE_CLONE PT_EVENT_FLAG(PTRACE_EVENT_CLONE)
+#define PT_TRACE_EXEC PT_EVENT_FLAG(PTRACE_EVENT_EXEC)
+#define PT_TRACE_VFORK_DONE PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE)
+#define PT_TRACE_EXIT PT_EVENT_FLAG(PTRACE_EVENT_EXIT)

#define PT_TRACE_MASK 0x000003f4

@@ -146,25 +151,38 @@ int generic_ptrace_pokedata(struct task_struct *tsk, unsigned long addr,
unsigned long data);

/**
+ * ptrace_event_enabled - test whether a ptrace event is enabled
+ * @task: ptracee of interest
+ * @event: %PTRACE_EVENT_* to test
+ *
+ * Test whether @event is enabled for ptracee @task.
+ *
+ * Returns %true if @event is enabled, %false otherwise.
+ */
+static inline bool ptrace_event_enabled(struct task_struct *task, int event)
+{
+ return task->ptrace & PT_EVENT_FLAG(event);
+}
+
+/**
* ptrace_event - possibly stop for a ptrace event notification
- * @mask: %PT_* bit to check in @current->ptrace
- * @event: %PTRACE_EVENT_* value to report if @mask is set
+ * @event: %PTRACE_EVENT_* value to report
* @message: value for %PTRACE_GETEVENTMSG to return
*
- * This checks the @mask bit to see if ptrace wants stops for this event.
- * If so we stop, reporting @event and @message to the ptrace parent.
+ * Check whether @event is enabled and, if so, report @event and @message
+ * to the ptrace parent.
*
* Returns nonzero if we did a ptrace notification, zero if not.
*
* Called without locks.
*/
-static inline int ptrace_event(int mask, int event, unsigned long message)
+static inline int ptrace_event(int event, unsigned long message)
{
- if (mask && likely(!(current->ptrace & mask)))
- return 0;
+ if (likely(!ptrace_event_enabled(current, event)))
+ return false;
current->ptrace_message = message;
ptrace_notify((event << 8) | SIGTRAP);
- return 1;
+ return true;
}

/**
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index a3e8387..7d38571 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -201,7 +201,7 @@ static inline void tracehook_report_exec(struct linux_binfmt *fmt,
struct linux_binprm *bprm,
struct pt_regs *regs)
{
- if (!ptrace_event(PT_TRACE_EXEC, PTRACE_EVENT_EXEC, 0) &&
+ if (!ptrace_event(PTRACE_EVENT_EXEC, 0) &&
unlikely(current->ptrace & PT_PTRACED))
send_sig(SIGTRAP, current, 0);
}
@@ -218,7 +218,7 @@ static inline void tracehook_report_exec(struct linux_binfmt *fmt,
*/
static inline void tracehook_report_exit(long *exit_code)
{
- ptrace_event(PT_TRACE_EXIT, PTRACE_EVENT_EXIT, *exit_code);
+ ptrace_event(PTRACE_EVENT_EXIT, *exit_code);
}

/**
@@ -232,19 +232,19 @@ static inline void tracehook_report_exit(long *exit_code)
*/
static inline int tracehook_prepare_clone(unsigned clone_flags)
{
+ int event = 0;
+
if (clone_flags & CLONE_UNTRACED)
return 0;

- if (clone_flags & CLONE_VFORK) {
- if (current->ptrace & PT_TRACE_VFORK)
- return PTRACE_EVENT_VFORK;
- } else if ((clone_flags & CSIGNAL) != SIGCHLD) {
- if (current->ptrace & PT_TRACE_CLONE)
- return PTRACE_EVENT_CLONE;
- } else if (current->ptrace & PT_TRACE_FORK)
- return PTRACE_EVENT_FORK;
+ if (clone_flags & CLONE_VFORK)
+ event = PTRACE_EVENT_VFORK;
+ else if ((clone_flags & CSIGNAL) != SIGCHLD)
+ event = PTRACE_EVENT_CLONE;
+ else
+ event = PTRACE_EVENT_FORK;

- return 0;
+ return ptrace_event_enabled(current, event) ? event : 0;
}

/**
@@ -318,7 +318,7 @@ static inline void tracehook_report_clone_complete(int trace,
struct task_struct *child)
{
if (unlikely(trace))
- ptrace_event(0, trace, pid);
+ ptrace_event(trace, pid);
}

/**
@@ -336,7 +336,7 @@ static inline void tracehook_report_clone_complete(int trace,
static inline void tracehook_report_vfork_done(struct task_struct *child,
pid_t pid)
{
- ptrace_event(PT_TRACE_VFORK_DONE, PTRACE_EVENT_VFORK_DONE, pid);
+ ptrace_event(PTRACE_EVENT_VFORK_DONE, pid);
}

/**
--
1.7.5.4

2011-06-17 14:51:46

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/7] ptrace: move SIGTRAP on exec(2) logic to ptrace_event()

Move SIGTRAP on exec(2) logic from tracehook_report_exec() to
ptrace_event(). This is part of changes to make ptrace_event()
smarter and handle ptrace event related details in one place.

This doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/ptrace.h | 16 ++++++++--------
include/linux/tracehook.h | 4 +---
2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 18feac6..b546fd6 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -172,17 +172,17 @@ static inline bool ptrace_event_enabled(struct task_struct *task, int event)
* Check whether @event is enabled and, if so, report @event and @message
* to the ptrace parent.
*
- * Returns nonzero if we did a ptrace notification, zero if not.
- *
* Called without locks.
*/
-static inline int ptrace_event(int event, unsigned long message)
+static inline void ptrace_event(int event, unsigned long message)
{
- if (likely(!ptrace_event_enabled(current, event)))
- return false;
- current->ptrace_message = message;
- ptrace_notify((event << 8) | SIGTRAP);
- return true;
+ if (unlikely(ptrace_event_enabled(current, event))) {
+ current->ptrace_message = message;
+ ptrace_notify((event << 8) | SIGTRAP);
+ } else if (event == PTRACE_EVENT_EXEC && unlikely(current->ptrace)) {
+ /* legacy EXEC report via SIGTRAP */
+ send_sig(SIGTRAP, current, 0);
+ }
}

/**
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 7d38571..3b68aa8 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -201,9 +201,7 @@ static inline void tracehook_report_exec(struct linux_binfmt *fmt,
struct linux_binprm *bprm,
struct pt_regs *regs)
{
- if (!ptrace_event(PTRACE_EVENT_EXEC, 0) &&
- unlikely(current->ptrace & PT_PTRACED))
- send_sig(SIGTRAP, current, 0);
+ ptrace_event(PTRACE_EVENT_EXEC, 0);
}

/**
--
1.7.5.4

2011-06-17 14:53:44

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 4/7] ptrace: kill trivial tracehooks

At this point, tracehooks aren't useful to mainline kernel and mostly
just add an extra layer of obfuscation. Although they have comments,
without actual in-kernel users, it is difficult to tell what are their
assumptions and they're actually trying to achieve. To mainline
kernel, they just aren't worth keeping around.

This patch kills the following trivial tracehooks.

* Ones testing whether task is ptraced. Replace with ->ptrace test.

tracehook_expect_breakpoints()
tracehook_consider_ignored_signal()
tracehook_consider_fatal_signal()

* ptrace_event() wrappers. Call directly.

tracehook_report_exec()
tracehook_report_exit()
tracehook_report_vfork_done()

* ptrace_release_task() wrapper. Call directly.

tracehook_finish_release_task()

* noop

tracehook_prepare_release_task()
tracehook_report_death()

This doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
---
arch/s390/kernel/traps.c | 4 +-
fs/exec.c | 2 +-
include/linux/tracehook.h | 156 ---------------------------------------------
kernel/exit.c | 7 +--
kernel/fork.c | 2 +-
kernel/signal.c | 8 +-
mm/nommu.c | 3 +-
7 files changed, 11 insertions(+), 171 deletions(-)

diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
index a65d2e8..a63d34c 100644
--- a/arch/s390/kernel/traps.c
+++ b/arch/s390/kernel/traps.c
@@ -331,7 +331,7 @@ void __kprobes do_per_trap(struct pt_regs *regs)
{
if (notify_die(DIE_SSTEP, "sstep", regs, 0, 0, SIGTRAP) == NOTIFY_STOP)
return;
- if (tracehook_consider_fatal_signal(current, SIGTRAP))
+ if (current->ptrace)
force_sig(SIGTRAP, current);
}

@@ -425,7 +425,7 @@ static void __kprobes illegal_op(struct pt_regs *regs, long pgm_int_code,
if (get_user(*((__u16 *) opcode), (__u16 __user *) location))
return;
if (*((__u16 *) opcode) == S390_BREAKPOINT_U16) {
- if (tracehook_consider_fatal_signal(current, SIGTRAP))
+ if (current->ptrace)
force_sig(SIGTRAP, current);
else
signal = SIGILL;
diff --git a/fs/exec.c b/fs/exec.c
index a9f2b36..b37030d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1384,7 +1384,7 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
bprm->recursion_depth = depth;
if (retval >= 0) {
if (depth == 0)
- tracehook_report_exec(fmt, bprm, regs);
+ ptrace_event(PTRACE_EVENT_EXEC, 0);
put_binfmt(fmt);
allow_write_access(bprm->file);
if (bprm->file)
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 3b68aa8..8b06d4f 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -51,21 +51,6 @@
#include <linux/security.h>
struct linux_binprm;

-/**
- * tracehook_expect_breakpoints - guess if task memory might be touched
- * @task: current task, making a new mapping
- *
- * Return nonzero if @task is expected to want breakpoint insertion in
- * its memory at some point. A zero return is no guarantee it won't
- * be done, but this is a hint that it's known to be likely.
- *
- * May be called with @task->mm->mmap_sem held for writing.
- */
-static inline int tracehook_expect_breakpoints(struct task_struct *task)
-{
- return (task->ptrace & PT_PTRACED) != 0;
-}
-
/*
* ptrace report for syscall entry and exit looks identical.
*/
@@ -184,42 +169,6 @@ static inline struct task_struct *tracehook_tracer_task(struct task_struct *tsk)
}

/**
- * tracehook_report_exec - a successful exec was completed
- * @fmt: &struct linux_binfmt that performed the exec
- * @bprm: &struct linux_binprm containing exec details
- * @regs: user-mode register state
- *
- * An exec just completed, we are shortly going to return to user mode.
- * The freshly initialized register state can be seen and changed in @regs.
- * The name, file and other pointers in @bprm are still on hand to be
- * inspected, but will be freed as soon as this returns.
- *
- * Called with no locks, but with some kernel resources held live
- * and a reference on @fmt->module.
- */
-static inline void tracehook_report_exec(struct linux_binfmt *fmt,
- struct linux_binprm *bprm,
- struct pt_regs *regs)
-{
- ptrace_event(PTRACE_EVENT_EXEC, 0);
-}
-
-/**
- * tracehook_report_exit - task has begun to exit
- * @exit_code: pointer to value destined for @current->exit_code
- *
- * @exit_code points to the value passed to do_exit(), which tracing
- * might change here. This is almost the first thing in do_exit(),
- * before freeing any resources or setting the %PF_EXITING flag.
- *
- * Called with no locks held.
- */
-static inline void tracehook_report_exit(long *exit_code)
-{
- ptrace_event(PTRACE_EVENT_EXIT, *exit_code);
-}
-
-/**
* tracehook_prepare_clone - prepare for new child to be cloned
* @clone_flags: %CLONE_* flags from clone/fork/vfork system call
*
@@ -320,52 +269,6 @@ static inline void tracehook_report_clone_complete(int trace,
}

/**
- * tracehook_report_vfork_done - vfork parent's child has exited or exec'd
- * @child: child task, already running
- * @pid: new child's PID in the parent's namespace
- *
- * Called after a %CLONE_VFORK parent has waited for the child to complete.
- * The clone/vfork system call will return immediately after this.
- * The @child pointer may be invalid if a self-reaping child died and
- * tracehook_report_clone() took no action to prevent it from self-reaping.
- *
- * Called with no locks held.
- */
-static inline void tracehook_report_vfork_done(struct task_struct *child,
- pid_t pid)
-{
- ptrace_event(PTRACE_EVENT_VFORK_DONE, pid);
-}
-
-/**
- * tracehook_prepare_release_task - task is being reaped, clean up tracing
- * @task: task in %EXIT_DEAD state
- *
- * This is called in release_task() just before @task gets finally reaped
- * and freed. This would be the ideal place to remove and clean up any
- * tracing-related state for @task.
- *
- * Called with no locks held.
- */
-static inline void tracehook_prepare_release_task(struct task_struct *task)
-{
-}
-
-/**
- * tracehook_finish_release_task - final tracing clean-up
- * @task: task in %EXIT_DEAD state
- *
- * This is called in release_task() when @task is being in the middle of
- * being reaped. After this, there must be no tracing entanglements.
- *
- * Called with write_lock_irq(&tasklist_lock) held.
- */
-static inline void tracehook_finish_release_task(struct task_struct *task)
-{
- ptrace_release_task(task);
-}
-
-/**
* tracehook_signal_handler - signal handler setup is complete
* @sig: number of signal being delivered
* @info: siginfo_t of signal being delivered
@@ -388,41 +291,6 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info,
ptrace_notify(SIGTRAP);
}

-/**
- * tracehook_consider_ignored_signal - suppress short-circuit of ignored signal
- * @task: task receiving the signal
- * @sig: signal number being sent
- *
- * Return zero iff tracing doesn't care to examine this ignored signal,
- * so it can short-circuit normal delivery and never even get queued.
- *
- * Called with @task->sighand->siglock held.
- */
-static inline int tracehook_consider_ignored_signal(struct task_struct *task,
- int sig)
-{
- return (task->ptrace & PT_PTRACED) != 0;
-}
-
-/**
- * tracehook_consider_fatal_signal - suppress special handling of fatal signal
- * @task: task receiving the signal
- * @sig: signal number being sent
- *
- * Return nonzero to prevent special handling of this termination signal.
- * Normally handler for signal is %SIG_DFL. It can be %SIG_IGN if @sig is
- * ignored, in which case force_sig() is about to reset it to %SIG_DFL.
- * When this returns zero, this signal might cause a quick termination
- * that does not give the debugger a chance to intercept the signal.
- *
- * Called with or without @task->sighand->siglock held.
- */
-static inline int tracehook_consider_fatal_signal(struct task_struct *task,
- int sig)
-{
- return (task->ptrace & PT_PTRACED) != 0;
-}
-
#define DEATH_REAP -1
#define DEATH_DELAYED_GROUP_LEADER -2

@@ -457,30 +325,6 @@ static inline int tracehook_notify_death(struct task_struct *task,
return task->ptrace ? SIGCHLD : DEATH_DELAYED_GROUP_LEADER;
}

-/**
- * tracehook_report_death - task is dead and ready to be reaped
- * @task: @current task now exiting
- * @signal: return value from tracheook_notify_death()
- * @death_cookie: value passed back from tracehook_notify_death()
- * @group_dead: nonzero if this was the last thread in the group to die
- *
- * Thread has just become a zombie or is about to self-reap. If positive,
- * @signal is the signal number just sent to the parent (usually %SIGCHLD).
- * If @signal is %DEATH_REAP, this thread will self-reap. If @signal is
- * %DEATH_DELAYED_GROUP_LEADER, this is a delayed_group_leader() zombie.
- * The @death_cookie was passed back by tracehook_notify_death().
- *
- * If normal reaping is not inhibited, @task->exit_state might be changing
- * in parallel.
- *
- * Called without locks.
- */
-static inline void tracehook_report_death(struct task_struct *task,
- int signal, void *death_cookie,
- int group_dead)
-{
-}
-
#ifdef TIF_NOTIFY_RESUME
/**
* set_notify_resume - cause tracehook_notify_resume() to be called
diff --git a/kernel/exit.c b/kernel/exit.c
index e5cc056..d49134a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -169,7 +169,6 @@ void release_task(struct task_struct * p)
struct task_struct *leader;
int zap_leader;
repeat:
- tracehook_prepare_release_task(p);
/* don't need to get the RCU readlock here - the process is dead and
* can't be modifying its own credentials. But shut RCU-lockdep up */
rcu_read_lock();
@@ -179,7 +178,7 @@ repeat:
proc_flush_task(p);

write_lock_irq(&tasklist_lock);
- tracehook_finish_release_task(p);
+ ptrace_release_task(p);
__exit_signal(p);

/*
@@ -868,8 +867,6 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
wake_up_process(tsk->signal->group_exit_task);
write_unlock_irq(&tasklist_lock);

- tracehook_report_death(tsk, signal, cookie, group_dead);
-
/* If the process is dead, release it - nobody will wait for it */
if (signal == DEATH_REAP)
release_task(tsk);
@@ -924,7 +921,7 @@ NORET_TYPE void do_exit(long code)
*/
set_fs(USER_DS);

- tracehook_report_exit(&code);
+ ptrace_event(PTRACE_EVENT_EXIT, code);

validate_creds_for_do_exit(tsk);

diff --git a/kernel/fork.c b/kernel/fork.c
index 0276c30..d4f0dff 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1527,7 +1527,7 @@ long do_fork(unsigned long clone_flags,
freezer_do_not_count();
wait_for_completion(&vfork);
freezer_count();
- tracehook_report_vfork_done(p, nr);
+ ptrace_event(PTRACE_EVENT_VFORK_DONE, nr);
}
} else {
nr = PTR_ERR(p);
diff --git a/kernel/signal.c b/kernel/signal.c
index 0f33708..1550aee 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -87,7 +87,7 @@ static int sig_ignored(struct task_struct *t, int sig, int from_ancestor_ns)
/*
* Tracers may want to know about even ignored signals.
*/
- return !tracehook_consider_ignored_signal(t, sig);
+ return !t->ptrace;
}

/*
@@ -493,7 +493,8 @@ int unhandled_signal(struct task_struct *tsk, int sig)
return 1;
if (handler != SIG_IGN && handler != SIG_DFL)
return 0;
- return !tracehook_consider_fatal_signal(tsk, sig);
+ /* if ptraced, let the tracer determine */
+ return !tsk->ptrace;
}

/*
@@ -981,8 +982,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
if (sig_fatal(p, sig) &&
!(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
!sigismember(&t->real_blocked, sig) &&
- (sig == SIGKILL ||
- !tracehook_consider_fatal_signal(t, sig))) {
+ (sig == SIGKILL || !t->ptrace)) {
/*
* This signal will be fatal to the whole group.
*/
diff --git a/mm/nommu.c b/mm/nommu.c
index 1fd0c51..54ae707 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -22,7 +22,6 @@
#include <linux/pagemap.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
-#include <linux/tracehook.h>
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
#include <linux/mount.h>
@@ -1087,7 +1086,7 @@ static unsigned long determine_vm_flags(struct file *file,
* it's being traced - otherwise breakpoints set in it may interfere
* with another untraced process
*/
- if ((flags & MAP_PRIVATE) && tracehook_expect_breakpoints(current))
+ if ((flags & MAP_PRIVATE) && current->ptrace)
vm_flags &= ~VM_MAYSHARE;

return vm_flags;
--
1.7.5.4

2011-06-17 14:52:37

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 5/7] ptrace: kill clone/exec tracehooks

At this point, tracehooks aren't useful to mainline kernel and mostly
just add an extra layer of obfuscation. Although they have comments,
without actual in-kernel users, it is difficult to tell what are their
assumptions and they're actually trying to achieve. To mainline
kernel, they just aren't worth keeping around.

This patch kills the following clone and exec related tracehooks.

tracehook_prepare_clone()
tracehook_finish_clone()
tracehook_report_clone()
tracehook_report_clone_complete()
tracehook_unsafe_exec()

The changes are mostly trivial - logic is moved to the caller and
comments are merged and adjusted appropriately.

The only exception is in check_unsafe_exec() where LSM_UNSAFE_PTRACE*
are OR'd to bprm->unsafe instead of setting it, which produces the
same result as the field is always zero on entry. It also tests
p->ptrace instead of (p->ptrace & PT_PTRACED) for consistency, which
also gives the same result.

This doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Christoph Hellwig <[email protected]>
---
fs/exec.c | 7 ++-
include/linux/tracehook.h | 121 ---------------------------------------------
kernel/fork.c | 41 ++++++++++++---
3 files changed, 38 insertions(+), 131 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b37030d..8dca45b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1224,7 +1224,12 @@ int check_unsafe_exec(struct linux_binprm *bprm)
unsigned n_fs;
int res = 0;

- bprm->unsafe = tracehook_unsafe_exec(p);
+ if (p->ptrace) {
+ if (p->ptrace & PT_PTRACE_CAP)
+ bprm->unsafe |= LSM_UNSAFE_PTRACE_CAP;
+ else
+ bprm->unsafe |= LSM_UNSAFE_PTRACE;
+ }

n_fs = 1;
spin_lock(&p->fs->lock);
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 8b06d4f..bcc4ca7 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -130,27 +130,6 @@ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step)
}

/**
- * tracehook_unsafe_exec - check for exec declared unsafe due to tracing
- * @task: current task doing exec
- *
- * Return %LSM_UNSAFE_* bits applied to an exec because of tracing.
- *
- * @task->signal->cred_guard_mutex is held by the caller through the do_execve().
- */
-static inline int tracehook_unsafe_exec(struct task_struct *task)
-{
- int unsafe = 0;
- int ptrace = task->ptrace;
- if (ptrace & PT_PTRACED) {
- if (ptrace & PT_PTRACE_CAP)
- unsafe |= LSM_UNSAFE_PTRACE_CAP;
- else
- unsafe |= LSM_UNSAFE_PTRACE;
- }
- return unsafe;
-}
-
-/**
* tracehook_tracer_task - return the task that is tracing the given task
* @tsk: task to consider
*
@@ -169,106 +148,6 @@ static inline struct task_struct *tracehook_tracer_task(struct task_struct *tsk)
}

/**
- * tracehook_prepare_clone - prepare for new child to be cloned
- * @clone_flags: %CLONE_* flags from clone/fork/vfork system call
- *
- * This is called before a new user task is to be cloned.
- * Its return value will be passed to tracehook_finish_clone().
- *
- * Called with no locks held.
- */
-static inline int tracehook_prepare_clone(unsigned clone_flags)
-{
- int event = 0;
-
- if (clone_flags & CLONE_UNTRACED)
- return 0;
-
- if (clone_flags & CLONE_VFORK)
- event = PTRACE_EVENT_VFORK;
- else if ((clone_flags & CSIGNAL) != SIGCHLD)
- event = PTRACE_EVENT_CLONE;
- else
- event = PTRACE_EVENT_FORK;
-
- return ptrace_event_enabled(current, event) ? event : 0;
-}
-
-/**
- * tracehook_finish_clone - new child created and being attached
- * @child: new child task
- * @clone_flags: %CLONE_* flags from clone/fork/vfork system call
- * @trace: return value from tracehook_prepare_clone()
- *
- * This is called immediately after adding @child to its parent's children list.
- * The @trace value is that returned by tracehook_prepare_clone().
- *
- * Called with current's siglock and write_lock_irq(&tasklist_lock) held.
- */
-static inline void tracehook_finish_clone(struct task_struct *child,
- unsigned long clone_flags, int trace)
-{
- ptrace_init_task(child, (clone_flags & CLONE_PTRACE) || trace);
-}
-
-/**
- * tracehook_report_clone - in parent, new child is about to start running
- * @regs: parent's user register state
- * @clone_flags: flags from parent's system call
- * @pid: new child's PID in the parent's namespace
- * @child: new child task
- *
- * Called after a child is set up, but before it has been started running.
- * This is not a good place to block, because the child has not started
- * yet. Suspend the child here if desired, and then block in
- * tracehook_report_clone_complete(). This must prevent the child from
- * self-reaping if tracehook_report_clone_complete() uses the @child
- * pointer; otherwise it might have died and been released by the time
- * tracehook_report_clone_complete() is called.
- *
- * Called with no locks held, but the child cannot run until this returns.
- */
-static inline void tracehook_report_clone(struct pt_regs *regs,
- unsigned long clone_flags,
- pid_t pid, struct task_struct *child)
-{
- if (unlikely(child->ptrace)) {
- /*
- * It doesn't matter who attached/attaching to this
- * task, the pending SIGSTOP is right in any case.
- */
- sigaddset(&child->pending.signal, SIGSTOP);
- set_tsk_thread_flag(child, TIF_SIGPENDING);
- }
-}
-
-/**
- * tracehook_report_clone_complete - new child is running
- * @trace: return value from tracehook_prepare_clone()
- * @regs: parent's user register state
- * @clone_flags: flags from parent's system call
- * @pid: new child's PID in the parent's namespace
- * @child: child task, already running
- *
- * This is called just after the child has started running. This is
- * just before the clone/fork syscall returns, or blocks for vfork
- * child completion if @clone_flags has the %CLONE_VFORK bit set.
- * The @child pointer may be invalid if a self-reaping child died and
- * tracehook_report_clone() took no action to prevent it from self-reaping.
- *
- * Called with no locks held.
- */
-static inline void tracehook_report_clone_complete(int trace,
- struct pt_regs *regs,
- unsigned long clone_flags,
- pid_t pid,
- struct task_struct *child)
-{
- if (unlikely(trace))
- ptrace_event(trace, pid);
-}
-
-/**
* tracehook_signal_handler - signal handler setup is complete
* @sig: number of signal being delivered
* @info: siginfo_t of signal being delivered
diff --git a/kernel/fork.c b/kernel/fork.c
index d4f0dff..3c72a5b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1340,7 +1340,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
}

if (likely(p->pid)) {
- tracehook_finish_clone(p, clone_flags, trace);
+ ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace);

if (thread_group_leader(p)) {
if (is_child_reaper(pid))
@@ -1481,10 +1481,22 @@ long do_fork(unsigned long clone_flags,
}

/*
- * When called from kernel_thread, don't do user tracing stuff.
+ * Determine whether and which event to report to ptracer. When
+ * called from kernel_thread or CLONE_UNTRACED is explicitly
+ * requested, no event is reported; otherwise, report if the event
+ * for the type of forking is enabled.
*/
- if (likely(user_mode(regs)))
- trace = tracehook_prepare_clone(clone_flags);
+ if (likely(user_mode(regs)) && !(clone_flags & CLONE_UNTRACED)) {
+ if (clone_flags & CLONE_VFORK)
+ trace = PTRACE_EVENT_VFORK;
+ else if ((clone_flags & CSIGNAL) != SIGCHLD)
+ trace = PTRACE_EVENT_CLONE;
+ else
+ trace = PTRACE_EVENT_FORK;
+
+ if (likely(!ptrace_event_enabled(current, trace)))
+ trace = 0;
+ }

p = copy_process(clone_flags, stack_start, regs, stack_size,
child_tidptr, NULL, trace);
@@ -1508,20 +1520,31 @@ long do_fork(unsigned long clone_flags,
}

audit_finish_fork(p);
- tracehook_report_clone(regs, clone_flags, nr, p);
+
+ /*
+ * Child is ready but hasn't started running yet. Queue
+ * SIGSTOP if it's gonna be ptraced - it doesn't matter who
+ * attached/attaching to this task, the pending SIGSTOP is
+ * right in any case.
+ */
+ if (unlikely(p->ptrace)) {
+ sigaddset(&p->pending.signal, SIGSTOP);
+ set_tsk_thread_flag(p, TIF_SIGPENDING);
+ }

/*
* We set PF_STARTING at creation in case tracing wants to
* use this to distinguish a fully live task from one that
- * hasn't gotten to tracehook_report_clone() yet. Now we
- * clear it and set the child going.
+ * hasn't finished SIGSTOP raising yet. Now we clear it
+ * and set the child going.
*/
p->flags &= ~PF_STARTING;

wake_up_new_task(p);

- tracehook_report_clone_complete(trace, regs,
- clone_flags, nr, p);
+ /* forking complete and child started to run, tell ptracer */
+ if (unlikely(trace))
+ ptrace_event(trace, nr);

if (clone_flags & CLONE_VFORK) {
freezer_do_not_count();
--
1.7.5.4

2011-06-17 14:52:35

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 6/7] ptrace: kill detah tracehooks

This patch kills tracehook_notify_death() by reimplementing the logic
in its caller - exit_notify(). I opted for re-implementation as the
existing code was a bit difficult to wrap one's head around.

The reimplemented logic yields the same result in all combinations of
task_detached(), thread_group_empty(), ptrace_reparented() and
->ptrace.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Christoph Hellwig <[email protected]>
---
include/linux/tracehook.h | 34 ----------------------------------
kernel/exit.c | 28 +++++++++++++++++++++-------
2 files changed, 21 insertions(+), 41 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index bcc4ca7..fea0d4b 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -170,40 +170,6 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info,
ptrace_notify(SIGTRAP);
}

-#define DEATH_REAP -1
-#define DEATH_DELAYED_GROUP_LEADER -2
-
-/**
- * tracehook_notify_death - task is dead, ready to notify parent
- * @task: @current task now exiting
- * @death_cookie: value to pass to tracehook_report_death()
- * @group_dead: nonzero if this was the last thread in the group to die
- *
- * A return value >= 0 means call do_notify_parent() with that signal
- * number. Negative return value can be %DEATH_REAP to self-reap right
- * now, or %DEATH_DELAYED_GROUP_LEADER to a zombie without notifying our
- * parent. Note that a return value of 0 means a do_notify_parent() call
- * that sends no signal, but still wakes up a parent blocked in wait*().
- *
- * Called with write_lock_irq(&tasklist_lock) held.
- */
-static inline int tracehook_notify_death(struct task_struct *task,
- void **death_cookie, int group_dead)
-{
- if (task_detached(task))
- return task->ptrace ? SIGCHLD : DEATH_REAP;
-
- /*
- * If something other than our normal parent is ptracing us, then
- * send it a SIGCHLD instead of honoring exit_signal. exit_signal
- * only has special meaning to our real parent.
- */
- if (thread_group_empty(task) && !ptrace_reparented(task))
- return task->exit_signal;
-
- return task->ptrace ? SIGCHLD : DEATH_DELAYED_GROUP_LEADER;
-}
-
#ifdef TIF_NOTIFY_RESUME
/**
* set_notify_resume - cause tracehook_notify_resume() to be called
diff --git a/kernel/exit.c b/kernel/exit.c
index d49134a..f3c49ca 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -819,8 +819,7 @@ static void forget_original_parent(struct task_struct *father)
*/
static void exit_notify(struct task_struct *tsk, int group_dead)
{
- int signal;
- void *cookie;
+ bool dead = false;

/*
* This does two things:
@@ -856,11 +855,26 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
tsk->self_exec_id != tsk->parent_exec_id))
tsk->exit_signal = SIGCHLD;

- signal = tracehook_notify_death(tsk, &cookie, group_dead);
- if (signal >= 0)
- signal = do_notify_parent(tsk, signal);
+ /*
+ * Notify interested parent if the whole group is dead. Ptracer
+ * should always be notified. If something other than our normal
+ * parent is ptracing us, then send it a SIGCHLD instead of
+ * honoring exit_signal. exit_signal only has special meaning to
+ * our real parent.
+ */
+ if (!task_detached(tsk) && thread_group_empty(tsk)) {
+ if (!ptrace_reparented(tsk))
+ do_notify_parent(tsk, tsk->exit_signal);
+ else
+ do_notify_parent(tsk, SIGCHLD);
+ } else {
+ if (unlikely(tsk->ptrace))
+ do_notify_parent(tsk, SIGCHLD);
+ else if (task_detached(tsk))
+ dead = true;
+ }

- tsk->exit_state = signal == DEATH_REAP ? EXIT_DEAD : EXIT_ZOMBIE;
+ tsk->exit_state = dead ? EXIT_DEAD : EXIT_ZOMBIE;

/* mt-exec, de_thread() is waiting for group leader */
if (unlikely(tsk->signal->notify_count < 0))
@@ -868,7 +882,7 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
write_unlock_irq(&tasklist_lock);

/* If the process is dead, release it - nobody will wait for it */
- if (signal == DEATH_REAP)
+ if (dead)
release_task(tsk);
}

--
1.7.5.4

2011-06-17 14:52:32

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 7/7] ptrace: s/tracehook_tracer_task()/ptrace_parent()/

tracehook.h is on the way out. Rename tracehook_tracer_task() to
ptrace_parent() and move it from tracehook.h to ptrace.h.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: John Johansen <[email protected]>
Cc: Stephen Smalley <[email protected]>
---
fs/proc/array.c | 2 +-
fs/proc/base.c | 2 +-
include/linux/ptrace.h | 18 ++++++++++++++++++
include/linux/tracehook.h | 18 ------------------
security/apparmor/domain.c | 2 +-
security/selinux/hooks.c | 4 ++--
6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 9b45ee8..3a1dafd 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -172,7 +172,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
task_tgid_nr_ns(rcu_dereference(p->real_parent), ns) : 0;
tpid = 0;
if (pid_alive(p)) {
- struct task_struct *tracer = tracehook_tracer_task(p);
+ struct task_struct *tracer = ptrace_parent(p);
if (tracer)
tpid = task_pid_nr_ns(tracer, ns);
}
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 14def99..c883dad 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -216,7 +216,7 @@ static struct mm_struct *__check_mem_permission(struct task_struct *task)
if (task_is_stopped_or_traced(task)) {
int match;
rcu_read_lock();
- match = (tracehook_tracer_task(task) == current);
+ match = (ptrace_parent(task) == current);
rcu_read_unlock();
if (match && ptrace_may_access(task, PTRACE_MODE_ATTACH))
return mm;
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index b546fd6..bb157bd 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -151,6 +151,24 @@ int generic_ptrace_pokedata(struct task_struct *tsk, unsigned long addr,
unsigned long data);

/**
+ * ptrace_parent - return the task that is tracing the given task
+ * @task: task to consider
+ *
+ * Returns %NULL if no one is tracing @task, or the &struct task_struct
+ * pointer to its tracer.
+ *
+ * Must called under rcu_read_lock(). The pointer returned might be kept
+ * live only by RCU. During exec, this may be called with task_lock() held
+ * on @task, still held from when check_unsafe_exec() was called.
+ */
+static inline struct task_struct *ptrace_parent(struct task_struct *task)
+{
+ if (unlikely(task->ptrace))
+ return rcu_dereference(task->parent);
+ return NULL;
+}
+
+/**
* ptrace_event_enabled - test whether a ptrace event is enabled
* @task: ptracee of interest
* @event: %PTRACE_EVENT_* to test
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index fea0d4b..a71a292 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -130,24 +130,6 @@ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step)
}

/**
- * tracehook_tracer_task - return the task that is tracing the given task
- * @tsk: task to consider
- *
- * Returns NULL if no one is tracing @task, or the &struct task_struct
- * pointer to its tracer.
- *
- * Must called under rcu_read_lock(). The pointer returned might be kept
- * live only by RCU. During exec, this may be called with task_lock()
- * held on @task, still held from when tracehook_unsafe_exec() was called.
- */
-static inline struct task_struct *tracehook_tracer_task(struct task_struct *tsk)
-{
- if (tsk->ptrace & PT_PTRACED)
- return rcu_dereference(tsk->parent);
- return NULL;
-}
-
-/**
* tracehook_signal_handler - signal handler setup is complete
* @sig: number of signal being delivered
* @info: siginfo_t of signal being delivered
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index c825c6e..7312bf9 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -67,7 +67,7 @@ static int may_change_ptraced_domain(struct task_struct *task,
int error = 0;

rcu_read_lock();
- tracer = tracehook_tracer_task(task);
+ tracer = ptrace_parent(task);
if (tracer) {
/* released below */
cred = get_task_cred(tracer);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a0d3845..fc07d18 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2048,7 +2048,7 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
u32 ptsid = 0;

rcu_read_lock();
- tracer = tracehook_tracer_task(current);
+ tracer = ptrace_parent(current);
if (likely(tracer != NULL)) {
sec = __task_cred(tracer)->security;
ptsid = sec->sid;
@@ -5314,7 +5314,7 @@ static int selinux_setprocattr(struct task_struct *p,
Otherwise, leave SID unchanged and fail. */
ptsid = 0;
task_lock(p);
- tracer = tracehook_tracer_task(p);
+ tracer = ptrace_parent(p);
if (tracer)
ptsid = task_sid(tracer);
task_unlock(p);
--
1.7.5.4

2011-06-20 11:17:05

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCHSET] ptrace: kill most tracehooks

On Fri, Jun 17, 2011 at 04:50:33PM +0200, Tejun Heo wrote:
> Hello,
>
> At this point, tracehooks aren't useful to mainline kernel and mostly
> just add an extra layer of obfuscation. Although they have comments,
> without actual in-kernel users, it is difficult to tell what are their
> assumptions and they're actually trying to achieve. To mainline
> kernel, they just aren't worth keeping around.
>
> This patchset kills most tracehooks which aren't used from arch codes.
> The remaining ones will be dealt with with future changes.

Thanks, this makes the whole ptrace interaction a lot more sensible and
actually possible to follow.

2011-06-20 19:41:26

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 6/7] ptrace: kill detah tracehooks

On 06/17, Tejun Heo wrote:
>
> This patch kills tracehook_notify_death() by reimplementing the logic
> in its caller - exit_notify(). I opted for re-implementation as the
> existing code was a bit difficult to wrap one's head around.

I always hated this particular tracehook much more than others ;)

But the patch doesn't look right, please see below.

> - signal = tracehook_notify_death(tsk, &cookie, group_dead);
> - if (signal >= 0)
> - signal = do_notify_parent(tsk, signal);
> + /*
> + * Notify interested parent if the whole group is dead. Ptracer
> + * should always be notified. If something other than our normal
> + * parent is ptracing us, then send it a SIGCHLD instead of
> + * honoring exit_signal. exit_signal only has special meaning to
> + * our real parent.
> + */
> + if (!task_detached(tsk) && thread_group_empty(tsk)) {
> + if (!ptrace_reparented(tsk))
> + do_notify_parent(tsk, tsk->exit_signal);
> + else
> + do_notify_parent(tsk, SIGCHLD);

Forget about ptrace, suppose a single threaded child exits and its
parent ignores SIGCHLD. In this case do_notify_parent() sets
exit_signal = -1 to mark it dead, we should set exit_state = EXIT_DEAD
and release it.


Note! This looks very ugly, I wanted to cleanup this a long ago.
I think we should never change ->exit_signal, and do_notify_parent()
should return a boolean. I'll try to make the patches tomorrow.
Then _perhaps_ this patch will become simpler.

Oleg.

2011-06-20 20:18:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 7/7] ptrace: s/tracehook_tracer_task()/ptrace_parent()/

On 06/17, Tejun Heo wrote:
>
> tracehook.h is on the way out. Rename tracehook_tracer_task() to
> ptrace_parent() and move it from tracehook.h to ptrace.h.

I am a bit surpised you decided to keep this helper. Can't we simply
kill it?

OK, we will see. I guess this change is mostly needed to remove yet
another function from tracehook.h.

> @@ -216,7 +216,7 @@ static struct mm_struct *__check_mem_permission(struct task_struct *task)
> if (task_is_stopped_or_traced(task)) {
> int match;
> rcu_read_lock();
> - match = (tracehook_tracer_task(task) == current);
> + match = (ptrace_parent(task) == current);
> rcu_read_unlock();
> if (match && ptrace_may_access(task, PTRACE_MODE_ATTACH))

All we need

if (task_is_traced(task) && task->parent == current) {
if (ptrace_may_access()
return mm;
}

Of course I do not blame this patch, my only point is that this helper
only adds more confusion imho.




> @@ -67,7 +67,7 @@ static int may_change_ptraced_domain(struct task_struct *task,
> int error = 0;
>
> rcu_read_lock();
> - tracer = tracehook_tracer_task(task);
> + tracer = ptrace_parent(task);
> if (tracer) {
> /* released below */
> cred = get_task_cred(tracer);

Hmm. And then this task_struct is used after we dropped rcu_read_lock().

John, is this correct?

Oleg.

2011-06-20 20:26:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/7] ptrace: move SIGTRAP on exec(2) logic to ptrace_event()

On 06/17, Tejun Heo wrote:
>
> Move SIGTRAP on exec(2) logic from tracehook_report_exec() to
> ptrace_event(). This is part of changes to make ptrace_event()
> smarter and handle ptrace event related details in one place.

I am going to apply all patches except 6/7, but to be honest this
one looks a bit ugly to me.

> -static inline int ptrace_event(int event, unsigned long message)
> +static inline void ptrace_event(int event, unsigned long message)
> {
> - if (likely(!ptrace_event_enabled(current, event)))
> - return false;
> - current->ptrace_message = message;
> - ptrace_notify((event << 8) | SIGTRAP);
> - return true;
> + if (unlikely(ptrace_event_enabled(current, event))) {
> + current->ptrace_message = message;
> + ptrace_notify((event << 8) | SIGTRAP);
> + } else if (event == PTRACE_EVENT_EXEC && unlikely(current->ptrace)) {
> + /* legacy EXEC report via SIGTRAP */
> + send_sig(SIGTRAP, current, 0);
> + }

Why does this make ptrace_event() smarter?

OK, tracehooks should die. But why should we move this special case
into ptrace_event? Say, a simple

static inline void ptrace_exec_event(...)
{
if (!ptrace_event_enabled(PTRACE_EVENT_EXEC))
send_sig(SIGTRAP, current, 0);
else
ptrace_event(PTRACE_EVENT_EXEC);
}

in fs/exec.c looks a bit better to me.

Oleg.

2011-06-20 20:35:43

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 5/7] ptrace: kill clone/exec tracehooks

On 06/17, Tejun Heo wrote:
>
> At this point, tracehooks aren't useful to mainline kernel and mostly
> just add an extra layer of obfuscation.

Yes, this is true.

> @@ -1481,10 +1481,22 @@ long do_fork(unsigned long clone_flags,
> }
>
> /*
> - * When called from kernel_thread, don't do user tracing stuff.
> + * Determine whether and which event to report to ptracer. When
> + * called from kernel_thread or CLONE_UNTRACED is explicitly
> + * requested, no event is reported; otherwise, report if the event
> + * for the type of forking is enabled.
> */
> - if (likely(user_mode(regs)))
> - trace = tracehook_prepare_clone(clone_flags);
> + if (likely(user_mode(regs)) && !(clone_flags & CLONE_UNTRACED)) {

Off-topic, but I never understood this user_mode() check... OK, a
traced task can call kernel_thread() in kernel, but this used
CLONE_UNTRACED.

> + if (clone_flags & CLONE_VFORK)
> + trace = PTRACE_EVENT_VFORK;
> + else if ((clone_flags & CSIGNAL) != SIGCHLD)
> + trace = PTRACE_EVENT_CLONE;
> + else
> + trace = PTRACE_EVENT_FORK;
> +
> + if (likely(!ptrace_event_enabled(current, trace)))
> + trace = 0;
> + }

As I said am going to apply this all except 6/7, but imho this
deserves a trivial helper anyway.

Oleg.

2011-06-21 07:21:19

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/7] ptrace: move SIGTRAP on exec(2) logic to ptrace_event()

Hello,

On Mon, Jun 20, 2011 at 10:25 PM, Oleg Nesterov <[email protected]> wrote:
> Why does this make ptrace_event() smarter?
>
> OK, tracehooks should die. But why should we move this special case
> into ptrace_event? Say, a simple
>
> ? ? ? ?static inline void ptrace_exec_event(...)
> ? ? ? ?{
> ? ? ? ? ? ? ? ?if (!ptrace_event_enabled(PTRACE_EVENT_EXEC))
> ? ? ? ? ? ? ? ? ? ? ? ?send_sig(SIGTRAP, current, 0);
> ? ? ? ? ? ? ? ?else
> ? ? ? ? ? ? ? ? ? ? ? ?ptrace_event(PTRACE_EVENT_EXEC);
> ? ? ? ?}
>
> in fs/exec.c looks a bit better to me.

The intention is to concentrate ptrace specific logic in
ptrace_event(). We'll have more of them, mostly dependent on
PT_SEIZED and I don't think it's a good idea to scatter them across
the kernel. They're of no interest outside of ptrace after all. I
think it's better to have them collected in one place than scattered
around. The PT_SEIZED ones would probably have some commonalities
among them too.

Thanks.

--
tejun

2011-06-21 07:24:24

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 5/7] ptrace: kill clone/exec tracehooks

Hello,

On Mon, Jun 20, 2011 at 10:33 PM, Oleg Nesterov <[email protected]> wrote:
>> + ? ? if (likely(user_mode(regs)) && !(clone_flags & CLONE_UNTRACED)) {
>
> Off-topic, but I never understood this user_mode() check... OK, a
> traced task can call kernel_thread() in kernel, but this used
> CLONE_UNTRACED.

Me neither. I suppose it's added as a safety net. Maybe we can turn
it into WARN_ON(!user_mode) && !CLONE_UNTRACED) but just leaving it
alone doesn't sound like a bad idea.

>> + ? ? ? ? ? ? if (clone_flags & CLONE_VFORK)
>> + ? ? ? ? ? ? ? ? ? ? trace = PTRACE_EVENT_VFORK;
>> + ? ? ? ? ? ? else if ((clone_flags & CSIGNAL) != SIGCHLD)
>> + ? ? ? ? ? ? ? ? ? ? trace = PTRACE_EVENT_CLONE;
>> + ? ? ? ? ? ? else
>> + ? ? ? ? ? ? ? ? ? ? trace = PTRACE_EVENT_FORK;
>> +
>> + ? ? ? ? ? ? if (likely(!ptrace_event_enabled(current, trace)))
>> + ? ? ? ? ? ? ? ? ? ? trace = 0;
>> + ? ? }
>
> As I said am going to apply this all except 6/7, but imho this
> deserves a trivial helper anyway.

Yeah, maybe. Preference, I guess.

Thanks.

--
tejun

2011-06-21 11:44:48

by John Johansen

[permalink] [raw]
Subject: Re: [PATCH 7/7] ptrace: s/tracehook_tracer_task()/ptrace_parent()/

On 06/20/2011 01:16 PM, Oleg Nesterov wrote:
> On 06/17, Tejun Heo wrote:
>>
>> tracehook.h is on the way out. Rename tracehook_tracer_task() to
>> ptrace_parent() and move it from tracehook.h to ptrace.h.
>
> I am a bit surpised you decided to keep this helper. Can't we simply
> kill it?
>
> OK, we will see. I guess this change is mostly needed to remove yet
> another function from tracehook.h.
>
>> @@ -216,7 +216,7 @@ static struct mm_struct *__check_mem_permission(struct task_struct *task)
>> if (task_is_stopped_or_traced(task)) {
>> int match;
>> rcu_read_lock();
>> - match = (tracehook_tracer_task(task) == current);
>> + match = (ptrace_parent(task) == current);
>> rcu_read_unlock();
>> if (match && ptrace_may_access(task, PTRACE_MODE_ATTACH))
>
> All we need
>
> if (task_is_traced(task) && task->parent == current) {
> if (ptrace_may_access()
> return mm;
> }
>
> Of course I do not blame this patch, my only point is that this helper
> only adds more confusion imho.
>
>
>
>
>> @@ -67,7 +67,7 @@ static int may_change_ptraced_domain(struct task_struct *task,
>> int error = 0;
>>
>> rcu_read_lock();
>> - tracer = tracehook_tracer_task(task);
>> + tracer = ptrace_parent(task);
>> if (tracer) {
>> /* released below */
>> cred = get_task_cred(tracer);
>
> Hmm. And then this task_struct is used after we dropped rcu_read_lock().
>
> John, is this correct?
>
nope this use is wrong. The following patch should fix this

===

AppArmor: Fix reference to rcu protected pointer outside of rcu_read_lock

The pointer returned from tracehook_tracer_task() is only valid inside
the rcu_read_lock. However the tracer pointer obtained is being passed
to aa_may_ptrace outside of the rcu_read_lock critical section.

Mover the aa_may_ptrace test into the rcu_read_lock critical section, to
fix this.

Signed-off-by: John Johansen <[email protected]>
---
security/apparmor/domain.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index c825c6e..78adc43 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -73,7 +73,6 @@ static int may_change_ptraced_domain(struct task_struct *task,
cred = get_task_cred(tracer);
tracerp = aa_cred_profile(cred);
}
- rcu_read_unlock();

/* not ptraced */
if (!tracer || unconfined(tracerp))
@@ -82,6 +81,7 @@ static int may_change_ptraced_domain(struct task_struct *task,
error = aa_may_ptrace(tracer, tracerp, to_profile, PTRACE_MODE_ATTACH);

out:
+ rcu_read_unlock();
if (cred)
put_cred(cred);

--
1.7.4.1

2011-06-21 20:26:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 6/7] ptrace: kill detah tracehooks

On 06/20, Oleg Nesterov wrote:
>
> Note! This looks very ugly, I wanted to cleanup this a long ago.
> I think we should never change ->exit_signal, and do_notify_parent()
> should return a boolean. I'll try to make the patches tomorrow.
> Then _perhaps_ this patch will become simpler.

Damn. Tomorrow (I promise ;) I'll send the full series. IMHO, we really
need to cleanup the do_notify_parent/task_detached logic, and exit_signal
should be "const"

But, to remove this tracehook, we only need the patch below and then
exit_notify() should do:

if (unlikely(tsk->ptrace)) {
int sig = ptrace_reparented(tsk) || task_detached(tsk) ?
SIGCHLD : tsk->exit_signal;
autoreap = do_notify_parent(tsk, sig);
WARN_ON(autoreap);
} else if (thread_group_leader(tsk)) {
autoreap = thread_group_empty(tsk) &&
do_notify_parent(tsk, tsk->exit_signal);
} else {
autoreap = true;
}

tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;

Note the WARN_ON() above, perhaps instead we should autoreap if the
tracer is the real parent and it ignores SIGCHLD.

And. ptrace_reparented() should be changed to check same_thread_group(),
I think. This also looks fine for wait_task_zombie().

Oleg.

-----------------------------------------------------------------------------
[PATCH 3/XXX] make do_notify_parent() return bool

- change do_notify_parent() to return a boolean, true if the task should
be reaped because its parent ignores SIGCHLD.

- update the only caller which checks the returned value, exit_notify(),
to rely DEATH_REAP only if we do not call do_notify_parent().

This temporary uglifies exit_notify() even more, will be cleanuped by
the next change.

Signed-off-by: Oleg Nesterov <[email protected]>
---

--- ptrace/include/linux/sched.h~2_do_notify_parent_bool 2011-06-17 20:12:29.000000000 +0200
+++ ptrace/include/linux/sched.h 2011-06-21 19:18:00.000000000 +0200
@@ -2146,7 +2146,7 @@ static inline int dequeue_signal_lock(st
spin_unlock_irqrestore(&tsk->sighand->siglock, flags);

return ret;
-}
+}

extern void block_all_signals(int (*notifier)(void *priv), void *priv,
sigset_t *mask);
@@ -2161,7 +2161,7 @@ extern int kill_pid_info_as_uid(int, str
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
extern int kill_proc_info(int, struct siginfo *, pid_t);
-extern int do_notify_parent(struct task_struct *, int);
+extern bool do_notify_parent(struct task_struct *, int);
extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
extern void force_sig(int, struct task_struct *);
extern int send_sig(int, struct task_struct *, int);
--- ptrace/kernel/signal.c~2_do_notify_parent_bool 2011-06-20 20:40:49.000000000 +0200
+++ ptrace/kernel/signal.c 2011-06-21 20:29:57.000000000 +0200
@@ -1577,15 +1577,15 @@ ret:
* Let a parent know about the death of a child.
* For a stopped/continued status change, use do_notify_parent_cldstop instead.
*
- * Returns -1 if our parent ignored us and so we've switched to
- * self-reaping, or else @sig.
+ * Returns true if our parent ignored us and so we've switched to
+ * self-reaping.
*/
-int do_notify_parent(struct task_struct *tsk, int sig)
+bool do_notify_parent(struct task_struct *tsk, int sig)
{
struct siginfo info;
unsigned long flags;
struct sighand_struct *psig;
- int ret = sig;
+ bool autoreap = false;

BUG_ON(sig == -1);

@@ -1649,16 +1649,17 @@ int do_notify_parent(struct task_struct
* is implementation-defined: we do (if you don't want
* it, just use SIG_IGN instead).
*/
- ret = tsk->exit_signal = -1;
+ autoreap = true;
+ tsk->exit_signal = -1;
if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
- sig = -1;
+ sig = 0;
}
- if (valid_signal(sig) && sig > 0)
+ if (valid_signal(sig) && sig)
__group_send_sig_info(sig, &info, tsk->parent);
__wake_up_parent(tsk, tsk->parent);
spin_unlock_irqrestore(&psig->siglock, flags);

- return ret;
+ return autoreap;
}

/**
--- ptrace/kernel/exit.c~2_do_notify_parent_bool 2011-06-21 18:36:24.000000000 +0200
+++ ptrace/kernel/exit.c 2011-06-21 20:37:24.000000000 +0200
@@ -821,6 +821,7 @@ static void forget_original_parent(struc
static void exit_notify(struct task_struct *tsk, int group_dead)
{
int signal;
+ bool autoreap;
void *cookie;

/*
@@ -859,9 +860,11 @@ static void exit_notify(struct task_stru

signal = tracehook_notify_death(tsk, &cookie, group_dead);
if (signal >= 0)
- signal = do_notify_parent(tsk, signal);
+ autoreap = do_notify_parent(tsk, signal);
+ else
+ autoreap = (signal == DEATH_REAP);

- tsk->exit_state = signal == DEATH_REAP ? EXIT_DEAD : EXIT_ZOMBIE;
+ tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;

/* mt-exec, de_thread() is waiting for group leader */
if (unlikely(tsk->signal->notify_count < 0))
@@ -869,7 +872,7 @@ static void exit_notify(struct task_stru
write_unlock_irq(&tasklist_lock);

/* If the process is dead, release it - nobody will wait for it */
- if (signal == DEATH_REAP)
+ if (autoreap)
release_task(tsk);
}

2011-06-21 20:42:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/7] ptrace: move SIGTRAP on exec(2) logic to ptrace_event()

On 06/21, Tejun Heo wrote:
>
> Hello,
>
> On Mon, Jun 20, 2011 at 10:25 PM, Oleg Nesterov <[email protected]> wrote:
> > Why does this make ptrace_event() smarter?
> >
> > OK, tracehooks should die. But why should we move this special case
> > into ptrace_event? Say, a simple
> >
> > ? ? ? ?static inline void ptrace_exec_event(...)
> > ? ? ? ?{
> > ? ? ? ? ? ? ? ?if (!ptrace_event_enabled(PTRACE_EVENT_EXEC))
> > ? ? ? ? ? ? ? ? ? ? ? ?send_sig(SIGTRAP, current, 0);
> > ? ? ? ? ? ? ? ?else
> > ? ? ? ? ? ? ? ? ? ? ? ?ptrace_event(PTRACE_EVENT_EXEC);
> > ? ? ? ?}
> >
> > in fs/exec.c looks a bit better to me.
>
> The intention is to concentrate ptrace specific logic in
> ptrace_event(). We'll have more of them, mostly dependent on
> PT_SEIZED and I don't think it's a good idea to scatter them across
> the kernel. They're of no interest outside of ptrace after all. I
> think it's better to have them collected in one place than scattered
> around.

This was one of the reasons for tracehooks ;)

OK, we can move this helper to ptrace.h although I do not think this
makes sense. As for "scattered around", imho the code which calculates
trace in do_fork() falls into the same category.

I still can't understand why ptrace_event() should check EVENT_EXEC.
This is the special case, it should be handled specially. And while
I think this is not that important, this is not friendly to do_fork,
compiler has to generate the code to check event.

But OK, I applied 1-5 and 7. This is minor, and we can reconsider this
later. I mean, right now I think I'll send the cleanup later, and you
will have to explain your nack ;)

Oleg.

2011-06-22 21:10:16

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/8] kill task_detached() (Was: ptrace: kill detah tracehooks)

Remove tracehook_notify_death(), and kill task_detached() logic.

IMHO, task_detached() is very strange and confusing. Sometimes it used
instead of !thread_group_leader() when the task is live. Sometimes it
is used to detect the EXIT_DEAD tasks (reparent_leader). Sometimes it
used after do_notify_parent() which sets ->exit_signal = -1 to signal
the caller it should set EXIT_DEAD.

Basically, this series does:

- s/task_detached/thread_group_leader/ when the caller need
to know if it is the group_leader or not

- s/task_detached/EXIT_DEAD/ when the caller needs to filter
out the EXIT_DEAD tasks

- changes do_notify_parent() to return a boolean and converts
the callers from

do_notify_parent(p);
if (task_detached(p))
// aha, do_notify_parent() changed ->exit_signal
release_task(p)

to
if (do_notify_parent(p))
release_task(p)

- kills the now unused task_detached()

- makes task->exit_signal "const" and redefines thread_group_leader()
for micro-optimization.


fs/exec.c | 1 +
include/linux/sched.h | 15 ++++-------
include/linux/tracehook.h | 34 --------------------------
kernel/exit.c | 58 +++++++++++++++++++-------------------------
kernel/ptrace.c | 32 +++++++++++++-----------
kernel/signal.c | 16 ++++++------
6 files changed, 57 insertions(+), 99 deletions(-)

Oleg.

2011-06-22 21:10:30

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/8] make do_notify_parent() return bool

- change do_notify_parent() to return a boolean, true if the task should
be reaped because its parent ignores SIGCHLD.

- update the only caller which checks the returned value, exit_notify().

This temporary uglifies exit_notify() even more, will be cleanuped by
the next change.

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/sched.h | 4 ++--
kernel/signal.c | 17 +++++++++--------
kernel/exit.c | 9 ++++++---
3 files changed, 17 insertions(+), 13 deletions(-)

--- ptrace/include/linux/sched.h~1_do_notify_parent_bool 2011-06-22 22:47:05.000000000 +0200
+++ ptrace/include/linux/sched.h 2011-06-22 22:47:11.000000000 +0200
@@ -2146,7 +2146,7 @@ static inline int dequeue_signal_lock(st
spin_unlock_irqrestore(&tsk->sighand->siglock, flags);

return ret;
-}
+}

extern void block_all_signals(int (*notifier)(void *priv), void *priv,
sigset_t *mask);
@@ -2161,7 +2161,7 @@ extern int kill_pid_info_as_uid(int, str
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
extern int kill_proc_info(int, struct siginfo *, pid_t);
-extern int do_notify_parent(struct task_struct *, int);
+extern bool do_notify_parent(struct task_struct *, int);
extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
extern void force_sig(int, struct task_struct *);
extern int send_sig(int, struct task_struct *, int);
--- ptrace/kernel/signal.c~1_do_notify_parent_bool 2011-06-22 22:47:05.000000000 +0200
+++ ptrace/kernel/signal.c 2011-06-22 22:47:11.000000000 +0200
@@ -1577,15 +1577,15 @@ ret:
* Let a parent know about the death of a child.
* For a stopped/continued status change, use do_notify_parent_cldstop instead.
*
- * Returns -1 if our parent ignored us and so we've switched to
- * self-reaping, or else @sig.
+ * Returns true if our parent ignored us and so we've switched to
+ * self-reaping.
*/
-int do_notify_parent(struct task_struct *tsk, int sig)
+bool do_notify_parent(struct task_struct *tsk, int sig)
{
struct siginfo info;
unsigned long flags;
struct sighand_struct *psig;
- int ret = sig;
+ bool autoreap = false;

BUG_ON(sig == -1);

@@ -1649,16 +1649,17 @@ int do_notify_parent(struct task_struct
* is implementation-defined: we do (if you don't want
* it, just use SIG_IGN instead).
*/
- ret = tsk->exit_signal = -1;
+ autoreap = true;
+ tsk->exit_signal = -1;
if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
- sig = -1;
+ sig = 0;
}
- if (valid_signal(sig) && sig > 0)
+ if (valid_signal(sig) && sig)
__group_send_sig_info(sig, &info, tsk->parent);
__wake_up_parent(tsk, tsk->parent);
spin_unlock_irqrestore(&psig->siglock, flags);

- return ret;
+ return autoreap;
}

/**
--- ptrace/kernel/exit.c~1_do_notify_parent_bool 2011-06-22 22:47:05.000000000 +0200
+++ ptrace/kernel/exit.c 2011-06-22 22:47:11.000000000 +0200
@@ -819,6 +819,7 @@ static void forget_original_parent(struc
static void exit_notify(struct task_struct *tsk, int group_dead)
{
int signal;
+ bool autoreap;
void *cookie;

/*
@@ -857,9 +858,11 @@ static void exit_notify(struct task_stru

signal = tracehook_notify_death(tsk, &cookie, group_dead);
if (signal >= 0)
- signal = do_notify_parent(tsk, signal);
+ autoreap = do_notify_parent(tsk, signal);
+ else
+ autoreap = (signal == DEATH_REAP);

- tsk->exit_state = signal == DEATH_REAP ? EXIT_DEAD : EXIT_ZOMBIE;
+ tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;

/* mt-exec, de_thread() is waiting for group leader */
if (unlikely(tsk->signal->notify_count < 0))
@@ -867,7 +870,7 @@ static void exit_notify(struct task_stru
write_unlock_irq(&tasklist_lock);

/* If the process is dead, release it - nobody will wait for it */
- if (signal == DEATH_REAP)
+ if (autoreap)
release_task(tsk);
}

2011-06-22 21:11:17

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/8] kill tracehook_notify_death()

Kill tracehook_notify_death(), reimplement the logic in its caller,
exit_notify().

This also fixes a minor bug, if the exiting task is the group_leader
and it is traced by its real_parent, tracehook_notify_death() returns
task->exit_signal or SIGCHLD depending on thread_group_empty(), this
looks strange.

Also, change the exec_id check to use thread_group_leader() instead
of task_detached(), this looks more clear.

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/tracehook.h | 34 ----------------------------------
kernel/exit.c | 19 +++++++++++--------
2 files changed, 11 insertions(+), 42 deletions(-)

--- ptrace/include/linux/tracehook.h~2_kill_notify_death 2011-06-22 22:47:04.000000000 +0200
+++ ptrace/include/linux/tracehook.h 2011-06-22 22:47:11.000000000 +0200
@@ -152,40 +152,6 @@ static inline void tracehook_signal_hand
ptrace_notify(SIGTRAP);
}

-#define DEATH_REAP -1
-#define DEATH_DELAYED_GROUP_LEADER -2
-
-/**
- * tracehook_notify_death - task is dead, ready to notify parent
- * @task: @current task now exiting
- * @death_cookie: value to pass to tracehook_report_death()
- * @group_dead: nonzero if this was the last thread in the group to die
- *
- * A return value >= 0 means call do_notify_parent() with that signal
- * number. Negative return value can be %DEATH_REAP to self-reap right
- * now, or %DEATH_DELAYED_GROUP_LEADER to a zombie without notifying our
- * parent. Note that a return value of 0 means a do_notify_parent() call
- * that sends no signal, but still wakes up a parent blocked in wait*().
- *
- * Called with write_lock_irq(&tasklist_lock) held.
- */
-static inline int tracehook_notify_death(struct task_struct *task,
- void **death_cookie, int group_dead)
-{
- if (task_detached(task))
- return task->ptrace ? SIGCHLD : DEATH_REAP;
-
- /*
- * If something other than our normal parent is ptracing us, then
- * send it a SIGCHLD instead of honoring exit_signal. exit_signal
- * only has special meaning to our real parent.
- */
- if (thread_group_empty(task) && !ptrace_reparented(task))
- return task->exit_signal;
-
- return task->ptrace ? SIGCHLD : DEATH_DELAYED_GROUP_LEADER;
-}
-
#ifdef TIF_NOTIFY_RESUME
/**
* set_notify_resume - cause tracehook_notify_resume() to be called
--- ptrace/kernel/exit.c~2_kill_notify_death 2011-06-22 22:47:11.000000000 +0200
+++ ptrace/kernel/exit.c 2011-06-22 22:47:11.000000000 +0200
@@ -818,9 +818,7 @@ static void forget_original_parent(struc
*/
static void exit_notify(struct task_struct *tsk, int group_dead)
{
- int signal;
bool autoreap;
- void *cookie;

/*
* This does two things:
@@ -851,16 +849,21 @@ static void exit_notify(struct task_stru
* we have changed execution domain as these two values started
* the same after a fork.
*/
- if (tsk->exit_signal != SIGCHLD && !task_detached(tsk) &&
+ if (thread_group_leader(tsk) && tsk->exit_signal != SIGCHLD &&
(tsk->parent_exec_id != tsk->real_parent->self_exec_id ||
tsk->self_exec_id != tsk->parent_exec_id))
tsk->exit_signal = SIGCHLD;

- signal = tracehook_notify_death(tsk, &cookie, group_dead);
- if (signal >= 0)
- autoreap = do_notify_parent(tsk, signal);
- else
- autoreap = (signal == DEATH_REAP);
+ if (unlikely(tsk->ptrace)) {
+ int sig = thread_group_leader(tsk) && !ptrace_reparented(tsk)
+ ? tsk->exit_signal : SIGCHLD;
+ autoreap = do_notify_parent(tsk, sig);
+ } else if (thread_group_leader(tsk)) {
+ autoreap = thread_group_empty(tsk) &&
+ do_notify_parent(tsk, tsk->exit_signal);
+ } else {
+ autoreap = true;
+ }

tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;

2011-06-22 21:11:05

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/8] __ptrace_detach: avoid task_detached(), check do_notify_parent()

__ptrace_detach() relies on the current obscure behaviour of
do_notify_parent(tsk) which changes tsk->exit_signal if this child
should be silently reaped. That is why we check task_detached(), it
is true if the task is sub-thread, or it is the group_leader but
its exit_signal was changed by do_notify_parent().

This is confusing, change the code to rely on !thread_group_leader()
or the value returned by do_notify_parent().

Signed-off-by: Oleg Nesterov <[email protected]>
---

kernel/ptrace.c | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)

--- ptrace/kernel/ptrace.c~3_detach_ck_notify 2011-06-22 22:47:03.000000000 +0200
+++ ptrace/kernel/ptrace.c 2011-06-22 22:47:11.000000000 +0200
@@ -370,25 +370,28 @@ static int ignoring_children(struct sigh
*/
static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
{
+ bool dead;
+
__ptrace_unlink(p);

- if (p->exit_state == EXIT_ZOMBIE) {
- if (!task_detached(p) && thread_group_empty(p)) {
- if (!same_thread_group(p->real_parent, tracer))
- do_notify_parent(p, p->exit_signal);
- else if (ignoring_children(tracer->sighand)) {
- __wake_up_parent(p, tracer);
- p->exit_signal = -1;
- }
- }
- if (task_detached(p)) {
- /* Mark it as in the process of being reaped. */
- p->exit_state = EXIT_DEAD;
- return true;
+ if (p->exit_state != EXIT_ZOMBIE)
+ return false;
+
+ dead = !thread_group_leader(p);
+
+ if (!dead && thread_group_empty(p)) {
+ if (!same_thread_group(p->real_parent, tracer))
+ dead = do_notify_parent(p, p->exit_signal);
+ else if (ignoring_children(tracer->sighand)) {
+ __wake_up_parent(p, tracer);
+ p->exit_signal = -1;
+ dead = true;
}
}
-
- return false;
+ /* Mark it as in the process of being reaped. */
+ if (dead)
+ p->exit_state = EXIT_DEAD;
+ return dead;
}

static int ptrace_detach(struct task_struct *child, unsigned int data)

2011-06-22 21:11:33

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 4/8] make do_notify_parent() __must_check, update the callers

Change other callers of do_notify_parent() to check the value it
returns, this makes the subsequent task_detached() unnecessary.
Mark do_notify_parent() as __must_check.

Use thread_group_leader() instead of !task_detached() to check
if we need to notify the real parent in wait_task_zombie().

Remove the stale comment in release_task(). "just for sanity" is
no longer true, we have to set EXIT_DEAD to avoid the races with
do_wait().

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/sched.h | 2 +-
kernel/exit.c | 29 ++++++++---------------------
2 files changed, 9 insertions(+), 22 deletions(-)

--- ptrace/include/linux/sched.h~4_all_notify_callers 2011-06-22 22:47:11.000000000 +0200
+++ ptrace/include/linux/sched.h 2011-06-22 22:47:12.000000000 +0200
@@ -2161,7 +2161,7 @@ extern int kill_pid_info_as_uid(int, str
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
extern int kill_proc_info(int, struct siginfo *, pid_t);
-extern bool do_notify_parent(struct task_struct *, int);
+extern __must_check bool do_notify_parent(struct task_struct *, int);
extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
extern void force_sig(int, struct task_struct *);
extern int send_sig(int, struct task_struct *, int);
--- ptrace/kernel/exit.c~4_all_notify_callers 2011-06-22 22:47:11.000000000 +0200
+++ ptrace/kernel/exit.c 2011-06-22 22:47:12.000000000 +0200
@@ -190,21 +190,12 @@ repeat:
leader = p->group_leader;
if (leader != p && thread_group_empty(leader) && leader->exit_state == EXIT_ZOMBIE) {
BUG_ON(task_detached(leader));
- do_notify_parent(leader, leader->exit_signal);
/*
* If we were the last child thread and the leader has
* exited already, and the leader's parent ignores SIGCHLD,
* then we are the one who should release the leader.
- *
- * do_notify_parent() will have marked it self-reaping in
- * that case.
- */
- zap_leader = task_detached(leader);
-
- /*
- * This maintains the invariant that release_task()
- * only runs on a task in EXIT_DEAD, just for sanity.
*/
+ zap_leader = do_notify_parent(leader, leader->exit_signal);
if (zap_leader)
leader->exit_state = EXIT_DEAD;
}
@@ -765,8 +756,7 @@ static void reparent_leader(struct task_
/* If it has exited notify the new parent about this child's death. */
if (!p->ptrace &&
p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) {
- do_notify_parent(p, p->exit_signal);
- if (task_detached(p)) {
+ if (do_notify_parent(p, p->exit_signal)) {
p->exit_state = EXIT_DEAD;
list_move_tail(&p->sibling, dead);
}
@@ -1348,16 +1338,13 @@ static int wait_task_zombie(struct wait_
/* We dropped tasklist, ptracer could die and untrace */
ptrace_unlink(p);
/*
- * If this is not a detached task, notify the parent.
- * If it's still not detached after that, don't release
- * it now.
+ * If this is not a sub-thread, notify the parent.
+ * If parent wants a zombie, don't release it now.
*/
- if (!task_detached(p)) {
- do_notify_parent(p, p->exit_signal);
- if (!task_detached(p)) {
- p->exit_state = EXIT_ZOMBIE;
- p = NULL;
- }
+ if (thread_group_leader(p) &&
+ !do_notify_parent(p, p->exit_signal)) {
+ p->exit_state = EXIT_ZOMBIE;
+ p = NULL;
}
write_unlock_irq(&tasklist_lock);
}

2011-06-22 21:12:00

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 5/8] reparent_leader: check EXIT_DEAD instead of task_detached()

Change reparent_leader() to check ->exit_state instead of ->exit_signal,
this matches the similar EXIT_DEAD check in wait_consider_task() and
allows us to cleanup the do_notify_parent/task_detached logic.

task_detached() was really needed during reparenting before 9cd80bbb
"do_wait() optimization: do not place sub-threads on ->children list"
to filter out the sub-threads. After this change task_detached(p) can
only be true if p is the dead group_leader and its parent ignores
SIGCHLD, in this case the caller of do_notify_parent() is going to
reap this task and it should set EXIT_DEAD.

Signed-off-by: Oleg Nesterov <[email protected]>
---

kernel/exit.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- ptrace/kernel/exit.c~5_reparent_leader_ck_exit_dead 2011-06-22 22:47:12.000000000 +0200
+++ ptrace/kernel/exit.c 2011-06-22 22:47:12.000000000 +0200
@@ -741,7 +741,7 @@ static void reparent_leader(struct task_
{
list_move_tail(&p->sibling, &p->real_parent->children);

- if (task_detached(p))
+ if (p->exit_state == EXIT_DEAD)
return;
/*
* If this is a threaded reparent there is no need to

2011-06-22 21:11:51

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 6/8] kill task_detached()

Upadate the last user of task_detached(), wait_task_zombie(), to
use thread_group_leader() and kill task_detached().

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/sched.h | 5 -----
kernel/exit.c | 5 ++---
2 files changed, 2 insertions(+), 8 deletions(-)

--- ptrace/include/linux/sched.h~6_kill_task_detached 2011-06-22 22:47:12.000000000 +0200
+++ ptrace/include/linux/sched.h 2011-06-22 22:47:13.000000000 +0200
@@ -2319,11 +2319,6 @@ static inline int thread_group_empty(str
#define delay_group_leader(p) \
(thread_group_leader(p) && !thread_group_empty(p))

-static inline int task_detached(struct task_struct *p)
-{
- return p->exit_signal == -1;
-}
-
/*
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
--- ptrace/kernel/exit.c~6_kill_task_detached 2011-06-22 22:47:12.000000000 +0200
+++ ptrace/kernel/exit.c 2011-06-22 22:47:13.000000000 +0200
@@ -189,7 +189,6 @@ repeat:
zap_leader = 0;
leader = p->group_leader;
if (leader != p && thread_group_empty(leader) && leader->exit_state == EXIT_ZOMBIE) {
- BUG_ON(task_detached(leader));
/*
* If we were the last child thread and the leader has
* exited already, and the leader's parent ignores SIGCHLD,
@@ -1228,9 +1227,9 @@ static int wait_task_zombie(struct wait_
traced = ptrace_reparented(p);
/*
* It can be ptraced but not reparented, check
- * !task_detached() to filter out sub-threads.
+ * thread_group_leader() to filter out sub-threads.
*/
- if (likely(!traced) && likely(!task_detached(p))) {
+ if (likely(!traced) && thread_group_leader(p)) {
struct signal_struct *psig;
struct signal_struct *sig;
unsigned long maxrss;

2011-06-22 21:12:24

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 7/8] do not change dead_task->exit_signal

__ptrace_detach() and do_notify_parent() set task->exit_signal = -1
to mark the task dead. This is no longer needed, nobody checks
exit_signal to detect the EXIT_DEAD task.

Signed-off-by: Oleg Nesterov <[email protected]>
---

kernel/ptrace.c | 1 -
kernel/signal.c | 1 -
2 files changed, 2 deletions(-)

--- ptrace/kernel/ptrace.c~7_dont_change_exit_signal 2011-06-22 22:47:11.000000000 +0200
+++ ptrace/kernel/ptrace.c 2011-06-22 22:47:14.000000000 +0200
@@ -384,7 +384,6 @@ static bool __ptrace_detach(struct task_
dead = do_notify_parent(p, p->exit_signal);
else if (ignoring_children(tracer->sighand)) {
__wake_up_parent(p, tracer);
- p->exit_signal = -1;
dead = true;
}
}
--- ptrace/kernel/signal.c~7_dont_change_exit_signal 2011-06-22 22:47:11.000000000 +0200
+++ ptrace/kernel/signal.c 2011-06-22 22:47:14.000000000 +0200
@@ -1650,7 +1650,6 @@ bool do_notify_parent(struct task_struct
* it, just use SIG_IGN instead).
*/
autoreap = true;
- tsk->exit_signal = -1;
if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
sig = 0;
}

2011-06-22 21:12:30

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 8/8] redefine thread_group_leader() as exit_signal >= 0

Change de_thread() to set old_leader->exit_signal = -1. This is
good for the consistency, it is no longer the leader and all
sub-threads have exit_signal = -1 set by copy_process(CLONE_THREAD).

And this allows us to micro-optimize thread_group_leader(), it can
simply check exit_signal >= 0. This also makes sense because we
should move ->group_leader from task_struct to signal_struct.

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/sched.h | 6 ++++--
fs/exec.c | 1 +
2 files changed, 5 insertions(+), 2 deletions(-)

--- ptrace/include/linux/sched.h~8_thread_group_leader 2011-06-22 22:47:13.000000000 +0200
+++ ptrace/include/linux/sched.h 2011-06-22 22:47:14.000000000 +0200
@@ -2285,8 +2285,10 @@ static inline int get_nr_threads(struct
return tsk->signal->nr_threads;
}

-/* de_thread depends on thread_group_leader not being a pid based check */
-#define thread_group_leader(p) (p == p->group_leader)
+static inline bool thread_group_leader(struct task_struct *p)
+{
+ return p->exit_signal >= 0;
+}

/* Do to the insanities of de_thread it is possible for a process
* to have the pid of the thread group leader without actually being
--- ptrace/fs/exec.c~8_thread_group_leader 2011-06-22 22:47:00.000000000 +0200
+++ ptrace/fs/exec.c 2011-06-22 22:47:14.000000000 +0200
@@ -963,6 +963,7 @@ static int de_thread(struct task_struct
leader->group_leader = tsk;

tsk->exit_signal = SIGCHLD;
+ leader->exit_signal = -1;

BUG_ON(leader->exit_state != EXIT_ZOMBIE);
leader->exit_state = EXIT_DEAD;

2011-06-23 08:58:20

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/7] ptrace: move SIGTRAP on exec(2) logic to ptrace_event()

Hello,

On Tue, Jun 21, 2011 at 10:40:01PM +0200, Oleg Nesterov wrote:
> > The intention is to concentrate ptrace specific logic in
> > ptrace_event(). We'll have more of them, mostly dependent on
> > PT_SEIZED and I don't think it's a good idea to scatter them across
> > the kernel. They're of no interest outside of ptrace after all. I
> > think it's better to have them collected in one place than scattered
> > around.
>
> This was one of the reasons for tracehooks ;)

Yeah, sure. The problem with tracehook is not that it's a
fundamentally wrong thing to do but it's way over done than actually
necessary and full of requirements which are of no interest to inside
the upstream kernel.

> OK, we can move this helper to ptrace.h although I do not think this
> makes sense. As for "scattered around", imho the code which calculates
> trace in do_fork() falls into the same category.
>
> I still can't understand why ptrace_event() should check EVENT_EXEC.
> This is the special case, it should be handled specially. And while
> I think this is not that important, this is not friendly to do_fork,
> compiler has to generate the code to check event.

The generated code should be the same. I didn't check all the cases
but the few I check didn't really change.

> But OK, I applied 1-5 and 7. This is minor, and we can reconsider this
> later. I mean, right now I think I'll send the cleanup later, and you
> will have to explain your nack ;)

Yeah, yeah, it's a minor point one way or the other. I was (and still
is) thinking about folding most of ptrace-specific event logics into
ptrace_event() and this was just a first step as I think adding more
condition checks in fs/exec.c doesn't make much sense - ie. I might
end up adding PT_SEZIED there too.

Thanks.

--
tejun

2011-06-23 09:14:08

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 7/7] ptrace: s/tracehook_tracer_task()/ptrace_parent()/

Hello,

On Mon, Jun 20, 2011 at 10:16:03PM +0200, Oleg Nesterov wrote:
> On 06/17, Tejun Heo wrote:
> >
> > tracehook.h is on the way out. Rename tracehook_tracer_task() to
> > ptrace_parent() and move it from tracehook.h to ptrace.h.
>
> I am a bit surpised you decided to keep this helper. Can't we simply
> kill it?

I was being a bit lazy. If it can be killed, it would be great.

> OK, we will see. I guess this change is mostly needed to remove yet
> another function from tracehook.h.
>
> > @@ -216,7 +216,7 @@ static struct mm_struct *__check_mem_permission(struct task_struct *task)
> > if (task_is_stopped_or_traced(task)) {
> > int match;
> > rcu_read_lock();
> > - match = (tracehook_tracer_task(task) == current);
> > + match = (ptrace_parent(task) == current);
> > rcu_read_unlock();
> > if (match && ptrace_may_access(task, PTRACE_MODE_ATTACH))
>
> All we need
>
> if (task_is_traced(task) && task->parent == current) {
> if (ptrace_may_access()
> return mm;
> }
>
> Of course I do not blame this patch, my only point is that this helper
> only adds more confusion imho.

Yeah, it didn't seem like all the users actually required rcu
dereferencing and both the tracehook and ptrace function names are
confusing and not consistent in their use. We can convert the ones
which don't require rcu dereferencing to use ->parent directly and if
the left ones are few enough, maybe kill the helper altogether.

Thanks.

--
tejun

2011-06-23 09:24:08

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 6/7] ptrace: kill detah tracehooks

Hey, Oleg.

On Mon, Jun 20, 2011 at 09:39:29PM +0200, Oleg Nesterov wrote:
> > - signal = tracehook_notify_death(tsk, &cookie, group_dead);
> > - if (signal >= 0)
> > - signal = do_notify_parent(tsk, signal);
> > + /*
> > + * Notify interested parent if the whole group is dead. Ptracer
> > + * should always be notified. If something other than our normal
> > + * parent is ptracing us, then send it a SIGCHLD instead of
> > + * honoring exit_signal. exit_signal only has special meaning to
> > + * our real parent.
> > + */
> > + if (!task_detached(tsk) && thread_group_empty(tsk)) {
> > + if (!ptrace_reparented(tsk))
> > + do_notify_parent(tsk, tsk->exit_signal);
> > + else
> > + do_notify_parent(tsk, SIGCHLD);
>
> Forget about ptrace, suppose a single threaded child exits and its
> parent ignores SIGCHLD. In this case do_notify_parent() sets
> exit_signal = -1 to mark it dead, we should set exit_state = EXIT_DEAD
> and release it.

Oops, right. Missed do_notify_parent() may modify exit_signal.
Grumble. It seems you already have posted patches to address this
one. Let's continue there.

Thanks.

--
tejun

2011-06-23 09:52:45

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/8] make do_notify_parent() return bool

Hello,

On Wed, Jun 22, 2011 at 11:08:18PM +0200, Oleg Nesterov wrote:
> - change do_notify_parent() to return a boolean, true if the task should
> be reaped because its parent ignores SIGCHLD.
>
> - update the only caller which checks the returned value, exit_notify().
>
> This temporary uglifies exit_notify() even more, will be cleanuped by
> the next change.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> @@ -2161,7 +2161,7 @@ extern int kill_pid_info_as_uid(int, str
> extern int kill_pgrp(struct pid *pid, int sig, int priv);
> extern int kill_pid(struct pid *pid, int sig, int priv);
> extern int kill_proc_info(int, struct siginfo *, pid_t);
> -extern int do_notify_parent(struct task_struct *, int);
> +extern bool do_notify_parent(struct task_struct *, int);
> extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
> extern void force_sig(int, struct task_struct *);
> extern int send_sig(int, struct task_struct *, int);

While at it, would you mind adding the name of the parameters to the
prototype? I find prototypes w/o parameter names a bit irksome.

Other than that,

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2011-06-23 12:22:59

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/8] kill tracehook_notify_death()

Hello, Oleg.

On Wed, Jun 22, 2011 at 11:08:34PM +0200, Oleg Nesterov wrote:
> Kill tracehook_notify_death(), reimplement the logic in its caller,
> exit_notify().
>
> This also fixes a minor bug, if the exiting task is the group_leader
> and it is traced by its real_parent, tracehook_notify_death() returns
> task->exit_signal or SIGCHLD depending on thread_group_empty(), this
> looks strange.

Maybe we should do the above in a separate patch?

> - if (tsk->exit_signal != SIGCHLD && !task_detached(tsk) &&
> + if (thread_group_leader(tsk) && tsk->exit_signal != SIGCHLD &&

Hmmm... it probably depends on POV but wouldn't (exit_signal != -1 &&
exit_signal != SIGCHLD) be easier? The logic here is about demoting
specials sigs to SIGCHLD under certain circumstances and the check is
there to prevent promoting -1 to SIGCHLD. I agree the detached() is
more distracting than helping but thread_group_leader() seems
unnecessarily indirect.

Thanks.

--
tejun

2011-06-23 13:23:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/8] kill tracehook_notify_death()

Hi Tejun,

On 06/23, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Wed, Jun 22, 2011 at 11:08:34PM +0200, Oleg Nesterov wrote:
> > Kill tracehook_notify_death(), reimplement the logic in its caller,
> > exit_notify().
> >
> > This also fixes a minor bug, if the exiting task is the group_leader
> > and it is traced by its real_parent, tracehook_notify_death() returns
> > task->exit_signal or SIGCHLD depending on thread_group_empty(), this
> > looks strange.
>
> Maybe we should do the above in a separate patch?

Do you think this makes sense? OK, I can do this...

>
> > - if (tsk->exit_signal != SIGCHLD && !task_detached(tsk) &&
> > + if (thread_group_leader(tsk) && tsk->exit_signal != SIGCHLD &&
>
> Hmmm... it probably depends on POV but wouldn't (exit_signal != -1 &&
> exit_signal != SIGCHLD) be easier?

I disagree.

> The logic here is about demoting
> specials sigs to SIGCHLD under certain circumstances

Yes. And what is ->exit_signal? It is in fact per-process, lives in the
group_leader's task_struct. We could move it to signal_struct.

> and the check is
> there to prevent promoting -1 to SIGCHLD.

Yes, because we should never change ->exit_signal for sub-threads.

And it doesn't make sense to check exec ids (this is again per-process,
should be cleanuped) unless the task is the group leader.

> thread_group_leader() seems
> unnecessarily indirect.

This is what I disagree with. Contrary, I think thread_group_leader() exactly
explains what do we want to check. (but once again, exec_id logic should be
cleanuped, not only in this function).

Oleg.

2011-06-23 13:25:40

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/8] __ptrace_detach: avoid task_detached(), check do_notify_parent()

On Wed, Jun 22, 2011 at 11:08:53PM +0200, Oleg Nesterov wrote:
> __ptrace_detach() relies on the current obscure behaviour of
> do_notify_parent(tsk) which changes tsk->exit_signal if this child
> should be silently reaped. That is why we check task_detached(), it
> is true if the task is sub-thread, or it is the group_leader but
> its exit_signal was changed by do_notify_parent().
>
> This is confusing, change the code to rely on !thread_group_leader()
> or the value returned by do_notify_parent().
>
> Signed-off-by: Oleg Nesterov <[email protected]>

This looks good to me.

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2011-06-23 13:28:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/8] kill tracehook_notify_death()

Hey,

On Thu, Jun 23, 2011 at 03:21:26PM +0200, Oleg Nesterov wrote:
> > > This also fixes a minor bug, if the exiting task is the group_leader
> > > and it is traced by its real_parent, tracehook_notify_death() returns
> > > task->exit_signal or SIGCHLD depending on thread_group_empty(), this
> > > looks strange.
> >
> > Maybe we should do the above in a separate patch?
>
> Do you think this makes sense? OK, I can do this...

Having subtle behavior change mixed with reorganization isn't too
nice, so I think separating is better.

> > thread_group_leader() seems unnecessarily indirect.
>
> This is what I disagree with. Contrary, I think thread_group_leader() exactly
> explains what do we want to check. (but once again, exec_id logic should be
> cleanuped, not only in this function).

Hmmm... well, this was minor to begin with and thread_group_leader()
matches later patches better, so using thread_group_leader() seems
fine.

Thanks.

--
tejun

2011-06-23 13:30:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/8] kill tracehook_notify_death()

On 06/23, Tejun Heo wrote:
>
> Hey,
>
> On Thu, Jun 23, 2011 at 03:21:26PM +0200, Oleg Nesterov wrote:
> > > > This also fixes a minor bug, if the exiting task is the group_leader
> > > > and it is traced by its real_parent, tracehook_notify_death() returns
> > > > task->exit_signal or SIGCHLD depending on thread_group_empty(), this
> > > > looks strange.
> > >
> > > Maybe we should do the above in a separate patch?
> >
> > Do you think this makes sense? OK, I can do this...
>
> Having subtle behavior change mixed with reorganization isn't too
> nice, so I think separating is better.

Hmm. OK, you are right. I'll redo.

Oleg.

2011-06-23 13:38:14

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/8] make do_notify_parent() __must_check, update the callers

On Wed, Jun 22, 2011 at 11:09:09PM +0200, Oleg Nesterov wrote:
> Change other callers of do_notify_parent() to check the value it
> returns, this makes the subsequent task_detached() unnecessary.
> Mark do_notify_parent() as __must_check.
>
> Use thread_group_leader() instead of !task_detached() to check
> if we need to notify the real parent in wait_task_zombie().
>
> Remove the stale comment in release_task(). "just for sanity" is
> no longer true, we have to set EXIT_DEAD to avoid the races with
> do_wait().
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2011-06-23 13:56:44

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 8/8] redefine thread_group_leader() as exit_signal >= 0

Hello,

On Wed, Jun 22, 2011 at 11:10:26PM +0200, Oleg Nesterov wrote:
> Change de_thread() to set old_leader->exit_signal = -1. This is
> good for the consistency, it is no longer the leader and all
> sub-threads have exit_signal = -1 set by copy_process(CLONE_THREAD).
>
> And this allows us to micro-optimize thread_group_leader(), it can
> simply check exit_signal >= 0. This also makes sense because we
> should move ->group_leader from task_struct to signal_struct.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Patches 5-8 look correct to me but I frankly don't know the area
enough to assert acked-by's. Please feel free to add reviewed-by's
tho.

Thanks.

--
tejun

2011-06-23 17:09:18

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/8] kill tracehook_notify_death()

On 06/23, Oleg Nesterov wrote:
>
> On 06/23, Tejun Heo wrote:
> >
> > Having subtle behavior change mixed with reorganization isn't too
> > nice, so I think separating is better.
>
> Hmm. OK, you are right. I'll redo.

------------------------------------------------------------------------------
[PATCH v2 2/8] kill tracehook_notify_death()

Kill tracehook_notify_death(), reimplement the logic in its caller,
exit_notify().

Also, change the exec_id's check to use thread_group_leader() instead
of task_detached(), this is more clear. This logic only applies to
the exiting leader, a sub-thread must never change its exit_signal.

Note: when the traced group leader exits the exit_signal-or-SIGCHLD
logic looks really strange:

- we notify the tracer even if !thread_group_empty() but
do_wait(WEXITED) can't work until all threads exit

- if the tracer is real_parent, it is not clear why can't
we use ->exit_signal event if !thread_group_empty()

-v2: do not try to fix the 2nd oddity to avoid the subtle behavior
change mixed with reorganization, suggested by Tejun.

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/tracehook.h | 34 ----------------------------------
kernel/exit.c | 21 +++++++++++++--------
2 files changed, 13 insertions(+), 42 deletions(-)

--- ptrace/include/linux/tracehook.h~2_kill_notify_death 2011-06-22 22:47:04.000000000 +0200
+++ ptrace/include/linux/tracehook.h 2011-06-22 22:47:11.000000000 +0200
@@ -152,40 +152,6 @@ static inline void tracehook_signal_hand
ptrace_notify(SIGTRAP);
}

-#define DEATH_REAP -1
-#define DEATH_DELAYED_GROUP_LEADER -2
-
-/**
- * tracehook_notify_death - task is dead, ready to notify parent
- * @task: @current task now exiting
- * @death_cookie: value to pass to tracehook_report_death()
- * @group_dead: nonzero if this was the last thread in the group to die
- *
- * A return value >= 0 means call do_notify_parent() with that signal
- * number. Negative return value can be %DEATH_REAP to self-reap right
- * now, or %DEATH_DELAYED_GROUP_LEADER to a zombie without notifying our
- * parent. Note that a return value of 0 means a do_notify_parent() call
- * that sends no signal, but still wakes up a parent blocked in wait*().
- *
- * Called with write_lock_irq(&tasklist_lock) held.
- */
-static inline int tracehook_notify_death(struct task_struct *task,
- void **death_cookie, int group_dead)
-{
- if (task_detached(task))
- return task->ptrace ? SIGCHLD : DEATH_REAP;
-
- /*
- * If something other than our normal parent is ptracing us, then
- * send it a SIGCHLD instead of honoring exit_signal. exit_signal
- * only has special meaning to our real parent.
- */
- if (thread_group_empty(task) && !ptrace_reparented(task))
- return task->exit_signal;
-
- return task->ptrace ? SIGCHLD : DEATH_DELAYED_GROUP_LEADER;
-}
-
#ifdef TIF_NOTIFY_RESUME
/**
* set_notify_resume - cause tracehook_notify_resume() to be called
--- ptrace/kernel/exit.c~2_kill_notify_death 2011-06-22 22:47:11.000000000 +0200
+++ ptrace/kernel/exit.c 2011-06-23 18:33:59.000000000 +0200
@@ -818,9 +818,7 @@ static void forget_original_parent(struc
*/
static void exit_notify(struct task_struct *tsk, int group_dead)
{
- int signal;
bool autoreap;
- void *cookie;

/*
* This does two things:
@@ -851,16 +849,23 @@ static void exit_notify(struct task_stru
* we have changed execution domain as these two values started
* the same after a fork.
*/
- if (tsk->exit_signal != SIGCHLD && !task_detached(tsk) &&
+ if (thread_group_leader(tsk) && tsk->exit_signal != SIGCHLD &&
(tsk->parent_exec_id != tsk->real_parent->self_exec_id ||
tsk->self_exec_id != tsk->parent_exec_id))
tsk->exit_signal = SIGCHLD;

- signal = tracehook_notify_death(tsk, &cookie, group_dead);
- if (signal >= 0)
- autoreap = do_notify_parent(tsk, signal);
- else
- autoreap = (signal == DEATH_REAP);
+ if (unlikely(tsk->ptrace)) {
+ int sig = thread_group_leader(tsk) &&
+ thread_group_empty(tsk) &&
+ !ptrace_reparented(tsk) ?
+ tsk->exit_signal : SIGCHLD;
+ autoreap = do_notify_parent(tsk, sig);
+ } else if (thread_group_leader(tsk)) {
+ autoreap = thread_group_empty(tsk) &&
+ do_notify_parent(tsk, tsk->exit_signal);
+ } else {
+ autoreap = true;
+ }

tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;

2011-06-25 14:16:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/8] kill tracehook_notify_death()

Hello,

On Thu, Jun 23, 2011 at 07:06:50PM +0200, Oleg Nesterov wrote:
> [PATCH v2 2/8] kill tracehook_notify_death()
>
> Kill tracehook_notify_death(), reimplement the logic in its caller,
> exit_notify().
>
> Also, change the exec_id's check to use thread_group_leader() instead
> of task_detached(), this is more clear. This logic only applies to
> the exiting leader, a sub-thread must never change its exit_signal.
>
> Note: when the traced group leader exits the exit_signal-or-SIGCHLD
> logic looks really strange:
>
> - we notify the tracer even if !thread_group_empty() but
> do_wait(WEXITED) can't work until all threads exit

Yeap, we've discussed this before and this indeed is odd. However, is
there something ptracer can't do with PTRACE_EVENT_EXIT instead? If
PTRACE_EVENT_EXIT can be used instead for most practical purposes, I
think we're better off just documenting the fact and advise use of
PTRACE_EVENT_EXIT rather than trying to change the behavior.

> - if the tracer is real_parent, it is not clear why can't
> we use ->exit_signal event if !thread_group_empty()

I've been thinking a bit more about this and it doesn't seem that
changing this is necessarily a good idea. The current behavior does
make certain sense (overridden exit_signal is used only for the real
parent when the process is being reaped) and doesn't cause any actual
problem, so I don't think we need to change this behavior.

> + if (unlikely(tsk->ptrace)) {
> + int sig = thread_group_leader(tsk) &&
> + thread_group_empty(tsk) &&
> + !ptrace_reparented(tsk) ?
> + tsk->exit_signal : SIGCHLD;

Heh, I think this needs to be prettier even at the cost of an inline
function.

Thanks.

--
tejun

2011-06-26 20:55:04

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/8] kill tracehook_notify_death()

On 06/25, Tejun Heo wrote:
>
> Hello,
>
> On Thu, Jun 23, 2011 at 07:06:50PM +0200, Oleg Nesterov wrote:
> > [PATCH v2 2/8] kill tracehook_notify_death()
> >
> > Kill tracehook_notify_death(), reimplement the logic in its caller,
> > exit_notify().
> >
> > Also, change the exec_id's check to use thread_group_leader() instead
> > of task_detached(), this is more clear. This logic only applies to
> > the exiting leader, a sub-thread must never change its exit_signal.
> >
> > Note: when the traced group leader exits the exit_signal-or-SIGCHLD
> > logic looks really strange:
> >
> > - we notify the tracer even if !thread_group_empty() but
> > do_wait(WEXITED) can't work until all threads exit
>
> Yeap, we've discussed this before and this indeed is odd. However, is
> there something ptracer can't do with PTRACE_EVENT_EXIT instead?

Firstly, I think PTRACE_EVENT_EXIT should not stop the tracee if it
was SIGKILL'ed. Even if the tracee stops, it can be killed later.
The tracer can't detach after that, it can't even wait() to detecte
a zombie leader.

> rather than trying to change the behavior.

Yes, perhaps we shouldn't (or can't) change this behaviour, I am not
sure. We will see.

> > - if the tracer is real_parent, it is not clear why can't
> > we use ->exit_signal event if !thread_group_empty()
>
> I've been thinking a bit more about this and it doesn't seem that
> changing this is necessarily a good idea.

Yes, agreed. This doesn't buy us something really useful.

> The current behavior does
> make certain sense (overridden exit_signal is used only for the real
> parent when the process is being reaped)

Oh, but this is the traced task. I do not think this behaviour was
really designed, I can be wrong of course. For example, what "being
reaped" actually means? Say, the group leader can exit after all
other sub-threads have already exited, but thread_group_empty() == F
exactly because a sub-thread is traced and wasn't reaped yet.

To me, it would be more clean to do

if (tsk->ptrace) {
int sig = ptrace_reparented(tsk) ?
SIGCHLD : tsk->group_leader->exit_signal;

}

> and doesn't cause any actual
> problem, so I don't think we need to change this behavior.

Agreed.

> > + if (unlikely(tsk->ptrace)) {
> > + int sig = thread_group_leader(tsk) &&
> > + thread_group_empty(tsk) &&
> > + !ptrace_reparented(tsk) ?
> > + tsk->exit_signal : SIGCHLD;
>
> Heh, I think this needs to be prettier even at the cost of an inline
> function.

May be, but the code is sooooo simple. In fact I thought about the helper,
but can't find a good name.

Anyway, this is so minor, unless you strongly object I am going to push
this patch as is. We can add a helper later although I don't think it is
needed.

The same logic could be written as

if (thread_group_empty(tsk)) {
int sig = ptrace_reparented(tsk) ?
SIGCHLD : tsk->exit_signal;
autoreap = do_notify_parent(tsk, sig);
} else if (task->ptrace) {
autoreap = do_notify_parent(tsk, SIGCHLD);
} else {
autoreap = !thread_group_leader();
}

note that it certainly looks "prettier". However, personaly I strongly
prefer the non-pretty code above, imho it is more straighforward and
understandable. It is hardly possible to misread/misunderstand it.

Oleg.

2011-06-27 08:25:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/8] kill tracehook_notify_death()

Hello, Oleg.

On Sun, Jun 26, 2011 at 10:51:57PM +0200, Oleg Nesterov wrote:
> > Yeap, we've discussed this before and this indeed is odd. However, is
> > there something ptracer can't do with PTRACE_EVENT_EXIT instead?
>
> Firstly, I think PTRACE_EVENT_EXIT should not stop the tracee if it
> was SIGKILL'ed. Even if the tracee stops, it can be killed later.
> The tracer can't detach after that, it can't even wait() to detecte
> a zombie leader.

For SIGKILL, yes, it is different, but if PTRACE_EVENT_EXIT is enough
for all other cases, I think we're mostly set. BTW, it seems like we
would actually stop at PTRACE_EVENT_EXIT even after SIGKILL. This is
wrong & racy. may_ptrace_stop() should be checking for
sigkill_pending(), right?

> > Heh, I think this needs to be prettier even at the cost of an inline
> > function.
>
> May be, but the code is sooooo simple. In fact I thought about the helper,
> but can't find a good name.
>
> Anyway, this is so minor, unless you strongly object I am going to push
> this patch as is. We can add a helper later although I don't think it is
> needed.
>
> The same logic could be written as
>
> if (thread_group_empty(tsk)) {
> int sig = ptrace_reparented(tsk) ?
> SIGCHLD : tsk->exit_signal;
> autoreap = do_notify_parent(tsk, sig);
> } else if (task->ptrace) {
> autoreap = do_notify_parent(tsk, SIGCHLD);
> } else {
> autoreap = !thread_group_leader();
> }
>
> note that it certainly looks "prettier". However, personaly I strongly
> prefer the non-pretty code above, imho it is more straighforward and
> understandable. It is hardly possible to misread/misunderstand it.

Well, it's cosmetic after all and you're the maintainer. I don't have
any major problem with the original. Please go ahead.

Thanks.

--
tejun

2011-06-27 14:23:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/8] kill tracehook_notify_death()

Hi,

On 06/27, Tejun Heo wrote:
>
> On Sun, Jun 26, 2011 at 10:51:57PM +0200, Oleg Nesterov wrote:
> > > Yeap, we've discussed this before and this indeed is odd. However, is
> > > there something ptracer can't do with PTRACE_EVENT_EXIT instead?
> >
> > Firstly, I think PTRACE_EVENT_EXIT should not stop the tracee if it
> > was SIGKILL'ed. Even if the tracee stops, it can be killed later.
> > The tracer can't detach after that, it can't even wait() to detecte
> > a zombie leader.
>
> For SIGKILL, yes, it is different, but if PTRACE_EVENT_EXIT is enough
> for all other cases, I think we're mostly set.

I think this is not that simple. I already mentioned this before, I think
we need a separate discussion. I'll try to return to this in a few days.

Firstly, we should decide when PTRACE_EVENT_EXIT should stop, and when it
shouldn't. In this discussion I'll assume sys_exit_group() should respect
PTRACE_EVENT_EXIT.

> BTW, it seems like we
> would actually stop at PTRACE_EVENT_EXIT even after SIGKILL. This is
> wrong & racy.

Yes! because the tracee can call ptrace_stop() after the pending SIGKILL
was already dequeued from task->pending, this fools
schedule()->signal_pending_state().

So, __fatal_signal_pending() is too "weak",

> may_ptrace_stop() should be checking for
> sigkill_pending(), right?

Yes, but at the same time even __fatal_signal_pending() is too strong!
What if the tracee exits on its own, and its sys_exit() races with
exit_group() from another thread? In this case I think it should stop,
but __fatal_signal_pending() is true.


And worse. What if the tracee stops in PTRACE_EVENT_EXIT, and _then_
another thread does sys_exit_group()? The tracee will be "killed".
I do not think this is right. I think the "implicit" SIGKILL in this
case should _not_ wake up the tracee. Only the real SIGKILL (or any
fatal signal which mutates to SIGKILL). Otherwise we simply can't
guarantee PTRACE_EVENT_EXIT works "reliably" in this case.


We have signal_group_exit()/SIGNAL_GROUP_EXIT. I think we also need
SIGNAL_THE_REAL_SIGKILL_WAS_SENT flag. Note also we have the similar
problems with the coredump. SIGKILL should abort it. Also, we should
define what TIF_SIGPENDIND and interruptible wait mean after exit_signals()
and/or exit_notify(). Some drivers (tty? I do not remember) expect that
the exiting task can do wait_event_interruptible() and react to ^C.



> I don't have
> any major problem with the original. Please go ahead.

Thanks. Can I add your reviewed-by/acked-by ?

Oleg.

2011-06-27 14:36:29

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/8] kill tracehook_notify_death()

Hello,

On Mon, Jun 27, 2011 at 04:21:36PM +0200, Oleg Nesterov wrote:
> Yes, but at the same time even __fatal_signal_pending() is too strong!
> What if the tracee exits on its own, and its sys_exit() races with
> exit_group() from another thread? In this case I think it should stop,
> but __fatal_signal_pending() is true.
>
> And worse. What if the tracee stops in PTRACE_EVENT_EXIT, and _then_
> another thread does sys_exit_group()? The tracee will be "killed".
> I do not think this is right. I think the "implicit" SIGKILL in this
> case should _not_ wake up the tracee. Only the real SIGKILL (or any
> fatal signal which mutates to SIGKILL). Otherwise we simply can't
> guarantee PTRACE_EVENT_EXIT works "reliably" in this case.

Indeed, for PTRACE_EVENT_EXIT to work properly, we would need to
discern between actual KILLs and the implicit ones.

> We have signal_group_exit()/SIGNAL_GROUP_EXIT. I think we also need
> SIGNAL_THE_REAL_SIGKILL_WAS_SENT flag. Note also we have the similar
> problems with the coredump. SIGKILL should abort it. Also, we should
> define what TIF_SIGPENDIND and interruptible wait mean after exit_signals()
> and/or exit_notify(). Some drivers (tty? I do not remember) expect that
> the exiting task can do wait_event_interruptible() and react to ^C.

Eh... that sounds very shady. Maybe we should try to remove that
weird requirement instead?

> > I don't have
> > any major problem with the original. Please go ahead.
>
> Thanks. Can I add your reviewed-by/acked-by ?

Sure, please feel free to add reviewed-by. Thanks.

--
tejun