2008-07-17 07:26:28

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 00/23] tracehook

This patch series introduces the "tracehook" interface layer of inlines
in <linux/tracehook.h>. There are more details in the log entry for
patch 01/23 and in the header file comments inside that patch.
Most of these changes move code around with little or no change,
and they should not break anything or change any behavior.

This sets a new standard for uniform arch support to enable clean
arch-independent implementations of new debugging and tracing stuff,
denoted by CONFIG_HAVE_ARCH_TRACEHOOK. Patch 20/23 adds that symbol to
arch/Kconfig, with comments listing everything an arch has to do before
setting "select HAVE_ARCH_TRACEHOOK". These are elaborted a bit at:
http://sourceware.org/systemtap/wiki/utrace/arch/HowTo
The new inlines that arch code must define or call have detailed
kerneldoc comments in the generic header files that say what is required.

No arch is obligated to do any work, and no arch's build should be
broken by these changes. There are several steps that each arch should
take so it can set HAVE_ARCH_TRACEHOOK. Most of these are simple.
Providing this support will let new things people add for doing
debugging and tracing of user-level threads "just work" for your arch
in the future. For an arch that does not provide HAVE_ARCH_TRACEHOOK,
some new options for such features will not be available for config.

I have done some arch work and will submit this to the arch maintainers
after the generic tracehook series settles in. For now, that work is
available in my GIT repositories, and in patch and mbox-of-patches form
at http://people.redhat.com/roland/utrace/2.6-current/

This paves the way for my "utrace" work, to be submitted later. But it
is not innately tied to that. I hope that the tracehook series can go
in soon regardless of what eventually does or doesn't go on top of it.
For anyone implementing any kind of new tracing/debugging plan, or just
understanding all the context of the existing ptrace implementation,
having tracehook.h makes things much easier to find and understand.

This series must be applied after the "ptrace & wait cleanup" series,
but only because of patch conflicts. They can be reviewed separately.

This series of patches is also available in GIT at the URL below, and in a
mail folder of patch messages (use git-am or formail -s patch -p1) at:
http://people.redhat.com/roland/utrace/2.6-current/tracehook.mbox

The following changes since commit 666f164f4fbfa78bd00fb4b74788b42a39842c64:
Roland McGrath (1):
fix dangling zombie when new parent ignores children

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git tracehook

Roland McGrath (23):
tracehook: add linux/tracehook.h
tracehook: exec
tracehook: unexport ptrace_notify
tracehook: exit
tracehook: clone
tracehook: vfork-done
tracehook: release_task
tracehook: tracehook_tracer_task
tracehook: tracehook_expect_breakpoints
tracehook: tracehook_signal_handler
tracehook: tracehook_consider_ignored_signal
tracehook: tracehook_consider_fatal_signal
tracehook: syscall
tracehook: get_signal_to_deliver
tracehook: job control
tracehook: death
tracehook: force signal_pending()
tracehook: TIF_NOTIFY_RESUME
tracehook: asm/syscall.h
tracehook: CONFIG_HAVE_ARCH_TRACEHOOK
tracehook: wait_task_inactive
task_current_syscall
/proc/PID/syscall

arch/Kconfig | 18 ++
arch/ia64/kernel/perfmon.c | 4 +-
arch/x86/ia32/ia32_aout.c | 6 -
fs/binfmt_aout.c | 6 -
fs/binfmt_elf.c | 6 -
fs/binfmt_elf_fdpic.c | 7 -
fs/binfmt_flat.c | 3 -
fs/binfmt_som.c | 2 -
fs/exec.c | 12 +-
fs/proc/array.c | 9 +-
fs/proc/base.c | 39 +++-
include/asm-generic/syscall.h | 141 ++++++++++
include/linux/ptrace.h | 72 +++++
include/linux/sched.h | 10 +-
include/linux/tracehook.h | 575 +++++++++++++++++++++++++++++++++++++++++
kernel/exit.c | 53 ++---
kernel/fork.c | 74 ++----
kernel/kthread.c | 2 +-
kernel/ptrace.c | 2 +-
kernel/sched.c | 29 ++-
kernel/signal.c | 99 +++++---
lib/Makefile | 2 +
lib/syscall.c | 75 ++++++
mm/nommu.c | 4 +-
security/selinux/hooks.c | 22 +--
25 files changed, 1084 insertions(+), 188 deletions(-)
create mode 100644 include/asm-generic/syscall.h
create mode 100644 include/linux/tracehook.h
create mode 100644 lib/syscall.c


Thanks,
Roland


2008-07-17 07:28:26

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 01/23] tracehook: add linux/tracehook.h

This adds the new kernel-internal header file <linux/tracehook.h>.
This is not yet used at all. The comments in the header introduce
what the following series of patches is about.

The aim is to formalize and consolidate all the places that the core
kernel code and the arch code now ties into the ptrace implementation.

These patches mostly don't cause any functional change. They just
move the details of ptrace logic out of core code into tracehook.h
inlines, where they are mostly compiled away to the same as before.
All that changes is that everything is thoroughly documented and any
future reworking of ptrace, or addition of something new, would not
have to touch core code all over, just change the tracehook.h inlines.

The new linux/ptrace.h inlines are used by the following patches in the
new tracehook_*() inlines. Using these helpers for the ptrace event
stops makes it simple to change or disable the old ptrace implementation
of these stops conditionally later.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/ptrace.h | 33 ++++++++++++++++++++++++++++
include/linux/tracehook.h | 52 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index c6f5f9d..c74abfc 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -121,6 +121,39 @@ static inline void ptrace_unlink(struct task_struct *child)
int generic_ptrace_peekdata(struct task_struct *tsk, long addr, long data);
int generic_ptrace_pokedata(struct task_struct *tsk, long addr, long data);

+/**
+ * task_ptrace - return %PT_* flags that apply to a task
+ * @task: pointer to &task_struct in question
+ *
+ * Returns the %PT_* flags that apply to @task.
+ */
+static inline int task_ptrace(struct task_struct *task)
+{
+ return task->ptrace;
+}
+
+/**
+ * ptrace_event - possibly stop for a ptrace event notification
+ * @mask: %PT_* bit to check in @current->ptrace
+ * @event: %PTRACE_EVENT_* value to report if @mask is set
+ * @message: value for %PTRACE_GETEVENTMSG to return
+ *
+ * This checks the @mask bit to see if ptrace wants stops for this event.
+ * If so we stop, reporting @event and @message to the ptrace parent.
+ *
+ * Returns nonzero if we did a ptrace notification, zero if not.
+ *
+ * Called without locks.
+ */
+static inline int ptrace_event(int mask, int event, unsigned long message)
+{
+ if (mask && likely(!(current->ptrace & mask)))
+ return 0;
+ current->ptrace_message = message;
+ ptrace_notify((event << 8) | SIGTRAP);
+ return 1;
+}
+
#ifndef force_successful_syscall_return
/*
* System call handlers that, upon successful completion, need to return a
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
new file mode 100644
index 0000000..bea0f3e
--- /dev/null
+++ b/include/linux/tracehook.h
@@ -0,0 +1,52 @@
+/*
+ * Tracing hooks
+ *
+ * Copyright (C) 2008 Red Hat, Inc. All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * This file defines hook entry points called by core code where
+ * user tracing/debugging support might need to do something. These
+ * entry points are called tracehook_*(). Each hook declared below
+ * has a detailed kerneldoc comment giving the context (locking et
+ * al) from which it is called, and the meaning of its return value.
+ *
+ * Each function here typically has only one call site, so it is ok
+ * to have some nontrivial tracehook_*() inlines. In all cases, the
+ * fast path when no tracing is enabled should be very short.
+ *
+ * The purpose of this file and the tracehook_* layer is to consolidate
+ * the interface that the kernel core and arch code uses to enable any
+ * user debugging or tracing facility (such as ptrace). The interfaces
+ * here are carefully documented so that maintainers of core and arch
+ * code do not need to think about the implementation details of the
+ * tracing facilities. Likewise, maintainers of the tracing code do not
+ * need to understand all the calling core or arch code in detail, just
+ * documented circumstances of each call, such as locking conditions.
+ *
+ * If the calling core code changes so that locking is different, then
+ * it is ok to change the interface documented here. The maintainer of
+ * core code changing should notify the maintainers of the tracing code
+ * that they need to work out the change.
+ *
+ * Some tracehook_*() inlines take arguments that the current tracing
+ * implementations might not necessarily use. These function signatures
+ * are chosen to pass in all the information that is on hand in the
+ * caller and might conceivably be relevant to a tracer, so that the
+ * core code won't have to be updated when tracing adds more features.
+ * If a call site changes so that some of those parameters are no longer
+ * already on hand without extra work, then the tracehook_* interface
+ * can change so there is no make-work burden on the core code. The
+ * maintainer of core code changing should notify the maintainers of the
+ * tracing code that they need to work out the change.
+ */
+
+#ifndef _LINUX_TRACEHOOK_H
+#define _LINUX_TRACEHOOK_H 1
+
+#include <linux/sched.h>
+#include <linux/ptrace.h>
+
+#endif /* <linux/tracehook.h> */

2008-07-17 07:28:46

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 03/23] tracehook: unexport ptrace_notify

The ptrace_notify() function should not be called by any modules.
It was only ever exported to be called by binfmt exec functions.
But that is no longer necessary since fs/exec.c deals with that
generically now. There should be no calls to ptrace_notify() from
outside the core kernel.

Signed-off-by: Roland McGrath <[email protected]>
---
kernel/signal.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 6c0958e..02d3a11 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1921,7 +1921,6 @@ EXPORT_SYMBOL_GPL(dequeue_signal);
EXPORT_SYMBOL(flush_signals);
EXPORT_SYMBOL(force_sig);
EXPORT_SYMBOL(kill_proc);
-EXPORT_SYMBOL(ptrace_notify);
EXPORT_SYMBOL(send_sig);
EXPORT_SYMBOL(send_sig_info);
EXPORT_SYMBOL(sigprocmask);

2008-07-17 07:29:15

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 02/23] tracehook: exec

This moves all the ptrace hooks related to exec into tracehook.h inlines.

This also lifts the calls for tracing out of the binfmt load_binary hooks
into search_binary_handler() after it calls into the binfmt module. This
change has no effect, since all the binfmt modules' load_binary functions
did the call at the end on success, and now search_binary_handler() does
it immediately after return if successful. We consolidate the repeated
code, and binfmt modules no longer need to import ptrace_notify().

Signed-off-by: Roland McGrath <[email protected]>
---
arch/x86/ia32/ia32_aout.c | 6 -----
fs/binfmt_aout.c | 6 -----
fs/binfmt_elf.c | 6 -----
fs/binfmt_elf_fdpic.c | 7 ------
fs/binfmt_flat.c | 3 --
fs/binfmt_som.c | 2 -
fs/exec.c | 12 +++-------
include/linux/tracehook.h | 46 +++++++++++++++++++++++++++++++++++++++++++++
8 files changed, 50 insertions(+), 38 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 58cccb6..a0e1dbe 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -441,12 +441,6 @@ beyond_if:
regs->r8 = regs->r9 = regs->r10 = regs->r11 =
regs->r12 = regs->r13 = regs->r14 = regs->r15 = 0;
set_fs(USER_DS);
- if (unlikely(current->ptrace & PT_PTRACED)) {
- if (current->ptrace & PT_TRACE_EXEC)
- ptrace_notify((PTRACE_EVENT_EXEC << 8) | SIGTRAP);
- else
- send_sig(SIGTRAP, current, 0);
- }
return 0;
}

diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index ba4cddb..204cfd1 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -444,12 +444,6 @@ beyond_if:
regs->gp = ex.a_gpvalue;
#endif
start_thread(regs, ex.a_entry, current->mm->start_stack);
- if (unlikely(current->ptrace & PT_PTRACED)) {
- if (current->ptrace & PT_TRACE_EXEC)
- ptrace_notify ((PTRACE_EVENT_EXEC << 8) | SIGTRAP);
- else
- send_sig(SIGTRAP, current, 0);
- }
return 0;
}

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index d48ff5f..66a2bdc 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -974,12 +974,6 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
#endif

start_thread(regs, elf_entry, bprm->p);
- if (unlikely(current->ptrace & PT_PTRACED)) {
- if (current->ptrace & PT_TRACE_EXEC)
- ptrace_notify ((PTRACE_EVENT_EXEC << 8) | SIGTRAP);
- else
- send_sig(SIGTRAP, current, 0);
- }
retval = 0;
out:
kfree(loc);
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index d051a32..1280d5d 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -433,13 +433,6 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm,
entryaddr = interp_params.entry_addr ?: exec_params.entry_addr;
start_thread(regs, entryaddr, current->mm->start_stack);

- if (unlikely(current->ptrace & PT_PTRACED)) {
- if (current->ptrace & PT_TRACE_EXEC)
- ptrace_notify((PTRACE_EVENT_EXEC << 8) | SIGTRAP);
- else
- send_sig(SIGTRAP, current, 0);
- }
-
retval = 0;

error:
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 2cb1acd..56372ec 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -920,9 +920,6 @@ static int load_flat_binary(struct linux_binprm * bprm, struct pt_regs * regs)

start_thread(regs, start_addr, current->mm->start_stack);

- if (current->ptrace & PT_PTRACED)
- send_sig(SIGTRAP, current, 0);
-
return 0;
}

diff --git a/fs/binfmt_som.c b/fs/binfmt_som.c
index fdc36bf..68be580 100644
--- a/fs/binfmt_som.c
+++ b/fs/binfmt_som.c
@@ -274,8 +274,6 @@ load_som_binary(struct linux_binprm * bprm, struct pt_regs * regs)
map_hpux_gateway_page(current,current->mm);

start_thread_som(regs, som_entry, bprm->p);
- if (current->ptrace & PT_PTRACED)
- send_sig(SIGTRAP, current, 0);
return 0;

/* error cleanup */
diff --git a/fs/exec.c b/fs/exec.c
index fd92343..346db53 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -43,7 +43,6 @@
#include <linux/module.h>
#include <linux/namei.h>
#include <linux/proc_fs.h>
-#include <linux/ptrace.h>
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/syscalls.h>
@@ -51,6 +50,7 @@
#include <linux/tsacct_kern.h>
#include <linux/cn_proc.h>
#include <linux/audit.h>
+#include <linux/tracehook.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -1075,13 +1075,8 @@ EXPORT_SYMBOL(prepare_binprm);

static int unsafe_exec(struct task_struct *p)
{
- int unsafe = 0;
- if (p->ptrace & PT_PTRACED) {
- if (p->ptrace & PT_PTRACE_CAP)
- unsafe |= LSM_UNSAFE_PTRACE_CAP;
- else
- unsafe |= LSM_UNSAFE_PTRACE;
- }
+ int unsafe = tracehook_unsafe_exec(p);
+
if (atomic_read(&p->fs->count) > 1 ||
atomic_read(&p->files->count) > 1 ||
atomic_read(&p->sighand->count) > 1)
@@ -1218,6 +1213,7 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
read_unlock(&binfmt_lock);
retval = fn(bprm, regs);
if (retval >= 0) {
+ tracehook_report_exec(fmt, bprm, regs);
put_binfmt(fmt);
allow_write_access(bprm->file);
if (bprm->file)
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index bea0f3e..6276353 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -48,5 +48,51 @@

#include <linux/sched.h>
#include <linux/ptrace.h>
+#include <linux/security.h>
+struct linux_binprm;
+
+/**
+ * tracehook_unsafe_exec - check for exec declared unsafe due to tracing
+ * @task: current task doing exec
+ *
+ * Return %LSM_UNSAFE_* bits applied to an exec because of tracing.
+ *
+ * Called with task_lock() held on @task.
+ */
+static inline int tracehook_unsafe_exec(struct task_struct *task)
+{
+ int unsafe = 0;
+ int ptrace = task_ptrace(task);
+ if (ptrace & PT_PTRACED) {
+ if (ptrace & PT_PTRACE_CAP)
+ unsafe |= LSM_UNSAFE_PTRACE_CAP;
+ else
+ unsafe |= LSM_UNSAFE_PTRACE;
+ }
+ return unsafe;
+}
+
+/**
+ * tracehook_report_exec - a successful exec was completed
+ * @fmt: &struct linux_binfmt that performed the exec
+ * @bprm: &struct linux_binprm containing exec details
+ * @regs: user-mode register state
+ *
+ * An exec just completed, we are shortly going to return to user mode.
+ * The freshly initialized register state can be seen and changed in @regs.
+ * The name, file and other pointers in @bprm are still on hand to be
+ * inspected, but will be freed as soon as this returns.
+ *
+ * Called with no locks, but with some kernel resources held live
+ * and a reference on @fmt->module.
+ */
+static inline void tracehook_report_exec(struct linux_binfmt *fmt,
+ struct linux_binprm *bprm,
+ struct pt_regs *regs)
+{
+ if (!ptrace_event(PT_TRACE_EXEC, PTRACE_EVENT_EXEC, 0) &&
+ unlikely(task_ptrace(current) & PT_PTRACED))
+ send_sig(SIGTRAP, current, 0);
+}

#endif /* <linux/tracehook.h> */

2008-07-17 07:29:32

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 04/23] tracehook: exit

This moves the PTRACE_EVENT_EXIT tracing into a tracehook.h inline,
tracehook_report_exec(). The change has no effect, just clean-up.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 15 +++++++++++++++
kernel/exit.c | 6 ++----
2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 6276353..967ab47 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -95,4 +95,19 @@ static inline void tracehook_report_exec(struct linux_binfmt *fmt,
send_sig(SIGTRAP, current, 0);
}

+/**
+ * tracehook_report_exit - task has begun to exit
+ * @exit_code: pointer to value destined for @current->exit_code
+ *
+ * @exit_code points to the value passed to do_exit(), which tracing
+ * might change here. This is almost the first thing in do_exit(),
+ * before freeing any resources or setting the %PF_EXITING flag.
+ *
+ * Called with no locks held.
+ */
+static inline void tracehook_report_exit(long *exit_code)
+{
+ ptrace_event(PT_TRACE_EXIT, PTRACE_EVENT_EXIT, *exit_code);
+}
+
#endif /* <linux/tracehook.h> */
diff --git a/kernel/exit.c b/kernel/exit.c
index 93d2711..bd243c3 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -46,6 +46,7 @@
#include <linux/resource.h>
#include <linux/blkdev.h>
#include <linux/task_io_accounting_ops.h>
+#include <linux/tracehook.h>

#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -1005,10 +1006,7 @@ NORET_TYPE void do_exit(long code)
if (unlikely(!tsk->pid))
panic("Attempted to kill the idle task!");

- if (unlikely(current->ptrace & PT_TRACE_EXIT)) {
- current->ptrace_message = code;
- ptrace_notify((PTRACE_EVENT_EXIT << 8) | SIGTRAP);
- }
+ tracehook_report_exit(&code);

/*
* We're taking recursive faults here in do_exit. Safest is to just

2008-07-17 07:29:52

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 07/23] tracehook: release_task

This moves the ptrace-related logic from release_task into tracehook.h and
ptrace.h inlines. It provides clean hooks both before and after locking
tasklist_lock, for future tracing logic to do more cleanup without the lock.

This also changes release_task() itself in the rare "zap_leader" case to
set the leader to EXIT_DEAD before iterating. This maintains the
invariant that release_task() only ever handles a task in EXIT_DEAD.
This is a common-sense invariant that is already always true except in
this one arcane case of zombie leader whose parent ignores SIGCHLD.

This change is harmless and only costs one store in this one rare case.
It keeps the expected state more consisently sane, which is nicer when
debugging weirdness in release_task(). It also lets some future code
in the tracehook entry points rely on this invariant for bookkeeping.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/ptrace.h | 13 +++++++++++++
include/linux/tracehook.h | 28 ++++++++++++++++++++++++++++
kernel/exit.c | 21 +++++++++------------
3 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index dae6d85..ed69c03 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -176,6 +176,19 @@ static inline void ptrace_init_task(struct task_struct *child, bool ptrace)
}
}

+/**
+ * ptrace_release_task - final ptrace-related cleanup of a zombie being reaped
+ * @task: task in %EXIT_DEAD state
+ *
+ * Called with write_lock(&tasklist_lock) held.
+ */
+static inline void ptrace_release_task(struct task_struct *task)
+{
+ BUG_ON(!list_empty(&task->ptraced));
+ ptrace_unlink(task);
+ BUG_ON(!list_empty(&task->ptrace_entry));
+}
+
#ifndef force_successful_syscall_return
/*
* System call handlers that, upon successful completion, need to return a
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 830e6e1..9a5b3be 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -228,4 +228,32 @@ static inline void tracehook_report_vfork_done(struct task_struct *child,
ptrace_event(PT_TRACE_VFORK_DONE, PTRACE_EVENT_VFORK_DONE, pid);
}

+/**
+ * tracehook_prepare_release_task - task is being reaped, clean up tracing
+ * @task: task in %EXIT_DEAD state
+ *
+ * This is called in release_task() just before @task gets finally reaped
+ * and freed. This would be the ideal place to remove and clean up any
+ * tracing-related state for @task.
+ *
+ * Called with no locks held.
+ */
+static inline void tracehook_prepare_release_task(struct task_struct *task)
+{
+}
+
+/**
+ * tracehook_finish_release_task - task is being reaped, clean up tracing
+ * @task: task in %EXIT_DEAD state
+ *
+ * This is called in release_task() when @task is being in the middle of
+ * being reaped. After this, there must be no tracing entanglements.
+ *
+ * Called with write_lock_irq(&tasklist_lock) held.
+ */
+static inline void tracehook_finish_release_task(struct task_struct *task)
+{
+ ptrace_release_task(task);
+}
+
#endif /* <linux/tracehook.h> */
diff --git a/kernel/exit.c b/kernel/exit.c
index bd243c3..a416a84 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -153,27 +153,17 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
put_task_struct(container_of(rhp, struct task_struct, rcu));
}

-/*
- * Do final ptrace-related cleanup of a zombie being reaped.
- *
- * Called with write_lock(&tasklist_lock) held.
- */
-static void ptrace_release_task(struct task_struct *p)
-{
- BUG_ON(!list_empty(&p->ptraced));
- ptrace_unlink(p);
- BUG_ON(!list_empty(&p->ptrace_entry));
-}

void release_task(struct task_struct * p)
{
struct task_struct *leader;
int zap_leader;
repeat:
+ tracehook_prepare_release_task(p);
atomic_dec(&p->user->processes);
proc_flush_task(p);
write_lock_irq(&tasklist_lock);
- ptrace_release_task(p);
+ tracehook_finish_release_task(p);
__exit_signal(p);

/*
@@ -195,6 +185,13 @@ repeat:
* that case.
*/
zap_leader = task_detached(leader);
+
+ /*
+ * This maintains the invariant that release_task()
+ * only runs on a task in EXIT_DEAD, just for sanity.
+ */
+ if (zap_leader)
+ leader->exit_state = EXIT_DEAD;
}

write_unlock_irq(&tasklist_lock);

2008-07-17 07:30:23

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 05/23] tracehook: clone

This moves all the ptrace initialization and tracing logic for task
creation into tracehook.h and ptrace.h inlines. It reorganizes the code
slightly, but should not change any behavior.

There are four tracehook entry points, at each important stage of task
creation. This keeps the interface from the core fork.c code fairly clean,
while supporting the complex setup required for ptrace or something like it.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/ptrace.h | 22 ++++++++++
include/linux/tracehook.h | 100 +++++++++++++++++++++++++++++++++++++++++++++
kernel/fork.c | 69 +++++++++++++------------------
3 files changed, 150 insertions(+), 41 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index c74abfc..dae6d85 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -154,6 +154,28 @@ static inline int ptrace_event(int mask, int event, unsigned long message)
return 1;
}

+/**
+ * ptrace_init_task - initialize ptrace state for a new child
+ * @child: new child task
+ * @ptrace: true if child should be ptrace'd by parent's tracer
+ *
+ * This is called immediately after adding @child to its parent's children
+ * list. @ptrace is false in the normal case, and true to ptrace @child.
+ *
+ * Called with current's siglock and write_lock_irq(&tasklist_lock) held.
+ */
+static inline void ptrace_init_task(struct task_struct *child, bool ptrace)
+{
+ INIT_LIST_HEAD(&child->ptrace_entry);
+ INIT_LIST_HEAD(&child->ptraced);
+ child->parent = child->real_parent;
+ child->ptrace = 0;
+ if (unlikely(ptrace)) {
+ child->ptrace = current->ptrace;
+ __ptrace_link(child, current->parent);
+ }
+}
+
#ifndef force_successful_syscall_return
/*
* System call handlers that, upon successful completion, need to return a
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 967ab47..3ebc58b 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -110,4 +110,104 @@ static inline void tracehook_report_exit(long *exit_code)
ptrace_event(PT_TRACE_EXIT, PTRACE_EVENT_EXIT, *exit_code);
}

+/**
+ * tracehook_prepare_clone - prepare for new child to be cloned
+ * @clone_flags: %CLONE_* flags from clone/fork/vfork system call
+ *
+ * This is called before a new user task is to be cloned.
+ * Its return value will be passed to tracehook_finish_clone().
+ *
+ * Called with no locks held.
+ */
+static inline int tracehook_prepare_clone(unsigned clone_flags)
+{
+ if (clone_flags & CLONE_UNTRACED)
+ return 0;
+
+ if (clone_flags & CLONE_VFORK) {
+ if (current->ptrace & PT_TRACE_VFORK)
+ return PTRACE_EVENT_VFORK;
+ } else if ((clone_flags & CSIGNAL) != SIGCHLD) {
+ if (current->ptrace & PT_TRACE_CLONE)
+ return PTRACE_EVENT_CLONE;
+ } else if (current->ptrace & PT_TRACE_FORK)
+ return PTRACE_EVENT_FORK;
+
+ return 0;
+}
+
+/**
+ * tracehook_finish_clone - new child created and being attached
+ * @child: new child task
+ * @clone_flags: %CLONE_* flags from clone/fork/vfork system call
+ * @trace: return value from tracehook_clone_prepare()
+ *
+ * This is called immediately after adding @child to its parent's children list.
+ * The @trace value is that returned by tracehook_prepare_clone().
+ *
+ * Called with current's siglock and write_lock_irq(&tasklist_lock) held.
+ */
+static inline void tracehook_finish_clone(struct task_struct *child,
+ unsigned long clone_flags, int trace)
+{
+ ptrace_init_task(child, (clone_flags & CLONE_PTRACE) || trace);
+}
+
+/**
+ * tracehook_report_clone - in parent, new child is about to start running
+ * @trace: return value from tracehook_clone_prepare()
+ * @regs: parent's user register state
+ * @clone_flags: flags from parent's system call
+ * @pid: new child's PID in the parent's namespace
+ * @child: new child task
+ *
+ * Called after a child is set up, but before it has been started running.
+ * The @trace value is that returned by tracehook_clone_prepare().
+ * This is not a good place to block, because the child has not started yet.
+ * Suspend the child here if desired, and block in tracehook_clone_complete().
+ * This must prevent the child from self-reaping if tracehook_clone_complete()
+ * uses the @child pointer; otherwise it might have died and been released by
+ * the time tracehook_report_clone_complete() is called.
+ *
+ * Called with no locks held, but the child cannot run until this returns.
+ */
+static inline void tracehook_report_clone(int trace, struct pt_regs *regs,
+ unsigned long clone_flags,
+ pid_t pid, struct task_struct *child)
+{
+ if (unlikely(trace)) {
+ /*
+ * The child starts up with an immediate SIGSTOP.
+ */
+ sigaddset(&child->pending.signal, SIGSTOP);
+ set_tsk_thread_flag(child, TIF_SIGPENDING);
+ }
+}
+
+/**
+ * tracehook_report_clone_complete - new child is running
+ * @trace: return value from tracehook_clone_prepare()
+ * @regs: parent's user register state
+ * @clone_flags: flags from parent's system call
+ * @pid: new child's PID in the parent's namespace
+ * @child: child task, already running
+ *
+ * This is called just after the child has started running. This is
+ * just before the clone/fork syscall returns, or blocks for vfork
+ * child completion if @clone_flags has the %CLONE_VFORK bit set.
+ * The @child pointer may be invalid if a self-reaping child died and
+ * tracehook_report_clone() took no action to prevent it from self-reaping.
+ *
+ * Called with no locks held.
+ */
+static inline void tracehook_report_clone_complete(int trace,
+ struct pt_regs *regs,
+ unsigned long clone_flags,
+ pid_t pid,
+ struct task_struct *child)
+{
+ if (unlikely(trace))
+ ptrace_event(0, trace, pid);
+}
+
#endif /* <linux/tracehook.h> */
diff --git a/kernel/fork.c b/kernel/fork.c
index adefc11..56b6ab0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -55,6 +55,7 @@
#include <linux/tty.h>
#include <linux/proc_fs.h>
#include <linux/blkdev.h>
+#include <linux/tracehook.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -833,8 +834,7 @@ static void copy_flags(unsigned long clone_flags, struct task_struct *p)

new_flags &= ~PF_SUPERPRIV;
new_flags |= PF_FORKNOEXEC;
- if (!(clone_flags & CLONE_PTRACE))
- p->ptrace = 0;
+ new_flags |= PF_STARTING;
p->flags = new_flags;
clear_freeze_flag(p);
}
@@ -875,7 +875,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
struct pt_regs *regs,
unsigned long stack_size,
int __user *child_tidptr,
- struct pid *pid)
+ struct pid *pid,
+ int trace)
{
int retval;
struct task_struct *p;
@@ -1125,8 +1126,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
*/
p->group_leader = p;
INIT_LIST_HEAD(&p->thread_group);
- INIT_LIST_HEAD(&p->ptrace_entry);
- INIT_LIST_HEAD(&p->ptraced);

/* Now that the task is set up, run cgroup callbacks if
* necessary. We need to run them before the task is visible
@@ -1157,7 +1156,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->real_parent = current->real_parent;
else
p->real_parent = current;
- p->parent = p->real_parent;

spin_lock(&current->sighand->siglock);

@@ -1199,8 +1197,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,

if (likely(p->pid)) {
list_add_tail(&p->sibling, &p->real_parent->children);
- if (unlikely(p->ptrace & PT_PTRACED))
- __ptrace_link(p, current->parent);
+ tracehook_finish_clone(p, clone_flags, trace);

if (thread_group_leader(p)) {
if (clone_flags & CLONE_NEWPID)
@@ -1285,29 +1282,13 @@ struct task_struct * __cpuinit fork_idle(int cpu)
struct pt_regs regs;

task = copy_process(CLONE_VM, 0, idle_regs(&regs), 0, NULL,
- &init_struct_pid);
+ &init_struct_pid, 0);
if (!IS_ERR(task))
init_idle(task, cpu);

return task;
}

-static int fork_traceflag(unsigned clone_flags)
-{
- if (clone_flags & CLONE_UNTRACED)
- return 0;
- else if (clone_flags & CLONE_VFORK) {
- if (current->ptrace & PT_TRACE_VFORK)
- return PTRACE_EVENT_VFORK;
- } else if ((clone_flags & CSIGNAL) != SIGCHLD) {
- if (current->ptrace & PT_TRACE_CLONE)
- return PTRACE_EVENT_CLONE;
- } else if (current->ptrace & PT_TRACE_FORK)
- return PTRACE_EVENT_FORK;
-
- return 0;
-}
-
/*
* Ok, this is the main fork-routine.
*
@@ -1342,14 +1323,14 @@ long do_fork(unsigned long clone_flags,
}
}

- if (unlikely(current->ptrace)) {
- trace = fork_traceflag (clone_flags);
- if (trace)
- clone_flags |= CLONE_PTRACE;
- }
+ /*
+ * When called from kernel_thread, don't do user tracing stuff.
+ */
+ if (likely(user_mode(regs)))
+ trace = tracehook_prepare_clone(clone_flags);

p = copy_process(clone_flags, stack_start, regs, stack_size,
- child_tidptr, NULL);
+ child_tidptr, NULL, trace);
/*
* Do this prior waking up the new thread - the thread pointer
* might get invalid after that point, if the thread exits quickly.
@@ -1367,24 +1348,30 @@ long do_fork(unsigned long clone_flags,
init_completion(&vfork);
}

- if ((p->ptrace & PT_PTRACED) || (clone_flags & CLONE_STOPPED)) {
+ tracehook_report_clone(trace, regs, clone_flags, nr, p);
+
+ /*
+ * We set PF_STARTING at creation in case tracing wants to
+ * use this to distinguish a fully live task from one that
+ * hasn't gotten to tracehook_report_clone() yet. Now we
+ * clear it and set the child going.
+ */
+ p->flags &= ~PF_STARTING;
+
+ if (unlikely(clone_flags & CLONE_STOPPED)) {
/*
* We'll start up with an immediate SIGSTOP.
*/
sigaddset(&p->pending.signal, SIGSTOP);
set_tsk_thread_flag(p, TIF_SIGPENDING);
- }
-
- if (!(clone_flags & CLONE_STOPPED))
- wake_up_new_task(p, clone_flags);
- else
__set_task_state(p, TASK_STOPPED);
-
- if (unlikely (trace)) {
- current->ptrace_message = nr;
- ptrace_notify ((trace << 8) | SIGTRAP);
+ } else {
+ wake_up_new_task(p, clone_flags);
}

+ tracehook_report_clone_complete(trace, regs,
+ clone_flags, nr, p);
+
if (clone_flags & CLONE_VFORK) {
freezer_do_not_count();
wait_for_completion(&vfork);

2008-07-17 07:30:46

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 06/23] tracehook: vfork-done

This moves the PTRACE_EVENT_VFORK_DONE tracing into a tracehook.h inline,
tracehook_report_vfork_done(). The change has no effect, just clean-up.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 18 ++++++++++++++++++
kernel/fork.c | 5 +----
2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 3ebc58b..830e6e1 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -210,4 +210,22 @@ static inline void tracehook_report_clone_complete(int trace,
ptrace_event(0, trace, pid);
}

+/**
+ * tracehook_report_vfork_done - vfork parent's child has exited or exec'd
+ * @child: child task, already running
+ * @pid: new child's PID in the parent's namespace
+ *
+ * Called after a %CLONE_VFORK parent has waited for the child to complete.
+ * The clone/vfork system call will return immediately after this.
+ * The @child pointer may be invalid if a self-reaping child died and
+ * tracehook_report_clone() took no action to prevent it from self-reaping.
+ *
+ * Called with no locks held.
+ */
+static inline void tracehook_report_vfork_done(struct task_struct *child,
+ pid_t pid)
+{
+ ptrace_event(PT_TRACE_VFORK_DONE, PTRACE_EVENT_VFORK_DONE, pid);
+}
+
#endif /* <linux/tracehook.h> */
diff --git a/kernel/fork.c b/kernel/fork.c
index 56b6ab0..5345b7d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1376,10 +1376,7 @@ long do_fork(unsigned long clone_flags,
freezer_do_not_count();
wait_for_completion(&vfork);
freezer_count();
- if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE)) {
- current->ptrace_message = nr;
- ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
- }
+ tracehook_report_vfork_done(p, nr);
}
} else {
nr = PTR_ERR(p);

2008-07-17 07:31:08

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 08/23] tracehook: tracehook_tracer_task

This adds the tracehook_tracer_task() hook to consolidate all forms of
"Who is using ptrace on me?" logic. This is used for "TracerPid:" in
/proc and for permission checks. We also clean up the selinux code
the called an identical accessor.

Signed-off-by: Roland McGrath <[email protected]>
---
fs/proc/array.c | 9 +++++++--
fs/proc/base.c | 13 +++++++++----
include/linux/tracehook.h | 18 ++++++++++++++++++
security/selinux/hooks.c | 22 +++-------------------
4 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 797d775..0d6eb33 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -80,6 +80,7 @@
#include <linux/delayacct.h>
#include <linux/seq_file.h>
#include <linux/pid_namespace.h>
+#include <linux/tracehook.h>

#include <asm/pgtable.h>
#include <asm/processor.h>
@@ -168,8 +169,12 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
rcu_read_lock();
ppid = pid_alive(p) ?
task_tgid_nr_ns(rcu_dereference(p->real_parent), ns) : 0;
- tpid = pid_alive(p) && p->ptrace ?
- task_pid_nr_ns(rcu_dereference(p->parent), ns) : 0;
+ tpid = 0;
+ if (pid_alive(p)) {
+ struct task_struct *tracer = tracehook_tracer_task(p);
+ if (tracer)
+ tpid = task_pid_nr_ns(tracer, ns);
+ }
seq_printf(m,
"State:\t%s\n"
"Tgid:\t%d\n"
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 58c3e6a..6f21daa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -69,6 +69,7 @@
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/ptrace.h>
+#include <linux/tracehook.h>
#include <linux/cgroup.h>
#include <linux/cpuset.h>
#include <linux/audit.h>
@@ -231,10 +232,14 @@ static int check_mem_permission(struct task_struct *task)
* If current is actively ptrace'ing, and would also be
* permitted to freshly attach with ptrace now, permit it.
*/
- if (task->parent == current && (task->ptrace & PT_PTRACED) &&
- task_is_stopped_or_traced(task) &&
- ptrace_may_access(task, PTRACE_MODE_ATTACH))
- return 0;
+ if (task_is_stopped_or_traced(task)) {
+ int match;
+ rcu_read_lock();
+ match = (tracehook_tracer_task(task) == current);
+ rcu_read_unlock();
+ if (match && ptrace_may_access(task, PTRACE_MODE_ATTACH))
+ return 0;
+ }

/*
* Noone else is allowed.
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 9a5b3be..6468ca0 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -73,6 +73,24 @@ static inline int tracehook_unsafe_exec(struct task_struct *task)
}

/**
+ * tracehook_tracer_task - return the task that is tracing the given task
+ * @tsk: task to consider
+ *
+ * Returns NULL if noone is tracing @task, or the &struct task_struct
+ * pointer to its tracer.
+ *
+ * Must called under rcu_read_lock(). The pointer returned might be kept
+ * live only by RCU. During exec, this may be called with task_lock()
+ * held on @task, still held from when tracehook_unsafe_exec() was called.
+ */
+static inline struct task_struct *tracehook_tracer_task(struct task_struct *tsk)
+{
+ if (task_ptrace(tsk) & PT_PTRACED)
+ return rcu_dereference(tsk->parent);
+ return NULL;
+}
+
+/**
* tracehook_report_exec - a successful exec was completed
* @fmt: &struct linux_binfmt that performed the exec
* @bprm: &struct linux_binprm containing exec details
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 63f131f..3481cde 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -25,7 +25,7 @@

#include <linux/init.h>
#include <linux/kernel.h>
-#include <linux/ptrace.h>
+#include <linux/tracehook.h>
#include <linux/errno.h>
#include <linux/sched.h>
#include <linux/security.h>
@@ -1971,22 +1971,6 @@ static int selinux_vm_enough_memory(struct mm_struct *mm, long pages)
return __vm_enough_memory(mm, pages, cap_sys_admin);
}

-/**
- * task_tracer_task - return the task that is tracing the given task
- * @task: task to consider
- *
- * Returns NULL if noone is tracing @task, or the &struct task_struct
- * pointer to its tracer.
- *
- * Must be called under rcu_read_lock().
- */
-static struct task_struct *task_tracer_task(struct task_struct *task)
-{
- if (task->ptrace & PT_PTRACED)
- return rcu_dereference(task->parent);
- return NULL;
-}
-
/* binprm security operations */

static int selinux_bprm_alloc_security(struct linux_binprm *bprm)
@@ -2238,7 +2222,7 @@ static void selinux_bprm_apply_creds(struct linux_binprm *bprm, int unsafe)
u32 ptsid = 0;

rcu_read_lock();
- tracer = task_tracer_task(current);
+ tracer = tracehook_tracer_task(current);
if (likely(tracer != NULL)) {
sec = tracer->security;
ptsid = sec->sid;
@@ -5247,7 +5231,7 @@ static int selinux_setprocattr(struct task_struct *p,
Otherwise, leave SID unchanged and fail. */
task_lock(p);
rcu_read_lock();
- tracer = task_tracer_task(p);
+ tracer = tracehook_tracer_task(p);
if (tracer != NULL) {
struct task_security_struct *ptsec = tracer->security;
u32 ptsid = ptsec->sid;

2008-07-17 07:31:37

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 10/23] tracehook: tracehook_signal_handler

This defines tracehook_signal_handler() as a hook for the arch signal
handling code to call. It gives ptrace the opportunity to stop for a
pseudo-single-step trap immediately after signal handler setup is done.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 23 +++++++++++++++++++++++
1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index e113e09..2d1426f 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -289,4 +289,27 @@ static inline void tracehook_finish_release_task(struct task_struct *task)
ptrace_release_task(task);
}

+/**
+ * tracehook_signal_handler - signal handler setup is complete
+ * @sig: number of signal being delivered
+ * @info: siginfo_t of signal being delivered
+ * @ka: sigaction setting that chose the handler
+ * @regs: user register state
+ * @stepping: nonzero if debugger single-step or block-step in use
+ *
+ * Called by the arch code after a signal handler has been set up.
+ * Register and stack state reflects the user handler about to run.
+ * Signal mask changes have already been made.
+ *
+ * Called without locks, shortly before returning to user mode
+ * (or handling more signals).
+ */
+static inline void tracehook_signal_handler(int sig, siginfo_t *info,
+ const struct k_sigaction *ka,
+ struct pt_regs *regs, int stepping)
+{
+ if (stepping)
+ ptrace_notify(SIGTRAP);
+}
+
#endif /* <linux/tracehook.h> */

2008-07-17 07:31:57

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 11/23] tracehook: tracehook_consider_ignored_signal

This defines tracehook_consider_ignored_signal() has a fine-grained
hook for deciding to prevent the normal short-circuit of sending an
ignored signal, as ptrace does. There is no change, only cleanup.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 19 +++++++++++++++++++
kernel/signal.c | 27 ++++++++++++++++-----------
2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 2d1426f..8cffd34 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -312,4 +312,23 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info,
ptrace_notify(SIGTRAP);
}

+/**
+ * tracehook_consider_ignored_signal - suppress short-circuit of ignored signal
+ * @task: task receiving the signal
+ * @sig: signal number being sent
+ * @handler: %SIG_IGN or %SIG_DFL
+ *
+ * Return zero iff tracing doesn't care to examine this ignored signal,
+ * so it can short-circuit normal delivery and never even get queued.
+ * Either @handler is %SIG_DFL and @sig's default is ignore, or it's %SIG_IGN.
+ *
+ * Called with @task->sighand->siglock held.
+ */
+static inline int tracehook_consider_ignored_signal(struct task_struct *task,
+ int sig,
+ void __user *handler)
+{
+ return (task_ptrace(task) & PT_PTRACED) != 0;
+}
+
#endif /* <linux/tracehook.h> */
diff --git a/kernel/signal.c b/kernel/signal.c
index 02d3a11..f68fb01 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -22,6 +22,7 @@
#include <linux/ptrace.h>
#include <linux/signal.h>
#include <linux/signalfd.h>
+#include <linux/tracehook.h>
#include <linux/capability.h>
#include <linux/freezer.h>
#include <linux/pid_namespace.h>
@@ -39,24 +40,21 @@

static struct kmem_cache *sigqueue_cachep;

-static int __sig_ignored(struct task_struct *t, int sig)
+static void __user *sig_handler(struct task_struct *t, int sig)
{
- void __user *handler;
+ return t->sighand->action[sig - 1].sa.sa_handler;
+}

+static int sig_handler_ignored(void __user *handler, int sig)
+{
/* Is it explicitly or implicitly ignored? */
-
- handler = t->sighand->action[sig - 1].sa.sa_handler;
return handler == SIG_IGN ||
(handler == SIG_DFL && sig_kernel_ignore(sig));
}

static int sig_ignored(struct task_struct *t, int sig)
{
- /*
- * Tracers always want to know about signals..
- */
- if (t->ptrace & PT_PTRACED)
- return 0;
+ void __user *handler;

/*
* Blocked signals are never ignored, since the
@@ -66,7 +64,14 @@ static int sig_ignored(struct task_struct *t, int sig)
if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
return 0;

- return __sig_ignored(t, sig);
+ handler = sig_handler(t, sig);
+ if (!sig_handler_ignored(handler, sig))
+ return 0;
+
+ /*
+ * Tracers may want to know about even ignored signals.
+ */
+ return !tracehook_consider_ignored_signal(t, sig, handler);
}

/*
@@ -2324,7 +2329,7 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
* (for example, SIGCHLD), shall cause the pending signal to
* be discarded, whether or not it is blocked"
*/
- if (__sig_ignored(t, sig)) {
+ if (sig_handler_ignored(sig_handler(t, sig), sig)) {
sigemptyset(&mask);
sigaddset(&mask, sig);
rm_from_queue_full(&mask, &t->signal->shared_pending);

2008-07-17 07:32:28

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 12/23] tracehook: tracehook_consider_fatal_signal

This defines tracehook_consider_fatal_signal() has a fine-grained
hook for deciding to skip the special cases for a fatal signal,
as ptrace does. There is no change, only cleanup.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 21 +++++++++++++++++++++
kernel/signal.c | 9 +++++----
2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 8cffd34..8b4c15e 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -331,4 +331,25 @@ static inline int tracehook_consider_ignored_signal(struct task_struct *task,
return (task_ptrace(task) & PT_PTRACED) != 0;
}

+/**
+ * tracehook_consider_fatal_signal - suppress special handling of fatal signal
+ * @task: task receiving the signal
+ * @sig: signal number being sent
+ * @handler: %SIG_DFL or %SIG_IGN
+ *
+ * Return nonzero to prevent special handling of this termination signal.
+ * Normally @handler is %SIG_DFL. It can be %SIG_IGN if @sig is ignored,
+ * in which case force_sig() is about to reset it to %SIG_DFL.
+ * When this returns zero, this signal might cause a quick termination
+ * that does not give the debugger a chance to intercept the signal.
+ *
+ * Called with or without @task->sighand->siglock held.
+ */
+static inline int tracehook_consider_fatal_signal(struct task_struct *task,
+ int sig,
+ void __user *handler)
+{
+ return (task_ptrace(task) & PT_PTRACED) != 0;
+}
+
#endif /* <linux/tracehook.h> */
diff --git a/kernel/signal.c b/kernel/signal.c
index f68fb01..860afb8 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -300,12 +300,12 @@ flush_signal_handlers(struct task_struct *t, int force_default)

int unhandled_signal(struct task_struct *tsk, int sig)
{
+ void __user *handler = tsk->sighand->action[sig-1].sa.sa_handler;
if (is_global_init(tsk))
return 1;
- if (tsk->ptrace & PT_PTRACED)
+ if (handler != SIG_IGN && handler != SIG_DFL)
return 0;
- return (tsk->sighand->action[sig-1].sa.sa_handler == SIG_IGN) ||
- (tsk->sighand->action[sig-1].sa.sa_handler == SIG_DFL);
+ return !tracehook_consider_fatal_signal(tsk, sig, handler);
}


@@ -770,7 +770,8 @@ static void complete_signal(int sig, struct task_struct *p, int group)
if (sig_fatal(p, sig) &&
!(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
!sigismember(&t->real_blocked, sig) &&
- (sig == SIGKILL || !(t->ptrace & PT_PTRACED))) {
+ (sig == SIGKILL ||
+ !tracehook_consider_fatal_signal(t, sig, SIG_DFL))) {
/*
* This signal will be fatal to the whole group.
*/

2008-07-17 07:32:46

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 09/23] tracehook: tracehook_expect_breakpoints

This adds tracehook_expect_breakpoints() as a formal hook for
the nommu code to use for its, "Is text-poking likely?" check
at mmap time. This names the actual semantics the code means
to test, and documents it.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 15 +++++++++++++++
mm/nommu.c | 4 ++--
2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 6468ca0..e113e09 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -52,6 +52,21 @@
struct linux_binprm;

/**
+ * tracehook_expect_breakpoints - guess if task memory might be touched
+ * @task: current task, making a new mapping
+ *
+ * Return nonzero if @task is expected to want breakpoint insertion in
+ * its memory at some point. A zero return is no guarantee it won't
+ * be done, but this is a hint that it's known to be likely.
+ *
+ * May be called with @task->mm->mmap_sem held for writing.
+ */
+static inline int tracehook_expect_breakpoints(struct task_struct *task)
+{
+ return (task_ptrace(task) & PT_PTRACED) != 0;
+}
+
+/**
* tracehook_unsafe_exec - check for exec declared unsafe due to tracing
* @task: current task doing exec
*
diff --git a/mm/nommu.c b/mm/nommu.c
index 4462b6a..5edccd9 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -22,7 +22,7 @@
#include <linux/pagemap.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
-#include <linux/ptrace.h>
+#include <linux/tracehook.h>
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
#include <linux/mount.h>
@@ -745,7 +745,7 @@ static unsigned long determine_vm_flags(struct file *file,
* it's being traced - otherwise breakpoints set in it may interfere
* with another untraced process
*/
- if ((flags & MAP_PRIVATE) && (current->ptrace & PT_PTRACED))
+ if ((flags & MAP_PRIVATE) && tracehook_expect_breakpoints(current))
vm_flags &= ~VM_MAYSHARE;

return vm_flags;

2008-07-17 07:32:59

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 13/23] tracehook: syscall

This adds standard tracehook.h inlines for arch code to call when
TIF_SYSCALL_TRACE has been set. This replaces having each arch
implement the ptrace guts for its syscall tracing support.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 70 +++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 8b4c15e..3548694 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -66,6 +66,76 @@ static inline int tracehook_expect_breakpoints(struct task_struct *task)
return (task_ptrace(task) & PT_PTRACED) != 0;
}

+/*
+ * ptrace report for syscall entry and exit looks identical.
+ */
+static inline void ptrace_report_syscall(struct pt_regs *regs)
+{
+ int ptrace = task_ptrace(current);
+
+ if (!(ptrace & PT_PTRACED))
+ return;
+
+ ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
+
+ /*
+ * this isn't the same as continuing with a signal, but it will do
+ * for normal use. strace only continues with a signal if the
+ * stopping signal is not SIGTRAP. -brl
+ */
+ if (current->exit_code) {
+ send_sig(current->exit_code, current, 1);
+ current->exit_code = 0;
+ }
+}
+
+/**
+ * tracehook_report_syscall_entry - task is about to attempt a system call
+ * @regs: user register state of current task
+ *
+ * This will be called if %TIF_SYSCALL_TRACE has been set, when the
+ * current task has just entered the kernel for a system call.
+ * Full user register state is available here. Changing the values
+ * in @regs can affect the system call number and arguments to be tried.
+ * It is safe to block here, preventing the system call from beginning.
+ *
+ * Returns zero normally, or nonzero if the calling arch code should abort
+ * the system call. That must prevent normal entry so no system call is
+ * made. If @task ever returns to user mode after this, its register state
+ * is unspecified, but should be something harmless like an %ENOSYS error
+ * return.
+ *
+ * Called without locks, just after entering kernel mode.
+ */
+static inline __must_check int tracehook_report_syscall_entry(
+ struct pt_regs *regs)
+{
+ ptrace_report_syscall(regs);
+ return 0;
+}
+
+/**
+ * tracehook_report_syscall_exit - task has just finished a system call
+ * @regs: user register state of current task
+ * @step: nonzero if simulating single-step or block-step
+ *
+ * This will be called if %TIF_SYSCALL_TRACE has been set, when the
+ * current task has just finished an attempted system call. Full
+ * user register state is available here. It is safe to block here,
+ * preventing signals from being processed.
+ *
+ * If @step is nonzero, this report is also in lieu of the normal
+ * trap that would follow the system call instruction because
+ * user_enable_block_step() or user_enable_single_step() was used.
+ * In this case, %TIF_SYSCALL_TRACE might not be set.
+ *
+ * Called without locks, just before checking for pending signals.
+ */
+static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step)
+{
+ ptrace_report_syscall(regs);
+}
+
/**
* tracehook_unsafe_exec - check for exec declared unsafe due to tracing
* @task: current task doing exec

2008-07-17 07:33:28

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 15/23] tracehook: job control

This defines the tracehook_notify_jctl() hook to formalize
the ptrace effects on the job control notifications.
There is no change, only cleanup.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 20 ++++++++++++++++++++
kernel/signal.c | 10 +++++-----
2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 42a0d7b..6dc428d 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -451,4 +451,24 @@ static inline int tracehook_get_signal(struct task_struct *task,
return 0;
}

+/**
+ * tracehook_notify_jctl - report about job control stop/continue
+ * @notify: nonzero if this is the last thread in the group to stop
+ * @why: %CLD_STOPPED or %CLD_CONTINUED
+ *
+ * This is called when we might call do_notify_parent_cldstop().
+ * It's called when about to stop for job control; we are already in
+ * %TASK_STOPPED state, about to call schedule(). It's also called when
+ * a delayed %CLD_STOPPED or %CLD_CONTINUED report is ready to be made.
+ *
+ * Return nonzero to generate a %SIGCHLD with @why, which is
+ * normal if @notify is nonzero.
+ *
+ * Called with no locks held.
+ */
+static inline int tracehook_notify_jctl(int notify, int why)
+{
+ return notify || (current->ptrace & PT_PTRACED);
+}
+
#endif /* <linux/tracehook.h> */
diff --git a/kernel/signal.c b/kernel/signal.c
index 1766c47..9b8d94a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -605,9 +605,6 @@ static int check_kill_permission(int sig, struct siginfo *info,
return security_task_kill(t, info, sig, 0);
}

-/* forward decl */
-static void do_notify_parent_cldstop(struct task_struct *tsk, int why);
-
/*
* Handle magic process-wide effects of stop/continue signals. Unlike
* the signal actions, these happen immediately at signal-generation
@@ -1629,7 +1626,7 @@ finish_stop(int stop_count)
* a group stop in progress and we are the last to stop,
* report to the parent. When ptraced, every thread reports itself.
*/
- if (stop_count == 0 || (current->ptrace & PT_PTRACED)) {
+ if (tracehook_notify_jctl(stop_count == 0, CLD_STOPPED)) {
read_lock(&tasklist_lock);
do_notify_parent_cldstop(current, CLD_STOPPED);
read_unlock(&tasklist_lock);
@@ -1766,6 +1763,9 @@ relock:
signal->flags &= ~SIGNAL_CLD_MASK;
spin_unlock_irq(&sighand->siglock);

+ if (unlikely(!tracehook_notify_jctl(1, why)))
+ goto relock;
+
read_lock(&tasklist_lock);
do_notify_parent_cldstop(current->group_leader, why);
read_unlock(&tasklist_lock);
@@ -1931,7 +1931,7 @@ void exit_signals(struct task_struct *tsk)
out:
spin_unlock_irq(&tsk->sighand->siglock);

- if (unlikely(group_stop)) {
+ if (unlikely(group_stop) && tracehook_notify_jctl(1, CLD_STOPPED)) {
read_lock(&tasklist_lock);
do_notify_parent_cldstop(tsk, CLD_STOPPED);
read_unlock(&tasklist_lock);

2008-07-17 07:33:47

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 14/23] tracehook: get_signal_to_deliver

This defines the tracehook_get_signal() hook to allow tracing code to
slip in before normal signal dequeuing. This lays the groundwork for
new tracing features that can inject synthetic signals outside the
normal queue or control the disposition of delivered signals. The
calling convention lets tracehook_get_signal() decide both exactly
what will happen and what signal number to report in the handler/exit.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 29 +++++++++++++++++++++++++++++
kernel/signal.c | 38 +++++++++++++++++++++++++++-----------
2 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 3548694..42a0d7b 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -422,4 +422,33 @@ static inline int tracehook_consider_fatal_signal(struct task_struct *task,
return (task_ptrace(task) & PT_PTRACED) != 0;
}

+/**
+ * tracehook_get_signal - deliver synthetic signal to traced task
+ * @task: @current
+ * @regs: task_pt_regs(@current)
+ * @info: details of synthetic signal
+ * @return_ka: sigaction for synthetic signal
+ *
+ * Return zero to check for a real pending signal normally.
+ * Return -1 after releasing the siglock to repeat the check.
+ * Return a signal number to induce an artifical signal delivery,
+ * setting *@info and *@return_ka to specify its details and behavior.
+ *
+ * The @return_ka->sa_handler value controls the disposition of the
+ * signal, no matter the signal number. For %SIG_DFL, the return value
+ * is a representative signal to indicate the behavior (e.g. %SIGTERM
+ * for death, %SIGQUIT for core dump, %SIGSTOP for job control stop,
+ * %SIGTSTP for stop unless in an orphaned pgrp), but the signal number
+ * reported will be @info->si_signo instead.
+ *
+ * Called with @task->sighand->siglock held, before dequeuing pending signals.
+ */
+static inline int tracehook_get_signal(struct task_struct *task,
+ struct pt_regs *regs,
+ siginfo_t *info,
+ struct k_sigaction *return_ka)
+{
+ return 0;
+}
+
#endif /* <linux/tracehook.h> */
diff --git a/kernel/signal.c b/kernel/signal.c
index 860afb8..1766c47 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1779,17 +1779,33 @@ relock:
do_signal_stop(0))
goto relock;

- signr = dequeue_signal(current, &current->blocked, info);
- if (!signr)
- break; /* will return 0 */
+ /*
+ * Tracing can induce an artifical signal and choose sigaction.
+ * The return value in @signr determines the default action,
+ * but @info->si_signo is the signal number we will report.
+ */
+ signr = tracehook_get_signal(current, regs, info, return_ka);
+ if (unlikely(signr < 0))
+ goto relock;
+ if (unlikely(signr != 0))
+ ka = return_ka;
+ else {
+ signr = dequeue_signal(current, &current->blocked,
+ info);

- if (signr != SIGKILL) {
- signr = ptrace_signal(signr, info, regs, cookie);
if (!signr)
- continue;
+ break; /* will return 0 */
+
+ if (signr != SIGKILL) {
+ signr = ptrace_signal(signr, info,
+ regs, cookie);
+ if (!signr)
+ continue;
+ }
+
+ ka = &sighand->action[signr-1];
}

- ka = &sighand->action[signr-1];
if (ka->sa.sa_handler == SIG_IGN) /* Do nothing. */
continue;
if (ka->sa.sa_handler != SIG_DFL) {
@@ -1837,7 +1853,7 @@ relock:
spin_lock_irq(&sighand->siglock);
}

- if (likely(do_signal_stop(signr))) {
+ if (likely(do_signal_stop(info->si_signo))) {
/* It released the siglock. */
goto relock;
}
@@ -1858,7 +1874,7 @@ relock:

if (sig_kernel_coredump(signr)) {
if (print_fatal_signals)
- print_fatal_signal(regs, signr);
+ print_fatal_signal(regs, info->si_signo);
/*
* If it was able to dump core, this kills all
* other threads in the group and synchronizes with
@@ -1867,13 +1883,13 @@ relock:
* first and our do_group_exit call below will use
* that value and ignore the one we pass it.
*/
- do_coredump((long)signr, signr, regs);
+ do_coredump(info->si_signo, info->si_signo, regs);
}

/*
* Death signals, no core dump.
*/
- do_group_exit(signr);
+ do_group_exit(info->si_signo);
/* NOTREACHED */
}
spin_unlock_irq(&sighand->siglock);

2008-07-17 07:34:08

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 16/23] tracehook: death

This moves the ptrace logic in task death (exit_notify) into
tracehook.h inlines. Some code is rearranged slightly to make
things nicer. There is no change, only cleanup.

There is one hook called with the tasklist_lock write-locked, as ptrace
needs. There is also a new hook called after exit_state changes and
without locks. This is a better place for tracing work to be in the
future, since it doesn't delay the whole system with locking.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/sched.h | 2 +-
include/linux/tracehook.h | 52 +++++++++++++++++++++++++++++++++++++++++++++
kernel/exit.c | 26 +++++++--------------
kernel/signal.c | 10 ++++++--
4 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1941d8b..c58e8ab 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1785,7 +1785,7 @@ extern int kill_pid_info_as_uid(int, struct siginfo *, struct pid *, uid_t, uid_
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
extern int kill_proc_info(int, struct siginfo *, pid_t);
-extern void do_notify_parent(struct task_struct *, int);
+extern int do_notify_parent(struct task_struct *, int);
extern void force_sig(int, struct task_struct *);
extern void force_sig_specific(int, struct task_struct *);
extern int send_sig(int, struct task_struct *, int);
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 6dc428d..4c50e1b 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -471,4 +471,56 @@ static inline int tracehook_notify_jctl(int notify, int why)
return notify || (current->ptrace & PT_PTRACED);
}

+/**
+ * tracehook_notify_death - task is dead, ready to notify parent
+ * @task: @current task now exiting
+ * @death_cookie: value to pass to tracehook_report_death()
+ * @group_dead: nonzero if this was the last thread in the group to die
+ *
+ * Return the signal number to send our parent with do_notify_parent(), or
+ * zero to send no signal and leave a zombie, or -1 to self-reap right now.
+ *
+ * Called with write_lock_irq(&tasklist_lock) held.
+ */
+static inline int tracehook_notify_death(struct task_struct *task,
+ void **death_cookie, int group_dead)
+{
+ if (task->exit_signal == -1)
+ return task->ptrace ? SIGCHLD : -1;
+
+ /*
+ * If something other than our normal parent is ptracing us, then
+ * send it a SIGCHLD instead of honoring exit_signal. exit_signal
+ * only has special meaning to our real parent.
+ */
+ if (thread_group_empty(task) && !ptrace_reparented(task))
+ return task->exit_signal;
+
+ return task->ptrace ? SIGCHLD : 0;
+}
+
+/**
+ * tracehook_report_death - task is dead and ready to be reaped
+ * @task: @current task now exiting
+ * @signal: signal number sent to parent, or 0 or -1
+ * @death_cookie: value passed back from tracehook_notify_death()
+ * @group_dead: nonzero if this was the last thread in the group to die
+ *
+ * Thread has just become a zombie or is about to self-reap. If positive,
+ * @signal is the signal number just sent to the parent (usually %SIGCHLD).
+ * If @signal is -1, this thread will self-reap. If @signal is 0, this is
+ * a delayed_group_leader() zombie. The @death_cookie was passed back by
+ * tracehook_notify_death().
+ *
+ * If normal reaping is not inhibited, @task->exit_state might be changing
+ * in parallel.
+ *
+ * Called without locks.
+ */
+static inline void tracehook_report_death(struct task_struct *task,
+ int signal, void *death_cookie,
+ int group_dead)
+{
+}
+
#endif /* <linux/tracehook.h> */
diff --git a/kernel/exit.c b/kernel/exit.c
index a416a84..78c6307 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -861,7 +861,8 @@ static void forget_original_parent(struct task_struct *father)
*/
static void exit_notify(struct task_struct *tsk, int group_dead)
{
- int state;
+ int signal;
+ void *cookie;

/*
* This does two things:
@@ -898,22 +899,11 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
!capable(CAP_KILL))
tsk->exit_signal = SIGCHLD;

- /* If something other than our normal parent is ptracing us, then
- * send it a SIGCHLD instead of honoring exit_signal. exit_signal
- * only has special meaning to our real parent.
- */
- if (!task_detached(tsk) && thread_group_empty(tsk)) {
- int signal = ptrace_reparented(tsk) ?
- SIGCHLD : tsk->exit_signal;
- do_notify_parent(tsk, signal);
- } else if (tsk->ptrace) {
- do_notify_parent(tsk, SIGCHLD);
- }
+ signal = tracehook_notify_death(tsk, &cookie, group_dead);
+ if (signal > 0)
+ signal = do_notify_parent(tsk, signal);

- state = EXIT_ZOMBIE;
- if (task_detached(tsk) && likely(!tsk->ptrace))
- state = EXIT_DEAD;
- tsk->exit_state = state;
+ tsk->exit_state = signal < 0 ? EXIT_DEAD : EXIT_ZOMBIE;

/* mt-exec, de_thread() is waiting for us */
if (thread_group_leader(tsk) &&
@@ -923,8 +913,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead)

write_unlock_irq(&tasklist_lock);

+ tracehook_report_death(tsk, signal, cookie, group_dead);
+
/* If the process is dead, release it - nobody will wait for it */
- if (state == EXIT_DEAD)
+ if (signal < 0)
release_task(tsk);
}

diff --git a/kernel/signal.c b/kernel/signal.c
index 9b8d94a..cbec18e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1346,9 +1346,11 @@ static inline void __wake_up_parent(struct task_struct *p,
/*
* Let a parent know about the death of a child.
* For a stopped/continued status change, use do_notify_parent_cldstop instead.
+ *
+ * Returns -1 if our parent ignored us and so we've switched to
+ * self-reaping, or else @sig.
*/
-
-void do_notify_parent(struct task_struct *tsk, int sig)
+int do_notify_parent(struct task_struct *tsk, int sig)
{
struct siginfo info;
unsigned long flags;
@@ -1420,12 +1422,14 @@ void do_notify_parent(struct task_struct *tsk, int sig)
*/
tsk->exit_signal = -1;
if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
- sig = 0;
+ sig = -1;
}
if (valid_signal(sig) && sig > 0)
__group_send_sig_info(sig, &info, tsk->parent);
__wake_up_parent(tsk, tsk->parent);
spin_unlock_irqrestore(&psig->siglock, flags);
+
+ return sig;
}

static void do_notify_parent_cldstop(struct task_struct *tsk, int why)

2008-07-17 07:34:34

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 17/23] tracehook: force signal_pending()

This defines a new hook tracehook_force_sigpending() that lets
tracing code decide to force TIF_SIGPENDING on in recalc_sigpending().

This is not used yet, so it compiles away to nothing for now.
It lays the groundwork for new tracing code that can interrupt
a task synthetically without actually sending a signal.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 14 ++++++++++++++
kernel/signal.c | 4 +++-
2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 4c50e1b..43bc51b 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -423,6 +423,20 @@ static inline int tracehook_consider_fatal_signal(struct task_struct *task,
}

/**
+ * tracehook_force_sigpending - let tracing force signal_pending(current) on
+ *
+ * Called when recomputing our signal_pending() flag. Return nonzero
+ * to force the signal_pending() flag on, so that tracehook_get_signal()
+ * will be called before the next return to user mode.
+ *
+ * Called with @current->sighand->siglock held.
+ */
+static inline int tracehook_force_sigpending(void)
+{
+ return 0;
+}
+
+/**
* tracehook_get_signal - deliver synthetic signal to traced task
* @task: @current
* @regs: task_pt_regs(@current)
diff --git a/kernel/signal.c b/kernel/signal.c
index cbec18e..44b5823 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -134,7 +134,9 @@ void recalc_sigpending_and_wake(struct task_struct *t)

void recalc_sigpending(void)
{
- if (!recalc_sigpending_tsk(current) && !freezing(current))
+ if (unlikely(tracehook_force_sigpending()))
+ set_thread_flag(TIF_SIGPENDING);
+ else if (!recalc_sigpending_tsk(current) && !freezing(current))
clear_thread_flag(TIF_SIGPENDING);

}

2008-07-17 07:34:49

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 19/23] tracehook: asm/syscall.h

This adds asm-generic/syscall.h, which documents what a real
asm-ARCH/syscall.h file should define. This is not used yet,
but will provide all the machine-dependent details of examining
a user system call about to begin, in progress, or just ended.

Each arch should add an asm-ARCH/syscall.h that defines all the
entry points documented in asm-generic/syscall.h, as short inlines
if possible. This lets us write new tracing code that understands
user system call registers, without any new arch-specific work.

Signed-off-by: Roland McGrath <[email protected]>
---
include/asm-generic/syscall.h | 141 +++++++++++++++++++++++++++++++++++++++++
include/linux/tracehook.h | 3 +-
2 files changed, 143 insertions(+), 1 deletions(-)

diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
new file mode 100644
index 0000000..abcf34c
--- /dev/null
+++ b/include/asm-generic/syscall.h
@@ -0,0 +1,141 @@
+/*
+ * Access to user system call parameters and results
+ *
+ * Copyright (C) 2008 Red Hat, Inc. All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * This file is a stub providing documentation for what functions
+ * asm-ARCH/syscall.h files need to define. Most arch definitions
+ * will be simple inlines.
+ *
+ * All of these functions expect to be called with no locks,
+ * and only when the caller is sure that the task of interest
+ * cannot return to user mode while we are looking at it.
+ */
+
+#ifndef _ASM_SYSCALL_H
+#define _ASM_SYSCALL_H 1
+
+struct task_struct;
+struct pt_regs;
+
+/**
+ * syscall_get_nr - find what system call a task is executing
+ * @task: task of interest, must be blocked
+ * @regs: task_pt_regs() of @task
+ *
+ * If @task is executing a system call or is at system call
+ * tracing about to attempt one, returns the system call number.
+ * If @task is not executing a system call, i.e. it's blocked
+ * inside the kernel for a fault or signal, returns -1.
+ *
+ * It's only valid to call this when @task is known to be blocked.
+ */
+long syscall_get_nr(struct task_struct *task, struct pt_regs *regs);
+
+/**
+ * syscall_rollback - roll back registers after an aborted system call
+ * @task: task of interest, must be in system call exit tracing
+ * @regs: task_pt_regs() of @task
+ *
+ * It's only valid to call this when @task is stopped for system
+ * call exit tracing (due to TIF_SYSCALL_TRACE or TIF_SYSCALL_AUDIT),
+ * after tracehook_report_syscall_entry() returned nonzero to prevent
+ * the system call from taking place.
+ *
+ * This rolls back the register state in @regs so it's as if the
+ * system call instruction was a no-op. The registers containing
+ * the system call number and arguments are as they were before the
+ * system call instruction. This may not be the same as what the
+ * register state looked like at system call entry tracing.
+ */
+void syscall_rollback(struct task_struct *task, struct pt_regs *regs);
+
+/**
+ * syscall_get_error - check result of traced system call
+ * @task: task of interest, must be blocked
+ * @regs: task_pt_regs() of @task
+ *
+ * Returns 0 if the system call succeeded, or -ERRORCODE if it failed.
+ *
+ * It's only valid to call this when @task is stopped for tracing on exit
+ * from a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ */
+long syscall_get_error(struct task_struct *task, struct pt_regs *regs);
+
+/**
+ * syscall_get_return_value - get the return value of a traced system call
+ * @task: task of interest, must be blocked
+ * @regs: task_pt_regs() of @task
+ *
+ * Returns the return value of the successful system call.
+ * This value is meaningless if syscall_get_error() returned nonzero.
+ *
+ * It's only valid to call this when @task is stopped for tracing on exit
+ * from a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ */
+long syscall_get_return_value(struct task_struct *task, struct pt_regs *regs);
+
+/**
+ * syscall_set_return_value - change the return value of a traced system call
+ * @task: task of interest, must be blocked
+ * @regs: task_pt_regs() of @task
+ * @error: negative error code, or zero to indicate success
+ * @val: user return value if @error is zero
+ *
+ * This changes the results of the system call that user mode will see.
+ * If @error is zero, the user sees a successful system call with a
+ * return value of @val. If @error is nonzero, it's a negated errno
+ * code; the user sees a failed system call with this errno code.
+ *
+ * It's only valid to call this when @task is stopped for tracing on exit
+ * from a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ */
+void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs,
+ int error, long val);
+
+/**
+ * syscall_get_arguments - extract system call parameter values
+ * @task: task of interest, must be blocked
+ * @regs: task_pt_regs() of @task
+ * @i: argument index [0,5]
+ * @n: number of arguments; n+i must be [1,6].
+ * @args: array filled with argument values
+ *
+ * Fetches @n arguments to the system call starting with the @i'th argument
+ * (from 0 through 5). Argument @i is stored in @args[0], and so on.
+ * An arch inline version is probably optimal when @i and @n are constants.
+ *
+ * It's only valid to call this when @task is stopped for tracing on
+ * entry to a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * It's invalid to call this with @i + @n > 6; we only support system calls
+ * taking up to 6 arguments.
+ */
+void syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
+ unsigned int i, unsigned int n, unsigned long *args);
+
+/**
+ * syscall_set_arguments - change system call parameter value
+ * @task: task of interest, must be in system call entry tracing
+ * @regs: task_pt_regs() of @task
+ * @i: argument index [0,5]
+ * @n: number of arguments; n+i must be [1,6].
+ * @args: array of argument values to store
+ *
+ * Changes @n arguments to the system call starting with the @i'th argument.
+ * @n'th argument to @val. Argument @i gets value @args[0], and so on.
+ * An arch inline version is probably optimal when @i and @n are constants.
+ *
+ * It's only valid to call this when @task is stopped for tracing on
+ * entry to a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * It's invalid to call this with @i + @n > 6; we only support system calls
+ * taking up to 6 arguments.
+ */
+void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+ unsigned int i, unsigned int n,
+ const unsigned long *args);
+
+#endif /* _ASM_SYSCALL_H */
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 32867ab..589f429 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -103,7 +103,8 @@ static inline void ptrace_report_syscall(struct pt_regs *regs)
* the system call. That must prevent normal entry so no system call is
* made. If @task ever returns to user mode after this, its register state
* is unspecified, but should be something harmless like an %ENOSYS error
- * return.
+ * return. It should preserve enough information so that syscall_rollback()
+ * can work (see asm-generic/syscall.h).
*
* Called without locks, just after entering kernel mode.
*/

2008-07-17 07:35:17

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 20/23] tracehook: CONFIG_HAVE_ARCH_TRACEHOOK

This adds the generic HAVE_ARCH_TRACEHOOK kconfig item.
Each arch should add to some Kconfig file:
select HAVE_ARCH_TRACEHOOK
if the arch code uses all the latest hooks to enable newfangled
tracing and debugging code. The comment in arch/Kconfig lists
all the prerequisite arch support. When all these are available,
setting HAVE_ARCH_TRACEHOOK will allow enabling any new features
that depend on the modern arch interfaces.

Signed-off-by: Roland McGrath <[email protected]>
---
arch/Kconfig | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index ad89a33..4641edf 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -37,6 +37,24 @@ config HAVE_KPROBES
config HAVE_KRETPROBES
def_bool n

+#
+# An arch should select this if it provides all these things:
+#
+# task_pt_regs() in asm/processor.h or asm/ptrace.h
+# arch_has_single_step() if there is hardware single-step support
+# arch_has_block_step() if there is hardware block-step support
+# arch_ptrace() and not #define __ARCH_SYS_PTRACE
+# compat_arch_ptrace() and #define __ARCH_WANT_COMPAT_SYS_PTRACE
+# asm/syscall.h supplying asm-generic/syscall.h interface
+# linux/regset.h user_regset interfaces
+# CORE_DUMP_USE_REGSET #define'd in linux/elf.h
+# TIF_SYSCALL_TRACE calls tracehook_report_syscall_{entry,exit}
+# TIF_NOTIFY_RESUME calls tracehook_notify_resume()
+# signal delivery calls tracehook_signal_handler()
+#
+config HAVE_ARCH_TRACEHOOK
+ def_bool n
+
config HAVE_DMA_ATTRS
def_bool n

2008-07-17 07:36:07

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 18/23] tracehook: TIF_NOTIFY_RESUME

This adds tracehook.h inlines to enable a new arch feature in support
of user debugging/tracing. This is not used yet, but it lays the
groundwork for a debugger to be able to wrangle a task that's possibly
running, without interrupting its syscalls in progress.

Each arch should define TIF_NOTIFY_RESUME, and in their entry.S code
treat it much like TIF_SIGPENDING. That is, it causes you to take the
slow path when returning to user mode, where you get the full user-mode
state accessible as for signal handling or ptrace. The arch code
should check TIF_NOTIFY_RESUME after handling TIF_SIGPENDING.
When it's set, clear it and then call tracehook_notify_resume().

In future, tracing code will call set_notify_resume() when it
wants to get a callback in tracehook_notify_resume().

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/tracehook.h | 34 ++++++++++++++++++++++++++++++++++
1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 43bc51b..32867ab 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -537,4 +537,38 @@ static inline void tracehook_report_death(struct task_struct *task,
{
}

+#ifdef TIF_NOTIFY_RESUME
+/**
+ * set_notify_resume - cause tracehook_notify_resume() to be called
+ * @task: task that will call tracehook_notify_resume()
+ *
+ * Calling this arranges that @task will call tracehook_notify_resume()
+ * before returning to user mode. If it's already running in user mode,
+ * it will enter the kernel and call tracehook_notify_resume() soon.
+ * If it's blocked, it will not be woken.
+ */
+static inline void set_notify_resume(struct task_struct *task)
+{
+ if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_RESUME))
+ kick_process(task);
+}
+
+/**
+ * tracehook_notify_resume - report when about to return to user mode
+ * @regs: user-mode registers of @current task
+ *
+ * This is called when %TIF_NOTIFY_RESUME has been set. Now we are
+ * about to return to user mode, and the user state in @regs can be
+ * inspected or adjusted. The caller in arch code has cleared
+ * %TIF_NOTIFY_RESUME before the call. If the flag gets set again
+ * asynchronously, this will be called again before we return to
+ * user mode.
+ *
+ * Called without locks.
+ */
+static inline void tracehook_notify_resume(struct pt_regs *regs)
+{
+}
+#endif /* TIF_NOTIFY_RESUME */
+
#endif /* <linux/tracehook.h> */

2008-07-17 07:36:35

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 22/23] task_current_syscall

This adds the new function task_current_syscall() on machines where the
asm/syscall.h interface is supported (CONFIG_HAVE_ARCH_TRACEHOOK).
It's exported for modules to use in the future. This function safely
samples the state of a blocked thread to collect what system call it
is blocked in, and the six system call argument registers.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/ptrace.h | 4 ++
lib/Makefile | 2 +
lib/syscall.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 81 insertions(+), 0 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index ed69c03..fd31756 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -314,6 +314,10 @@ static inline void user_enable_block_step(struct task_struct *task)
#define arch_ptrace_stop(code, info) do { } while (0)
#endif

+extern int task_current_syscall(struct task_struct *target, long *callno,
+ unsigned long args[6], unsigned int maxargs,
+ unsigned long *sp, unsigned long *pc);
+
#endif

#endif
diff --git a/lib/Makefile b/lib/Makefile
index 2c62a9c..0dd88c4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -82,6 +82,8 @@ lib-$(CONFIG_GENERIC_BUG) += bug.o

obj-$(CONFIG_HAVE_LMB) += lmb.o

+obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
+
hostprogs-y := gen_crc32table
clean-files := crc32table.h

diff --git a/lib/syscall.c b/lib/syscall.c
new file mode 100644
index 0000000..a4f7067
--- /dev/null
+++ b/lib/syscall.c
@@ -0,0 +1,75 @@
+#include <linux/ptrace.h>
+#include <linux/sched.h>
+#include <linux/module.h>
+#include <asm/syscall.h>
+
+static int collect_syscall(struct task_struct *target, long *callno,
+ unsigned long args[6], unsigned int maxargs,
+ unsigned long *sp, unsigned long *pc)
+{
+ struct pt_regs *regs = task_pt_regs(target);
+ if (unlikely(!regs))
+ return -EAGAIN;
+
+ *sp = user_stack_pointer(regs);
+ *pc = instruction_pointer(regs);
+
+ *callno = syscall_get_nr(target, regs);
+ if (*callno != -1L && maxargs > 0)
+ syscall_get_arguments(target, regs, 0, maxargs, args);
+
+ return 0;
+}
+
+/**
+ * task_current_syscall - Discover what a blocked task is doing.
+ * @target: thread to examine
+ * @callno: filled with system call number or -1
+ * @args: filled with @maxargs system call arguments
+ * @maxargs: number of elements in @args to fill
+ * @sp: filled with user stack pointer
+ * @pc: filled with user PC
+ *
+ * If @target is blocked in a system call, returns zero with *@callno
+ * set to the the call's number and @args filled in with its arguments.
+ * Registers not used for system call arguments may not be available and
+ * it is not kosher to use &struct user_regset calls while the system
+ * call is still in progress. Note we may get this result if @target
+ * has finished its system call but not yet returned to user mode, such
+ * as when it's stopped for signal handling or syscall exit tracing.
+ *
+ * If @target is blocked in the kernel during a fault or exception,
+ * returns zero with *@callno set to -1 and does not fill in @args.
+ * If so, it's now safe to examine @target using &struct user_regset
+ * get() calls as long as we're sure @target won't return to user mode.
+ *
+ * Returns -%EAGAIN if @target does not remain blocked.
+ *
+ * Returns -%EINVAL if @maxargs is too large (maximum is six).
+ */
+int task_current_syscall(struct task_struct *target, long *callno,
+ unsigned long args[6], unsigned int maxargs,
+ unsigned long *sp, unsigned long *pc)
+{
+ long state;
+ unsigned long ncsw;
+
+ if (unlikely(maxargs > 6))
+ return -EINVAL;
+
+ if (target == current)
+ return collect_syscall(target, callno, args, maxargs, sp, pc);
+
+ state = target->state;
+ if (unlikely(!state))
+ return -EAGAIN;
+
+ ncsw = wait_task_inactive(target, state);
+ if (unlikely(!ncsw) ||
+ unlikely(collect_syscall(target, callno, args, maxargs, sp, pc)) ||
+ unlikely(wait_task_inactive(target, state) != ncsw))
+ return -EAGAIN;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(task_current_syscall);

2008-07-17 07:37:42

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 23/23] /proc/PID/syscall

This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files.
These use task_current_syscall() to show the task's current system call
number and argument registers, stack pointer and PC. For a task blocked
but not in a syscall, the file shows "-1" in place of the syscall number,
followed by only the SP and PC. For a task that's not blocked, it shows
"running".

Signed-off-by: Roland McGrath <[email protected]>
---
fs/proc/base.c | 26 ++++++++++++++++++++++++++
1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6f21daa..6c950c7 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -509,6 +509,26 @@ static int proc_pid_limits(struct task_struct *task, char *buffer)
return count;
}

+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+static int proc_pid_syscall(struct task_struct *task, char *buffer)
+{
+ long nr;
+ unsigned long args[6], sp, pc;
+
+ if (task_current_syscall(task, &nr, args, 6, &sp, &pc))
+ return sprintf(buffer, "running\n");
+
+ if (nr < 0)
+ return sprintf(buffer, "%ld 0x%lx 0x%lx\n", nr, sp, pc);
+
+ return sprintf(buffer,
+ "%ld 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx\n",
+ nr,
+ args[0], args[1], args[2], args[3], args[4], args[5],
+ sp, pc);
+}
+#endif /* CONFIG_HAVE_ARCH_TRACEHOOK */
+
/************************************************************************/
/* Here the fs part begins */
/************************************************************************/
@@ -2425,6 +2445,9 @@ static const struct pid_entry tgid_base_stuff[] = {
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, pid_sched),
#endif
+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+ INF("syscall", S_IRUSR, pid_syscall),
+#endif
INF("cmdline", S_IRUGO, pid_cmdline),
ONE("stat", S_IRUGO, tgid_stat),
ONE("statm", S_IRUGO, pid_statm),
@@ -2757,6 +2780,9 @@ static const struct pid_entry tid_base_stuff[] = {
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, pid_sched),
#endif
+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+ INF("syscall", S_IRUSR, pid_syscall),
+#endif
INF("cmdline", S_IRUGO, pid_cmdline),
ONE("stat", S_IRUGO, tid_stat),
ONE("statm", S_IRUGO, pid_statm),

2008-07-17 07:37:10

by Roland McGrath

[permalink] [raw]
Subject: [PATCH 21/23] tracehook: wait_task_inactive

This extends wait_task_inactive() with a new argument so it
can be used in a "soft" mode where it will check for the task
changing state unexpectedly and back off. There is no change
to existing callers. This lays the groundwork to allow robust,
noninvasive tracing that can try to sample a blocked thread but
back off safely if it wakes up.

Signed-off-by: Roland McGrath <[email protected]>
---
arch/ia64/kernel/perfmon.c | 4 ++--
include/linux/sched.h | 8 ++++++--
kernel/kthread.c | 2 +-
kernel/ptrace.c | 2 +-
kernel/sched.c | 29 +++++++++++++++++++++++++++--
5 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 19d4493..fc8f350 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2626,7 +2626,7 @@ pfm_task_incompatible(pfm_context_t *ctx, struct task_struct *task)
/*
* make sure the task is off any CPU
*/
- wait_task_inactive(task);
+ wait_task_inactive(task, 0);

/* more to come... */

@@ -4774,7 +4774,7 @@ recheck:

UNPROTECT_CTX(ctx, flags);

- wait_task_inactive(task);
+ wait_task_inactive(task, 0);

PROTECT_CTX(ctx, flags);

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c58e8ab..48832a2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1872,9 +1872,13 @@ extern void set_task_comm(struct task_struct *tsk, char *from);
extern char *get_task_comm(char *to, struct task_struct *tsk);

#ifdef CONFIG_SMP
-extern void wait_task_inactive(struct task_struct * p);
+extern unsigned long wait_task_inactive(struct task_struct *, long match_state);
#else
-#define wait_task_inactive(p) do { } while (0)
+static inline unsigned long wait_task_inactive(struct task_struct *p,
+ long match_state)
+{
+ return 1;
+}
#endif

#define next_task(p) list_entry(rcu_dereference((p)->tasks.next), struct task_struct, tasks)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ac3fb73..88538db 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -176,7 +176,7 @@ void kthread_bind(struct task_struct *k, unsigned int cpu)
return;
}
/* Must have done schedule() in kthread() before we set_task_cpu */
- wait_task_inactive(k);
+ wait_task_inactive(k, 0);
set_task_cpu(k, cpu);
k->cpus_allowed = cpumask_of_cpu(cpu);
k->rt.nr_cpus_allowed = 1;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 8392a9d..082b3fc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -107,7 +107,7 @@ int ptrace_check_attach(struct task_struct *child, int kill)
read_unlock(&tasklist_lock);

if (!ret && !kill)
- wait_task_inactive(child);
+ ret = wait_task_inactive(child, TASK_TRACED) ? 0 : -ESRCH;

/* All systems go.. */
return ret;
diff --git a/kernel/sched.c b/kernel/sched.c
index 99e6d85..33aec6d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1946,16 +1946,24 @@ migrate_task(struct task_struct *p, int dest_cpu, struct migration_req *req)
/*
* wait_task_inactive - wait for a thread to unschedule.
*
+ * If @match_state is nonzero, it's the @p->state value just checked and
+ * not expected to change. If it changes, i.e. @p might have woken up,
+ * then return zero. When we succeed in waiting for @p to be off its CPU,
+ * we return a positive number (its total switch count). If a second call
+ * a short while later returns the same number, the caller can be sure that
+ * @p has remained unscheduled the whole time.
+ *
* The caller must ensure that the task *will* unschedule sometime soon,
* else this function might spin for a *long* time. This function can't
* be called with interrupts off, or it may introduce deadlock with
* smp_call_function() if an IPI is sent by the same process we are
* waiting to become inactive.
*/
-void wait_task_inactive(struct task_struct *p)
+unsigned long wait_task_inactive(struct task_struct *p, long match_state)
{
unsigned long flags;
int running, on_rq;
+ unsigned long ncsw;
struct rq *rq;

for (;;) {
@@ -1978,8 +1986,11 @@ void wait_task_inactive(struct task_struct *p)
* return false if the runqueue has changed and p
* is actually now running somewhere else!
*/
- while (task_running(rq, p))
+ while (task_running(rq, p)) {
+ if (match_state && unlikely(p->state != match_state))
+ return 0;
cpu_relax();
+ }

/*
* Ok, time to look more closely! We need the rq
@@ -1989,9 +2000,21 @@ void wait_task_inactive(struct task_struct *p)
rq = task_rq_lock(p, &flags);
running = task_running(rq, p);
on_rq = p->se.on_rq;
+ ncsw = 0;
+ if (!match_state || p->state == match_state) {
+ ncsw = p->nivcsw + p->nvcsw;
+ if (unlikely(!ncsw))
+ ncsw = 1;
+ }
task_rq_unlock(rq, &flags);

/*
+ * If it changed from the expected state, bail out now.
+ */
+ if (unlikely(!ncsw))
+ break;
+
+ /*
* Was it really running after all now that we
* checked with the proper locks actually held?
*
@@ -2023,6 +2046,8 @@ void wait_task_inactive(struct task_struct *p)
*/
break;
}
+
+ return ncsw;
}

/***

2008-07-17 07:41:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

On Thu, 17 Jul 2008 00:25:41 -0700 (PDT) Roland McGrath <[email protected]> wrote:

> This patch series introduces the "tracehook" interface layer of inlines
> in <linux/tracehook.h>.
>
> ...
>
> 25 files changed, 1084 insertions(+), 188 deletions(-)

It's a strange time to be sending this.

We're in the middle of a massive dump from linux-next into mainline and
soon we'll be doing a more modest dump of -mm into mainline and I'm
tiptoeing around getting all anxious when people try to sneak
wasnt-in-linux-next stuff into mainline thus wrecking all my junk.

I'm largely in ignore-new-stuff mode as I'm trying to stabilise
the existing stuff for 2.6.27-rc1 and hopefully the subsystem
maintainers are in the same mode.

So, hum. It's not a good time for anyone to be merging new
probably-2.6.28 material like this!

So for now, please let's just review the code and if it does get decent
review and if there are no unresolveable objections, please ask Stephen
to include this tree in linux-next.


That being said, the impact on existing code is fairly modest and we
could perhaps slip it into 2.6.27-rc1. I haven't looked at it yet.

Still. Strange timing!

2008-07-17 08:12:09

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

> It's a strange time to be sending this.

I've never managed to understand how the timing of your cycles works
exactly, and I really just work on the "send it when you can" principle.
To be honest all I really know is that stuff is going in now, and that
I think this stuff of mine is ready to go.

> That being said, the impact on existing code is fairly modest and we
> could perhaps slip it into 2.6.27-rc1. I haven't looked at it yet.

I really hope this can be merged in soon (for 2.6.27). I do think its
impact is low. The sooner this goes in, the sooner arch folks can round
out all their bits and the more time folks like me can spend on actually
making something new and interesting happen in this area.

I am happy to do whatever merging/rebasing work helps that happen the
soonest. If there is a different GIT tree to branch a rebase from,
that is easy for me to do.


Thanks,
Roland

2008-07-17 08:31:23

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

On Thu, 17 Jul 2008 01:11:21 -0700 (PDT) Roland McGrath <[email protected]> wrote:

> > It's a strange time to be sending this.
>
> I've never managed to understand how the timing of your cycles works
> exactly, and I really just work on the "send it when you can" principle.

It's pretty simple - it's just piplining. During the -rc's we
accumulate stuff for the next release in linux-next and in -mm. During
the 2.6.x->2.6.x+1-rc1 merge window we transfer that material into
mainline then the cycle starts again.

So new code which turns up during the merge window is a problem. It
should be reviewed, integrated on top of the already-pending work and
should get some testing in linux-next. But during the patch-merging
chaos nobody has the time or a suitable tree upon which to be doing
that.

> To be honest all I really know is that stuff is going in now, and that
> I think this stuff of mine is ready to go.

Yes, but all the stuff which is going in now has been reviewed and has
had testing in linux-next and/or -mm. (In theory).

> > That being said, the impact on existing code is fairly modest and we
> > could perhaps slip it into 2.6.27-rc1. I haven't looked at it yet.
>
> I really hope this can be merged in soon (for 2.6.27). I do think its
> impact is low. The sooner this goes in, the sooner arch folks can round
> out all their bits and the more time folks like me can spend on actually
> making something new and interesting happen in this area.
>
> I am happy to do whatever merging/rebasing work helps that happen the
> soonest. If there is a different GIT tree to branch a rebase from,
> that is easy for me to do.
>

erk.

Well I got it all merged up relatively easily but I don't know what it
does and I haven't looked at it yet.

It will take some urgent effort from reviewers and, it appears, arch
maintainers to be able to get this over the line. Even then it wom't
have had any linux-next testing. This should all have started weeks
ago!

2008-07-17 08:38:44

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

> It will take some urgent effort from reviewers and, it appears, arch
> maintainers to be able to get this over the line.

All the arch considerations are opt-in, and mostly quite simple. I
have arch patches for review by a few arch maintainers who I don't
think will mind too much. All others can look into it at their
leisure and expect not to get broken.


Thanks,
Roland

2008-07-17 08:51:17

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH 01/23] tracehook: add linux/tracehook.h

On Thu, Jul 17, 2008 at 12:27:55AM -0700, Roland McGrath wrote:
> The aim is to formalize and consolidate all the places that the core
> kernel code and the arch code now ties into the ptrace implementation.
>
> These patches mostly don't cause any functional change. They just
> move the details of ptrace logic out of core code into tracehook.h
> inlines, where they are mostly compiled away to the same as before.

> All that changes is that everything is thoroughly documented

This is fine.

> and any future reworking of ptrace, or addition of something new,
> would not have to touch core code all over, just change the tracehook.h
> inlines.

And this is suprising wish given one can't predict how exactly those
"future reworking" will look like.

> The new linux/ptrace.h inlines are used by the following patches in the
> new tracehook_*() inlines. Using these helpers for the ptrace event
> stops makes it simple to change or disable the old ptrace implementation
> of these stops conditionally later.

Call them "utrace_*" from the start?

> --- a/include/linux/ptrace.h
> +++ b/include/linux/ptrace.h
> @@ -121,6 +121,39 @@ static inline void ptrace_unlink(struct task_struct *child)
> int generic_ptrace_peekdata(struct task_struct *tsk, long addr, long data);
> int generic_ptrace_pokedata(struct task_struct *tsk, long addr, long data);
>
> +/**
> + * task_ptrace - return %PT_* flags that apply to a task
> + * @task: pointer to &task_struct in question
> + *
> + * Returns the %PT_* flags that apply to @task.
> + */
> +static inline int task_ptrace(struct task_struct *task)
> +{
> + return task->ptrace;
> +}

Pointless 1:1 wrapper unless you're going to #ifdef ->ptrace later.

2008-07-17 08:52:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

On Thu, 17 Jul 2008 00:25:41 -0700 (PDT) Roland McGrath <[email protected]> wrote:

> This patch series introduces the "tracehook" interface layer of inlines
> in <linux/tracehook.h>.

Looks sane to me from a quick scan.

A ~200 byte increase in i386 allnoconfig .text is liveable with. But
nothing defines CONFIG_HAVE_ARCH_TRACEHOOK yet. What effect will that
have?

I don't like the name! We have ftrace and we have static tracepoints
and we have dynamic tracepoints and we have linux trace toolkit and
whatever is in kernel/trace/trace.c etc, etc. Now this work comes
along with _userspace_ tracing capabilities and it goes and calls it,
of all things, "trace"!

Things would be much less confusing were we to do s/trace/xyzzy/g on
the whole patchset.


Apart from that, I think the other big issue with this patchset is that
it doesn't do anything yet. It's effectively a blank cheque. There's
not a lot of point in merging all this work unless we also merge
something which uses it (is this correct?). And afacit the thing which
will use it is utrace and utrace hasn't been sighted for a year or more
and it has met objections.

If we merge this and then utrace crashes on a rocky shore then there
was no (or little) point in having merged this.

Or am I wrong about that? Does it have sufficient standalone value to
justify a standalone merge (yet alone to justify such a late merge)?

2008-07-17 11:07:06

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH 01/23] tracehook: add linux/tracehook.h

On Thu, 2008-07-17 at 12:48 +0400, Alexey Dobriyan wrote:
> On Thu, Jul 17, 2008 at 12:27:55AM -0700, Roland McGrath wrote:
> > The aim is to formalize and consolidate all the places that the core
> > kernel code and the arch code now ties into the ptrace implementation.
> >
> > These patches mostly don't cause any functional change. They just
> > move the details of ptrace logic out of core code into tracehook.h
> > inlines, where they are mostly compiled away to the same as before.
>
> > All that changes is that everything is thoroughly documented
>
> This is fine.
>
> > and any future reworking of ptrace, or addition of something new,
> > would not have to touch core code all over, just change the tracehook.h
> > inlines.
>
> And this is suprising wish given one can't predict how exactly those
> "future reworking" will look like.
>
> > The new linux/ptrace.h inlines are used by the following patches in the
> > new tracehook_*() inlines. Using these helpers for the ptrace event
> > stops makes it simple to change or disable the old ptrace implementation
> > of these stops conditionally later.
>
> Call them "utrace_*" from the start?

Ah, maybe justified, because I don't expect any other re-implementation
of the same after utrace is finished, but -- there's still the old
ptrace implementation, so _not_ mentioning utrace seems a bit better to
me.

> > --- a/include/linux/ptrace.h
> > +++ b/include/linux/ptrace.h
> > @@ -121,6 +121,39 @@ static inline void ptrace_unlink(struct task_struct *child)
> > int generic_ptrace_peekdata(struct task_struct *tsk, long addr, long data);
> > int generic_ptrace_pokedata(struct task_struct *tsk, long addr, long data);
> >
> > +/**
> > + * task_ptrace - return %PT_* flags that apply to a task
> > + * @task: pointer to &task_struct in question
> > + *
> > + * Returns the %PT_* flags that apply to @task.
> > + */
> > +static inline int task_ptrace(struct task_struct *task)
> > +{
> > + return task->ptrace;
> > +}
>
> Pointless 1:1 wrapper unless you're going to #ifdef ->ptrace later.

And that's exactly what's going to happen. Look at Roland's git. ;)

Petr

2008-07-17 13:35:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 01/23] tracehook: add linux/tracehook.h

On Thu, Jul 17, 2008 at 12:48:18PM +0400, Alexey Dobriyan wrote:
> > The new linux/ptrace.h inlines are used by the following patches in the
> > new tracehook_*() inlines. Using these helpers for the ptrace event
> > stops makes it simple to change or disable the old ptrace implementation
> > of these stops conditionally later.
>
> Call them "utrace_*" from the start?

Yes, please.

> Pointless 1:1 wrapper unless you're going to #ifdef ->ptrace later.

Even then I don't think it's a good idea. This one should only be
touched in very very few places

2008-07-17 21:53:54

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH 01/23] tracehook: add linux/tracehook.h

On Thu, Jul 17, 2008 at 01:06:52PM +0200, Petr Tesarik wrote:
> On Thu, 2008-07-17 at 12:48 +0400, Alexey Dobriyan wrote:
> > On Thu, Jul 17, 2008 at 12:27:55AM -0700, Roland McGrath wrote:
> > > The aim is to formalize and consolidate all the places that the core
> > > kernel code and the arch code now ties into the ptrace implementation.
> > >
> > > These patches mostly don't cause any functional change. They just
> > > move the details of ptrace logic out of core code into tracehook.h
> > > inlines, where they are mostly compiled away to the same as before.
> >
> > > All that changes is that everything is thoroughly documented
> >
> > This is fine.
> >
> > > and any future reworking of ptrace, or addition of something new,
> > > would not have to touch core code all over, just change the tracehook.h
> > > inlines.
> >
> > And this is suprising wish given one can't predict how exactly those
> > "future reworking" will look like.
> >
> > > The new linux/ptrace.h inlines are used by the following patches in the
> > > new tracehook_*() inlines. Using these helpers for the ptrace event
> > > stops makes it simple to change or disable the old ptrace implementation
> > > of these stops conditionally later.
> >
> > Call them "utrace_*" from the start?
>
> Ah, maybe justified, because I don't expect any other re-implementation
> of the same after utrace is finished, but -- there's still the old
> ptrace implementation, so _not_ mentioning utrace seems a bit better to
> me.

ptrace(2) will start calling utrace_* hooks and functions.

These tracehooks are generic and utrace is generic as well.

2008-07-17 22:59:26

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH 23/23] /proc/PID/syscall

On Thu, Jul 17, 2008 at 12:31:44AM -0700, Roland McGrath wrote:
> This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files.
> These use task_current_syscall() to show the task's current system call
> number and argument registers, stack pointer and PC.

> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -509,6 +509,26 @@ static int proc_pid_limits(struct task_struct *task, char *buffer)
> return count;
> }
>
> +#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
> +static int proc_pid_syscall(struct task_struct *task, char *buffer)
> +{
> + long nr;
> + unsigned long args[6], sp, pc;
> +
> + if (task_current_syscall(task, &nr, args, 6, &sp, &pc))
> + return sprintf(buffer, "running\n");
> +
> + if (nr < 0)
> + return sprintf(buffer, "%ld 0x%lx 0x%lx\n", nr, sp, pc);
> +
> + return sprintf(buffer,
> + "%ld 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx 0x%lx\n",
> + nr,
> + args[0], args[1], args[2], args[3], args[4], args[5],
> + sp, pc);
> +}
> +#endif /* CONFIG_HAVE_ARCH_TRACEHOOK */

My gut feeling this code needs ptrace_may_access() checks.

2008-07-18 08:08:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook


* Andrew Morton <[email protected]> wrote:

> On Thu, 17 Jul 2008 00:25:41 -0700 (PDT) Roland McGrath <[email protected]> wrote:
>
> > This patch series introduces the "tracehook" interface layer of
> > inlines in <linux/tracehook.h>.
>
> Looks sane to me from a quick scan.

same here.

Reviewed-by: Ingo Molnar <[email protected]>

> A ~200 byte increase in i386 allnoconfig .text is liveable with. But
> nothing defines CONFIG_HAVE_ARCH_TRACEHOOK yet. What effect will that
> have?

this is the second subtle step towards utrace and
next-gen-instrumentation. (regset was the first, by far most risky step)

> I don't like the name! We have ftrace and we have static tracepoints
> and we have dynamic tracepoints and we have linux trace toolkit and
> whatever is in kernel/trace/trace.c etc, etc. Now this work comes
> along with _userspace_ tracing capabilities and it goes and calls it,
> of all things, "trace"!

> Apart from that, I think the other big issue with this patchset is
> that it doesn't do anything yet. It's effectively a blank cheque.
> There's not a lot of point in merging all this work unless we also
> merge something which uses it (is this correct?). And afacit the
> thing which will use it is utrace and utrace hasn't been sighted for a
> year or more and it has met objections.
>
> If we merge this and then utrace crashes on a rocky shore then there
> was no (or little) point in having merged this.
>
> Or am I wrong about that? Does it have sufficient standalone value to
> justify a standalone merge (yet alone to justify such a late merge)?

It has enough standalone value to me - it generally cleans up all things
"abstract kernel events", collects them into a single entity and lets
tracers interact with the kernel (not just passively observe its
events). So it's good for next-gen debuggers too, etc.

And task_current_syscall() avoids us hundreds of crappy hooks in every
syscall handler and gives us in-kernel strace in essence. (it's not just
useful to sample blocked threads - it could also be used by ftrace&co to
sample the currently executing task) That alone makes it worth it IMO
;-)

also in places it cleans up current special-case-for-ptrace code and
makes it shorter - like in kernel/exit.c. Well thought out scheme and
structure - as we've come to expect from Roland.

Ingo

2008-07-18 09:20:44

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

From: Roland McGrath <[email protected]>
Date: Thu, 17 Jul 2008 00:25:41 -0700 (PDT)

> I have done some arch work and will submit this to the arch maintainers
> after the generic tracehook series settles in. For now, that work is
> available in my GIT repositories, and in patch and mbox-of-patches form
> at http://people.redhat.com/roland/utrace/2.6-current/

Thanks for doing this work.

I took a quick look at the sparc64 tracehook support and only spotted
one issue.

You'll need special handling for 32-bit compat tasks when copying the
arguments in. You can't just use the full 64-bit values because they
get 32-bit zero extended by the 32-bit syscall entry code, and this
doesn't propagate into the pt_regs copies.

See arch/sparc64/kernel/syscalls.S:linux_sparc_syscall32

So at a minimum, when _TIF_32BIT is set, you'll need to "u32" cast
the regs->u_regs[] values before assigning them into the unsigned long
array slots.

Actually, it's more complicated than that. We have to sign extend
arguments in some cases, and this is performed by a class of stubs
in arch/sparc64/kernel/sys32.S which then revector to the real
system call handler.

I don't know how you want to handle those cases. Should the tracehook
user know that it's a 32-bit task, what system call it is, and therefore
do the appropriate sign extensions? That doesn't sound right.

So likely we have to properly sign extend those things here in the
asm/syscall.h handlers.

If you look at arch/sparc64/kernel/sys32.S it uses macros and then a
list of cases that could be slightly modified such that they could be
used to build argument sign extension tables. Simply change the
explicit register name arguments to pure numbers (ie. %o0 becomes
plain 0) and adjust the assembler macros accordingly.

Then, in a C file somewhere, you set these macro defines differently
then feed the list of instantiations through them and build the table.

The other arch's with compat support will have this same exact issue.

2008-07-18 11:24:43

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

David Miller <[email protected]> writes:
>
> The other arch's with compat support will have this same exact issue.

On x86-64 the sign extension is in C code, so it would be somewhat more
complicated to do a generic table or converter.

I think the tracer will always need to know if it's tracing 32bit
or 64bit though, because the interfaces are not always 100% compatible.
e.g. I suspect any serious one will need to look at memory
arguments for some cases and those generally differ.

-Andi

2008-07-18 11:32:36

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

From: Andi Kleen <[email protected]>
Date: Fri, 18 Jul 2008 13:24:33 +0200

> David Miller <[email protected]> writes:
> >
> > The other arch's with compat support will have this same exact issue.
>
> On x86-64 the sign extension is in C code, so it would be somewhat more
> complicated to do a generic table or converter.

I was suggesting to use the list of syscalls in the sparc64 code
in a generic spot, making a table indexed by __NR_* number
that the tracehook syscall stuff can use.

But yes, there are other issues such as datastructure layout.

It all depends upon what Roland thinks those arg values obtained by
the asm/syscall.h functions should mean.

2008-07-18 11:57:28

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH 01/23] tracehook: add linux/tracehook.h

On Thu, 2008-07-17 at 09:34 -0400, Christoph Hellwig wrote:
> On Thu, Jul 17, 2008 at 12:48:18PM +0400, Alexey Dobriyan wrote:
> > > The new linux/ptrace.h inlines are used by the following patches in the
> > > new tracehook_*() inlines. Using these helpers for the ptrace event
> > > stops makes it simple to change or disable the old ptrace implementation
> > > of these stops conditionally later.
> >
> > Call them "utrace_*" from the start?
>
> Yes, please.
>
> > Pointless 1:1 wrapper unless you're going to #ifdef ->ptrace later.
>
> Even then I don't think it's a good idea. This one should only be
> touched in very very few places

So, do you suggest putting those #ifdef's in the middle of the function
which uses it? Anyway, this patch should probably be posted later, i.e.
when the other implementation is actually introduced.

My $0.02.

Petr

2008-07-21 08:19:40

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 23/23] /proc/PID/syscall

> My gut feeling this code needs ptrace_may_access() checks.

That would be fine with me. The nodes are readable only by user, which
seems sufficient to me. But I certainly don't object to the stronger check
there. I didn't put too much thought into /proc/pid/syscall, it's just
there to demonstrate and test the internal task_current_syscall() call.


Thanks,
Roland

2008-07-21 08:49:32

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 02/23] tracehook: exec

On Thu, Jul 17, 2008 at 12:28:20AM -0700, Roland McGrath wrote:
> This moves all the ptrace hooks related to exec into tracehook.h inlines.
>
> This also lifts the calls for tracing out of the binfmt load_binary hooks
> into search_binary_handler() after it calls into the binfmt module. This
> change has no effect, since all the binfmt modules' load_binary functions
> did the call at the end on success, and now search_binary_handler() does
> it immediately after return if successful. We consolidate the repeated
> code, and binfmt modules no longer need to import ptrace_notify().

Care to first just consolidate the code from the binary handlers to
exec.c and then restrucure it? Currently mainline doesn't even
have ptrace_event yet so it's hard to verify the code is the same.
It certainly isn't for binfmt_som although that's like just a bugfix
that needs to be documented. Also the two new routines are too large
to be inlined, and I would rather not rely on the guarantee that no
other caller will pop up.

2008-07-21 09:59:48

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

Ingo said very well what I think the intrinsic value is. Indeed, I
think it is of use even if utrace were never merged. Anyone who
decides tomorrow that my crap is useless and whips up a different
plan instead of utrace, will benefit from reading all the kerneldoc
in tracehook.h.

The practical value of the patch series right now is that it makes
life easier for merging and patch juggling. The new utrace patches
are indeed coming too, and I'm expecting some vigorous feedback.
Different versions are likely to bounce around for a while.
Meanwhile, there are some people trying to track latest+utrace via
GIT or patching. The tracehook series makes it so that the utrace
patches proper touch very little core code and few files, so merge
and patch conflicts are rare. I can tell you from doing it a lot
that there are frequently niggling conflicts to fix up in rebasing
the tracehook patches after miscellaneous core kernel changes.
Things requiring changes to tracehook.h are unlikely, but unrelated
changes textually near the tracehook calls that foul automatic patch
merging are common.

I'll do another few days of polish on utrace first too, but one main
reason I posted tracehook alone first was to see how much shrapnel I
took in the face just for this, before getting to the substantive
bits. If this much is accepted, that's enough to make it possible
for utrace or another new thing to be added as a config option.

The new utrace has internal differences from the first version, but
the more essential point here is that it's entirely optional at
config time. Whatever traits utrace has, it's only a new
non-default config option marked EXPERIMENTAL. Once the tracehook
series is accepted, then relative to that, no utrace patches will do
anything at all if CONFIG_UTRACE is not chosen.

Another value is for arch folks. I know at least a few arch
maintainers are interested in adding this support. The generic
tracehook series sets a base on which all the arch support can be
finished up and ready to enable utrace or something different,
whatever it is. If this base is in a canonical shared tree to be
merged into, then any interested arch people can fully complete this
work and push it to their arch's users. This lets people experiment
on utrace (or a replacement) without also juggling arch patches.


Thanks,
Roland

2008-07-21 10:54:46

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

> I took a quick look at the sparc64 tracehook support and only spotted
> one issue.

Thanks for the feedback. The sparc64 (and ia64) code I have is meant
only as a starting point, and there are some unfinished holes. I'd
intended to point you (arch maintainers) at what I whipped up but
leave it to you to decide on the correct arch bits and merge them
eventually. (The x86 and powerpc patches are the only ones I've
really used myself and would submit as claimed to be ready.)

> You'll need special handling for 32-bit compat tasks when copying the
> arguments in. You can't just use the full 64-bit values because they
> get 32-bit zero extended by the 32-bit syscall entry code, and this
> doesn't propagate into the pt_regs copies.

This is true of powerpc too. I guess I'd figured that uses of the
calls would know when it's 32-bit and truncate what they saw. Isn't
that what happens now when a 64-bit ptrace caller uses the 64-bit
getregs flavor on a target task that is actually 32-bit?

But I don't disagree it might well be better for syscall_get_arguments
to truncate values to 32 bits. I don't think there's a need for it to
reflect sign-extended 64-bit values when it's a 32-bit task. Probably
better not to, i.e. see "123 0xffffffff ..." in /proc/pid/syscall on a
64-bit kernel for a 32-bit task as on 32-bit, instead of 32-bit apps
that don't normally notice whether the kernel is 64 or 32 sometimes
seeing "123 0xffffffffffffffff ...".

> Actually, it's more complicated than that. We have to sign extend
> arguments in some cases, and this is performed by a class of stubs
> in arch/sparc64/kernel/sys32.S which then revector to the real
> system call handler.

It's odd that you need sign-extended arguments for all those calls
when other machines don't.

> I don't know how you want to handle those cases. Should the tracehook
> user know that it's a 32-bit task, what system call it is, and therefore
> do the appropriate sign extensions? That doesn't sound right.

Which call are you talking about? For syscall_get_arguments, I think
it's fine to get all values zero-extended from 32 bits (maybe even
fine to get random high bits). The consumer of those values in the
kernel code ought well enough to know whether it's a 32-bit or 64-bit
task's system call--that changes what the call number and arguments
mean, anyway.

For syscall_set_arguments, it's a different question. In that case,
we should expect that whatever original source of the value being
poked in was, it just knows it's poking a 32-bit task and doesn't even
consider that the kernel might be 64 bits. It should poke a 32-bit
value and have the effect that poking that 32-bit value has on a
32-bit kernel.

Anyway, I don't see how this side of it is really an issue at all.
syscall_set_arguments is only valid to use at the syscall entry
tracing point. This is before the dispatch to those stubs doing the
sign extension. (Or it could be used at a point entirely outside of
syscall entry, like on the state just after -ERESTARTSYS handling
to change the args before the user re-executes the syscall.)

> The other arch's with compat support will have this same exact issue.

They don't seem to.


Thanks,
Roland

2008-07-21 15:18:55

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 00/23] tracehook

From: Roland McGrath <[email protected]>
Date: Mon, 21 Jul 2008 03:54:31 -0700 (PDT)

> > The other arch's with compat support will have this same exact issue.
>
> They don't seem to.

Look at S390 and PowerPC, they do the sign extension cases as well.

PowerPC does it with C level stubs, whereas I believe S390 uses a
similar assembler stub scheme as sparc64.

But you are right, this is probably not an issue, and providing 32-bit
truncated incoming args, and letting the tracer do whatever it likes
for outgoing args, should work just fine.